WorldWideScience

Sample records for based clustering method

  1. Kernel method-based fuzzy clustering algorithm

    Institute of Scientific and Technical Information of China (English)

    Wu Zhongdong; Gao Xinbo; Xie Weixin; Yu Jianping

    2005-01-01

    The fuzzy C-means clustering algorithm(FCM) to the fuzzy kernel C-means clustering algorithm(FKCM) to effectively perform cluster analysis on the diversiform structures are extended, such as non-hyperspherical data, data with noise, data with mixture of heterogeneous cluster prototypes, asymmetric data, etc. Based on the Mercer kernel, FKCM clustering algorithm is derived from FCM algorithm united with kernel method. The results of experiments with the synthetic and real data show that the FKCM clustering algorithm is universality and can effectively unsupervised analyze datasets with variform structures in contrast to FCM algorithm. It is can be imagined that kernel-based clustering algorithm is one of important research direction of fuzzy clustering analysis.

  2. Convex Decomposition Based Cluster Labeling Method for Support Vector Clustering

    Institute of Scientific and Technical Information of China (English)

    Yuan Ping; Ying-Jie Tian; Ya-Jian Zhou; Yi-Xian Yang

    2012-01-01

    Support vector clustering (SVC) is an important boundary-based clustering algorithm in multiple applications for its capability of handling arbitrary cluster shapes. However,SVC's popularity is degraded by its highly intensive time complexity and poor label performance.To overcome such problems,we present a novel efficient and robust convex decomposition based cluster labeling (CDCL) method based on the topological property of dataset.The CDCL decomposes the implicit cluster into convex hulls and each one is comprised by a subset of support vectors (SVs).According to a robust algorithm applied in the nearest neighboring convex hulls,the adjacency matrix of convex hulls is built up for finding the connected components; and the remaining data points would be assigned the label of the nearest convex hull appropriately.The approach's validation is guaranteed by geometric proofs.Time complexity analysis and comparative experiments suggest that CDCL improves both the efficiency and clustering quality significantly.

  3. A dynamic fuzzy clustering method based on genetic algorithm

    Institute of Scientific and Technical Information of China (English)

    ZHENG Yan; ZHOU Chunguang; LIANG Yanchun; GUO Dongwei

    2003-01-01

    A dynamic fuzzy clustering method is presented based on the genetic algorithm. By calculating the fuzzy dissimilarity between samples the essential associations among samples are modeled factually. The fuzzy dissimilarity between two samples is mapped into their Euclidean distance, that is, the high dimensional samples are mapped into the two-dimensional plane. The mapping is optimized globally by the genetic algorithm, which adjusts the coordinates of each sample, and thus the Euclidean distance, to approximate to the fuzzy dissimilarity between samples gradually. A key advantage of the proposed method is that the clustering is independent of the space distribution of input samples, which improves the flexibility and visualization. This method possesses characteristics of a faster convergence rate and more exact clustering than some typical clustering algorithms. Simulated experiments show the feasibility and availability of the proposed method.

  4. PERFORMANCE ANALYSIS OF CLUSTERING BASED IMAGE SEGMENTATION AND OPTIMIZATION METHODS

    Directory of Open Access Journals (Sweden)

    Jaskirat kaur

    2012-05-01

    Full Text Available Partitioning of an image into several constituent components is called image segmentation. Myriad algorithms using different methods have been proposed for image segmentation. Many clustering algorithms and optimization techniques are also being used for segmentation of images. A major challenge in segmentation evaluation comes from the fundamental conflict between generality and objectivity. As there is a glut of image segmentation techniques available today, customer who is the real user of these techniques may get obfuscated. In this paper to address the above described problem some image segmentation techniques are evaluated based on their consistency in different applications. Based on the parameters used quantification of different clustering algorithms is done.

  5. Super pixel density based clustering automatic image classification method

    Science.gov (United States)

    Xu, Mingxing; Zhang, Chuan; Zhang, Tianxu

    2015-12-01

    The image classification is an important means of image segmentation and data mining, how to achieve rapid automated image classification has been the focus of research. In this paper, based on the super pixel density of cluster centers algorithm for automatic image classification and identify outlier. The use of the image pixel location coordinates and gray value computing density and distance, to achieve automatic image classification and outlier extraction. Due to the increased pixel dramatically increase the computational complexity, consider the method of ultra-pixel image preprocessing, divided into a small number of super-pixel sub-blocks after the density and distance calculations, while the design of a normalized density and distance discrimination law, to achieve automatic classification and clustering center selection, whereby the image automatically classify and identify outlier. After a lot of experiments, our method does not require human intervention, can automatically categorize images computing speed than the density clustering algorithm, the image can be effectively automated classification and outlier extraction.

  6. Density-based clustering method in the moving object database

    Institute of Scientific and Technical Information of China (English)

    ZHOU Xing; XIANG Shu; GE Jun-wei; LIU Zhao-hong; BAE Hae-young

    2004-01-01

    With the rapid advance of wireless communication, tracking the positions of the moving objects is becoming increasingly feasible and necessary. Because a large number of people use mobile phones, we must handle a large moving object database as well as the following problems. How can we provide the customers with high quality service, that means, how can we deal with so many enquiries within as less time as possible? Because of the large number of data, the gap between CPU speed and the size of main memory has increasing considerably. One way to reduce the time to handle enquiries is to reduce the I/O number between the buffer and the secondary storage. An effective clustering of the objects can minimize the I/O-cost between them. In this paper, according to the characteristic of the moving object database, we analyze the objects in buffer, according to their mappings in the two-dimension coordinate, and then develop a density-based clustering method to effectively reorganize the clusters. This new mechanism leads to the less cost of the I/O operation and the more efficient response to enquiries.

  7. Urban Fire Risk Clustering Method Based on Fire Statistics

    Institute of Scientific and Technical Information of China (English)

    WU Lizhi; REN Aizhu

    2008-01-01

    Fire statistics and fire analysis have become important ways for us to understand the law of fire,prevent the occurrence of fire, and improve the ability to control fire. According to existing fire statistics, the weighted fire risk calculating method characterized by the number of fire occurrence, direct economic losses,and fire casualties was put forward. On the basis of this method, meanwhile having improved K-mean clus-tering arithmetic, this paper established fire dsk K-mean clustering model, which could better resolve the automatic classifying problems towards fire risk. Fire risk cluster should be classified by the absolute dis-tance of the target instead of the relative distance in the traditional cluster arithmetic. Finally, for applying the established model, this paper carded out fire risk clustering on fire statistics from January 2000 to December 2004 of Shenyang in China. This research would provide technical support for urban fire management.

  8. Clustering method based on data division and partition

    Institute of Scientific and Technical Information of China (English)

    卢志茂; 刘晨; 张春祥; 王蕾

    2014-01-01

    Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets (VLDS). In this work, a novel division and partition clustering method (DP) was proposed to solve the problem. DP cut the source data set into data blocks, and extracted the eigenvector for each data block to form the local feature set. The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector. Ultimately according to the global eigenvector, the data set was assigned by criterion of minimum distance. The experimental results show that it is more robust than the conventional clusterings. Characteristics of not sensitive to data dimensions, distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.

  9. Color Image Segmentation Method Based on Improved Spectral Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Dong Qin

    2014-08-01

    Full Text Available Contraposing to the features of image data with high sparsity of and the problems on determination of clustering numbers, we try to put forward an color image segmentation algorithm, combined with semi-supervised machine learning technology and spectral graph theory. By the research of related theories and methods of spectral clustering algorithms, we introduce information entropy conception to design a method which can automatically optimize the scale parameter value. So it avoids the unstability in clustering result of the scale parameter input manually. In addition, we try to excavate available priori information existing in large number of non-generic data and apply semi-supervised algorithm to improve the clustering performance for rare class. We also use added tag data to compute similar matrix and perform clustering through FKCM algorithms. By the simulation of standard dataset and image segmentation, the experiments demonstrate our algorithm has overcome the defects of traditional spectral clustering methods, which are sensitive to outliers and easy to fall into local optimum, and also poor in the convergence rate

  10. Spectral methods and cluster structure in correlation-based networks

    Science.gov (United States)

    Heimo, Tapio; Tibély, Gergely; Saramäki, Jari; Kaski, Kimmo; Kertész, János

    2008-10-01

    We investigate how in complex systems the eigenpairs of the matrices derived from the correlations of multichannel observations reflect the cluster structure of the underlying networks. For this we use daily return data from the NYSE and focus specifically on the spectral properties of weight W=|-δ and diffusion matrices D=W/sj-δ, where C is the correlation matrix and si=∑jW the strength of node j. The eigenvalues (and corresponding eigenvectors) of the weight matrix are ranked in descending order. As in the earlier observations, the first eigenvector stands for a measure of the market correlations. Its components are, to first approximation, equal to the strengths of the nodes and there is a second order, roughly linear, correction. The high ranking eigenvectors, excluding the highest ranking one, are usually assigned to market sectors and industrial branches. Our study shows that both for weight and diffusion matrices the eigenpair analysis is not capable of easily deducing the cluster structure of the network without a priori knowledge. In addition we have studied the clustering of stocks using the asset graph approach with and without spectrum based noise filtering. It turns out that asset graphs are quite insensitive to noise and there is no sharp percolation transition as a function of the ratio of bonds included, thus no natural threshold value for that ratio seems to exist. We suggest that these observations can be of use for other correlation based networks as well.

  11. Image Clustering Method Based on Density Maps Derived from Self-Organizing Mapping: SOM

    Directory of Open Access Journals (Sweden)

    Kohei Arai

    2012-07-01

    Full Text Available A new method for image clustering with density maps derived from Self-Organizing Maps (SOM is proposed together with a clarification of learning processes during a construction of clusters. It is found that the proposed SOM based image clustering method shows much better clustered result for both simulation and real satellite imagery data. It is also found that the separability among clusters of the proposed method is 16% longer than the existing k-mean clustering. It is also found that the separability among clusters of the proposed method is 16% longer than the existing k-mean clustering. In accordance with the experimental results with Landsat-5 TM image, it takes more than 20000 of iteration for convergence of the SOM learning processes.

  12. A data structure and function classification based method to evaluate clustering models for gene expression data

    Institute of Scientific and Technical Information of China (English)

    YI Dong; YANG Meng-su; HUANG Ming-hui; LI Hui-zhi; WANG Wen-chang

    2002-01-01

    Objective:To establish a systematic framework for selecting the best clustering algorithm and provide an evaluation method for clustering analyses of gene expression data. Methods: Based on data structure (internal information) and function classification (external information), the evaluation of gene expression data analyses were carried out by using 2 approaches. Firstly, to assess the predictive power of clusteringalgorithms, Entropy was introduced to measure the consistency between the clustering results from different algorithms and the known and validated functional classifications. Secondly, a modified method of figure of merit (adjust-FOM) was used as internal assessment method. In this method, one clustering algorithm was used to analyze all data but one experimental condition, the remaining condition was used to assess the predictive power of the resulting clusters. This method was applied on 3 gene expression data sets (2 from the Lyer's Serum Data Sets, and 1 from the Ferea's Saccharomyces Cerevisiae Data Set). Results: A method based on entropy and figure of merit (FOM) was proposed to explore the results of the 3 data sets obtained by 6 different algorithms, SOM and Fuzzy clustering methods were confirmed to possess the highest ability to cluster. Conclusion: A method based on entropy is firstly brought forward to evaluate clustering analyses.Different results are attained in evaluating same data set due to different function classification. According to the curves of adjust_FOM and Entropy_FOM, SOM and Fuzzy clustering methods show the highest ability to cluster on the 3 data sets.

  13. AN ADAPTIVE GRID-BASED METHOD FOR CLUSTERING MULTIDIMENSIONAL ONLINE DATA STREAMS

    Directory of Open Access Journals (Sweden)

    Toktam Dehghani

    2012-10-01

    Full Text Available Clustering is an important task in mining the evolving data streams. A lot of data streams are high dimensional in nature. Clustering in the high dimensional data space is a complex problem, which is inherently more complex for data streams. Most data stream clustering methods are not capable of dealing with high dimensional data streams; therefore they sacrifice the accuracy of clusters. In order to solve this problem we proposed an adaptive grid -based clustering method. Our focus is on providing up-to-date arbitrary shaped clusters along with improving the processing time and bounding the amount of the memory u sage. In our method (B+C tree, a structure called “B+cell tree” is used to keep the recent information of a data stream. In order to reduce the complexity of the clustering, a structure called “cluster tree” is proposed to maintain multi dimensional clusters. A Cluster tree yields high quality clusters by keeping the boundaries of clusters in a semi -optimal way. Clustertree captures the dynamic changes of data streams and adjusts the clusters. Our performance study over a number of real and synthetic data streams demonstrates the scalability of algorithm on the number of dimensions and data without sacrificing the accuracy of identified clusters.

  14. Clustering scientific publications based on citation relations: A systematic comparison of different methods

    CERN Document Server

    Šubelj, Lovro; Waltman, Ludo

    2015-01-01

    Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between di...

  15. An incremental DPMM-based method for trajectory clustering, modeling, and retrieval.

    Science.gov (United States)

    Hu, Weiming; Li, Xi; Tian, Guodong; Maybank, Stephen; Zhang, Zhongfei

    2013-05-01

    Trajectory analysis is the basis for many applications, such as indexing of motion events in videos, activity recognition, and surveillance. In this paper, the Dirichlet process mixture model (DPMM) is applied to trajectory clustering, modeling, and retrieval. We propose an incremental version of a DPMM-based clustering algorithm and apply it to cluster trajectories. An appropriate number of trajectory clusters is determined automatically. When trajectories belonging to new clusters arrive, the new clusters can be identified online and added to the model without any retraining using the previous data. A time-sensitive Dirichlet process mixture model (tDPMM) is applied to each trajectory cluster for learning the trajectory pattern which represents the time-series characteristics of the trajectories in the cluster. Then, a parameterized index is constructed for each cluster. A novel likelihood estimation algorithm for the tDPMM is proposed, and a trajectory-based video retrieval model is developed. The tDPMM-based probabilistic matching method and the DPMM-based model growing method are combined to make the retrieval model scalable and adaptable. Experimental comparisons with state-of-the-art algorithms demonstrate the effectiveness of our algorithm. PMID:23520251

  16. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods.

    Science.gov (United States)

    Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo

    2016-01-01

    Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community.

  17. A clustering based method to evaluate soil corrosivity for pipeline external integrity management

    International Nuclear Information System (INIS)

    One important category of transportation infrastructure is underground pipelines. Corrosion of these buried pipeline systems may cause pipeline failures with the attendant hazards of property loss and fatalities. Therefore, developing the capability to estimate the soil corrosivity is important for designing and preserving materials and for risk assessment. The deterioration rate of metal is highly influenced by the physicochemical characteristics of a material and the environment of its surroundings. In this study, the field data obtained from the southeast region of Mexico was examined using various data mining techniques to determine the usefulness of these techniques for clustering soil corrosivity level. Specifically, the soil was classified into different corrosivity level clusters by k-means and Gaussian mixture model (GMM). In terms of physical space, GMM shows better separability; therefore, the distributions of the material loss of the buried petroleum pipeline walls were estimated via the empirical density within GMM clusters. The soil corrosivity levels of the clusters were determined based on the medians of metal loss. The proposed clustering method was demonstrated to be capable of classifying the soil into different levels of corrosivity severity. - Highlights: • The clustering approach is applied to the data extracted from a real-life pipeline system. • Soil properties in the right-of-way are analyzed via clustering techniques to assess corrosivity. • GMM is selected as the preferred method for detecting the hidden pattern of in-situ data. • K–W test is performed for significant difference of corrosivity level between clusters

  18. A method for context-based adaptive QRS clustering in real-time

    CERN Document Server

    Castro, Daniel; Presedo, Jesús

    2014-01-01

    Continuous follow-up of heart condition through long-term electrocardiogram monitoring is an invaluable tool for diagnosing some cardiac arrhythmias. In such context, providing tools for fast locating alterations of normal conduction patterns is mandatory and still remains an open issue. This work presents a real-time method for adaptive clustering QRS complexes from multilead ECG signals that provides the set of QRS morphologies that appear during an ECG recording. The method processes the QRS complexes sequentially, grouping them into a dynamic set of clusters based on the information content of the temporal context. The clusters are represented by templates which evolve over time and adapt to the QRS morphology changes. Rules to create, merge and remove clusters are defined along with techniques for noise detection in order to avoid their proliferation. To cope with beat misalignment, Derivative Dynamic Time Warping is used. The proposed method has been validated against the MIT-BIH Arrhythmia Database and...

  19. A scale-independent clustering method with automatic variable selection based on trees

    OpenAIRE

    Lynch, Sarah K.

    2014-01-01

    Approved for public release; distribution is unlimited. Clustering is the process of putting observations into groups based on their distance, or dissimilarity, from one another. Measuring distance for continuous variables often requires scaling or monotonic transformation. Determining dissimilarity when observations have both continuous and categorical measurements can be difficult because each type of measurement must be approached differently. We introduce a new clustering method that u...

  20. An effective trust-based recommendation method using a novel graph clustering algorithm

    Science.gov (United States)

    Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin

    2015-10-01

    Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.

  1. A semantics-based method for clustering of Chinese web search results

    Science.gov (United States)

    Zhang, Hui; Wang, Deqing; Wang, Li; Bi, Zhuming; Chen, Yong

    2014-01-01

    Information explosion is a critical challenge to the development of modern information systems. In particular, when the application of an information system is over the Internet, the amount of information over the web has been increasing exponentially and rapidly. Search engines, such as Google and Baidu, are essential tools for people to find the information from the Internet. Valuable information, however, is still likely submerged in the ocean of search results from those tools. By clustering the results into different groups based on subjects automatically, a search engine with the clustering feature allows users to select most relevant results quickly. In this paper, we propose an online semantics-based method to cluster Chinese web search results. First, we employ the generalised suffix tree to extract the longest common substrings (LCSs) from search snippets. Second, we use the HowNet to calculate the similarities of the words derived from the LCSs, and extract the most representative features by constructing the vocabulary chain. Third, we construct a vector of text features and calculate snippets' semantic similarities. Finally, we improve the Chameleon algorithm to cluster snippets. Extensive experimental results have shown that the proposed algorithm has outperformed over the suffix tree clustering method and other traditional clustering methods.

  2. An efficient method of key-frame extraction based on a cluster algorithm.

    Science.gov (United States)

    Zhang, Qiang; Yu, Shao-Pei; Zhou, Dong-Sheng; Wei, Xiao-Peng

    2013-12-18

    This paper proposes a novel method of key-frame extraction for use with motion capture data. This method is based on an unsupervised cluster algorithm. First, the motion sequence is clustered into two classes by the similarity distance of the adjacent frames so that the thresholds needed in the next step can be determined adaptively. Second, a dynamic cluster algorithm called ISODATA is used to cluster all the frames and the frames nearest to the center of each class are automatically extracted as key-frames of the sequence. Unlike many other clustering techniques, the present improved cluster algorithm can automatically address different motion types without any need for specified parameters from users. The proposed method is capable of summarizing motion capture data reliably and efficiently. The present work also provides a meaningful comparison between the results of the proposed key-frame extraction technique and other previous methods. These results are evaluated in terms of metrics that measure reconstructed motion and the mean absolute error value, which are derived from the reconstructed data and the original data.

  3. Improved fuzzy identification method based on Hough transformation and fuzzy clustering

    Institute of Scientific and Technical Information of China (English)

    刘福才; 路平立; 潘江华; 裴润

    2004-01-01

    This paper presents an approach that is useful for the identification of a fuzzy model in SISO system. The initial values of cluster centers are identified by the Hough transformation, which considers the linearity and continuity of given input-output data, respectively. For the premise parts parameters identification, we use fuzzy-C-means clustering method. The consequent parameters are identified based on recursive least square. This method not only makes approximation more accurate, but also let computation be simpler and the procedure is realized more easily. Finally, it is shown that this method is useful for the identification of a fuzzy model by simulation.

  4. A polymerization-based method to construct a plasmid containing clustered DNA damage and a mismatch.

    Science.gov (United States)

    Takahashi, Momoko; Akamatsu, Ken; Shikazono, Naoya

    2016-10-01

    Exposure of biological materials to ionizing radiation often induces clustered DNA damage. The mutagenicity of clustered DNA damage can be analyzed with plasmids carrying a clustered DNA damage site, in which the strand bias of a replicating plasmid (i.e., the degree to which each of the two strands of the plasmid are used as the template for replication of the plasmid) can help to clarify how clustered DNA damage enhances the mutagenic potential of comprising lesions. Placement of a mismatch near a clustered DNA damage site can help to determine the strand bias, but present plasmid-based methods do not allow insertion of a mismatch at a given site in the plasmid. Here, we describe a polymerization-based method for constructing a plasmid containing clustered DNA lesions and a mismatch. The presence of a DNA lesion and a mismatch in the plasmid was verified by enzymatic treatment and by determining the relative abundance of the progeny plasmids derived from each of the two strands of the plasmid. PMID:27449134

  5. A New Keyphrases Extraction Method Based on Suffix Tree Data Structure for Arabic Documents Clustering

    Directory of Open Access Journals (Sweden)

    Issam SAHMOUDI

    2013-12-01

    Full Text Available Document Clustering is a branch of a larger area of scientific study kn own as data mining .which is an unsupervised classification using to find a structu re in a collection of unlabeled data. The useful information in the documents can be accompanied b y a large amount of noise words when using Full Tex t Representation, and therefore will affect negativel y the result of the clustering process. So it is w ith great need to eliminate the noise words and keeping just the useful information in order to enhance the qual ity of the clustering results. This problem occurs with di fferent degree for any language such as English, European, Hindi, Chinese, and Arabic Language. To o vercome this problem, in this paper, we propose a new and efficient Keyphrases extraction method base d on the Suffix Tree data structure (KpST, the extracted Keyphrases are then used in the clusterin g process instead of Full Text Representation. The proposed method for Keyphrases extraction is langua ge independent and therefore it may be applied to a ny language. In this investigation, we are interested to deal with the Arabic language which is one of th e most complex languages. To evaluate our method, we condu ct an experimental study on Arabic Documents using the most popular Clustering approach of Hiera rchical algorithms: Agglomerative Hierarchical algorithm with seven linkage techniques and a varie ty of distance functions and similarity measures to perform Arabic Document Clustering task. The obtain ed results show that our method for extracting Keyphrases increases the quality of the clustering results. We propose also to study the effect of using the stemming for the testing dataset to cluster it with the same documents clustering techniques and similarity/distance measures.

  6. New Clustering Method in High-Dimensional Space Based on Hypergraph-Models

    Institute of Scientific and Technical Information of China (English)

    CHEN Jian-bin; WANG Shu-jing; SONG Han-tao

    2006-01-01

    To overcome the limitation of the traditional clustering algorithms which fail to produce meanirigful clusters in high-dimensional, sparseness and binary value data sets, a new method based on hypergraph model is proposed. The hypergraph model maps the relationship present in the original data in high dimensional space into a hypergraph. A hyperedge represents the similarity of attribute-value distribution between two points. A hypergraph partitioning algorithm is used to find a partitioning of the vertices such that the corresponding data items in each partition are highly related and the weight of the hyperedges cut by the partitioning is minimized. The quality of the clustering result can be evaluated by applying the intra-cluster singularity value.Analysis and experimental results have demonstrated that this approach is applicable and effective in wide ranging scheme.

  7. A Cluster-based Method to Map Urban Area from DMSP/OLS Nightlights

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, Yuyu; Smith, Steven J.; Elvidge, Christopher; Zhao, Kaiguang; Thomson, Allison M.; Imhoff, Marc L.

    2014-05-05

    Accurate information of urban areas at regional and global scales is important for both the science and policy-making communities. The Defense Meteorological Satellite Program/Operational Linescan System (DMSP/OLS) nighttime stable light data (NTL) provide a potential way to map urban area and its dynamics economically and timely. In this study, we developed a cluster-based method to estimate the optimal thresholds and map urban extents from the DMSP/OLS NTL data in five major steps, including data preprocessing, urban cluster segmentation, logistic model development, threshold estimation, and urban extent delineation. Different from previous fixed threshold method with over- and under-estimation issues, in our method the optimal thresholds are estimated based on cluster size and overall nightlight magnitude in the cluster, and they vary with clusters. Two large countries of United States and China with different urbanization patterns were selected to map urban extents using the proposed method. The result indicates that the urbanized area occupies about 2% of total land area in the US ranging from lower than 0.5% to higher than 10% at the state level, and less than 1% in China, ranging from lower than 0.1% to about 5% at the province level with some municipalities as high as 10%. The derived thresholds and urban extents were evaluated using high-resolution land cover data at the cluster and regional levels. It was found that our method can map urban area in both countries efficiently and accurately. Compared to previous threshold techniques, our method reduces the over- and under-estimation issues, when mapping urban extent over a large area. More important, our method shows its potential to map global urban extents and temporal dynamics using the DMSP/OLS NTL data in a timely, cost-effective way.

  8. Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods

    Directory of Open Access Journals (Sweden)

    Morita Mitsuo

    2011-06-01

    Full Text Available Abstract Background A Bayesian approach based on a Dirichlet process (DP prior is useful for inferring genetic population structures because it can infer the number of populations and the assignment of individuals simultaneously. However, the properties of the DP prior method are not well understood, and therefore, the use of this method is relatively uncommon. We characterized the DP prior method to increase its practical use. Results First, we evaluated the usefulness of the sequentially-allocated merge-split (SAMS sampler, which is a technique for improving the mixing of Markov chain Monte Carlo algorithms. Although this sampler has been implemented in a preceding program, HWLER, its effectiveness has not been investigated. We showed that this sampler was effective for population structure analysis. Implementation of this sampler was useful with regard to the accuracy of inference and computational time. Second, we examined the effect of a hyperparameter for the prior distribution of allele frequencies and showed that the specification of this parameter was important and could be resolved by considering the parameter as a variable. Third, we compared the DP prior method with other Bayesian clustering methods and showed that the DP prior method was suitable for data sets with unbalanced sample sizes among populations. In contrast, although current popular algorithms for population structure analysis, such as those implemented in STRUCTURE, were suitable for data sets with uniform sample sizes, inferences with these algorithms for unbalanced sample sizes tended to be less accurate than those with the DP prior method. Conclusions The clustering method based on the DP prior was found to be useful because it can infer the number of populations and simultaneously assign individuals into populations, and it is suitable for data sets with unbalanced sample sizes among populations. Here we presented a novel program, DPART, that implements the SAMS

  9. Unconventional methods for clustering

    Science.gov (United States)

    Kotyrba, Martin

    2016-06-01

    Cluster analysis or clustering is a task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is the main task of exploratory data mining and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. The topic of this paper is one of the modern methods of clustering namely SOM (Self Organising Map). The paper describes the theory needed to understand the principle of clustering and descriptions of algorithm used with clustering in our experiments.

  10. Form gene clustering method about pan-ethnic-group products based on emotional semantic

    Science.gov (United States)

    Chen, Dengkai; Ding, Jingjing; Gao, Minzhuo; Ma, Danping; Liu, Donghui

    2016-09-01

    The use of pan-ethnic-group products form knowledge primarily depends on a designer's subjective experience without user participation. The majority of studies primarily focus on the detection of the perceptual demands of consumers from the target product category. A pan-ethnic-group products form gene clustering method based on emotional semantic is constructed. Consumers' perceptual images of the pan-ethnic-group products are obtained by means of product form gene extraction and coding and computer aided product form clustering technology. A case of form gene clustering about the typical pan-ethnic-group products is investigated which indicates that the method is feasible. This paper opens up a new direction for the future development of product form design which improves the agility of product design process in the era of Industry 4.0.

  11. A novel PPGA-based clustering analysis method for business cycle indicator selection

    Institute of Scientific and Technical Information of China (English)

    Dabin ZHANG; Lean YU; Shouyang WANG; Yingwen SONG

    2009-01-01

    A new clustering analysis method based on the pseudo parallel genetic algorithm (PPGA) is proposed for business cycle indicator selection. In the proposed method,the category of each indicator is coded by real numbers,and some illegal chromosomes are repaired by the identi-fication arid restoration of empty class. Two mutation op-erators, namely the discrete random mutation operator andthe optimal direction mutation operator, are designed to bal-ance the local convergence speed and the global convergence performance, which are then combined with migration strat-egy and insertion strategy. For the purpose of verification and illustration, the proposed method is compared with the K-means clustering algorithm and the standard genetic algo-rithms via a numerical simulation experiment. The experi-mental result shows the feasibility and effectiveness of the new PPGA-based clustering analysis algorithm. Meanwhile,the proposed clustering analysis algorithm is also applied to select the business cycle indicators to examine the status of the macro economy. Empirical results demonstrate that the proposed method can effectively and correctly select some leading indicators, coincident indicators, and lagging indi-cators to reflect the business cycle, which is extremely op-erational for some macro economy administrative managers and business decision-makers.

  12. Centroid Based Text Clustering

    Directory of Open Access Journals (Sweden)

    Priti Maheshwari

    2010-09-01

    Full Text Available Web mining is a burgeoning new field that attempts to glean meaningful information from natural language text. Web mining refers generally to the process of extracting interesting information and knowledge from unstructured text. Text clustering is one of the important Web mining functionalities. Text clustering is the task in which texts are classified into groups of similar objects based on their contents. Current research in the area of Web mining is tacklesproblems of text data representation, classification, clustering, information extraction or the search for and modeling of hidden patterns. In this paper we propose for mining large document collections it is necessary to pre-process the web documents and store the information in a data structure, which is more appropriate for further processing than a plain web file. In this paper we developed a php-mySql based utility to convert unstructured web documents into structured tabular representation by preprocessing, indexing .We apply centroid based web clustering method on preprocessed data. We apply three methods for clustering. Finally we proposed a method that can increase accuracy based on clustering ofdocuments.

  13. Galaxy Cluster Mass Reconstruction Project: I. Methods and first results on galaxy-based techniques

    CERN Document Server

    Old, L; Pearce, F R; Croton, D; Muldrew, S I; Muñoz-Cuartas, J C; Gifford, D; Gray, M E; von der Linden, A; Mamon, G A; Merrifield, M R; Müller, V; Pearson, R J; Ponman, T J; Saro, A; Sepp, T; Sifón, C; Tempel, E; Tundo, E; Wang, Y O; Wojtak, R

    2014-01-01

    This paper is the first in a series in which we perform an extensive comparison of various galaxy-based cluster mass estimation techniques that utilise the positions, velocities and colours of galaxies. Our primary aim is to test the performance of these cluster mass estimation techniques on a diverse set of models that will increase in complexity. We begin by providing participating methods with data from a simple model that delivers idealised clusters, enabling us to quantify the underlying scatter intrinsic to these mass estimation techniques. The mock catalogue is based on a Halo Occupation Distribution (HOD) model that assumes spherical Navarro, Frenk and White (NFW) haloes truncated at R_200, with no substructure nor colour segregation, and with isotropic, isothermal Maxwellian velocities. We find that, above 10^14 M_solar, recovered cluster masses are correlated with the true underlying cluster mass with an intrinsic scatter of typically a factor of two. Below 10^14 M_solar, the scatter rises as the nu...

  14. Are fragment-based quantum chemistry methods applicable to medium-sized water clusters?

    Science.gov (United States)

    Yuan, Dandan; Shen, Xiaoling; Li, Wei; Li, Shuhua

    2016-06-28

    Fragment-based quantum chemistry methods are either based on the many-body expansion or the inclusion-exclusion principle. To compare the applicability of these two categories of methods, we have systematically evaluated the performance of the generalized energy based fragmentation (GEBF) method (J. Phys. Chem. A, 2007, 111, 2193) and the electrostatically embedded many-body (EE-MB) method (J. Chem. Theory Comput., 2007, 3, 46) for medium-sized water clusters (H2O)n (n = 10, 20, 30). Our calculations demonstrate that the GEBF method provides uniformly accurate ground-state energies for 10 low-energy isomers of three water clusters under study at a series of theory levels, while the EE-MB method (with one water molecule as a fragment and without using the cutoff distance) shows a poor convergence for (H2O)20 and (H2O)30 when the basis set contains diffuse functions. Our analysis shows that the neglect of the basis set superposition error for each subsystem has little effect on the accuracy of the GEBF method, but leads to much less accurate results for the EE-MB method. The accuracy of the EE-MB method can be dramatically improved by using an appropriate cutoff distance and using two water molecules as a fragment. For (H2O)30, the average deviation of the EE-MB method truncated up to the three-body level calculated using this strategy (relative to the conventional energies) is about 0.003 hartree at the M06-2X/6-311++G** level, while the deviation of the GEBF method with a similar computational cost is less than 0.001 hartree. The GEBF method is demonstrated to be applicable for electronic structure calculations of water clusters at any basis set. PMID:27263629

  15. Improving cluster-based methods for investigating potential for insect pest species establishment: region-specific risk factors

    OpenAIRE

    Watts, Michael J.; Worner, Susan P.

    2011-01-01

    Existing cluster-based methods for investigating insect species assemblages or profiles of a region to indicate the risk of new insect pest invasion have a major limitation in that they assign the same species risk factors to each region in a cluster. Clearly regions assigned to the same cluster have different degrees of similarity with respect to their species profile or assemblage. This study addresses this concern by applying weighting factors to the cluster elements used to calculate regi...

  16. An Energy-Efficient Cluster-Based Vehicle Detection on Road Network Using Intention Numeration Method

    Directory of Open Access Journals (Sweden)

    Deepa Devasenapathy

    2015-01-01

    Full Text Available The traffic in the road network is progressively increasing at a greater extent. Good knowledge of network traffic can minimize congestions using information pertaining to road network obtained with the aid of communal callers, pavement detectors, and so on. Using these methods, low featured information is generated with respect to the user in the road network. Although the existing schemes obtain urban traffic information, they fail to calculate the energy drain rate of nodes and to locate equilibrium between the overhead and quality of the routing protocol that renders a great challenge. Thus, an energy-efficient cluster-based vehicle detection in road network using the intention numeration method (CVDRN-IN is developed. Initially, sensor nodes that detect a vehicle are grouped into separate clusters. Further, we approximate the strength of the node drain rate for a cluster using polynomial regression function. In addition, the total node energy is estimated by taking the integral over the area. Finally, enhanced data aggregation is performed to reduce the amount of data transmission using digital signature tree. The experimental performance is evaluated with Dodgers loop sensor data set from UCI repository and the performance evaluation outperforms existing work on energy consumption, clustering efficiency, and node drain rate.

  17. A novel intrusion detection method based on OCSVM and K-means recursive clustering

    Directory of Open Access Journals (Sweden)

    Leandros A. Maglaras

    2015-01-01

    Full Text Available In this paper we present an intrusion detection module capable of detecting malicious network traffic in a SCADA (Supervisory Control and Data Acquisition system, based on the combination of One-Class Support Vector Machine (OCSVM with RBF kernel and recursive k-means clustering. Important parameters of OCSVM, such as Gaussian width o and parameter v affect the performance of the classifier. Tuning of these parameters is of great importance in order to avoid false positives and over fitting. The combination of OCSVM with recursive k- means clustering leads the proposed intrusion detection module to distinguish real alarms from possible attacks regardless of the values of parameters o and v, making it ideal for real-time intrusion detection mechanisms for SCADA systems. Extensive simulations have been conducted with datasets extracted from small and medium sized HTB SCADA testbeds, in order to compare the accuracy, false alarm rate and execution time against the base line OCSVM method.

  18. Targets Separation and Imaging Method in Sparse Scene Based on Cluster Result of Range Profile Peaks

    Directory of Open Access Journals (Sweden)

    YANG Qiu

    2015-08-01

    Full Text Available This paper focuses on the synthetic aperture radar (SAR imaging of space-sparse targets such as ships on the sea, and proposes a method of targets separation and imaging of sparse scene based on cluster result of range profile peaks. Firstly, wavelet de-noising algorithm is used to preprocess the original echo, and then the range profile at different viewing positions can be obtained by range compression and range migration correction. Peaks of the range profiles can be detected by the fast peak detection algorithm based on second order difference operator. Targets with sparse energy intervals can be imaged through azimuth compression after clustering of peaks in range dimension. What's more, targets without coupling in range energy interval and direction synthetic aperture time can be imaged through azimuth compression after clustering of peaks both in range and direction dimension. Lastly, the effectiveness of the proposed method is validated by simulations. Results of experiment demonstrate that space-sparse targets such as ships can be imaged separately and completely with a small computation in azimuth compression, and the images are more beneficial for target recognition.

  19. A Method of Clustering Components into Modules Based on Products' Functional and Structural Analysis

    Institute of Scientific and Technical Information of China (English)

    MENG Xiang-hui; JIANG Zu-hua; ZHENG Ying-fei

    2006-01-01

    Modularity is the key to improving the cost-variety trade-off in product development. To achieve the functional independency and structural independency of modules, a method of clustering components to identify modules based on functional and structural analysis was presented. Two stages were included in the method. In the first stage the products' function was analyzed to determine the primary level of modules. Then the objective function for modules identifying was formulated to achieve functional independency of modules. Finally the genetic algorithm was used to solve the combinatorial optimization problem in modules identifying to form the primary modules of products. In the second stage the cohesion degree of modules and the coupling degree between modules were analyzed. Based on this structural analysis the modular scheme was refined according to the thinking of structural independency. A case study on the gear reducer was conducted to illustrate the validity of the presented method.

  20. Hybrid Decomposition Method in Parallel Molecular Dynamics Simulation Based on SMP Cluster Architecture

    Institute of Scientific and Technical Information of China (English)

    WANG Bing; SHU Jiwu; ZHENG Weimin; WANG Jinzhao; CHEN Min

    2005-01-01

    A hybrid decomposition method for molecular dynamics simulations was presented, using simultaneously spatial decomposition and force decomposition to fit the architecture of a cluster of symmetric multi-processor (SMP) nodes. The method distributes particles between nodes based on the spatial decomposition strategy to reduce inter-node communication costs. The method also partitions particle pairs within each node using the force decomposition strategy to improve the load balance for each node. Simulation results for a nucleation process with 4 000 000 particles show that the hybrid method achieves better parallel performance than either spatial or force decomposition alone, especially when applied to a large scale particle system with non-uniform spatial density.

  1. An extended affinity propagation clustering method based on different data density types.

    Science.gov (United States)

    Zhao, XiuLi; Xu, WeiXiang

    2015-01-01

    Affinity propagation (AP) algorithm, as a novel clustering method, does not require the users to specify the initial cluster centers in advance, which regards all data points as potential exemplars (cluster centers) equally and groups the clusters totally by the similar degree among the data points. But in many cases there exist some different intensive areas within the same data set, which means that the data set does not distribute homogeneously. In such situation the AP algorithm cannot group the data points into ideal clusters. In this paper, we proposed an extended AP clustering algorithm to deal with such a problem. There are two steps in our method: firstly the data set is partitioned into several data density types according to the nearest distances of each data point; and then the AP clustering method is, respectively, used to group the data points into clusters in each data density type. Two experiments are carried out to evaluate the performance of our algorithm: one utilizes an artificial data set and the other uses a real seismic data set. The experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithm itself.

  2. A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.

    Science.gov (United States)

    de Brito, Daniel M; Maracaja-Coutinho, Vinicius; de Farias, Savio T; Batista, Leonardo V; do Rêgo, Thaís G

    2016-01-01

    Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657

  3. A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.

    Directory of Open Access Journals (Sweden)

    Daniel M de Brito

    Full Text Available Genomic Islands (GIs are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me.

  4. A Research on Competitiveness of Guangxi City——Based on System Clustering Method and Principal Component Analysis Method

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    A total of 10 indices of regional economic development in Guangxi are selected.According to the relevant economic data,regional economic development in Guangxi is analyzed by using System Clustering Method and Principal Component Analysis Method.Result shows that System Clustering Method and Principal Component Analysis Method have revealed similar results analysis of economic development level.Overall economic strength of Guangxi is weak and Nanning has relatively high scores of factors due to its advantage of the political,economic and cultural center.Comprehensive scores of other regions are all lower than 1,which has big gap with the development of Nanning.Overall development strategy points out that Guangxi should accelerate the construction of the Ring Northern Bay Economic Zone,create a strong logistics system having strategic significance to national development,use the unique location advantage and rely on the modern transportation system to establish a logistics center and business center connecting the hinterland and the Asean Market.Based on the problems of unbalanced regional economic development in Guangxi,we should speed up the development of service industry in Nanning,construct the circular economy system of industrial city,and accelerate the industrialization process of tourism city in order to realize balanced development of regional economy in Guangxi,China.

  5. The tidal tails of globular cluster Palomar 5 based on the neural networks method

    Institute of Scientific and Technical Information of China (English)

    Hu Zou; Zhen-Yu WU; Jun Ma; Xu Zhou

    2009-01-01

    The sixth Data Release (DR6) of the Sloan Digital Sky Survey (SDSS) provides more photometric regions,new features and more accurate data around globular cluster Palomar 5.A new method,Back Propagation Neural Network (BPNN),is used to estimate the cluster membership probability in order to detect its tidal tails.Cluster and field stars,used for training the networks,are extracted over a 40×20 deg~2 field by color-magnitude diagrams (CMDs).The best BPNNs with two hidden layers and a Levenberg-Marquardt(LM) training algorithm are determined by the chosen cluster and field samples.The membership probabilities of stars in the whole field are obtained with the BPNNs,and contour maps of the probability distribution show that a tail extends 5.42°to the north of the cluster and another tail extends 3.77°to the south.The tails are similar to those detected by Odenkirchen et al.,but no more debris from the cluster is found to the northeast in the sky.The radial density profiles are investigated both along the tails and near the cluster center.Quite a few substructures are discovered in the tails.The number density profile of the cluster is fitted with the King model and the tidal radius is determined as 14.28'.However,the King model cannot fit the observed profile at the outer regions (R > 8') because of the tidal tails generated by the tidal force.Luminosity functions of the cluster and the tidal tails are calculated,which confirm that the tails originate from Palomar 5.

  6. Microcalcification detection in full-field digital mammograms with PFCM clustering and weighted SVM-based method

    Science.gov (United States)

    Liu, Xiaoming; Mei, Ming; Liu, Jun; Hu, Wei

    2015-12-01

    Clustered microcalcifications (MCs) in mammograms are an important early sign of breast cancer in women. Their accurate detection is important in computer-aided detection (CADe). In this paper, we integrated the possibilistic fuzzy c-means (PFCM) clustering algorithm and weighted support vector machine (WSVM) for the detection of MC clusters in full-field digital mammograms (FFDM). For each image, suspicious MC regions are extracted with region growing and active contour segmentation. Then geometry and texture features are extracted for each suspicious MC, a mutual information-based supervised criterion is used to select important features, and PFCM is applied to cluster the samples into two clusters. Weights of the samples are calculated based on possibilities and typicality values from the PFCM, and the ground truth labels. A weighted nonlinear SVM is trained. During the test process, when an unknown image is presented, suspicious regions are located with the segmentation step, selected features are extracted, and the suspicious MC regions are classified as containing MC or not by the trained weighted nonlinear SVM. Finally, the MC regions are analyzed with spatial information to locate MC clusters. The proposed method is evaluated using a database of 410 clinical mammograms and compared with a standard unweighted support vector machine (SVM) classifier. The detection performance is evaluated using response receiver operating (ROC) curves and free-response receiver operating characteristic (FROC) curves. The proposed method obtained an area under the ROC curve of 0.8676, while the standard SVM obtained an area of 0.8268 for MC detection. For MC cluster detection, the proposed method obtained a high sensitivity of 92 % with a false-positive rate of 2.3 clusters/image, and it is also better than standard SVM with 4.7 false-positive clusters/image at the same sensitivity.

  7. Assessing the Eutrophication of Shengzhong Reservoir Based on Grey Clustering Method

    Institute of Scientific and Technical Information of China (English)

    Pan An; Hu Lihui; Li Tesong; Li Chengzhu

    2009-01-01

    Reservoir water environment is a grey system.The grey clustering method is applied to assessing the reservoir water envi-ronment to establish a relatively complete model suitable for the reservoir eutrophication evaluation and appropriately evaluate the quality of reservoir water, providing evidence for reservoir man-agement.According to Chiua's lakes and reservoir eutrophication criteria and the characteristics of China's entrophication, as well as certain evaluation indices, the degree of eutrophication is classified into six categories with the utilization of grey classified whitening weight function to represent the boundaries of classification, to determine the clustering weight and clustering coefficient of each index in grey classifications, and the classification of each cluster-lag object.The comprehensive evaluation of reservoir eutrophica-tion is established on such a foundation, with Sichuan Shengzhong Reservoir as the survey object and the analysis of the data attained by several typical monitoring points there in 2006.It is found that eutrophication of Tiebian Power Generation Station, Guoyu-anchang and Dashiqiao Bridge is the heaviest, Tielusi and Qing-gangya the second, and Lijiaba the least.The eutrophication of this reservoir is closely relevant to the irrational exploitation in its surrounding areas, especially to the aggravation of the non-point source pollution and the increase of net-culture fishing.Therefore, it is feasible to use grey clustering in environment quality evalu-ation, and the point lies in the correct division of grey whitening function

  8. Privacy Preserving Multiview Point Based BAT Clustering Algorithm and Graph Kernel Method for Data Disambiguation on Horizontally Partitioned Data

    Directory of Open Access Journals (Sweden)

    J. Anitha

    2015-06-01

    Full Text Available Data mining has been a popular research area for more than a decade due to its vast spectrum of applications. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. Thus, the burden of data privacy protection falls on the shoulder of the data holder and data disambiguation problem occurs in the data matrix, anonymized data becomes less secure. All of the existing privacy preservation clustering methods performs clustering based on single point of view, which is the origin, while the latter utilizes many different viewpoints, which are objects assumed to not be in the same cluster with the two objects being measured. To solve this all of above mentioned problems, this study presents a multiview point based clustering methods for anonymized data. Before that data disambiguation problem is solved by using Ramon-Gartner Subtree Graph Kernel (RGSGK, where the weight values are assigned and kernel value is determined for disambiguated data. Obtain privacy by anonymization, where the data is encrypted with secure key is obtained by the Ring-Based Fully Homomorphic Encryption (RBFHE. In order to group the anonymize data, in this study BAT clustering method is proposed based on multiview point based similarity measurement and the proposed method is called as MVBAT. However in this paper initially distance matrix is calculated and using which similarity matrix and dissimilarity matrix is formed. The experimental result of the proposed MVBAT Clustering algorithm is compared with conventional methods in terms of the F-Measure, running time, privacy loss and utility loss. RBFHE encryption results is also compared with existing methods in terms of the communication cost for UCI machine learning datasets such as adult dataset and house dataset.

  9. A New Elliptical Grid Clustering Method

    Science.gov (United States)

    Guansheng, Zheng

    A new base on grid clustering method is presented in this paper. This new method first does unsupervised learning on the high dimensions data. This paper proposed a grid-based approach to clustering. It maps the data onto a multi-dimensional space and applies a linear transformation to the feature space instead of to the objects themselves and then approach a grid-clustering method. Unlike the conventional methods, it uses a multidimensional hyper-eclipse grid cell. Some case studies and ideas how to use the algorithms are described. The experimental results show that EGC can discover abnormity shapes of clusters.

  10. A bottom-up method for module-based product platform development through mapping, clustering and matching analysis

    Institute of Scientific and Technical Information of China (English)

    ZHANG Meng; LI Guo-xi; CAO Jian-ping; GONG Jing-zhong; WU Bao-zhong

    2016-01-01

    Designing product platform could be an effective and efficient solution for manufacturing firms. Product platforms enable firms to provide increased product variety for the marketplace with as little variety between products as possible. Developed consumer products and modules within a firm can further be investigated to find out the possibility of product platform creation. A bottom-up method is proposed for module-based product platform through mapping, clustering and matching analysis. The framework and the parametric model of the method are presented, which consist of three steps: (1) mapping parameters from existing product families to functional modules, (2) clustering the modules within existing module families based on their parameters so as to generate module clusters, and selecting the satisfactory module clusters based on commonality, and (3) matching the parameters of the module clusters to the functional modules in order to capture platform elements. In addition, the parameter matching criterion and mismatching treatment are put forward to ensure the effectiveness of the platform process, while standardization and serialization of the platform element are presented. A design case of the belt conveyor is studied to demonstrate the feasibility of the proposed method.

  11. A Semantic-based Clustering Method to Build Domain Ontology from Multiple Heterogeneous Knowledge Sources

    Institute of Scientific and Technical Information of China (English)

    LING Ling; HU Yu-jin; WANG Xue-lin; LI Cheng-gang

    2006-01-01

    In order to improve the efficiency of ontology construction from heterogeneous knowledge sources, a semantic-based approach is presented. The ontology will be constructed with the application of cluster technique in an incremental way.Firstly, terms will be extracted from knowledge sources and congregate a term set after pretreat-ment. Then the concept set will be built via semantic-based clustering according to semanteme of terms provided by WordNet. Next, a concept tree is constructed in terms of mapping rules between semanteme relationships and concept relationships. The semi-automatic approach can avoid non-consistence due to knowledge engineers having different understanding of the same concept and the obtained ontology is easily to be expanded.

  12. Improving cluster-based methods for investigating potential for insect pest species establishment: region-specific risk factors

    Directory of Open Access Journals (Sweden)

    Michael J. Watts

    2011-09-01

    Full Text Available Existing cluster-based methods for investigating insect species assemblages or profiles of a region to indicate the risk of new insect pest invasion have a major limitation in that they assign the same species risk factors to each region in a cluster. Clearly regions assigned to the same cluster have different degrees of similarity with respect to their species profile or assemblage. This study addresses this concern by applying weighting factors to the cluster elements used to calculate regional risk factors, thereby producing region-specific risk factors. Using a database of the global distribution of crop insect pest species, we found that we were able to produce highly differentiated region-specific risk factors for insect pests. We did this by weighting cluster elements by their Euclidean distance from the target region. Using this approach meant that risk weightings were derived that were more realistic, as they were specific to the pest profile or species assemblage of each region. This weighting method provides an improved tool for estimating the potential invasion risk posed by exotic species given that they have an opportunity to establish in a target region.

  13. Molecular-based rapid inventories of sympatric diversity: a comparison of DNA barcode clustering methods applied to geography-based vs clade-based sampling of amphibians.

    Science.gov (United States)

    Paz, Andrea; Crawford, Andrew J

    2012-11-01

    Molecular markers offer a universal source of data for quantifying biodiversity. DNA barcoding uses a standardized genetic marker and a curated reference database to identify known species and to reveal cryptic diversity within wellsampled clades. Rapid biological inventories, e.g. rapid assessment programs (RAPs), unlike most barcoding campaigns, are focused on particular geographic localities rather than on clades. Because of the potentially sparse phylogenetic sampling, the addition of DNA barcoding to RAPs may present a greater challenge for the identification of named species or for revealing cryptic diversity. In this article we evaluate the use of DNA barcoding for quantifying lineage diversity within a single sampling site as compared to clade-based sampling, and present examples from amphibians. We compared algorithms for identifying DNA barcode clusters (e.g. species, cryptic species or Evolutionary Significant Units) using previously published DNA barcode data obtained from geography-based sampling at a site in Central Panama, and from clade-based sampling in Madagascar. We found that clustering algorithms based on genetic distance performed similarly on sympatric as well as clade-based barcode data, while a promising coalescent-based method performed poorly on sympatric data. The various clustering algorithms were also compared in terms of speed and software implementation. Although each method has its shortcomings in certain contexts, we recommend the use of the ABGD method, which not only performs fairly well under either sampling method, but does so in a few seconds and with a user-friendly Web interface.

  14. Molecular-based rapid inventories of sympatric diversity: A comparison of DNA barcode clustering methods applied to geography-based vs clade-based sampling of amphibians

    Indian Academy of Sciences (India)

    Andrea Paz; Andrew J Crawford

    2012-11-01

    Molecular markers offer a universal source of data for quantifying biodiversity. DNA barcoding uses a standardized genetic marker and a curated reference database to identify known species and to reveal cryptic diversity within well-sampled clades. Rapid biological inventories, e.g. rapid assessment programs (RAPs), unlike most barcoding campaigns, are focused on particular geographic localities rather than on clades. Because of the potentially sparse phylogenetic sampling, the addition of DNA barcoding to RAPs may present a greater challenge for the identification of named species or for revealing cryptic diversity. In this article we evaluate the use of DNA barcoding for quantifying lineage diversity within a single sampling site as compared to clade-based sampling, and present examples from amphibians. We compared algorithms for identifying DNA barcode clusters (e.g. species, cryptic species or Evolutionary Significant Units) using previously published DNA barcode data obtained from geography-based sampling at a site in Central Panama, and from clade-based sampling in Madagascar. We found that clustering algorithms based on genetic distance performed similarly on sympatric as well as clade-based barcode data, while a promising coalescent-based method performed poorly on sympatric data. The various clustering algorithms were also compared in terms of speed and software implementation. Although each method has its shortcomings in certain contexts, we recommend the use of the ABGD method, which not only performs fairly well under either sampling method, but does so in a few seconds and with a user-friendly Web interface.

  15. Watchdog-LEACH: A new method based on LEACH protocol to Secure Clustered Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Mohammad Reza Rohbanian

    2013-07-01

    Full Text Available Wireless sensor network comprises of small sensor nodes with limited resources. Clustered networks have been proposed in many researches to reduce the power consumption in sensor networks. LEACH is one of the most interested techniques that offer an efficient way to minimize the power consumption in sensor networks. However, due to the characteristics of restricted resources and operation in a hostile environment, WSNs are subjected to numerous threats and are vulnerable to attacks. This research proposes a solution that can be applied on LEACH to increase the level of security. In Watchdog-LEACH, some nodes are considered as watchdogs and some changes are applied on LEACH protocol for intrusion detection. Watchdog-LEACH is able to protect against a wide range of attacks and it provides security, energy efficiency and memory efficiency. The result of simulation shows that in comparison to LEACH, the energy overhead is about 2% so this method is practical and can be applied to WSNs.

  16. FAULT DIAGNOSIS BASED ON INTE- GRATION OF CLUSTER ANALYSIS,ROUGH SET METHOD AND FUZZY NEURAL NETWORK

    Institute of Scientific and Technical Information of China (English)

    Feng Zhipeng; Song Xigeng; Chu Fulei

    2004-01-01

    In order to increase the efficiency and decrease the cost of machinery diagnosis, a hybrid system of computational intelligence methods is presented. Firstly, the continuous attributes in diagnosis decision system are discretized with the self-organizing map (SOM) neural network. Then, dynamic reducts are computed based on rough set method, and the key conditions for diagnosis are found according to the maximum cluster ratio. Lastly, according to the optimal reduct, the adaptive neuro-fuzzy inference system (ANFIS) is designed for fault identification. The diagnosis of a diesel verifies the feasibility of engineering applications.

  17. COOPERATIVE CLUSTERING BASED ON GRID AND DENSITY

    Institute of Scientific and Technical Information of China (English)

    HU Ruifei; YIN Guofu; TAN Ying; CAI Peng

    2006-01-01

    Based on the analysis of features of the grid-based clustering method-clustering in quest(CLIQUE) and density-based clustering method-density-based spatial clustering of applications with noise (DBSCAN), a new clustering algorithm named cooperative clustering based on grid and density(CLGRID) is presented. The new algorithm adopts an equivalent rule of regional inquiry and density unit identification. The central region of one class is calculated by the grid-based method and the margin region by a density-based method. By clustering in two phases and using only a small number of seed objects in representative units to expand the cluster, the frequency of region query can be decreased, and consequently the cost of time is reduced. The new algorithm retains positive features of both grid-based and density-based methods and avoids the difficulty of parameter searching. It can discover clusters of arbitrary shape with high efficiency and is not sensitive to noise. The application of CLGRID on test data sets demonstrates its validity and higher efficiency, which contrast with traditional DBSCAN with R* tree.

  18. Consumers' Kansei Needs Clustering Method for Product Emotional Design Based on Numerical Design Structure Matrix and Genetic Algorithms.

    Science.gov (United States)

    Yang, Yan-Pu; Chen, Deng-Kai; Gu, Rong; Gu, Yu-Feng; Yu, Sui-Huai

    2016-01-01

    Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design. PMID:27630709

  19. Consumers' Kansei Needs Clustering Method for Product Emotional Design Based on Numerical Design Structure Matrix and Genetic Algorithms

    Science.gov (United States)

    Chen, Deng-kai; Gu, Rong; Gu, Yu-feng; Yu, Sui-huai

    2016-01-01

    Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design. PMID:27630709

  20. Analysis of dynamic cerebral contrast-enhanced perfusion MRI time-series based on unsupervised clustering methods

    Science.gov (United States)

    Lange, Oliver; Meyer-Baese, Anke; Wismuller, Axel; Hurdal, Monica

    2005-03-01

    We employ unsupervised clustering techniques for the analysis of dynamic contrast-enhanced perfusion MRI time-series in patients with and without stroke. "Neural gas" network, fuzzy clustering based on deterministic annealing, self-organizing maps, and fuzzy c-means clustering enable self-organized data-driven segmentation w.r.t.fine-grained differences of signal amplitude and dynamics, thus identifying asymmetries and local abnormalities of brain perfusion. We conclude that clustering is a useful extension to conventional perfusion parameter maps.

  1. Cluster Tree Based Hybrid Document Similarity Measure

    Directory of Open Access Journals (Sweden)

    M. Varshana Devi

    2015-10-01

    Full Text Available <Cluster tree based hybrid similarity measure is established to measure the hybrid similarity. In cluster tree, the hybrid similarity measure can be calculated for the random data even it may not be the co-occurred and generate different views. Different views of tree can be combined and choose the one which is significant in cost. A method is proposed to combine the multiple views. Multiple views are represented by different distance measures into a single cluster. Comparing the cluster tree based hybrid similarity with the traditional statistical methods it gives the better feasibility for intelligent based search. It helps in improving the dimensionality reduction and semantic analysis.

  2. High Dimensional Data Clustering Using Fast Cluster Based Feature Selection

    Directory of Open Access Journals (Sweden)

    Karthikeyan.P

    2014-03-01

    Full Text Available Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature selection algorithm (FAST is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent; the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree (MST using the Kruskal‟s Algorithm clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Index Terms—

  3. Lick Indices and Spectral Energy Distribution Analysis based on an M31 Star Cluster Sample: Comparisons of Methods and Models

    CERN Document Server

    Fan, Zhou; Chen, Bingqiu; Jiang, Linhua; Bian, Fuyan; Li, Zhongmu

    2016-01-01

    Application of fitting techniques to obtain physical parameters---such as ages, metallicities, and $\\alpha$-element to iron ratios---of stellar populations is an important approach to understand the nature of both galaxies and globular clusters (GCs). In fact, fitting methods based on different underlying models may yield different results, and with varying precision. In this paper, we have selected 22 confirmed M31 GCs for which we do not have access to previously known spectroscopic metallicities. Most are located at approximately one degree (in projection) from the galactic center. We performed spectroscopic observations with the 6.5 m MMT telescope, equipped with its Red Channel Spectrograph. Lick/IDS absorption-line indices, radial velocities, ages, and metallicities were derived based on the $\\rm EZ\\_Ages$ stellar population parameter calculator. We also applied full spectral fitting with the ULySS code to constrain the parameters of our sample star clusters. In addition, we performed $\\chi^2_{\\rm min}$...

  4. Cycle-Based Cluster Variational Method for Direct and Inverse Inference

    Science.gov (United States)

    Furtlehner, Cyril; Decelle, Aurélien

    2016-08-01

    Large scale inference problems of practical interest can often be addressed with help of Markov random fields. This requires to solve in principle two related problems: the first one is to find offline the parameters of the MRF from empirical data (inverse problem); the second one (direct problem) is to set up the inference algorithm to make it as precise, robust and efficient as possible. In this work we address both the direct and inverse problem with mean-field methods of statistical physics, going beyond the Bethe approximation and associated belief propagation algorithm. We elaborate on the idea that loop corrections to belief propagation can be dealt with in a systematic way on pairwise Markov random fields, by using the elements of a cycle basis to define regions in a generalized belief propagation setting. For the direct problem, the region graph is specified in such a way as to avoid feed-back loops as much as possible by selecting a minimal cycle basis. Following this line we are led to propose a two-level algorithm, where a belief propagation algorithm is run alternatively at the level of each cycle and at the inter-region level. Next we observe that the inverse problem can be addressed region by region independently, with one small inverse problem per region to be solved. It turns out that each elementary inverse problem on the loop geometry can be solved efficiently. In particular in the random Ising context we propose two complementary methods based respectively on fixed point equations and on a one-parameter log likelihood function minimization. Numerical experiments confirm the effectiveness of this approach both for the direct and inverse MRF inference. Heterogeneous problems of size up to 10^5 are addressed in a reasonable computational time, notably with better convergence properties than ordinary belief propagation.

  5. A Novel Wireless Power Transfer-Based Weighed Clustering Cooperative Spectrum Sensing Method for Cognitive Sensor Networks.

    Science.gov (United States)

    Liu, Xin

    2015-10-30

    In a cognitive sensor network (CSN), the wastage of sensing time and energy is a challenge to cooperative spectrum sensing, when the number of cooperative cognitive nodes (CNs) becomes very large. In this paper, a novel wireless power transfer (WPT)-based weighed clustering cooperative spectrum sensing model is proposed, which divides all the CNs into several clusters, and then selects the most favorable CNs as the cluster heads and allows the common CNs to transfer the received radio frequency (RF) energy of the primary node (PN) to the cluster heads, in order to supply the electrical energy needed for sensing and cooperation. A joint resource optimization is formulated to maximize the spectrum access probability of the CSN, through jointly allocating sensing time and clustering number. According to the resource optimization results, a clustering algorithm is proposed. The simulation results have shown that compared to the traditional model, the cluster heads of the proposed model can achieve more transmission power and there exists optimal sensing time and clustering number to maximize the spectrum access probability.

  6. Cluster identification based on correlations

    Science.gov (United States)

    Schulman, L. S.

    2012-04-01

    The problem addressed is the identification of cooperating agents based on correlations created as a result of the joint action of these and other agents. A systematic method for using correlations beyond second moments is developed. The technique is applied to a didactic example, the identification of alphabet letters based on correlations among the pixels used in an image of the letter. As in this example, agents can belong to more than one cluster. Moreover, the identification scheme does not require that the patterns be known ahead of time.

  7. Cluster Based Text Classification Model

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases......, the classifier is trained on each cluster having reduced dimensionality and less number of examples. The experimental results show that the proposed model outperforms the existing classification models for the task of suspicious email detection and topic categorization on the Reuters-21578 and 20 Newsgroups...... datasets. Our model also outperforms A Decision Cluster Classification (ADCC) and the Decision Cluster Forest Classification (DCFC) models on the Reuters-21578 dataset....

  8. Cosine-Based Clustering Algorithm Approach

    Directory of Open Access Journals (Sweden)

    Mohammed A. H. Lubbad

    2012-02-01

    Full Text Available Due to many applications need the management of spatial data; clustering large spatial databases is an important problem which tries to find the densely populated regions in the feature space to be used in data mining, knowledge discovery, or efficient information retrieval. A good clustering approach should be efficient and detect clusters of arbitrary shapes. It must be insensitive to the outliers (noise and the order of input data. In this paper Cosine Cluster is proposed based on cosine transformation, which satisfies all the above requirements. Using multi-resolution property of cosine transforms, arbitrary shape clusters can be effectively identified at different degrees of accuracy. Cosine Cluster is also approved to be highly efficient in terms of time complexity. Experimental results on very large data sets are presented, which show the efficiency and effectiveness of the proposed approach compared to other recent clustering methods.

  9. Niching method using clustering crowding

    Institute of Scientific and Technical Information of China (English)

    GUO Guan-qi; GUI Wei-hua; WU Min; YU Shou-yi

    2005-01-01

    This study analyzes drift phenomena of deterministic crowding and probabilistic crowding by using equivalence class model and expectation proportion equations. It is proved that the replacement errors of deterministic crowding cause the population converging to a single individual, thus resulting in premature stagnation or losing optional optima. And probabilistic crowding can maintain equilibrium multiple subpopulations as the population size is adequate large. An improved niching method using clustering crowding is proposed. By analyzing topology of fitness landscape using hill valley function and extending the search space for similarity analysis, clustering crowding determines the locality of search space more accurately, thus greatly decreasing replacement errors of crowding. The integration of deterministic and probabilistic replacement increases the capacity of both parallel local hill climbing and maintaining multiple subpopulations. The experimental results optimizing various multimodal functions show that,the performances of clustering crowding, such as the number of effective peaks maintained, average peak ratio and global optimum ratio are uniformly superior to those of the evolutionary algorithms using fitness sharing, simple deterministic crowding and probabilistic crowding.

  10. Document Clustering Based on Semi-Supervised Term Clustering

    Directory of Open Access Journals (Sweden)

    Hamid Mahmoodi

    2012-05-01

    Full Text Available The study is conducted to propose a multi-step feature (term selection process and in semi-supervised fashion, provide initial centers for term clusters. Then utilize the fuzzy c-means (FCM clustering algorithm for clustering terms. Finally assign each of documents to closest associated term clusters. While most text clustering algorithms directly use documents for clustering, we propose to first group the terms using FCM algorithm and then cluster documents based on terms clusters. We evaluate effectiveness of our technique on several standard text collections and compare our results with the some classical text clustering algorithms.

  11. A new intelligent method for minerals segmentation in thin sections based on a novel incremental color clustering

    Science.gov (United States)

    Izadi, Hossein; Sadri, Javad; Mehran, Nosrat-Agha

    2015-08-01

    Mineral segmentation in thin sections is a challenging, popular, and important research topic in computational geology, mineralogy, and mining engineering. Mineral segmentation in thin sections containing altered minerals, in which there are no evident and close boundaries, is a rather complex process. Most of the thin sections created in industries include altered minerals. However, intelligent mineral segmentation in thin sections containing altered minerals has not been widely investigated in the literature, and the current state of the art algorithms are not able to accurately segment minerals in such thin sections. In this paper, a novel method based on incremental learning for clustering pixels is proposed in order to segment index minerals in both thin sections with and without altered minerals. Our algorithm uses 12 color features that are extracted from thin section images. These features include red, green, blue, hue, saturation and intensity, under plane and cross polarized lights in maximum intensity situation. The proposed method has been tested on 155 igneous samples and the overall accuracy of 92.15% and 85.24% has been obtained for thin sections without altered minerals and thin sections containing altered minerals, respectively. Experimental results indicate that the proposed method outperforms the results of other similar methods in the literature, especially for segmenting thin sections containing altered minerals. The proposed algorithm could be applied in applications which require a real time segmentation or efficient identification map such as petroleum geology, petrography and NASA Mars explorations.

  12. Sequential Clustering based Facial Feature Extraction Method for Automatic Creation of Facial Models from Orthogonal Views

    CERN Document Server

    Ghahari, Alireza

    2009-01-01

    Multiview 3D face modeling has attracted increasing attention recently and has become one of the potential avenues in future video systems. We aim to make more reliable and robust automatic feature extraction and natural 3D feature construction from 2D features detected on a pair of frontal and profile view face images. We propose several heuristic algorithms to minimize possible errors introduced by prevalent nonperfect orthogonal condition and noncoherent luminance. In our approach, we first extract the 2D features that are visible to both cameras in both views. Then, we estimate the coordinates of the features in the hidden profile view based on the visible features extracted in the two orthogonal views. Finally, based on the coordinates of the extracted features, we deform a 3D generic model to perform the desired 3D clone modeling. Present study proves the scope of resulted facial models for practical applications like face recognition and facial animation.

  13. A Clustered Multiclass Likelihood-Ratio Ensemble Method for Family-Based Association Analysis Accounting for Phenotypic Heterogeneity.

    Science.gov (United States)

    Wen, Yalu; Lu, Qing

    2016-09-01

    Although compelling evidence suggests that the genetic etiology of complex diseases could be heterogeneous in subphenotype groups, little attention has been paid to phenotypic heterogeneity in genetic association analysis of complex diseases. Simply ignoring phenotypic heterogeneity in association analysis could result in attenuated estimates of genetic effects and low power of association tests if subphenotypes with similar clinical manifestations have heterogeneous underlying genetic etiologies. To facilitate the family-based association analysis allowing for phenotypic heterogeneity, we propose a clustered multiclass likelihood-ratio ensemble (CMLRE) method. The proposed method provides an alternative way to model the complex relationship between disease outcomes and genetic variants. It allows for heterogeneous genetic causes of disease subphenotypes and can be applied to various pedigree structures. Through simulations, we found CMLRE outperformed the commonly adopted strategies in a variety of underlying disease scenarios. We further applied CMLRE to a family-based dataset from the International Consortium to Identify Genes and Interactions Controlling Oral Clefts (ICOC) to investigate the genetic variants and interactions predisposing to subphenotypes of oral clefts. The analysis suggested that two subphenotypes, nonsyndromic cleft lip without palate (CL) and cleft lip with palate (CLP), shared similar genetic etiologies, while cleft palate only (CP) had its own genetic mechanism. The analysis further revealed that rs10863790 (IRF6), rs7017252 (8q24), and rs7078160 (VAX1) were jointly associated with CL/CLP, while rs7969932 (TBK1), rs227731 (17q22), and rs2141765 (TBK1) jointly contributed to CP.

  14. Clustering in Water Based Magnetic Nanofluids: Investigations by Light Scattering Methods

    Science.gov (United States)

    Socoliuc, Vlad; Taculescu, Alina; Podaru, Camelia; Dobra, Andreea; Daia, Camelia; Marinica, Oana; Turcu, Rodica; Vekas, Ladislau

    2010-12-01

    Nanosized magnetite particles, with mean physical diameter of about 7 nm, obtained by chemical coprecipitation procedure were dispersed in water carrier by applying sterical stabilization of particles in order to prevent their aggregation and to ensure colloidal stability of the systems. Different chain length (C12, C14, C18) carboxylic acids (lauric (LA), myristic (MA) and oleic (OA)) were used for double layer coating of magnetite nanoparticles. Structural and magnetic properties were investigated by electron microscopy (TEM), dynamical and static light scattering (DLS, SLS) and magnetometry (VSM) to evaluate the role of chain length and of the saturated/unsaturated nature of surfactant layers. Also investigated were two water based magnetic nanocomposites obtained by encapsulating the magnetic nanoparticles in polymers with different functional properties.

  15. Pavement Crack Detection Using Spectral Clustering Method

    Directory of Open Access Journals (Sweden)

    Jin Huazhong

    2015-01-01

    Full Text Available Pavement crack detection plays an important role in pavement maintaining and management, nowadays, which could be performed through remote image analysis. Thus, edges of pavement crack should be extracted in advance; in general, traditional edge detection methods don’t consider phase information and the spatial relationship between the adjacent image areas to extract the edges. To overcome the deficiency of the traditional approaches, this paper proposes a pavement crack detection algorithm based on spectral clustering method. Firstly, a measure of similarity between pairs of pixels is taken into account through orientation energy. Then, spatial relationship is needed to find regions where similarity between pixels in a given region is high and similarity between pixels in different regions is low. After that, crack edge detection is completed with spectral clustering method. The presented method has been run on some real life images of pavement crack, experimental results display that the crack detection method of this paper could obtain ideal result.

  16. Model-based clustered-dot screening

    Science.gov (United States)

    Kim, Sang Ho

    2006-01-01

    I propose a halftone screen design method based on a human visual system model and the characteristics of the electro-photographic (EP) printer engine. Generally, screen design methods based on human visual models produce dispersed-dot type screens while design methods considering EP printer characteristics generate clustered-dot type screens. In this paper, I propose a cost function balancing the conflicting characteristics of the human visual system and the printer. By minimizing the obtained cost function, I design a model-based clustered-dot screen using a modified direct binary search algorithm. Experimental results demonstrate the superior quality of the model-based clustered-dot screen compared to a conventional clustered-dot screen.

  17. The SMART CLUSTER METHOD - adaptive earthquake cluster analysis and declustering

    Science.gov (United States)

    Schaefer, Andreas; Daniell, James; Wenzel, Friedemann

    2016-04-01

    Earthquake declustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity with usual applications comprising of probabilistic seismic hazard assessments (PSHAs) and earthquake prediction methods. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation. Various methods have been developed to address this issue from other researchers. These have differing ranges of complexity ranging from rather simple statistical window methods to complex epidemic models. This study introduces the smart cluster method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal identification. Hereby, an adaptive search algorithm for data point clusters is adopted. It uses the earthquake density in the spatio-temporal neighbourhood of each event to adjust the search properties. The identified clusters are subsequently analysed to determine directional anisotropy, focussing on a strong correlation along the rupture plane and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010/2011 Darfield-Christchurch events, an adaptive classification procedure is applied to disassemble subsequent ruptures which may have been grouped into an individual cluster using near-field searches, support vector machines and temporal splitting. The steering parameters of the search behaviour are linked to local earthquake properties like magnitude of completeness, earthquake density and Gutenberg-Richter parameters. The method is capable of identifying and classifying earthquake clusters in space and time. It is tested and validated using earthquake data from California and New Zealand. As a result of the cluster identification process, each event in

  18. Spectral clustering based on matrix perturbation theory

    Institute of Scientific and Technical Information of China (English)

    TIAN Zheng; LI XiaoBin; JU YanWei

    2007-01-01

    This paper exposes some intrinsic characteristics of the spectral clustering method by using the tools from the matrix perturbation theory. We construct a weight matrix of a graph and study its eigenvalues and eigenvectors. It shows that the number of clusters is equal to the number of eigenvalues that are larger than 1, and the number of points in each of the clusters can be approximated by the associated eigenvalue. It also shows that the eigenvector of the weight matrix can be used directly to perform clustering; that is, the directional angle between the two-row vectors of the matrix derived from the eigenvectors is a suitable distance measure for clustering. As a result, an unsupervised spectral clustering algorithm based on weight matrix (USCAWM) is developed. The experimental results on a number of artificial and real-world data sets show the correctness of the theoretical analysis.

  19. Clustering based gene expression feature selection method: A computational approach to enrich the classifier efficiency of differentially expressed genes

    KAUST Repository

    Abusamra, Heba

    2016-07-20

    The native nature of high dimension low sample size of gene expression data make the classification task more challenging. Therefore, feature (gene) selection become an apparent need. Selecting a meaningful and relevant genes for classifier not only decrease the computational time and cost, but also improve the classification performance. Among different approaches of feature selection methods, however most of them suffer from several problems such as lack of robustness, validation issues etc. Here, we present a new feature selection technique that takes advantage of clustering both samples and genes. Materials and methods We used leukemia gene expression dataset [1]. The effectiveness of the selected features were evaluated by four different classification methods; support vector machines, k-nearest neighbor, random forest, and linear discriminate analysis. The method evaluate the importance and relevance of each gene cluster by summing the expression level for each gene belongs to this cluster. The gene cluster consider important, if it satisfies conditions depend on thresholds and percentage otherwise eliminated. Results Initial analysis identified 7120 differentially expressed genes of leukemia (Fig. 15a), after applying our feature selection methodology we end up with specific 1117 genes discriminating two classes of leukemia (Fig. 15b). Further applying the same method with more stringent higher positive and lower negative threshold condition, number reduced to 58 genes have be tested to evaluate the effectiveness of the method (Fig. 15c). The results of the four classification methods are summarized in Table 11. Conclusions The feature selection method gave good results with minimum classification error. Our heat-map result shows distinct pattern of refines genes discriminating between two classes of leukemia.

  20. Voting-based consensus clustering for combining multiple clusterings of chemical structures

    Directory of Open Access Journals (Sweden)

    Saeed Faisal

    2012-12-01

    Full Text Available Abstract Background Although many consensus clustering methods have been successfully used for combining multiple classifiers in many areas such as machine learning, applied statistics, pattern recognition and bioinformatics, few consensus clustering methods have been applied for combining multiple clusterings of chemical structures. It is known that any individual clustering method will not always give the best results for all types of applications. So, in this paper, three voting and graph-based consensus clusterings were used for combining multiple clusterings of chemical structures to enhance the ability of separating biologically active molecules from inactive ones in each cluster. Results The cumulative voting-based aggregation algorithm (CVAA, cluster-based similarity partitioning algorithm (CSPA and hyper-graph partitioning algorithm (HGPA were examined. The F-measure and Quality Partition Index method (QPI were used to evaluate the clusterings and the results were compared to the Ward’s clustering method. The MDL Drug Data Report (MDDR dataset was used for experiments and was represented by two 2D fingerprints, ALOGP and ECFP_4. The performance of voting-based consensus clustering method outperformed the Ward’s method using F-measure and QPI method for both ALOGP and ECFP_4 fingerprints, while the graph-based consensus clustering methods outperformed the Ward’s method only for ALOGP using QPI. The Jaccard and Euclidean distance measures were the methods of choice to generate the ensembles, which give the highest values for both criteria. Conclusions The results of the experiments show that consensus clustering methods can improve the effectiveness of chemical structures clusterings. The cumulative voting-based aggregation algorithm (CVAA was the method of choice among consensus clustering methods.

  1. Single pass kernel -means clustering method

    Indian Academy of Sciences (India)

    T Hitendra Sarma; P Viswanath; B Eswara Reddy

    2013-06-01

    In unsupervised classification, kernel -means clustering method has been shown to perform better than conventional -means clustering method in identifying non-isotropic clusters in a data set. The space and time requirements of this method are $O(n^2)$, where is the data set size. Because of this quadratic time complexity, the kernel -means method is not applicable to work with large data sets. The paper proposes a simple and faster version of the kernel -means clustering method, called single pass kernel k-means clustering method. The proposed method works as follows. First, a random sample $\\mathcal{S}$ is selected from the data set $\\mathcal{D}$. A partition $\\Pi_{\\mathcal{S}}$ is obtained by applying the conventional kernel -means method on the random sample $\\mathcal{S}$. The novelty of the paper is, for each cluster in $\\Pi_{\\mathcal{S}}$, the exact cluster center in the input space is obtained using the gradient descent approach. Finally, each unsampled pattern is assigned to its closest exact cluster center to get a partition of the entire data set. The proposed method needs to scan the data set only once and it is much faster than the conventional kernel -means method. The time complexity of this method is $O(s^2+t+nk)$ where is the size of the random sample $\\mathcal{S}$, is the number of clusters required, and is the time taken by the gradient descent method (to find exact cluster centers). The space complexity of the method is $O(s^2)$. The proposed method can be easily implemented and is suitable for large data sets, like those in data mining applications. Experimental results show that, with a small loss of quality, the proposed method can significantly reduce the time taken than the conventional kernel -means clustering method. The proposed method is also compared with other recent similar methods.

  2. 基于MapReduce的DBSCAN聚类算法的并行实现%The Realization of MapReduce-based DBSCAN Density-base Clustering Method

    Institute of Scientific and Technical Information of China (English)

    林阿弟; 陈晓锋

    2015-01-01

    DBSCAN is an effective density-based clustering method which is designed to find high-density regions which are sep⁃arated by low-density regions. DBSCAN is one of the most common clustering algorithms and also most cited in scientific litera⁃ture. In the case of the data of high dimension, the computation complexity of DBSCAN is O(n2) . However, it is challenging due to the size of datasets has been growing rapidly to extra-large scale in the real world. In this paper, an efficient parallel density-based clustering algorithm is proposed and implemented by using MapReduce. Furthermore, we adopt a quick partitioning strate⁃gy for data which has been preprocessed is adopted. Then, Local DBSCAN process for each subspace divided by the partition pro⁃file is implemented to generate clusters. At last, the clusters which are generated in the previous phase are merged.%DBSCAN是一种简单、有效的基于密度的聚类算法,用于寻找被低密度区域分离的高密度区域。DBSCAN是最经常被使用、在科学文献中被引用最多的聚类算法之一。在数据维度比较高的情况下,DBSCAN的时间复杂度为O(n2)。然而,在现实世界中,数据集的大小已经增长到超大规模。对此,一个有效率的并行的DBSCAN算法被提出,并在MapRe-duce平台下实现它。首先,对已经预处理过的数据进行划分。接下来,局部的DBSCAN算法将对每一块划分好的数据空间实现聚类。最终,利用合并算法对上一阶段的聚类结果进行合并。实验结果验证了并行算法的有效性。

  3. Data Clustering Analysis Based on Wavelet Feature Extraction

    Institute of Scientific and Technical Information of China (English)

    QIANYuntao; TANGYuanyan

    2003-01-01

    A novel wavelet-based data clustering method is presented in this paper, which includes wavelet feature extraction and cluster growing algorithm. Wavelet transform can provide rich and diversified information for representing the global and local inherent structures of dataset. therefore, it is a very powerful tool for clustering feature extraction. As an unsupervised classification, the target of clustering analysis is dependent on the specific clustering criteria. Several criteria that should be con-sidered for general-purpose clustering algorithm are pro-posed. And the cluster growing algorithm is also con-structed to connect clustering criteria with wavelet fea-tures. Compared with other popular clustering methods,our clustering approach provides multi-resolution cluster-ing results,needs few prior parameters, correctly deals with irregularly shaped clusters, and is insensitive to noises and outliers. As this wavelet-based clustering method isaimed at solving two-dimensional data clustering prob-lem, for high-dimensional datasets, self-organizing mapand U-matrlx method are applied to transform them intotwo-dimensional Euclidean space, so that high-dimensional data clustering analysis,Results on some sim-ulated data and standard test data are reported to illus-trate the power of our method.

  4. Clustering based segmentation of text in complex color images

    Institute of Scientific and Technical Information of China (English)

    毛文革; 王洪滨; 张田文

    2004-01-01

    We propose a novel scheme based on clustering analysis in color space to solve text segmentation in complex color images. Text segmentation includes automatic clustering of color space and foreground image generation. Two methods are also proposed for automatic clustering: The first one is to determine the optimal number of clusters and the second one is the fuzzy competitively clustering method based on competitively learning techniques. Essential foreground images obtained from any of the color clusters are combined into foreground images. Further performance analysis reveals the advantages of the proposed methods.

  5. Asemantic Web service discovery method based on service clustering%一种面向聚类的语义 Web 服务发现方法

    Institute of Scientific and Technical Information of China (English)

    薛洁; 吴兵; 杜玉越

    2012-01-01

      A semantic Web service discovery method is proposed based on service clustering. The semantic similarity of function and input/output parameters is calculated, and the services in a service library can be clustered by a clustering algorithm. The input/output parameters of a cluster are marked by a unified label, then an input/output concept set is obtained and a unit model of service cluster nets is constructed. Finally, a unit matrix of service cluster nets is proposed, and the favorable Web service could be discovered effectively based on the matrix. The formal model of service cluster net units is presented to process the service discovery. The validity and reliability of the proposedmethodareillustratedbyanexperiment,showingthattheperformanceofvalidityandcompletenessisobviouslyenhanced.%  提出了一种基于聚类的语义Web服务发现方法。通过计算Web服务的功能相似度及输入输出参数的语义相似度,利用聚类算法对服务库中的服务进行聚类。统一标注服务簇的输入输出参数,得到输入输出概念集,构造出服务簇网元模型,并提出标识服务簇网元的矩阵模型,从而实现服务的快速发现,找出最符合用户需求的服务类。给出了服务簇的形式化描述及构建算法,并进行服务发现。实验例证了所提出方法的有效性和合理性,以及在查准率和查全率方面的明显提高

  6. Quartile Clustering: A quartile based technique for Generating Meaningful Clusters

    CERN Document Server

    Goswami, Saptarsi

    2012-01-01

    Clustering is one of the main tasks in exploratory data analysis and descriptive statistics where the main objective is partitioning observations in groups. Clustering has a broad range of application in varied domains like climate, business, information retrieval, biology, psychology, to name a few. A variety of methods and algorithms have been developed for clustering tasks in the last few decades. We observe that most of these algorithms define a cluster in terms of value of the attributes, density, distance etc. However these definitions fail to attach a clear meaning/semantics to the generated clusters. We argue that clusters having understandable and distinct semantics defined in terms of quartiles/halves are more appealing to business analysts than the clusters defined by data boundaries or prototypes. On the samepremise, we propose our new algorithm named as quartile clustering technique. Through a series of experiments we establish efficacy of this algorithm. We demonstrate that the quartile clusteri...

  7. A simulation study of three methods for detecting disease clusters

    Directory of Open Access Journals (Sweden)

    Samuelsen Sven O

    2006-04-01

    Full Text Available Abstract Background Cluster detection is an important part of spatial epidemiology because it can help identifying environmental factors associated with disease and thus guide investigation of the aetiology of diseases. In this article we study three methods suitable for detecting local spatial clusters: (1 a spatial scan statistic (SaTScan, (2 generalized additive models (GAM and (3 Bayesian disease mapping (BYM. We conducted a simulation study to compare the methods. Seven geographic clusters with different shapes were initially chosen as high-risk areas. Different scenarios for the magnitude of the relative risk of these areas as compared to the normal risk areas were considered. For each scenario the performance of the methods were assessed in terms of the sensitivity, specificity, and percentage correctly classified for each cluster. Results The performance depends on the relative risk, but all methods are in general suitable for identifying clusters with a relative risk larger than 1.5. However, it is difficult to detect clusters with lower relative risks. The GAM approach had the highest sensitivity, but relatively low specificity leading to an overestimation of the cluster area. Both the BYM and the SaTScan methods work well. Clusters with irregular shapes are more difficult to detect than more circular clusters. Conclusion Based on our simulations we conclude that the methods differ in their ability to detect spatial clusters. Different aspects should be considered for appropriate choice of method such as size and shape of the assumed spatial clusters and the relative importance of sensitivity and specificity. In general, the BYM method seems preferable for local cluster detection with relatively high relative risks whereas the SaTScan method appears preferable for lower relative risks. The GAM method needs to be tuned (using cross-validation to get satisfactory results.

  8. ATAT@WIEN2k: An interface for cluster expansion based on the linearized augmented planewave method

    Science.gov (United States)

    Chakraborty, Monodeep; Spitaler, Jürgen; Puschnig, Peter; Ambrosch-Draxl, Claudia

    2010-05-01

    We have developed an interface between the all-electron density functional theory code WIEN2k, and the MIT Ab-initio Phase Stability (MAPS) code of the Alloy-Theoretic Automated Toolkit (ATAT). WIEN2k is an implementation of the full-potential linearized augmented planewave method which yields highly accurate total energies and optimized geometries for any given structure. The ATAT package consists of two parts. The first one is the MAPS code, which constructs a cluster expansion (CE) in conjunction with a first-principles code. These results form the basis for the second part, which computes the thermodynamic properties of the alloy. The main task of the CE is to calculate the many-body potentials or effective cluster interactions (ECIs) from the first-principles total energies of different structures or supercells using the structure-inversion technique. By linking MAPS seamlessly with WIEN2k we have created a tool to obtain the ECIs for any lattice type of an alloy. We have chosen fcc Al-Ti and bcc W-Re to evaluate our implementation. Our calculated ECIs exhibit all features of a converged CE and compare well with literature results.

  9. Quantum Monte Carlo methods and lithium cluster properties. [Atomic clusters

    Energy Technology Data Exchange (ETDEWEB)

    Owen, R.K.

    1990-12-01

    Properties of small lithium clusters with sizes ranging from n = 1 to 5 atoms were investigated using quantum Monte Carlo (QMC) methods. Cluster geometries were found from complete active space self consistent field (CASSCF) calculations. A detailed development of the QMC method leading to the variational QMC (V-QMC) and diffusion QMC (D-QMC) methods is shown. The many-body aspect of electron correlation is introduced into the QMC importance sampling electron-electron correlation functions by using density dependent parameters, and are shown to increase the amount of correlation energy obtained in V-QMC calculations. A detailed analysis of D-QMC time-step bias is made and is found to be at least linear with respect to the time-step. The D-QMC calculations determined the lithium cluster ionization potentials to be 0.1982(14) (0.1981), 0.1895(9) (0.1874(4)), 0.1530(34) (0.1599(73)), 0.1664(37) (0.1724(110)), 0.1613(43) (0.1675(110)) Hartrees for lithium clusters n = 1 through 5, respectively; in good agreement with experimental results shown in the brackets. Also, the binding energies per atom was computed to be 0.0177(8) (0.0203(12)), 0.0188(10) (0.0220(21)), 0.0247(8) (0.0310(12)), 0.0253(8) (0.0351(8)) Hartrees for lithium clusters n = 2 through 5, respectively. The lithium cluster one-electron density is shown to have charge concentrations corresponding to nonnuclear attractors. The overall shape of the electronic charge density also bears a remarkable similarity with the anisotropic harmonic oscillator model shape for the given number of valence electrons.

  10. Comparison of Clustering Methods for Time Course Genomic Data: Applications to Aging Effects

    OpenAIRE

    Zhang, Y.; Horvath, S.; Ophoff, R; Telesca, D

    2014-01-01

    Time course microarray data provide insight about dynamic biological processes. While several clustering methods have been proposed for the analysis of these data structures, comparison and selection of appropriate clustering methods are seldom discussed. We compared $3$ probabilistic based clustering methods and $3$ distance based clustering methods for time course microarray data. Among probabilistic methods, we considered: smoothing spline clustering also known as model b...

  11. Clustering

    Directory of Open Access Journals (Sweden)

    Jinfei Liu

    2013-04-01

    Full Text Available DBSCAN is a well-known density-based clustering algorithm which offers advantages for finding clusters of arbitrary shapes compared to partitioning and hierarchical clustering methods. However, there are few papers studying the DBSCAN algorithm under the privacy preserving distributed data mining model, in which the data is distributed between two or more parties, and the parties cooperate to obtain the clustering results without revealing the data at the individual parties. In this paper, we address the problem of two-party privacy preserving DBSCAN clustering. We first propose two protocols for privacy preserving DBSCAN clustering over horizontally and vertically partitioned data respectively and then extend them to arbitrarily partitioned data. We also provide performance analysis and privacy proof of our solution..

  12. CNEM: Cluster Based Network Evolution Model

    Directory of Open Access Journals (Sweden)

    Sarwat Nizamani

    2015-01-01

    Full Text Available This paper presents a network evolution model, which is based on the clustering approach. The proposed approach depicts the network evolution, which demonstrates the network formation from individual nodes to fully evolved network. An agglomerative hierarchical clustering method is applied for the evolution of network. In the paper, we present three case studies which show the evolution of the networks from the scratch. These case studies include: terrorist network of 9/11 incidents, terrorist network of WMD (Weapons Mass Destruction plot against France and a network of tweets discussing a topic. The network of 9/11 is also used for evaluation, using other social network analysis methods which show that the clusters created using the proposed model of network evolution are of good quality, thus the proposed method can be used by law enforcement agencies in order to further investigate the criminal networks

  13. Cluster beam sources. Part 1. Methods of cluster beams generation

    Directory of Open Access Journals (Sweden)

    A.Ju. Karpenko

    2012-10-01

    Full Text Available The short review on cluster beams generation is proposed. The basic types of cluster sources are considered and the processes leading to cluster formation are analyzed. The parameters, that affects the work of cluster sources are presented.

  14. Comparison between optical and X-ray cluster detection methods

    CERN Document Server

    Basilakos, S; Georgakakis, A; Georgantopoulos, I; Gaga, T; Kolokotronis, V G; Stewart, G C

    2003-01-01

    In this work we present combined optical and X-ray cluster detection methods in an area near the North Galactic Pole area, previously covered by the SDSS and 2dF optical surveys. The same area has been covered by shallow ($\\sim 1.8$ deg$^{2}$) XMM-{\\em Newton} observations. The optical cluster detection procedure is based on merging two independent selection methods - a smoothing+percolation technique, and a Matched Filter Algorithm. The X-ray cluster detection is based on a wavelet-based algorithm, incorporated in the SAS v.5.2 package. The final optical sample counts 9 candidate clusters with richness of more than 20 galaxies, corresponding roughly to APM richness class. Three, of our optically detected clusters are also detected in our X-ray survey.

  15. Sequential Combination Methods forData Clustering Analysis

    Institute of Scientific and Technical Information of China (English)

    钱 涛; Ching Y.Suen; 唐远炎

    2002-01-01

    This paper proposes the use of more than one clustering method to improve clustering performance. Clustering is an optimization procedure based on a specific clustering criterion. Clustering combination can be regardedasatechnique that constructs and processes multiple clusteringcriteria.Sincetheglobalandlocalclusteringcriteriaarecomplementary rather than competitive, combining these two types of clustering criteria may enhance theclustering performance. In our past work, a multi-objective programming based simultaneous clustering combination algorithmhasbeenproposed, which incorporates multiple criteria into an objective function by a weighting method, and solves this problem with constrained nonlinear optimization programming. But this algorithm has high computationalcomplexity.Hereasequential combination approach is investigated, which first uses the global criterion based clustering to produce an initial result, then uses the local criterion based information to improve the initial result with aprobabilisticrelaxation algorithm or linear additive model.Compared with the simultaneous combination method, sequential combination haslow computational complexity. Results on some simulated data and standard test data arereported.Itappearsthatclustering performance improvement can be achieved at low cost through sequential combination.

  16. Scalable Density-Based Subspace Clustering

    DEFF Research Database (Denmark)

    Müller, Emmanuel; Assent, Ira; Günnemann, Stephan;

    2011-01-01

    For knowledge discovery in high dimensional databases, subspace clustering detects clusters in arbitrary subspace projections. Scalability is a crucial issue, as the number of possible projections is exponential in the number of dimensions. We propose a scalable density-based subspace clustering...... synthetic databases show that steering is efficient and scalable, with high quality results. For future work, our steering paradigm for density-based subspace clustering opens research potential for speeding up other subspace clustering approaches as well....

  17. Coupled Cluster Methods in Lattice Gauge Theory

    Science.gov (United States)

    Watson, Nicholas Jay

    Available from UMI in association with The British Library. Requires signed TDF. The many body coupled cluster method is applied to Hamiltonian pure lattice gauge theories. The vacuum wavefunction is written as the exponential of a single sum over the lattice of clusters of gauge invariant operators at fixed relative orientation and separation, generating excitations of the bare vacuum. The basic approximation scheme involves a truncation according to geometrical size on the lattice of the clusters in the wavefunction. For a wavefunction including clusters up to a given size, all larger clusters generated in the Schrodinger equation are discarded. The general formalism is first given, including that for excited states. Two possible procedures for discarding clusters are considered. The first involves discarding clusters describing excitations of the bare vacuum which are larger than those in the given wavefunction. The second involves rearranging the clusters so that they describe fluctuations of the gauge invariant excitations about their self-consistently calculated expectation values, and then discarding fluctuations larger then those in the given wavefunction. The coupled cluster method is applied to the Z_2 and Su(2) models in 2 + 1D. For the Z_2 model, the first procedure gives poor results, while the second gives wavefunctions which explicitly display a phase transition with critical couplings in good agreement with those obtained by other methods. For the SU(2) model, the first procedure also gives poor results, while the second gives vacuum wavefunctions valid at all couplings. The general properties of the wavefunctions at weak coupling are discussed. Approximations with clusters spanning up to four plaquettes are considered. Excited states are calculated, yielding mass gaps with fair scaling properties. Insight is obtained into the form of the wavefunctions at all couplings.

  18. PERFORMANCE OF SELECTED AGGLOMERATIVE HIERARCHICAL CLUSTERING METHODS

    Directory of Open Access Journals (Sweden)

    Nusa Erman

    2015-01-01

    Full Text Available A broad variety of different methods of agglomerative hierarchical clustering brings along problems how to choose the most appropriate method for the given data. It is well known that some methods outperform others if the analysed data have a specific structure. In the presented study we have observed the behaviour of the centroid, the median (Gower median method, and the average method (unweighted pair-group method with arithmetic mean – UPGMA; average linkage between groups. We have compared them with mostly used methods of hierarchical clustering: the minimum (single linkage clustering, the maximum (complete linkage clustering, the Ward, and the McQuitty (groups method average, weighted pair-group method using arithmetic averages - WPGMA methods. We have applied the comparison of these methods on spherical, ellipsoid, umbrella-like, “core-and-sphere”, ring-like and intertwined three-dimensional data structures. To generate the data and execute the analysis, we have used R statistical software. Results show that all seven methods are successful in finding compact, ball-shaped or ellipsoid structures when they are enough separated. Conversely, all methods except the minimum perform poor on non-homogenous, irregular and elongated ones. Especially challenging is a circular double helix structure; it is being correctly revealed only by the minimum method. We can also confirm formerly published results of other simulation studies, which usually favour average method (besides Ward method in cases when data is assumed to be fairly compact and well separated.

  19. Using an Improved Clustering Method to Detect Anomaly Activities

    Institute of Scientific and Technical Information of China (English)

    LI Han; ZHANG Nan; BAO Lihui

    2006-01-01

    In this paper, an improved k-means based clustering method (IKCM) is proposed. By refining the initial cluster centers and adjusting the number of clusters by splitting and merging procedures, it can avoid the algorithm resulting in the situation of locally optimal solution and reduce the number of clusters dependency. The IKCM has been implemented and tested. We perform experiments on KDD-99 data set. The comparison experiments with H-means+also have been conducted. The results obtained in this study are very encouraging.

  20. Spanning Tree Based Attribute Clustering

    DEFF Research Database (Denmark)

    Zeng, Yifeng; Jorge, Cordero Hernandez

    2009-01-01

    inconsistent edges from a maximum spanning tree by starting appropriate initial modes, therefore generating stable clusters. It discovers sound clusters through simple graph operations and achieves significant computational savings. We compare the Star Discovery algorithm against earlier attribute clustering...

  1. Ontology Partitioning: Clustering Based Approach

    Directory of Open Access Journals (Sweden)

    Soraya Setti Ahmed

    2015-05-01

    Full Text Available The semantic web goal is to share and integrate data across different domains and organizations. The knowledge representations of semantic data are made possible by ontology. As the usage of semantic web increases, construction of the semantic web ontologies is also increased. Moreover, due to the monolithic nature of the ontology various semantic web operations like query answering, data sharing, data matching, data reuse and data integration become more complicated as the size of ontology increases. Partitioning the ontology is the key solution to handle this scalability issue. In this work, we propose a revision and an enhancement of K-means clustering algorithm based on a new semantic similarity measure for partitioning given ontology into high quality modules. The results show that our approach produces meaningful clusters than the traditional algorithm of K-means.

  2. Clustering Method in Data Mining%数据挖掘中的聚类方法

    Institute of Scientific and Technical Information of China (English)

    王实; 高文

    2000-01-01

    In this paper we introduce clustering method at Data Mining.Clustering has been studied very deeply.In the field of Data Mining,clustering is facing the new situation.We summarize the major clustering methods and introduce four kinds of clustering method that have been used broadly in Data Mitring.Finally we draw a conclusion that the partitional clustering method based on distance in data mining is a typical two phase iteration process:1)appoint cluster;2)update the center of cluster.

  3. A PSO-Based Subtractive Data Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Gamal Abdel-Azeem

    2013-03-01

    Full Text Available There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial partitions and are prone to premature converge to local optima. Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and cluster centers for any given set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods. In this paper, we present a hybrid Particle Swarm Optimization, Subtractive + (PSO clustering algorithm that performs fast clustering. For comparison purpose, we applied the Subtractive + (PSO clustering algorithm, PSO, and the Subtractive clustering algorithms on three different datasets. The results illustrate that the Subtractive + (PSO clustering algorithm can generate the most compact clustering results as compared to other algorithms.

  4. 基于主动学习策略的半监督聚类算法研究%Semi-supervised clustering method based on active learning strategy

    Institute of Scientific and Technical Information of China (English)

    芦世丹; 崔荣一

    2013-01-01

    提出一种选择最富信息数据并予以标记的基于主动学习策略的半监督聚类算法.首先,采用传统K-均值聚类算法对数据集进行粗聚类;其次,根据粗聚类结果计算出每个数据隶属于每个类簇的隶属度,筛选出满足最大与次大隶属度差值小于阈值的候选数据,并从中选择差值较小的数据作为最富信息的数据进行标记;最后,将候选数据集合中未标记数据分组到与每类已被标记数据平均距离最小的类簇中.实验表明,提出的主动学习策略能够很好地学习到最富信息数据,基于该学习策略的半监督聚类算法在测试不同数据集时均获得了较高的准确率.%By employing active learning strategy to learn informative dataset to be labeled,this paper proposed a semi-supervised clustering method based on active learning strategy.Firstly,it employed traditional K-means algorithm to make coarse clustering for unlabeled dataset.And furthermore,based on the result of coarse clustering,it calculated the membership degree of each data belonging to each cluster,then screened out alternative data of which the difference between maximum and the second maximum membership degree was lower than threshold,then the partial data would be labeled if the difference of which was relatively small,i.e.,the data were informative samples.Finally,they grouped each selected unlabeled data to corresponding labeled cluster which acquired minimum average distances.The experimental results show that the proposed active learning strategy is very powerful to learn informative data,and the semi-supervised clustering method based on active learning strategy is quite accurate with regards to various dataset.

  5. A clustering method based on Dirichlet process mixture model%Dirichlet过程混合模型的聚类算法

    Institute of Scientific and Technical Information of China (English)

    张林; 刘辉

    2012-01-01

    The number of clusters should be determined in advance when a finite mixture model is built to cluster high dimensional data, which deteriorates the precision and generalization of clustering. A Dirichlet process infinite mixture model was built to cluster high dimensional data in this paper. Based on Urn model, the posterior distributions of each parameter were derived. All parameters, including the number of potential clusters were estimated through Gibbs sam- pling MCMC method. The clustering results on both simulation dataset and IRIS dataset show that this method can correctly estimate the number of potential clusters after 200 Gibbs sampling MCMC iterations. The average time of iteration for simulation and IRIS datasets were 0. 1850 s and 0. 1455 s, respectively, and the time complexity of each iteration was O(N), where N is the number of sample.%有限混合模型进行高维数据聚类分析时需预先估计聚类个数,因而聚类的准确性和泛化性受到影响。通过建立Dirichlet过程无限混合模型对高维数据开展聚类分析,采用Dirichlet过程的Urn模型分析出模型中各参数的后验分布,利用Gibbs采样MCMC方法估计出模型中各参数及数据中潜在的聚类数。在五维的仿真数据集和IRIS测试数据集上的聚类结果表明:经过200次Gibbs采样MCMC过程,该算法能够正确地估计出数据中潜在的聚类数。单次Gibbs采样MCMC过程的平均占用时间分别为0.1850S和0.1455S,其时间复杂度和数据的样本个数N有关,为0(N)。

  6. Variable cluster analysis method for building neural network model

    Institute of Scientific and Technical Information of China (English)

    王海东; 刘元东

    2004-01-01

    To address the problems that input variables should be reduced as much as possible and explain output variables fully in building neural network model of complicated system, a variable selection method based on cluster analysis was investigated. Similarity coefficient which describes the mutual relation of variables was defined. The methods of the highest contribution rate, part replacing whole and variable replacement are put forwarded and deduced by information theory. The software of the neural network based on cluster analysis, which can provide many kinds of methods for defining variable similarity coefficient, clustering system variable and evaluating variable cluster, was developed and applied to build neural network forecast model of cement clinker quality. The results show that all the network scale, training time and prediction accuracy are perfect. The practical application demonstrates that the method of selecting variables for neural network is feasible and effective.

  7. Cosmological Constraints with Clustering-Based Redshifts

    CERN Document Server

    Kovetz, Ely D; Rahman, Mubdi

    2016-01-01

    We demonstrate that observations lacking reliable redshift information, such as photometric and radio continuum surveys, can produce robust measurements of cosmological parameters when empowered by clustering-based redshift estimation. This method infers the redshift distribution based on the spatial clustering of sources, using cross-correlation with a reference dataset with known redshifts. Applying this method to the existing SDSS photometric galaxies, and projecting to future radio continuum surveys, we show that sources can be efficiently divided into several redshift bins, increasing their ability to constrain cosmological parameters. We forecast constraints on the dark-energy equation-of-state and on local non-gaussianity parameters. We explore several pertinent issues, including the tradeoff between including more sources versus minimizing the overlap between bins, the shot-noise limitations on binning, and the predicted performance of the method at high redshifts. Remarkably, we find that, once this ...

  8. 基于用户过滤的校园无线网用户聚类方法%User filtering based campus WLAN user clustering method

    Institute of Scientific and Technical Information of China (English)

    仇一泓; 尧婷娟; 秦丰林; 葛连升

    2014-01-01

    With the widespread of smart terminals such as smart phones and smart pads, using MAC address as user iden-tification in campus wireless local area network (WLAN) user clustering research cannot exactly represent user behavior. An user filtering based user clustering is proposed. This method filters users’ behavior data by their degree of activeness, and then further conducts clustering analysis of campus WLAN user behavior. The experimental result verifies the effec-tiveness of the proposed method.%随着智能终端地普及,在校园无线网用户聚类研究中采用MAC地址作为用户区分已不能真实反映用户的行为,为此,提出了一个基于用户过滤的校园无线网用户聚类方法,该方法基于用户活跃度对用户行为数据进行过滤,在此基础上对校园无线网用户行为做进一步地聚类分析。实验结果表明了该方法的有效性。

  9. A two-stage cluster sampling method using gridded population data, a GIS, and Google EarthTM imagery in a population-based mortality survey in Iraq

    Directory of Open Access Journals (Sweden)

    Galway LP

    2012-04-01

    Full Text Available Abstract Background Mortality estimates can measure and monitor the impacts of conflict on a population, guide humanitarian efforts, and help to better understand the public health impacts of conflict. Vital statistics registration and surveillance systems are rarely functional in conflict settings, posing a challenge of estimating mortality using retrospective population-based surveys. Results We present a two-stage cluster sampling method for application in population-based mortality surveys. The sampling method utilizes gridded population data and a geographic information system (GIS to select clusters in the first sampling stage and Google Earth TM imagery and sampling grids to select households in the second sampling stage. The sampling method is implemented in a household mortality study in Iraq in 2011. Factors affecting feasibility and methodological quality are described. Conclusion Sampling is a challenge in retrospective population-based mortality studies and alternatives that improve on the conventional approaches are needed. The sampling strategy presented here was designed to generate a representative sample of the Iraqi population while reducing the potential for bias and considering the context specific challenges of the study setting. This sampling strategy, or variations on it, are adaptable and should be considered and tested in other conflict settings.

  10. Progressive Exponential Clustering-Based Steganography

    Directory of Open Access Journals (Sweden)

    Li Yue

    2010-01-01

    Full Text Available Cluster indexing-based steganography is an important branch of data-hiding techniques. Such schemes normally achieve good balance between high embedding capacity and low embedding distortion. However, most cluster indexing-based steganographic schemes utilise less efficient clustering algorithms for embedding data, which causes redundancy and leaves room for increasing the embedding capacity further. In this paper, a new clustering algorithm, called progressive exponential clustering (PEC, is applied to increase the embedding capacity by avoiding redundancy. Meanwhile, a cluster expansion algorithm is also developed in order to further increase the capacity without sacrificing imperceptibility.

  11. Modified possibilistic clustering model based on kernel methods%基于核方法的改进可能聚类模型

    Institute of Scientific and Technical Information of China (English)

    武小红; 周建红

    2008-01-01

    A novel model of fuzzy clustering using kernel methods is proposed. This model is called kernel modified possibilisticc-means (KMPCM) model. The proposed model is an extension of the modified possibilistic c-means (MPCM) algorithm byusing kernel methods. Different from MPCM and fuzzy c-means (FCM) model which are based on Euclidean distance, theproposed model is based on kernel-induced distance. Furthermore, with kernel methods the input data can be mappedimplicitly into a high-dimensional feature space where the nonlinear pattern now appears linear. It is unnecessary to docalculation in the high-dimensional feature space because the kernel function can do it. Numerical experiments show thatKMPCM outperforms FCM and MPCM.

  12. DNA splice site sequences clustering method for conservativeness analysis

    Institute of Scientific and Technical Information of China (English)

    Quanwei Zhang; Qinke Peng; Tao Xu

    2009-01-01

    DNA sequences that are near to splice sites have remarkable conservativeness,and many researchers have contributed to the prediction of splice site.In order to mine the underlying biological knowledge,we analyze the conservativeness of DNA splice site adjacent sequences by clustering.Firstly,we propose a kind of DNA splice site sequences clustering method which is based on DBSCAN,and use four kinds of dissimilarity calculating methods.Then,we analyze the conservative feature of the clustering results and the experimental data set.

  13. Sparse maps--A systematic infrastructure for reduced-scaling electronic structure methods. II. Linear scaling domain based pair natural orbital coupled cluster theory.

    Science.gov (United States)

    Riplinger, Christoph; Pinski, Peter; Becker, Ute; Valeev, Edward F; Neese, Frank

    2016-01-14

    Domain based local pair natural orbital coupled cluster theory with single-, double-, and perturbative triple excitations (DLPNO-CCSD(T)) is a highly efficient local correlation method. It is known to be accurate and robust and can be used in a black box fashion in order to obtain coupled cluster quality total energies for large molecules with several hundred atoms. While previous implementations showed near linear scaling up to a few hundred atoms, several nonlinear scaling steps limited the applicability of the method for very large systems. In this work, these limitations are overcome and a linear scaling DLPNO-CCSD(T) method for closed shell systems is reported. The new implementation is based on the concept of sparse maps that was introduced in Part I of this series [P. Pinski, C. Riplinger, E. F. Valeev, and F. Neese, J. Chem. Phys. 143, 034108 (2015)]. Using the sparse map infrastructure, all essential computational steps (integral transformation and storage, initial guess, pair natural orbital construction, amplitude iterations, triples correction) are achieved in a linear scaling fashion. In addition, a number of additional algorithmic improvements are reported that lead to significant speedups of the method. The new, linear-scaling DLPNO-CCSD(T) implementation typically is 7 times faster than the previous implementation and consumes 4 times less disk space for large three-dimensional systems. For linear systems, the performance gains and memory savings are substantially larger. Calculations with more than 20 000 basis functions and 1000 atoms are reported in this work. In all cases, the time required for the coupled cluster step is comparable to or lower than for the preceding Hartree-Fock calculation, even if this is carried out with the efficient resolution-of-the-identity and chain-of-spheres approximations. The new implementation even reduces the error in absolute correlation energies by about a factor of two, compared to the already accurate

  14. Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. II. Linear scaling domain based pair natural orbital coupled cluster theory

    Science.gov (United States)

    Riplinger, Christoph; Pinski, Peter; Becker, Ute; Valeev, Edward F.; Neese, Frank

    2016-01-01

    Domain based local pair natural orbital coupled cluster theory with single-, double-, and perturbative triple excitations (DLPNO-CCSD(T)) is a highly efficient local correlation method. It is known to be accurate and robust and can be used in a black box fashion in order to obtain coupled cluster quality total energies for large molecules with several hundred atoms. While previous implementations showed near linear scaling up to a few hundred atoms, several nonlinear scaling steps limited the applicability of the method for very large systems. In this work, these limitations are overcome and a linear scaling DLPNO-CCSD(T) method for closed shell systems is reported. The new implementation is based on the concept of sparse maps that was introduced in Part I of this series [P. Pinski, C. Riplinger, E. F. Valeev, and F. Neese, J. Chem. Phys. 143, 034108 (2015)]. Using the sparse map infrastructure, all essential computational steps (integral transformation and storage, initial guess, pair natural orbital construction, amplitude iterations, triples correction) are achieved in a linear scaling fashion. In addition, a number of additional algorithmic improvements are reported that lead to significant speedups of the method. The new, linear-scaling DLPNO-CCSD(T) implementation typically is 7 times faster than the previous implementation and consumes 4 times less disk space for large three-dimensional systems. For linear systems, the performance gains and memory savings are substantially larger. Calculations with more than 20 000 basis functions and 1000 atoms are reported in this work. In all cases, the time required for the coupled cluster step is comparable to or lower than for the preceding Hartree-Fock calculation, even if this is carried out with the efficient resolution-of-the-identity and chain-of-spheres approximations. The new implementation even reduces the error in absolute correlation energies by about a factor of two, compared to the already accurate previous

  15. Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. II. Linear scaling domain based pair natural orbital coupled cluster theory

    Energy Technology Data Exchange (ETDEWEB)

    Riplinger, Christoph; Pinski, Peter; Becker, Ute; Neese, Frank, E-mail: frank.neese@cec.mpg.de, E-mail: evaleev@vt.edu [Max Planck Institute for Chemical Energy Conversion, Stiftstr. 34-36, D-45470 Mülheim an der Ruhr (Germany); Valeev, Edward F., E-mail: frank.neese@cec.mpg.de, E-mail: evaleev@vt.edu [Department of Chemistry, Virginia Tech, Blacksburg, Virginia 24061 (United States)

    2016-01-14

    Domain based local pair natural orbital coupled cluster theory with single-, double-, and perturbative triple excitations (DLPNO-CCSD(T)) is a highly efficient local correlation method. It is known to be accurate and robust and can be used in a black box fashion in order to obtain coupled cluster quality total energies for large molecules with several hundred atoms. While previous implementations showed near linear scaling up to a few hundred atoms, several nonlinear scaling steps limited the applicability of the method for very large systems. In this work, these limitations are overcome and a linear scaling DLPNO-CCSD(T) method for closed shell systems is reported. The new implementation is based on the concept of sparse maps that was introduced in Part I of this series [P. Pinski, C. Riplinger, E. F. Valeev, and F. Neese, J. Chem. Phys. 143, 034108 (2015)]. Using the sparse map infrastructure, all essential computational steps (integral transformation and storage, initial guess, pair natural orbital construction, amplitude iterations, triples correction) are achieved in a linear scaling fashion. In addition, a number of additional algorithmic improvements are reported that lead to significant speedups of the method. The new, linear-scaling DLPNO-CCSD(T) implementation typically is 7 times faster than the previous implementation and consumes 4 times less disk space for large three-dimensional systems. For linear systems, the performance gains and memory savings are substantially larger. Calculations with more than 20 000 basis functions and 1000 atoms are reported in this work. In all cases, the time required for the coupled cluster step is comparable to or lower than for the preceding Hartree-Fock calculation, even if this is carried out with the efficient resolution-of-the-identity and chain-of-spheres approximations. The new implementation even reduces the error in absolute correlation energies by about a factor of two, compared to the already accurate

  16. Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. II. Linear scaling domain based pair natural orbital coupled cluster theory

    International Nuclear Information System (INIS)

    Domain based local pair natural orbital coupled cluster theory with single-, double-, and perturbative triple excitations (DLPNO-CCSD(T)) is a highly efficient local correlation method. It is known to be accurate and robust and can be used in a black box fashion in order to obtain coupled cluster quality total energies for large molecules with several hundred atoms. While previous implementations showed near linear scaling up to a few hundred atoms, several nonlinear scaling steps limited the applicability of the method for very large systems. In this work, these limitations are overcome and a linear scaling DLPNO-CCSD(T) method for closed shell systems is reported. The new implementation is based on the concept of sparse maps that was introduced in Part I of this series [P. Pinski, C. Riplinger, E. F. Valeev, and F. Neese, J. Chem. Phys. 143, 034108 (2015)]. Using the sparse map infrastructure, all essential computational steps (integral transformation and storage, initial guess, pair natural orbital construction, amplitude iterations, triples correction) are achieved in a linear scaling fashion. In addition, a number of additional algorithmic improvements are reported that lead to significant speedups of the method. The new, linear-scaling DLPNO-CCSD(T) implementation typically is 7 times faster than the previous implementation and consumes 4 times less disk space for large three-dimensional systems. For linear systems, the performance gains and memory savings are substantially larger. Calculations with more than 20 000 basis functions and 1000 atoms are reported in this work. In all cases, the time required for the coupled cluster step is comparable to or lower than for the preceding Hartree-Fock calculation, even if this is carried out with the efficient resolution-of-the-identity and chain-of-spheres approximations. The new implementation even reduces the error in absolute correlation energies by about a factor of two, compared to the already accurate

  17. Sparse maps--A systematic infrastructure for reduced-scaling electronic structure methods. II. Linear scaling domain based pair natural orbital coupled cluster theory.

    Science.gov (United States)

    Riplinger, Christoph; Pinski, Peter; Becker, Ute; Valeev, Edward F; Neese, Frank

    2016-01-14

    Domain based local pair natural orbital coupled cluster theory with single-, double-, and perturbative triple excitations (DLPNO-CCSD(T)) is a highly efficient local correlation method. It is known to be accurate and robust and can be used in a black box fashion in order to obtain coupled cluster quality total energies for large molecules with several hundred atoms. While previous implementations showed near linear scaling up to a few hundred atoms, several nonlinear scaling steps limited the applicability of the method for very large systems. In this work, these limitations are overcome and a linear scaling DLPNO-CCSD(T) method for closed shell systems is reported. The new implementation is based on the concept of sparse maps that was introduced in Part I of this series [P. Pinski, C. Riplinger, E. F. Valeev, and F. Neese, J. Chem. Phys. 143, 034108 (2015)]. Using the sparse map infrastructure, all essential computational steps (integral transformation and storage, initial guess, pair natural orbital construction, amplitude iterations, triples correction) are achieved in a linear scaling fashion. In addition, a number of additional algorithmic improvements are reported that lead to significant speedups of the method. The new, linear-scaling DLPNO-CCSD(T) implementation typically is 7 times faster than the previous implementation and consumes 4 times less disk space for large three-dimensional systems. For linear systems, the performance gains and memory savings are substantially larger. Calculations with more than 20 000 basis functions and 1000 atoms are reported in this work. In all cases, the time required for the coupled cluster step is comparable to or lower than for the preceding Hartree-Fock calculation, even if this is carried out with the efficient resolution-of-the-identity and chain-of-spheres approximations. The new implementation even reduces the error in absolute correlation energies by about a factor of two, compared to the already accurate

  18. A stochastic optimization method based technique for finding out reaction paths in noble gas clusters perturbed by alkali metal ions

    International Nuclear Information System (INIS)

    Graphical abstract: The structure of a minimum in Ar19K+ cluster. Abstract: In this paper we explore the possibility of using stochastic optimizers, namely simulated annealing (SA) in locating critical points (global minima, local minima and first order saddle points) in Argon noble gas clusters perturbed by alkali metal ions namely sodium and potassium. The atomic interaction potential is the Lennard Jones potential. We also try to see if a continuous transformation in geometry during the search process can lead to a realization of a kind of minimum energy path (MEP) for transformation from one minimum geometry to another through a transition state (first order saddle point). We try our recipe for three sizes of clusters, namely (Ar)16M+, (Ar)19M+ and (Ar)24M+, where M+ is Na+ and K+.

  19. Firing Efficiency of Cluster Bomb Based on Method of Analogy%基于类比法的子母弹射击效率评定研究

    Institute of Scientific and Technical Information of China (English)

    殷培江; 李君; 张立生

    2012-01-01

    The method of analogy is a familiar logistic organon used in the mess of research domain, but it's rarely used in evaluation martial category. The firing efficiency of cluster bomb based on method of analogy have many advantage, for instance it is briefness, practicability, economy, and convenient for largely reckon. The paper builds the evaluation index system of the firing efficiency with cluster bomb, we found the comparable from cluster bomb to another ammunition in classical model, build the mathematics model based on method of analogy in the next place. The paper serve as an example of several emblematic targets, and display status of firing efficiency with several emblematic targets used the radar chart, we have compared the conclusions with method of analogy and emulation mode, validate the firing efficiency of cluster bomb based on method of analogy is feasible.%类比法是一种较为常见的逻辑推理方法,在各研究领域均有应用,但在军事评估领域却应用甚少.基于类比法的子母弹射击效率评定具有简单、实用、经济、便于大量计算等优点.根据子母弹与传统弹药在射击效率评定过程中的相似性,在经典毁伤评估模型的基础上,通过类比法建立字母弹的射击效率评定模型.列举了几种典型目标,并以子母弹的类比法模型进行射击效率评定,最后将类比法所得结论与目标仿真法的结论反映在雷达图上,通过数据比对验证了类比法对子母弹射击效率评定的可行性.

  20. Model-Based Clustering of Large Networks

    CERN Document Server

    Vu, Duy Quang; Schweinberger, Michael

    2012-01-01

    We describe a network clustering framework, based on finite mixture models, that can be applied to discrete-valued networks with hundreds of thousands of nodes and billions of edge variables. Relative to other recent model-based clustering work for networks, we introduce a more flexible modeling framework, improve the variational-approximation estimation algorithm, discuss and implement standard error estimation via a parametric bootstrap approach, and apply these methods to much larger datasets than those seen elsewhere in the literature. The more flexible modeling framework is achieved through introducing novel parameterizations of the model, giving varying degrees of parsimony, using exponential family models whose structure may be exploited in various theoretical and algorithmic ways. The algorithms, which we show how to adapt to the more complicated optimization requirements introduced by the constraints imposed by the novel parameterizations we propose, are based on variational generalized EM algorithms...

  1. Fingerprint analysis of Hibiscus mutabilis L. leaves based on ultra performance liquid chromatography with photodiode array detector combined with similarity analysis and hierarchical clustering analysis methods

    Directory of Open Access Journals (Sweden)

    Xianrui Liang

    2013-01-01

    Full Text Available Background: A method for chemical fingerprint analysis of Hibiscus mutabilis L. leaves was developed based on ultra performance liquid chromatography with photodiode array detector (UPLC-PAD combined with similarity analysis (SA and hierarchical clustering analysis (HCA. Materials and Methods: 10 batches of Hibiscus mutabilis L. leaves samples were collected from different regions of China. UPLC-PAD was employed to collect chemical fingerprints of Hibiscus mutabilis L. leaves. Results: The relative standard deviations (RSDs of the relative retention times (RRT and relative peak areas (RPA of 10 characteristic peaks (one of them was identified as rutin in precision, repeatability and stability test were less than 3%, and the method of fingerprint analysis was validated to be suitable for the Hibiscus mutabilis L. leaves. Conclusions: The chromatographic fingerprints showed abundant diversity of chemical constituents qualitatively in the 10 batches of Hibiscus mutabilis L. leaves samples from different locations by similarity analysis on basis of calculating the correlation coefficients between each two fingerprints. Moreover, the HCA method clustered the samples into four classes, and the HCA dendrogram showed the close or distant relations among the 10 samples, which was consistent to the SA result to some extent.

  2. Incremental Web Usage Mining Based on Active Ant Colony Clustering

    Institute of Scientific and Technical Information of China (English)

    SHEN Jie; LIN Ying; CHEN Zhimin

    2006-01-01

    To alleviate the scalability problem caused by the increasing Web using and changing users' interests, this paper presents a novel Web Usage Mining algorithm-Incremental Web Usage Mining algorithm based on Active Ant Colony Clustering. Firstly, an active movement strategy about direction selection and speed, different with the positive strategy employed by other Ant Colony Clustering algorithms, is proposed to construct an Active Ant Colony Clustering algorithm, which avoid the idle and "flying over the plane" moving phenomenon, effectively improve the quality and speed of clustering on large dataset. Then a mechanism of decomposing clusters based on above methods is introduced to form new clusters when users' interests change. Empirical studies on a real Web dataset show the active ant colony clustering algorithm has better performance than the previous algorithms, and the incremental approach based on the proposed mechanism can efficiently implement incremental Web usage mining.

  3. New resampling method for evaluating stability of clusters

    Directory of Open Access Journals (Sweden)

    Neuhaeuser Markus

    2008-01-01

    Full Text Available Abstract Background Hierarchical clustering is a widely applied tool in the analysis of microarray gene expression data. The assessment of cluster stability is a major challenge in clustering procedures. Statistical methods are required to distinguish between real and random clusters. Several methods for assessing cluster stability have been published, including resampling methods such as the bootstrap. We propose a new resampling method based on continuous weights to assess the stability of clusters in hierarchical clustering. While in bootstrapping approximately one third of the original items is lost, continuous weights avoid zero elements and instead allow non integer diagonal elements, which leads to retention of the full dimensionality of space, i.e. each variable of the original data set is represented in the resampling sample. Results Comparison of continuous weights and bootstrapping using real datasets and simulation studies reveals the advantage of continuous weights especially when the dataset has only few observations, few differentially expressed genes and the fold change of differentially expressed genes is low. Conclusion We recommend the use of continuous weights in small as well as in large datasets, because according to our results they produce at least the same results as conventional bootstrapping and in some cases they surpass it.

  4. 基于非竞争式簇首轮转的WSNs分簇优化方法%Optimization Method of WSNs Clustering Based on Non-competitive Cluster-Head Rotation

    Institute of Scientific and Technical Information of China (English)

    程宏斌; 乐德广; 孙霞; 王海军

    2012-01-01

    The paper established an energy consumption model in order to improve the lower energy efficiency of nodes in LEACH protocol. A method of cluster—head rotation based on non — competitive mode was proposed, according to analysis result of the energy consumption difference value between the the different nodes and elected cluster—head. The means elected the cluster—head once only at the first round of each rotation cycle. Then the other nodes acted as a cluster —head by fixed rotary method in remaining round. Furthermore, reasonable collection times of data in each round also could effectively reduce the energy consumption of cluster—head election. Finally, the theoretical analysis and simulation results show that the power consumption performance of WSNs clustering was improved effective by this optimized clustering algorithm.%针对LEACH协议中节点网络能量效率低的问题,建立了分簇协议的能耗模型;基于对簇首竞选能耗和不同节点能耗差的分析,提出了一种基于非竞争式的WSNs簇首轮换方法:在每一个轮转周期的第一轮中竞选一次簇首,其余轮中采取固定轮转的方法依次让其它节点充当簇首;同时合理设置每轮中的数据收集次数,以便有效降低网络簇首竞选能耗;理论分析和仿真实验表明:改进的分簇算法能够有效地改善WSNs分簇协议的总能耗性能.

  5. Efficient clustering aggregation based on data fragments.

    Science.gov (United States)

    Wu, Ou; Hu, Weiming; Maybank, Stephen J; Zhu, Mingliang; Li, Bing

    2012-06-01

    Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy. PMID:22334025

  6. PHISHING WEB IMAGE SEGMENTATION BASED ON IMPROVING SPECTRAL CLUSTERING

    Institute of Scientific and Technical Information of China (English)

    Li Yuancheng; Zhao Liujun; Jiao Runhai

    2011-01-01

    Abstract This paper proposes a novel phishing web image segmentation algorithm which based on improving spectral clustering.Firstly,we construct a set of points which are composed of spatial location pixels and gray levels from a given image.Secondly,the data is clustered in spectral space of the similar matrix of the set points,in order to avoid the drawbacks of K-means algorithm in the conventional spectral clustering method that is sensitive to initial clustering centroids and convergence to local optimal solution,we introduce the clone operator,Cauthy mutation to enlarge the scale of clustering centers,quantum-inspired evolutionary algorithm to find the global optimal clustering centroids.Compared with phishing web image segmentation based on K-means,experimental results show that the segmentation performance of our method gains much improvement.Moreover,our method can convergence to global optimal solution and is better in accuracy of phishing web segmentation.

  7. A novel clustering and supervising users' profiles method

    Institute of Scientific and Technical Information of China (English)

    Zhu Mingfu; Zhang Hongbin; Song Fangyun

    2005-01-01

    To better understand different users' accessing intentions, a novel clustering and supervising method based on accessing path is presented. This method divides users' interest space to express the distribution of users' interests, and directly to instruct the constructing process of web pages indexing for advanced performance.

  8. Fuzzy Clustering Methods and their Application to Fuzzy Modeling

    DEFF Research Database (Denmark)

    Kroszynski, Uri; Zhou, Jianjun

    1999-01-01

    Fuzzy modeling techniques based upon the analysis of measured input/output data sets result in a set of rules that allow to predict system outputs from given inputs. Fuzzy clustering methods for system modeling and identification result in relatively small rule-bases, allowing fast, yet accurate ...

  9. 一种基于词聚类的文本特征描述方法%A Description Method of Text Feature Based on Word Clustering

    Institute of Scientific and Technical Information of China (English)

    陈炯; 张永奎

    2011-01-01

    针对文本挖掘中存在的特征空间高维性问题,提出了一种基于词聚类的文本特征描述方法,旨在通过机器学习的方法挖掘词汇之间的语义关联,动态构造特定领域的概念词典,借助构造的概念来描述文本的特征,该方法不借助主题词典,先从训练语料中对词的共现情况进行分析,用词聚类(word clustering)生成由种子词(seedwords)表示的代表某一主题概念的词类,然后用种子词作为文本的特征项.实验表明,该方法不仅压缩了特征空间的维数,也克服了HowNet中概念信息的局限性,提高了文本分类的精确度.%Feature space has the high-dimensional problem in text mining. This paper presented a new description method of text feature based on word clustering. The purpose is to mine semantic association between words using machine learning, then to construct the concept dictionary in specific areas dynamically, finally to describe the text feature with the concept constructed. This method analyzes the co-occurrence of words in training corpus firstly, without using theme dictionary, then generates word cluster expressed in seed words which represents a concept of theme by word clustering, finally takes the seed words as text features. The experimental results indicate that this method not only reduces dimensionality of feature space but also overcomes the limitations of the concept in HowNet, and improve the performance of text categorization.

  10. New clustering methods for population comparison on paternal lineages.

    Science.gov (United States)

    Juhász, Z; Fehér, T; Bárány, G; Zalán, A; Németh, E; Pádár, Z; Pamjav, H

    2015-04-01

    The goal of this study is to show two new clustering and visualising techniques developed to find the most typical clusters of 18-dimensional Y chromosomal haplogroup frequency distributions of 90 Western Eurasian populations. The first technique called "self-organizing cloud (SOC)" is a vector-based self-learning method derived from the Self Organising Map and non-metric Multidimensional Scaling algorithms. The second technique is a new probabilistic method called the "maximal relation probability" (MRP) algorithm, based on a probability function having its local maximal values just in the condensation centres of the input data. This function is calculated immediately from the distance matrix of the data and can be interpreted as the probability that a given element of the database has a real genetic relation with at least one of the remaining elements. We tested these two new methods by comparing their results to both each other and the k-medoids algorithm. By means of these new algorithms, we determined 10 clusters of populations based on the similarity of haplogroup composition. The results obtained represented a genetically, geographically and historically well-interpretable picture of 10 genetic clusters of populations mirroring the early spread of populations from the Fertile Crescent to the Caucasus, Central Asia, Arabia and Southeast Europe. The results show that a parallel clustering of populations using SOC and MRP methods can be an efficient tool for studying the demographic history of populations sharing common genetic footprints.

  11. Generating a multilingual taxonomy based on multilingual terminology clustering

    Institute of Scientific and Technical Information of China (English)

    Chengzhi; ZHANG

    2011-01-01

    Taxonomy denotes the hierarchical structure of a knowledge organization system.It has important applications in knowledge navigation,semantic annotation and semantic search.It is a useful instrument to study the multilingual taxonomy generated automatically under the dynamic information environment in which massive amounts of information are processed and found.Multilingual taxonomy is the core component of the multilingual thesaurus or ontology.This paper presents two methods of bilingual generated taxonomy:Cross-language terminology clustering and mixed-language based terminology clustering.According to our experimental results of terminology clustering related to four specific subject domains,we found that if the parallel corpus is used to cluster multilingual terminologies,the method of using mixed-language based terminology clustering outperforms that of using the cross-language terminology clustering.

  12. Study on Grey Clustering Decision Methods that Based on Reny Entropy%基于Reny熵的灰色聚类决策方法研究

    Institute of Scientific and Technical Information of China (English)

    吴正朋; 张友萍; 李梅

    2011-01-01

    On account of the weight of traditional grey fixed weight clustering methords which is given in advance and does not have objective problems,the passage proves out a method of decicling weight that based on Reny entropy,owing to the thinking of traditional Shannon entropy of information,and construct methods that based on Reny entropy.The algorithem makes use of system state data,throughing calculating entropy to have decision weight,and makes example stheric syndrome research on the background of practical problem.The result proves that the method is easy in calculating and the weight decision is objective,and also complement and perfect grey clustering decision theory.%针对传统灰色定权聚类方法中权重是事先给定的,不具有客观性的问题。借鉴传统的shannon信息熵的思想,本文提出了基于Reny熵权确定权重的方法。构造了基于构造了基于Reny熵权的灰色定权聚类评估方法的算法。该方法利用系统状态数据为依据,通过计算熵来得到决策权重,以实际问题为背景进行了算例实证研究。结果表明该方法计算简单,权重确定客观,对灰色聚类决策理论进行了补充和完善。

  13. FLCW: Frequent Itemset Based Text Clustering with Window Constraint

    Institute of Scientific and Technical Information of China (English)

    ZHOU Chong; LU Yansheng; ZOU Lei; HU Rong

    2006-01-01

    Most of the existing text clustering algorithms overlook the fact that one document is a word sequence with semantic information.There is some important semantic information existed in the positions of words in the sequence.In this paper, a novel method named Frequent Itemset-based Clustering with Window (FICW) was proposed, which makes use of the semantic information for text clustering with a window constraint.The experimental results obtained from tests on three (hypertext) text sets show that FICW outperforms the method compared in both clustering accuracy and efficiency.

  14. A Practical Optimisation Method to Improve QOS and GOS-Based Key Performance Indicators in GSM Network Cell Cluster Environment [

    Directory of Open Access Journals (Sweden)

    Joseph Isabona

    2014-11-01

    Full Text Available The delivering of both good quality of service (QoS and Grade of Service (GoS in any competitive mob ile communication environment is a major factor to redu cing subscribers’ churn rate. Therefore, it is important for wireless mobile network operators to ensure stability and efficiency by delivering a consistent, reliable and high-quality end user (sub scriber satisfaction. This can only be achieve by conducting a regular network performance monitoring and optimisation as it directly impacts the qualit y of the offered services and hence user satisfaction. I n this paper, we present the results of network performance evaluation and optimisation of a GSM ne twork on cell cluster-basis, in Asaba region, South East Nigeria. We employ a combination of essential key performance indicators such as dropped call rat e, call setup success rate and outage call rate to exa mine overall QoS and GoS performance of the GSM network. Our results after network optimisation sho wed significant performance improvement in terms of call drop rate, call set up success rate, and call block rate across. Specifically, the end user satis faction rate has increased from 94.45%, 87.74%, and 92.85% to 99.05%, 95.38% and 99.03% respectively across the three GSM cell clusters. The GoS is reduced fro m 3.33%, 6.60% and 2.38% to 0.00%, 3.70% and 0.00% respectively. Furthermore, ESA, which corresp ond end points service availability, has improved from 94.44%, 93.40% and 97.62% to 100%, 96.30% and 100% respectively. In addition, the average throughput has improved from 73.74kbits/s, 85.06kbi ts/s and 87.54kbits/s to 77.07kbits/s, 92.38kbits/s and 102kbits/s respectively across the three GSM cell c lusters.

  15. Web Document Clustering Using Cuckoo Search Clustering Algorithm based on Levy Flight

    Directory of Open Access Journals (Sweden)

    Moe Moe Zaw

    2013-09-01

    Full Text Available The World Wide Web serves as a huge widely distributed global information service center. The tremendous amount of information on the web is improving day by day. So, the process of finding the relevant information on the web is a major challenge in Information Retrieval. This leads the need for the development of new techniques for helping users to effectively navigate, summarize and organize the overwhelmed information. One of the techniques that can play an important role towards the achievement of this objective is web document clustering. This paper aims to develop a clustering algorithm and apply in web document clustering area. The Cuckoo Search Optimization algorithm is a recently developed optimization algorithm based on the obligate behavior of some cuckoo species in combining with the levy flight. In this paper, Cuckoo Search Clustering Algorithm based on levy flight is proposed. This algorithm is the application of Cuckoo Search Optimization algorithm in web document clustering area to locate the optimal centroids of the cluster and to find global solution of the clustering algorithm. For testing the performance of the proposed method, this paper will show the experience result by using the benchmark dataset. The result obtained shows that the Cuckoo Search Clustering algorithm based on Levy Flight performs well in web document clustering.

  16. Mapping Cigarettes Similarities using Cluster Analysis Methods

    Directory of Open Access Journals (Sweden)

    Lorentz Jäntschi

    2007-09-01

    Full Text Available The aim of the research was to investigate the relationship and/or occurrences in and between chemical composition information (tar, nicotine, carbon monoxide, market information (brand, manufacturer, price, and public health information (class, health warning as well as clustering of a sample of cigarette data. A number of thirty cigarette brands have been analyzed. Six categorical (cigarette brand, manufacturer, health warnings, class and four continuous (tar, nicotine, carbon monoxide concentrations and package price variables were collected for investigation of chemical composition, market information and public health information. Multiple linear regression and two clusterization techniques have been applied. The study revealed interesting remarks. The carbon monoxide concentration proved to be linked with tar and nicotine concentration. The applied clusterization methods identified groups of cigarette brands that shown similar characteristics. The tar and carbon monoxide concentrations were the main criteria used in clusterization. An analysis of a largest sample could reveal more relevant and useful information regarding the similarities between cigarette brands.

  17. A scanning method for detecting clustering pattern of both attribute and structure in social networks

    Science.gov (United States)

    Wang, Tai-Chi; Phoa, Frederick Kin Hing

    2016-03-01

    Community/cluster is one of the most important features in social networks. Many cluster detection methods were proposed to identify such an important pattern, but few were able to identify the statistical significance of the clusters by considering the likelihood of network structure and its attributes. Based on the definition of clustering, we propose a scanning method, originated from analyzing spatial data, for identifying clusters in social networks. Since the properties of network data are more complicated than those of spatial data, we verify our method's feasibility via simulation studies. The results show that the detection powers are affected by cluster sizes and connection probabilities. According to our simulation results, the detection accuracy of structure clusters and both structure and attribute clusters detected by our proposed method is better than that of other methods in most of our simulation cases. In addition, we apply our proposed method to some empirical data to identify statistically significant clusters.

  18. Comparing the performance of biomedical clustering methods

    DEFF Research Database (Denmark)

    Wiwie, Christian; Baumbach, Jan; Röttger, Richard

    2015-01-01

    Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene......-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art....

  19. Analyzing and Optimizing ANT-Clustering Algorithm by Using Numerical Methods for Efficient Data Mining

    Directory of Open Access Journals (Sweden)

    Md. Asikur Rahman

    2012-10-01

    Full Text Available Clustering analysis is an important function of data mining. There are various clustering methods in DataMining. Based on these methods various clustering algorithms are developed. Ant-clustering algorithm isone of such approaches that perform cluster analysis based on “Swarm Intelligence’. Existing antclusteringalgorithm uses two user defined parameters to calculate the picking-up probability and droppingprobability those are used to form the cluster. But, use of user defined parameters may lead to form aninaccurate cluster. It is difficult to anticipate about the value of the user defined parameters in advance toform the cluster because of the diversified characteristics of the dataset. In this paper, we have analyzedthe existing ant-clustering algorithm and then numerical analysis method of linear equation is proposedbased on the characteristics of the dataset that does not need any user defined parameters to form theclusters. Results of numerical experiments on synthetic datasets demonstrate the effectiveness of theproposed method.

  20. ANALYZING AND OPTIMIZING ANT-CLUSTERING ALGORITHM BY USING NUMERICAL METHODS FOR EFFICIENT DATA MINING

    Directory of Open Access Journals (Sweden)

    Md. Asikur Rahman

    2012-09-01

    Full Text Available Clustering analysis is an important function of data mining. There are various clustering methods in DataMining. Based on these methods various clustering algorithms are developed. Ant-clustering algorithm isone of such approaches that perform cluster analysis based on “Swarm Intelligence’. Existing antclusteringalgorithm uses two user defined parameters to calculate the picking-up probability and droppingprobability those are used to form the cluster. But, use of user defined parameters may lead to form aninaccurate cluster. It is difficult to anticipate about the value of the user defined parameters in advance toform the cluster because of the diversified characteristics of the dataset. In this paper, we have analyzedthe existing ant-clustering algorithm and then numerical analysis method of linear equation is proposedbased on the characteristics of the dataset that does not need any user defined parameters to form theclusters. Results of numerical experiments on synthetic datasets demonstrate the effectiveness of theproposed method.

  1. Recent advances in coupled-cluster methods

    CERN Document Server

    Bartlett, Rodney J

    1997-01-01

    Today, coupled-cluster (CC) theory has emerged as the most accurate, widely applicable approach for the correlation problem in molecules. Furthermore, the correct scaling of the energy and wavefunction with size (i.e. extensivity) recommends it for studies of polymers and crystals as well as molecules. CC methods have also paid dividends for nuclei, and for certain strongly correlated systems of interest in field theory.In order for CC methods to have achieved this distinction, it has been necessary to formulate new, theoretical approaches for the treatment of a variety of essential quantities

  2. Structure based alignment and clustering of proteins (STRALCP)

    Science.gov (United States)

    Zemla, Adam T.; Zhou, Carol E.; Smith, Jason R.; Lam, Marisa W.

    2013-06-18

    Disclosed are computational methods of clustering a set of protein structures based on local and pair-wise global similarity values. Pair-wise local and global similarity values are generated based on pair-wise structural alignments for each protein in the set of protein structures. Initially, the protein structures are clustered based on pair-wise local similarity values. The protein structures are then clustered based on pair-wise global similarity values. For each given cluster both a representative structure and spans of conserved residues are identified. The representative protein structure is used to assign newly-solved protein structures to a group. The spans are used to characterize conservation and assign a "structural footprint" to the cluster.

  3. Eros-based Fuzzy Cluster Method for Longitudual Data%基于Eros距离的纵向数据模糊聚类方法

    Institute of Scientific and Technical Information of China (English)

    李会民; 闫健卓; 方丽英; 王普

    2013-01-01

    Considering the characteristics of longitudinal data set,such as multi-variates,missing data,unequal series length,and irregular time interval,an algorithm based on Eros distance similarity measure for longitudinal data is proposed.Eros distance is used in Fuzzy-C-Means cluster processing.First,preprocessing is done for unbalance longitudinal data set,which includes filling the missing data,reducing the randaut attributes,etc.Second,FErosCM Cluster method is used for claasification automatically,and takes into account information entropy for assessing the performance of cluster algorithm.Experiments show that this method is effective and efficient for longitudinal data classification.%针对纵向数据集的数据特征,如多维、含缺失值、序列不等间隔和不全等长等特点,研究一种基于Eros距离的纵向数据的相似性度量方法,并对模糊C均值聚类算法进行改进,提出一种基于Eros距离度量的模糊聚类数据处理方法.对于纵向数据集,首先进行缺失值填充、变量标准化等预处理,使用粗糙集理论对冗余属性进行约简,然后基于FErosCM聚类方法进行数据自动分类.对比实验证实此方法可用于纵向数据集的自动聚类处理,并使用信息熵作为聚类效果的评价手段.实验结果表明:无论在聚类效率还是准确度上,FErosCM方法对于纵向数据的分类处理均是有效可行的.

  4. A Cluster Based Approach for Classification of Web Results

    Directory of Open Access Journals (Sweden)

    Apeksha Khabia

    2014-12-01

    Full Text Available Nowadays significant amount of information from web is present in the form of text, e.g., reviews, forum postings, blogs, news articles, email messages, web pages. It becomes difficult to classify documents in predefined categories as the number of document grows. Clustering is the classification of a data into clusters, so that the data in each cluster share some common trait – often vicinity according to some defined measure. Underlying distribution of data set can somewhat be depicted based on the learned clusters under the guidance of initial data set. Thus, clusters of documents can be employed to train the classifier by using defined features of those clusters. One of the important issues is also to classify the text data from web into different clusters by mining the knowledge. Conforming to that, this paper presents a review on most of document clustering technique and cluster based classification techniques used so far. Also pre-processing on text dataset and document clustering method is explained in brief.

  5. 基于直觉模糊聚类的Web资源推荐方法%Web resource recommendation method based on intuitive fuzzy clustering

    Institute of Scientific and Technical Information of China (English)

    肖满生; 汪新凡; 周丽娟

    2012-01-01

    在Web资源分类中,针对传统基于用户兴趣的方法不能准确反映用户兴趣的变化以及难以区分资源内容的品质和风格等问题,提出一种基于直觉模糊C均值聚类的Web资源聚类推荐方法.该方法首先根据用户兴趣度将Web资源表示为直觉模糊数,然后应用直觉模糊信息集成理论进行资源分类,最后实现向用户推荐相似或相近资源.理论分析和实验表明,该方法比传统的模糊C均值以及协同过滤方法在推荐质量上有很大的提高.%In the classification of the Web resources, a recommending method of Web resources based on intuitive fuzzy C-means clustering was proposed to solve the problem that the traditional method based on user interest cannot reflect the change of their interests accurately and the difficulty in distinguishing the quality and the style of content of resources. In the method, firstly, the Web resources were expressed as intuitive fuzzy data according to the user interest degree. Then the integrated theory of intuitive fuzzy information was applied to classify the resources. Lastly, the similar resources would be recommended to user successfully. Theoretical analysis and experimental results show that this method has a great advantage in improving the quality of recommendation compared with traditional fuzzy C-means and collaborative filtering method.

  6. Fuzzy Clustering - Principles, Methods and Examples

    DEFF Research Database (Denmark)

    Kroszynski, Uri; Zhou, Jianjun

    1998-01-01

    One of the most remarkable advances in the field of identification and control of systems -in particular mechanical systems- whose behaviour can not be described by means of the usual mathematical models, has been achieved by the application of methods of fuzzy theory.In the framework of a study...... about identification of "black-box" properties by analysis of system input/output data sets, we have prepared an introductory note on the principles and the most popular data classification methods used in fuzzy modeling. This introductory note also includes some examples that illustrate the use of the...... methods. The examples were solved by hand and served as a test bench for exploration of the MATLAB capabilities included in the Fuzzy Control Toolbox. The fuzzy clustering methods described include Fuzzy c-means (FCM), Fuzzy c-lines (FCL) and Fuzzy c-elliptotypes (FCE)....

  7. Clustering-Based PU Active Text Classification Method%一种基于聚类的PU主动文本分类方法

    Institute of Scientific and Technical Information of China (English)

    刘露; 彭涛; 左万利; 戴耀康

    2013-01-01

    文本分类是信息检索的关键问题之一。提取更多的可信反例和构造准确高效的分类器是 PU(positive and unlabeled)文本分类的两个重要问题。然而,在现有的可信反例提取方法中,很多方法提取的可信反例数量较少,构建的分类器质量有待提高。分别针对这两个重要步骤提供了一种基于聚类的半监督主动分类方法。与传统的反例提取方法不同,利用聚类技术和正例文档应与反例文档共享尽可能少的特征项这一特点,从未标识数据集中尽可能多地移除正例,从而可以获得更多的可信反例。结合 SVM 主动学习和改进的 Rocchio 构建分类器,并采用改进的TFIDF(term frequency inverse document frequency)进行特征提取,可以显著提高分类的准确度。分别在3个不同的数据集中测试了分类结果(RCV1,Reuters-21578,20 Newsgoups)。实验结果表明,基于聚类寻找可信反例可以在保持较低错误率的情况下获取更多的可信反例,而且主动学习方法的引入也显著提升了分类精度。%Text classification is a key technology in information retrieval. Collecting more reliable negative examples, and building effective and efficient classifiers are two important problems for automatic text classification. However, the existing methods mostly collect a small number of reliable negative examples, keeping the classifiers from reaching high accuracy. In this paper, a clustering-based method for automatic PU (positive and unlabeled) text classification enhanced by SVM active learning is proposed. In contrast to traditional methods, this approach is based on the clustering technique which employs the characteristic that positive and negative examples should share as few words as possible. It finds more reliable negative examples by removing as many probable positive examples from unlabeled set as possible. In the process of building classifier, a term weighting scheme TFIPNDF (term

  8. Document Clustering based on Topic Maps

    CERN Document Server

    Rafi, Muhammad; Farooq, Amir; 10.5120/1640-2204

    2011-01-01

    Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next challenge lies in semantically performing clustering based on the semantic contents of the document. The problem of document clustering has two main components: (1) to represent the document in such a form that inherently captures semantics of the text. This may also help to reduce dimensionality of the document, and (2) to define a similarity measure based on the semantic representation such that it assigns higher numerical values to document pairs which have higher semantic relationship. Feature space of the documents can be very challenging for document clustering. A document may contain multiple topics, it may contain a large set of class-independent general-words, and a handful class-specific core-words. With these features in mind, traditional agglomerative clustering algori...

  9. Summarization and Matching of Density-Based Clusters in Streaming Environments

    CERN Document Server

    Yang, Di; Ward, Matthew O

    2011-01-01

    Density-based cluster mining is known to serve a broad range of applications ranging from stock trade analysis to moving object monitoring. Although methods for efficient extraction of density-based clusters have been studied in the literature, the problem of summarizing and matching of such clusters with arbitrary shapes and complex cluster structures remains unsolved. Therefore, the goal of our work is to extend the state-of-art of density-based cluster mining in streams from cluster extraction only to now also support analysis and management of the extracted clusters. Our work solves three major technical challenges. First, we propose a novel multi-resolution cluster summarization method, called Skeletal Grid Summarization (SGS), which captures the key features of density-based clusters, covering both their external shape and internal cluster structures. Second, in order to summarize the extracted clusters in real-time, we present an integrated computation strategy C-SGS, which piggybacks the generation of...

  10. Assessment of climate change impacts on watershed in cold-arid region: an integrated multi-GCM-based stochastic weather generator and stepwise cluster analysis method

    Science.gov (United States)

    Zhuang, X. W.; Li, Y. P.; Huang, G. H.; Liu, J.

    2016-07-01

    An integrated multi-GCM-based stochastic weather generator and stepwise cluster analysis (MGCM-SWG-SCA) method is developed, through incorporating multiple global climate models (MGCM), stochastic weather generator (SWG), and stepwise-clustered hydrological model (SCHM) within a general framework. MGCM-SWG-SCA can investigate uncertainties of projected climate changes as well as create watershed-scale climate projections from large-scale variables. It can also assess climate change impacts on hydrological processes and capture nonlinear relationship between input variables and outputs in watershed systems. MGCM-SWG-SCA is then applied to the Kaidu watershed with cold-arid characteristics in the Xinjiang Uyghur Autonomous Region of northwest China, for demonstrating its efficiency. Results reveal that the variability of streamflow is mainly affected by (1) temperature change during spring, (2) precipitation change during winter, and (3) both temperature and precipitation changes in summer and autumn. Results also disclose that: (1) the projected minimum and maximum temperatures and precipitation from MGCM change with seasons in different ways; (2) various climate change projections can reproduce the seasonal variability of watershed-scale climate series; (3) SCHM can simulate daily streamflow with a satisfactory degree, and a significant increasing trend of streamflow is indicated from future (2015-2035) to validation (2006-2011) periods; (4) the streamflow can vary under different climate change projections. The findings can be explained that, for the Kaidu watershed located in the cold-arid region, glacier melt is mainly related to temperature changes and precipitation changes can directly cause the variability of streamflow.

  11. SOFT CLUSTERING BASED EXPOSITION TO MULTIPLE DICTIONARY BAG OF WORDS

    Directory of Open Access Journals (Sweden)

    K. S. Sujatha

    2012-01-01

    Full Text Available Object classification is a highly important area of computer vision and has many applications including robotics, searching images, face recognition, aiding visually impaired people, censoring images and many more. A new common method of classification that uses features is the Bag of Words approach. In this method a codebook of visual words is created using various clustering methods. For increasing the performance Multiple Dictionaries BoW (MDBoW method that uses more visual words from different independent dictionaries instead of adding more words to the same dictionary was implemented using hard clustering method. Nearest-neighbor assignments are used in hard clustering of features. A given feature may be nearly the same distance from two cluster centers. For a typical hard clustering method, only the slightly nearer neighbor is selected to represent that feature. Thus, the ambiguous features are not well-represented by the visual vocabulary. To address this problem, soft clustering model based Multiple Dictionary Bag of Visual words for image classification is implemented with dictionary generated using modified Fuzzy C-means algorithm using R1 norm. A performance evaluation on images has been done by varying the dictionary size. The proposed method works better when the number of topics and the number of images per topics are more. The results obtained indicate that multiple dictionary bag of words model using fuzzy clustering increases the recognition performance than the baseline method.

  12. An Efficient Fuzzy Clustering-Based Approach for Intrusion Detection

    CERN Document Server

    Nguyen, Huu Hoa; Darmont, Jérôme

    2011-01-01

    The need to increase accuracy in detecting sophisticated cyber attacks poses a great challenge not only to the research community but also to corporations. So far, many approaches have been proposed to cope with this threat. Among them, data mining has brought on remarkable contributions to the intrusion detection problem. However, the generalization ability of data mining-based methods remains limited, and hence detecting sophisticated attacks remains a tough task. In this thread, we present a novel method based on both clustering and classification for developing an efficient intrusion detection system (IDS). The key idea is to take useful information exploited from fuzzy clustering into account for the process of building an IDS. To this aim, we first present cornerstones to construct additional cluster features for a training set. Then, we come up with an algorithm to generate an IDS based on such cluster features and the original input features. Finally, we experimentally prove that our method outperform...

  13. Select and Cluster: A Method for Finding Functional Networks of Clustered Voxels in fMRI

    Science.gov (United States)

    DonGiovanni, Danilo

    2016-01-01

    Extracting functional connectivity patterns among cortical regions in fMRI datasets is a challenge stimulating the development of effective data-driven or model based techniques. Here, we present a novel data-driven method for the extraction of significantly connected functional ROIs directly from the preprocessed fMRI data without relying on a priori knowledge of the expected activations. This method finds spatially compact groups of voxels which show a homogeneous pattern of significant connectivity with other regions in the brain. The method, called Select and Cluster (S&C), consists of two steps: first, a dimensionality reduction step based on a blind multiresolution pairwise correlation by which the subset of all cortical voxels with significant mutual correlation is selected and the second step in which the selected voxels are grouped into spatially compact and functionally homogeneous ROIs by means of a Support Vector Clustering (SVC) algorithm. The S&C method is described in detail. Its performance assessed on simulated and experimental fMRI data is compared to other methods commonly used in functional connectivity analyses, such as Independent Component Analysis (ICA) or clustering. S&C method simplifies the extraction of functional networks in fMRI by identifying automatically spatially compact groups of voxels (ROIs) involved in whole brain scale activation networks.

  14. Criteria of off-diagonal long-range order in Bose and Fermi systems based on the Lee-Yang cluster expansion method

    OpenAIRE

    Sakumichi, Naoyuki; Kawakami, Norio; Ueda, Masahito

    2011-01-01

    The quantum-statistical cluster expansion method of Lee and Yang is extended to investigate off-diagonal long-range order (ODLRO) in one- and multi-component mixtures of bosons or fermions. Our formulation is applicable to both a uniform system and a trapped system without local-density approximation and allows systematic expansions of one- and multi-particle reduced density matrices in terms of cluster functions which are defined for the same system with Boltzmann statistics. Each term in th...

  15. Bases for cluster algebras from surfaces

    CERN Document Server

    Musiker, Gregg; Williams, Lauren

    2011-01-01

    We construct two bases for each cluster algebra coming from a triangulated surface without punctures. We work in the context of a coefficient system coming from a full-rank exchange matrix, for example, principal coefficients.

  16. MANNER OF STOCKS SORTING USING CLUSTER ANALYSIS METHODS

    Directory of Open Access Journals (Sweden)

    Jana Halčinová

    2014-06-01

    Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.

  17. Stigmergy based behavioural coordination for satellite clusters

    Science.gov (United States)

    Tripp, Howard; Palmer, Phil

    2010-04-01

    Multi-platform swarm/cluster missions are an attractive prospect for improved science return as they provide a natural capability for temporal, spatial and signal separation with further engineering and economic advantages. As spacecraft numbers increase and/or the round-trip communications delay from Earth lengthens, the traditional "remote-control" approach begins to break down. It is therefore essential to push control into space; to make spacecraft more autonomous. An autonomous group of spacecraft requires coordination, but standard terrestrial paradigms such as negotiation, require high levels of inter-spacecraft communication, which is nontrivial in space. This article therefore introduces the principals of stigmergy as a novel method for coordinating a cluster. Stigmergy is an agent-based, behavioural approach that allows for infrequent communication with decisions based on local information. Behaviours are selected dynamically using a genetic algorithm onboard. supervisors/ground stations occasionally adjust parameters and disseminate a "common environment" that is used for local decisions. After outlining the system, an analysis of some crucial parameters such as communications overhead and number of spacecraft is presented to demonstrate scalability. Further scenarios are considered to demonstrate the natural ability to deal with dynamic situations such as the failure of spacecraft, changing mission objectives and responding to sudden bursts of high priority tasks.

  18. Recursive Clustering Based Method for Message Structure Extraction%基于递归聚类的报文结构提取方法

    Institute of Scientific and Technical Information of China (English)

    潘瑶; 洪征; 杜有翔; 吴礼发

    2012-01-01

    针对应用层协议报文序列长、结构复杂的特点,提出了一种基于递归聚类的报文结构提取方法。方法首先在基本块级通过渐近多序列比对算法对样本集进行递归聚类,在分离不同格式报文的同时,降低了序列比对规模;在报文对齐的基础上,依据对齐字节的取值变化率识别字段边界;提出递归回溯的协议结构分析策略,通过识别格式标识字段实现字段间层次关系的提取。对多种公开协议的分析测试表明,该方法能够得到BNF形式的报文格式,并在提高字段识别准确度的同时减少了时间开销,具有较高的应用价值。%Messages of complex protocols usually have long byte sequences and many structure types, which pose serious challenges to protocol reverse analysis. A recursive clustering based method for message structure extraction was proposed. Firstly, the method recur- sively clustered the messages through progressive multiple sequence alignment in blocks, which separated messages of different struc- tures with smaller scale of sequence alignment. Then, it identified field boundaries according to the rates of change of aligned bytes. Moreover, a new backtracking policy for hierarchical message structure extraction was applied to extract message structures by identif- ying format distinguisher fields. Experiments on several public protocols showed that the proposed method can derive message formats in BNF form and improve the accuracy of field identification with less time overhead.

  19. Clustering Algorithm for Unsupervised Monaural Musical Sound Separation Based on Non-negative Matrix Factorization

    Science.gov (United States)

    Park, Sang Ha; Lee, Seokjin; Sung, Koeng-Mo

    Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.

  20. 基于信息熵的专家聚类赋权方法%Method for determining experts' weights based on entropy and cluster analysis

    Institute of Scientific and Technical Information of China (English)

    周漩; 张凤鸣; 惠晓滨; 李克武

    2011-01-01

    According to the methods of determining experts' weights in group decision-making, the existing methods take into account the consistency of experts' collating vectors, but it is lack of the measure of its information similarity. So it may occur that although the collating vector is similar to the group consensus, information uncertainty is great of a certain expert. However, it is given the same weight to the other experts. For this, a method for deriving experts' weights based on entropy and cluster analysis is proposed, in which the collating vectors of all experts are classified with information similarity coefficient, and the experts' weights are determined according to the result of classification and entropy of collating vectors.Finally, a numerical example shows that the method is effective and feasible.%鉴于群组决策专家赋权方法研究中,现有赋权方法虽然考虑了专家给出的排序向量的一致性,但缺乏对排序向量信息相似性的度量,导致可能出现排序向量与群体共识相近,但信息不确定性较大的专家被赋予了与其他专家相同权重的问题.基于此,提出一种基于信息熵的专家聚类赋权方法,运用信息相似系数对排序向量进行聚类分析,根据聚类结果和排序向量的信息熵来确定专家的权重.具体算例表明,该方法有效且可行.

  1. 基于模糊聚类的神经网络虫情预测%Study on Pests Forecasting Using the Method of Neural Network Based on Fuzzy Clustering

    Institute of Scientific and Technical Information of China (English)

    韦艳玲

    2009-01-01

    Aimed to the characters of pests forecast such as fuzziness, correlation, nonlinear and real-time as well as decline of generalization capacity of neural network in prediction with few observations, a method of pests forecasting using the method of neural network based on fuzzy clustering was proposed in this experiment. The simulation results demonstrated that the method was simple and practical and could forecast pests fast and accurately, particularly, the method could obtain good results with few samples and samples correlation.

  2. Web User Clustering Method Based on Multiple Evaluating Factors%基于多重评价因素的Web用户聚类方法

    Institute of Scientific and Technical Information of China (English)

    吴金桥; 曹奇英; 柯夏燕; 庄怡雯

    2011-01-01

    介绍Web日志挖掘的预处理过程,其中包括数据清理、站点拓扑识别、用户识别、会话识别、页面过滤和路径补充.针对无引用域记录日志的路径补充间题,提出并实现一种基于网站拓扑图的路径补充算法.讨论一种综合多重评价因素的用户相似度计算方法,并将其应用于Web用户聚类操作.使用Davies-Bouldin指标衡量聚类的效果并给出实验结果.%The paper introduces the pre-processing procedure, which includes data cleaning, Website topology identification, user identification,session identification, page filtering and path completion. With respect to logs without reference record, a path completion algorithm based on Website topology is put forward and implemented. A multi-factor user similarity computing method is introduced and applies on Web user clustering.Davies-Bouldin index is used to evaluate to effectiveness of the experiment results.

  3. The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    Chen Yidong

    2004-01-01

    Full Text Available An unsupervised data clustering method, called the local maximum clustering (LMC method, is proposed for identifying clusters in experiment data sets based on research interest. A magnitude property is defined according to research purposes, and data sets are clustered around each local maximum of the magnitude property. By properly defining a magnitude property, this method can overcome many difficulties in microarray data clustering such as reduced projection in similarities, noises, and arbitrary gene distribution. To critically evaluate the performance of this clustering method in comparison with other methods, we designed three model data sets with known cluster distributions and applied the LMC method as well as the hierarchic clustering method, the -mean clustering method, and the self-organized map method to these model data sets. The results show that the LMC method produces the most accurate clustering results. As an example of application, we applied the method to cluster the leukemia samples reported in the microarray study of Golub et al. (1999.

  4. Market Segmentation Using Bayesian Model Based Clustering

    OpenAIRE

    Van Hattum, P.

    2009-01-01

    This dissertation deals with two basic problems in marketing, that are market segmentation, which is the grouping of persons who share common aspects, and market targeting, which is focusing your marketing efforts on one or more attractive market segments. For the grouping of persons who share common aspects a Bayesian model based clustering approach is proposed such that it can be applied to data sets that are specifically used for market segmentation. The cluster algorithm can handle very l...

  5. Clustering Methods Application for Customer Segmentation to Manage Advertisement Campaign

    OpenAIRE

    Maciej Kutera; Mirosława Lasek

    2010-01-01

    Clustering methods are recently so advanced elaborated algorithms for large collection data analysis that they have been already included today to data mining methods. Clustering methods are nowadays larger and larger group of methods, very quickly evolving and having more and more various applications. In the article, our research concerning usefulness of clustering methods in customer segmentation to manage advertisement campaign is presented. We introduce results obtained by using four sel...

  6. A Clustering Ensemble approach based on the similarities in 2-mode social networks

    Institute of Scientific and Technical Information of China (English)

    SU Bao-ping; ZHANG Meng-jie

    2014-01-01

    For a particular clustering problems, selecting the best clustering method is a challenging problem.Research suggests that integrate the multiple clustering can improve the accuracy of clustering ensemble greatly. A new clustering ensemble approach based on the similarities in 2-mode networks is proposed in this paper. First of all, the data object and the initial clustering clusters transform into 2-mode networks, then using the similarities in 2-mode networks to calculate the similarity between different clusters iteratively to refine the adjacency matrix , K-means algorithm is finally used to get the final clustering, then obtain the final clustering results.The method effectively use the similarity between different clusters, example shows the feasibility of this method.

  7. Analysis of Massive Emigration from Poland: The Model-Based Clustering Approach

    Science.gov (United States)

    Witek, Ewa

    The model-based approach assumes that data is generated by a finite mixture of probability distributions such as multivariate normal distributions. In finite mixture models, each component of probability distribution corresponds to a cluster. The problem of determining the number of clusters and choosing an appropriate clustering method becomes the problem of statistical model choice. Hence, the model-based approach provides a key advantage over heuristic clustering algorithms, because it selects both the correct model and the number of clusters.

  8. Unbiased methods for removing systematics from galaxy clustering measurements

    CERN Document Server

    Elsner, Franz; Peiris, Hiranya V

    2015-01-01

    Measuring the angular clustering of galaxies as a function of redshift is a powerful method for tracting information from the three-dimensional galaxy distribution. The precision of such measurements will dramatically increase with ongoing and future wide-field galaxy surveys. However, these are also increasingly sensitive to observational and astrophysical contaminants. Here, we study the statistical properties of three methods proposed for controlling such systematics - template subtraction, basic mode projection, and extended mode projection - all of which make use of externally supplied template maps, designed to characterise and capture the spatial variations of potential systematic effects. Based on a detailed mathematical analysis, and in agreement with simulations, we find that the template subtraction method in its original formulation returns biased estimates of the galaxy angular clustering. We derive closed-form expressions that should be used to correct results for this shortcoming. Turning to th...

  9. 基于灰色聚类的管网水质评价%Water Quality of Pipe Network Based on the Grey Clustering Method

    Institute of Scientific and Technical Information of China (English)

    李明

    2011-01-01

    The water quality of pipe network can be seen as a grey water system, which can be evaluated by using the grey clustering approach to water quality of pipe network.The Grey clustering method can overcome the disadvantages of traditional method of evaluating many factors and indexes a single value.Guangzhou network is exemplified to assess the water quality of pipe network.The results show that,the grey clustering method can use a small number of samples to assess pipe net levels of water quality,consequently obtaining the water quality testing point,which is very convenient to obtain information on the status of each water quality testing point.%管网水质可以视为一个灰色系统,运用灰色聚类方法可对管网水质进行评价。灰色聚类方法克服了传统的用单一值评价多因素多指标问题的弊病。以广州市管网水质为实例,对其管网水质进行评估。结果表明,灰色聚类方法可采用数量较少的样本对管网水质的等级进行评估,从而为各测点水质状况信息的获取提供了便利。

  10. Risk assessment of water pollution sources based on an integrated k-means clustering and set pair analysis method in the region of Shiyan, China.

    Science.gov (United States)

    Li, Chunhui; Sun, Lian; Jia, Junxiang; Cai, Yanpeng; Wang, Xuan

    2016-07-01

    Source water areas are facing many potential water pollution risks. Risk assessment is an effective method to evaluate such risks. In this paper an integrated model based on k-means clustering analysis and set pair analysis was established aiming at evaluating the risks associated with water pollution in source water areas, in which the weights of indicators were determined through the entropy weight method. Then the proposed model was applied to assess water pollution risks in the region of Shiyan in which China's key source water area Danjiangkou Reservoir for the water source of the middle route of South-to-North Water Diversion Project is located. The results showed that eleven sources with relative high risk value were identified. At the regional scale, Shiyan City and Danjiangkou City would have a high risk value in term of the industrial discharge. Comparatively, Danjiangkou City and Yunxian County would have a high risk value in terms of agricultural pollution. Overall, the risk values of north regions close to the main stream and reservoir of the region of Shiyan were higher than that in the south. The results of risk level indicated that five sources were in lower risk level (i.e., level II), two in moderate risk level (i.e., level III), one in higher risk level (i.e., level IV) and three in highest risk level (i.e., level V). Also risks of industrial discharge are higher than that of the agricultural sector. It is thus essential to manage the pillar industry of the region of Shiyan and certain agricultural companies in the vicinity of the reservoir to reduce water pollution risks of source water areas. PMID:27016678

  11. Seniority-based coupled cluster theory

    CERN Document Server

    Henderson, Thomas M; Stein, Tamar; Scuseria, Gustavo E

    2014-01-01

    Doubly occupied configuration interaction (DOCI) with optimized orbitals often accurately describes strong correlations while working in a Hilbert space much smaller than that needed for full configuration interaction. However, the scaling of such calculations remains combinatorial with system size. Pair coupled cluster doubles (pCCD) is very successful in reproducing DOCI energetically, but can do so with low polynomial scaling ($N^3$, disregarding the two-electron integral transformation from atomic to molecular orbitals). We show here several examples illustrating the success of pCCD in reproducing both the DOCI energy and wave function, and show how this success frequently comes about. What DOCI and pCCD lack are an effective treatment of dynamic correlations, which we here add by including higher-seniority cluster amplitudes which are excluded from pCCD. This frozen pair coupled cluster approach is comparable in cost to traditional closed-shell coupled cluster methods with results that are competitive fo...

  12. Model-based clustering of array CGH data

    OpenAIRE

    Shah, Sohrab P.; Cheung, K-John; Johnson, Nathalie A.; Alain, Guillaume; Gascoyne, Randy D.; Horsman, Douglas E.; Ng, Raymond T.; Murphy, Kevin P.

    2009-01-01

    Motivation: Analysis of array comparative genomic hybridization (aCGH) data for recurrent DNA copy number alterations from a cohort of patients can yield distinct sets of molecular signatures or profiles. This can be due to the presence of heterogeneous cancer subtypes within a supposedly homogeneous population. Results: We propose a novel statistical method for automatically detecting such subtypes or clusters. Our approach is model based: each cluster is defined in terms of a sparse profile...

  13. Cluster-based control of nonlinear dynamics

    CERN Document Server

    Kaiser, Eurika; Spohn, Andreas; Cattafesta, Louis N; Morzynski, Marek

    2016-01-01

    The ability to manipulate and control fluid flows is of great importance in many scientific and engineering applications. Here, a cluster-based control framework is proposed to determine optimal control laws with respect to a cost function for unsteady flows. The proposed methodology frames high-dimensional, nonlinear dynamics into low-dimensional, probabilistic, linear dynamics which considerably simplifies the optimal control problem while preserving nonlinear actuation mechanisms. The data-driven approach builds upon a state space discretization using a clustering algorithm which groups kinematically similar flow states into a low number of clusters. The temporal evolution of the probability distribution on this set of clusters is then described by a Markov model. The Markov model can be used as predictor for the ergodic probability distribution for a particular control law. This probability distribution approximates the long-term behavior of the original system on which basis the optimal control law is de...

  14. Query Expansion Based on Clustered Results

    CERN Document Server

    Liu, Ziyang; Chen, Yi

    2011-01-01

    Query expansion is a functionality of search engines that suggests a set of related queries for a user-issued keyword query. Typical corpus-driven keyword query expansion approaches return popular words in the results as expanded queries. Using these approaches, the expanded queries may correspond to a subset of possible query semantics, and thus miss relevant results. To handle ambiguous queries and exploratory queries, whose result relevance is difficult to judge, we propose a new framework for keyword query expansion: we start with clustering the results according to user specified granularity, and then generate expanded queries, such that one expanded query is generated for each cluster whose result set should ideally be the corresponding cluster. We formalize this problem and show its APX-hardness. Then we propose two efficient algorithms named iterative single-keyword refinement and partial elimination based convergence, respectively, which effectively generate a set of expanded queries from clustered r...

  15. Logistics Enterprise Evaluation Model Based On Fuzzy Clustering Analysis

    Science.gov (United States)

    Fu, Pei-hua; Yin, Hong-bo

    In this thesis, we introduced an evaluation model based on fuzzy cluster algorithm of logistics enterprises. First of all,we present the evaluation index system which contains basic information, management level, technical strength, transport capacity,informatization level, market competition and customer service. We decided the index weight according to the grades, and evaluated integrate ability of the logistics enterprises using fuzzy cluster analysis method. In this thesis, we introduced the system evaluation module and cluster analysis module in detail and described how we achieved these two modules. At last, we gave the result of the system.

  16. 新的模糊核聚类入侵检测方法%New intrusion detection method based on fuzzy kernel clustering algorithm

    Institute of Scientific and Technical Information of China (English)

    刘永芬; 陈志安

    2012-01-01

    To solve the problem of high cost in labeling the data artificially and that of the dimension effect by traditional clustering method, this paper proposes a new fuzzy support vector clustering algorithm to cope with unlabeled data. Through combining .K-means and DBSCAN algorithm to generate association matrix, setting the threshold value of constraint term to get the initial clustering, and using the fuzzy support vector domain description, the final result is achieved. The contrast experiment shows the feasibility and effectiveness of this method.%针对人工标记数据类别代价太高以及传统聚类方法在处理高维数据时产生的维度效应,提出了一种针对无标签数据的新型模糊核聚类方法.通过将K-means与DBSCAN聚类算法相结合生成关联矩阵,设置约束条件的阈值得到初始聚类结果,并在模糊支持向量数据描述方法的基础上完成聚类过程.通过在网络连接数据的对比实验,验证了该方法的可行性与有效性.

  17. LCoMotion – Learning, Cognition and Motion; a multicomponent cluster randomized school-based intervention aimed at increasing learning and cognition - rationale, design and methods

    OpenAIRE

    Bugge, Anna; Tarp, Jakob; Østergaard, Lars; Domazet, Sidsel Louise; Andersen, Lars Bo; Froberg, Karsten

    2014-01-01

    Background The aim of the study; LCoMotion – Learning, Cognition and Motion was to develop, document, and evaluate a multi-component physical activity (PA) intervention in public schools in Denmark. The primary outcome was cognitive function. Secondary outcomes were academic skills, body composition, aerobic fitness and PA. The primary aim of the present paper was to describe the rationale, design and methods of the LCoMotion study. Methods/Design LCoMotion was designed as a cluster-randomize...

  18. Coupled Cluster Evaluation of the Stability of Atmospheric Acid-Base Clusters with up to 10 Molecules.

    Science.gov (United States)

    Myllys, Nanna; Elm, Jonas; Halonen, Roope; Kurtén, Theo; Vehkamäki, Hanna

    2016-02-01

    We investigate the utilization of the domain local pair natural orbital coupled cluster (DLPNO-CCSD(T)) method for calculating binding energies of atmospherical molecular clusters. Applied to small complexes of atmospherical relevance we find that the DLPNO method significantly reduces the scatter in the binding energy, which is commonly present in DFT calculations. For medium sized clusters consisting of sulfuric acid and bases the DLPNO method yields a systematic underestimation of the binding energy compared to canonical coupled cluster results. The errors in the DFT binding energies appear to be more random, while the systematic nature of the DLPNO results allows the establishment of a scaling factor, to better mimic the canonical coupled cluster calculations. Based on the trends identified for the small and medium sized systems, we further extend the application of the DLPNO method to large acid - base clusters consisting of up to 10 molecules, which have previously been out of reach with accurate coupled cluster methods. Using the Atmospheric Cluster Dynamics Code (ACDC) we compare the sulfuric acid dimer formation based on the new DLPNO binding energies with previously published RI-CC2/aug-cc-pV(T+d)Z results. We also compare the simulated sulfuric acid dimer concentration as a function of the base concentration with measurement data from the CLOUD chamber and flow tube experiments. The DLPNO method, even after scaling, underpredicts the dimer concentration significantly. Reasons for this are discussed. PMID:26771121

  19. Ontology-based topic clustering for online discussion data

    Science.gov (United States)

    Wang, Yongheng; Cao, Kening; Zhang, Xiaoming

    2013-03-01

    With the rapid development of online communities, mining and extracting quality knowledge from online discussions becomes very important for the industrial and marketing sector, as well as for e-commerce applications and government. Most of the existing techniques model a discussion as a social network of users represented by a user-based graph without considering the content of the discussion. In this paper we propose a new multilayered mode to analysis online discussions. The user-based and message-based representation is combined in this model. A novel frequent concept sets based clustering method is used to cluster the original online discussion network into topic space. Domain ontology is used to improve the clustering accuracy. Parallel methods are also used to make the algorithms scalable to very large data sets. Our experimental study shows that the model and algorithms are effective when analyzing large scale online discussion data.

  20. Model-based clustering in networks with Stochastic Community Finding

    CERN Document Server

    McDaid, Aaron F; Friel, Nial; Hurley, Neil J

    2012-01-01

    In the model-based clustering of networks, blockmodelling may be used to identify roles in the network. We identify a special case of the Stochastic Block Model (SBM) where we constrain the cluster-cluster interactions such that the density inside the clusters of nodes is expected to be greater than the density between clusters. This corresponds to the intuition behind community-finding methods, where nodes tend to clustered together if they link to each other. We call this model Stochastic Community Finding (SCF) and present an efficient MCMC algorithm which can cluster the nodes, given the network. The algorithm is evaluated on synthetic data and is applied to a social network of interactions at a karate club and at a monastery, demonstrating how the SCF finds the 'ground truth' clustering where sometimes the SBM does not. The SCF is only one possible form of constraint or specialization that may be applied to the SBM. In a more supervised context, it may be appropriate to use other specializations to guide...

  1. Graph-based clustering and data visualization algorithms

    CERN Document Server

    Vathy-Fogarassy, Ágnes

    2013-01-01

    This work presents a data visualization technique that combines graph-based topology representation and dimensionality reduction methods to visualize the intrinsic data structure in a low-dimensional vector space. The application of graphs in clustering and visualization has several advantages. A graph of important edges (where edges characterize relations and weights represent similarities or distances) provides a compact representation of the entire complex data set. This text describes clustering and visualization methods that are able to utilize information hidden in these graphs, based on

  2. Clustering-Based Matrix Factorization

    OpenAIRE

    Mirbakhsh, Nima; Ling, Charles X.

    2013-01-01

    Recommender systems are emerging technologies that nowadays can be found in many applications such as Amazon, Netflix, and so on. These systems help users to find relevant information, recommendations, and their preferred items. Slightly improvement of the accuracy of these recommenders can highly affect the quality of recommendations. Matrix Factorization is a popular method in Recommendation Systems showing promising results in accuracy and complexity. In this paper we propose an extension ...

  3. 基于AP密度聚类方法的雷达辐射源信号识别%Signal Identification of Radar Radiation Source Based on AP Density Clustering Method

    Institute of Scientific and Technical Information of China (English)

    王美玲; 张复春; 杨承志

    2012-01-01

    Signal identification of unknown radar radiation source is always a problem of intelligence analysis of radar countermeasure. Aiming at the shortage that the identification probability is low when the clustering algorithm based on density is used to process non-uniformity samples,this paper combines the algorithm with affinity propagation (AP) clustering algorithm,brings forward an identification method based on AP density clustering method. The method firstly uses AP clustering method to perform the primary clustering to the data samples, then sets up the correlative parameters,uses the algorithm of density based spatial clustering of application with noise (DB- SCAN) to perform secondary clustering. Comparing with original samples,the distribution of primary clustering results is representative and the parameter values adapted for DBSCAN algorithm can be found easily. The method is verified to have better identification probability through the test.%未知雷达辐射源信号识别一直是雷达对抗情报分析中的难题。针对基于密度的聚类算法在处理不均匀样本时识别率较低的缺陷,将该算法与亲和传递(AP)聚类算法结合,提出一种基于AP密度聚类的识别方法。该方法先利用AP聚类方法对数据样本进行初步聚类,再设定相关参数,运用基于密度的带有噪声的空间聚类(DBSCAN)算法进行二次聚类。相对于原样本,初始聚类结果分布具有一定的代表性,容易找到适合DBSCAN方法的参数值。测试表明该方法具有较高的识别率。

  4. Research of Web Documents Clustering Based on Dynamic Concept

    Institute of Scientific and Technical Information of China (English)

    WANG Yun-hua; CHEN Shi-hong

    2004-01-01

    Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge.Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web theme information, this paper proposes and implements dynamic conceptual clustering algorithm and merging algorithm for Web documents, and also analyses the super performance of the clustering algorithm in efficiency and clustering accuracy.

  5. Face Detection Method Based on Semi-supervised Clustering%基于半监督聚类的人脸检测方法

    Institute of Scientific and Technical Information of China (English)

    王燕; 蒋正午

    2012-01-01

    The paper proposes a method of face detection combined color of skin with continuous AdaBoost algorithm. In order to establish skin color model, this paper takes advantage of semi-supervised strategy to guide skin color clustering, and it also proposes a new algorithm SKDK in the process of clustering, skin color model can be established by the probability statistics distribution characteristics of each pixel cluster. On this basis, mathematical morphology of knowledge is used to handle image and find face candidate, which is the input of continuous AdaBoost classifier for final face detection. Experimental results prove that face detection ability of the method is superior to that directly using continuous AdaBoost method for face detection especially in multi-face situation.%将肤色与连续AdaBoost算法相结合进行人脸检测,并引入半监督策略指导肤色聚类从而建立肤色模型.在肤色聚类过程中,提出一种基于半监督的SKDK算法引导肤色聚类,依据各个像素簇的概率统计分布特性得到肤色模型.在此基础上利用数学形态学等知识对图像进行处理,得到人脸候选区域,将其作为连续AdaBoost分类器的输入进行人脸检测.实验结果表明,在多人脸的场景下,该方法的检测效果优于直接使用连续AdaBoost方法进行人脸检测的检测效果.

  6. MHCcluster, a method for functional clustering of MHC molecules

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Lundegaard, Claus; Buus, Søren;

    2013-01-01

    binding specificity. The method has a flexible web interface that allows the user to include any MHC of interest in the analysis. The output consists of a static heat map and graphical tree-based visualizations of the functional relationship between MHC variants and a dynamic TreeViewer interface where...... both the functional relationship and the individual binding specificities of MHC molecules are visualized. We demonstrate that conventional sequence-based clustering will fail to identify the functional relationship between molecules, when applied to MHC system, and only through the use of the...

  7. Web-based Interface in Public Cluster

    CERN Document Server

    Akbar, Z

    2007-01-01

    A web-based interface dedicated for cluster computer which is publicly accessible for free is introduced. The interface plays an important role to enable secure public access, while providing user-friendly computational environment for end-users and easy maintainance for administrators as well. The whole architecture which integrates both aspects of hardware and software is briefly explained. It is argued that the public cluster is globally a unique approach, and could be a new kind of e-learning system especially for parallel programming communities.

  8. TOWARDS MORE ACCURATE CLUSTERING METHOD BY USING DYNAMIC TIME WARPING

    Directory of Open Access Journals (Sweden)

    Khadoudja Ghanem

    2013-03-01

    Full Text Available An intrinsic problem of classifiers based on machine learning (ML methods is that their learning time grows as the size and complexity of the training dataset increases. For this reason, it is important to have efficient computational methods and algorithms that can be applied on large datasets, such that it is still possible to complete the machine learning tasks in reasonable time. In this context, we present in this paper a more accurate simple process to speed up ML methods. An unsupervised clustering algorithm is combined with Expectation, Maximization (EM algorithm to develop an efficient Hidden Markov Model (HMM training. The idea of the proposed process consists of two steps. In the first step, training instances with similar inputs are clustered and a weight factor which represents the frequency of these instances is assigned to each representative cluster. Dynamic Time Warping technique is used as a dissimilarity function to cluster similar examples. In the second step, all formulas in the classical HMM training algorithm (EM associated with the number of training instances are modified to include the weight factor in appropriate terms. This process significantly accelerates HMM training while maintaining the same initial, transition and emission probabilities matrixes as those obtained with the classical HMM training algorithm. Accordingly, the classification accuracy is preserved. Depending on the size of the training set, speedups of up to 2200 times is possible when the size is about 100.000 instances. The proposed approach is not limited to training HMMs, but it can be employed for a large variety of MLs methods.

  9. Classified base planning of tested parts based on factor-considered cluster method%基于携因素聚类方法的被测件分类基规划

    Institute of Scientific and Technical Information of China (English)

    肖新华; 王太勇; 成兵; 李煜; 胡淼

    2013-01-01

    为解决零部件内腔气密性检测设备个性化设计周期长、设计质量不稳定、生产成本高等问题,分析了气密性检测原理及检测设备设计过程,阐述了基于被测件分类基的可适应设计方法.提出一种携因素聚类方法,该方法采用层次分析法确定影响因素权向量,通过设定被测件关于各影响因素的吻合度,得出了被测件对相似度的加权计算方法,给出了聚类准则.阐述了循序聚类的分类基规划过程,并研究了该方法的软件实现.实例表明,该方法可有效规划被测件分类基,为可适应基型设计提供支撑.%To solve the problem of long period,high production cost and unstable quality in gas tightness detecting equipments,the design processes of detecting principle and detecting equipment were analyzed,and the adaptable design method based on classified base was described.A factor-considered cluster method which could determine the weight vector of factors by Analytic Hierarchy Process (AHP) was put forward.Through setting the consistency degree of tested parts on each factor,the weighted algorithm for similarity was obtained,and the cluster norm was given.The process of iterative cluster algorithm was explained,and the software based on the method was developec.Cases showed that the proposed method was effective in planning the classified base,which could be used to support the adaptable product case design.

  10. Clustering-based selective neural network ensemble

    Institute of Scientific and Technical Information of China (English)

    FU Qiang; HU Shang-xu; ZHAO Sheng-ying

    2005-01-01

    An effective ensemble should consist of a set of networks that are both accurate and diverse. We propose a novel clustering-based selective algorithm for constructing neural network ensemble, where clustering technology is used to classify trained networks according to similarity and optimally select the most accurate individual network from each cluster to make up the ensemble. Empirical studies on regression of four typical datasets showed that this approach yields significantly smaller en semble achieving better performance than other traditional ones such as Bagging and Boosting. The bias variance decomposition of the predictive error shows that the success of the proposed approach may lie in its properly tuning the bias/variance trade-offto reduce the prediction error (the sum of bias2 and variance).

  11. An analytic method to compute star cluster luminosity statistics

    Science.gov (United States)

    da Silva, Robert L.; Krumholz, Mark R.; Fumagalli, Michele; Fall, S. Michael

    2014-03-01

    The luminosity distribution of the brightest star clusters in a population of galaxies encodes critical pieces of information about how clusters form, evolve and disperse, and whether and how these processes depend on the large-scale galactic environment. However, extracting constraints on models from these data is challenging, in part because comparisons between theory and observation have traditionally required computationally intensive Monte Carlo methods to generate mock data that can be compared to observations. We introduce a new method that circumvents this limitation by allowing analytic computation of cluster order statistics, i.e. the luminosity distribution of the Nth most luminous cluster in a population. Our method is flexible and requires few assumptions, allowing for parametrized variations in the initial cluster mass function and its upper and lower cutoffs, variations in the cluster age distribution, stellar evolution and dust extinction, as well as observational uncertainties in both the properties of star clusters and their underlying host galaxies. The method is fast enough to make it feasible for the first time to use Markov chain Monte Carlo methods to search parameter space to find best-fitting values for the parameters describing cluster formation and disruption, and to obtain rigorous confidence intervals on the inferred values. We implement our method in a software package called the Cluster Luminosity Order-Statistic Code, which we have made publicly available.

  12. DBCSVM: Density Based Clustering Using Support VectorMachines

    Directory of Open Access Journals (Sweden)

    Santosh Kumar Rai

    2012-07-01

    Full Text Available Data categorization is challenging job in a current scenario. The growth rate of a multimedia data are increase day to day in an internet technology. For the better retrieval and efficient searching of a data, a process required for grouping the data. However, data mining can find out helpful implicit information in large databases. To detect the implicit useful information from large databases various data mining techniques are use. Data clustering is an important data mining technique for grouping data sets into different clusters and each cluster having same properties of data. In this paper we have taken image data sets and firstly applying the density based clustering to grouped the images, density based clustering grouped the images according to the nearest feature sets but not grouped outliers, then we used an important super hyperplane classifier support vector machine (SVM which classify the all outlier left from density based clustering. This method improves the efficiency of image grouping and gives better results.

  13. Semisupervised Clustering for Networks Based on Fast Affinity Propagation

    Directory of Open Access Journals (Sweden)

    Mu Zhu

    2013-01-01

    Full Text Available Most of the existing clustering algorithms for networks are unsupervised, which cannot help improve the clustering quality by utilizing a small number of prior knowledge. We propose a semisupervised clustering algorithm for networks based on fast affinity propagation (SCAN-FAP, which is essentially a kind of similarity metric learning method. Firstly, we define a new constraint similarity measure integrating the structural information and the pairwise constraints, which reflects the effective similarities between nodes in networks. Then, taking the constraint similarities as input, we propose a fast affinity propagation algorithm which keeps the advantages of the original affinity propagation algorithm while increasing the time efficiency by passing only the messages between certain nodes. Finally, by extensive experimental studies, we demonstrate that the proposed algorithm can take fully advantage of the prior knowledge and improve the clustering quality significantly. Furthermore, our algorithm has a superior performance to some of the state-of-art approaches.

  14. Cancer detection based on Raman spectra super-paramagnetic clustering

    Science.gov (United States)

    González-Solís, José Luis; Guizar-Ruiz, Juan Ignacio; Martínez-Espinosa, Juan Carlos; Martínez-Zerega, Brenda Esmeralda; Juárez-López, Héctor Alfonso; Vargas-Rodríguez, Héctor; Gallegos-Infante, Luis Armando; González-Silva, Ricardo Armando; Espinoza-Padilla, Pedro Basilio; Palomares-Anda, Pascual

    2016-08-01

    The clustering of Raman spectra of serum sample is analyzed using the super-paramagnetic clustering technique based in the Potts spin model. We investigated the clustering of biochemical networks by using Raman data that define edge lengths in the network, and where the interactions are functions of the Raman spectra's individual band intensities. For this study, we used two groups of 58 and 102 control Raman spectra and the intensities of 160, 150 and 42 Raman spectra of serum samples from breast and cervical cancer and leukemia patients, respectively. The spectra were collected from patients from different hospitals from Mexico. By using super-paramagnetic clustering technique, we identified the most natural and compact clusters allowing us to discriminate the control and cancer patients. A special interest was the leukemia case where its nearly hierarchical observed structure allowed the identification of the patients's leukemia type. The goal of this study is to apply a model of statistical physics, as the super-paramagnetic, to find these natural clusters that allow us to design a cancer detection method. To the best of our knowledge, this is the first report of preliminary results evaluating the usefulness of super-paramagnetic clustering in the discipline of spectroscopy where it is used for classification of spectra.

  15. A Multidimensional and Multimembership Clustering Method for Social Networks and Its Application in Customer Relationship Management

    Directory of Open Access Journals (Sweden)

    Peixin Zhao

    2013-01-01

    Full Text Available Community detection in social networks plays an important role in cluster analysis. Many traditional techniques for one-dimensional problems have been proven inadequate for high-dimensional or mixed type datasets due to the data sparseness and attribute redundancy. In this paper we propose a graph-based clustering method for multidimensional datasets. This novel method has two distinguished features: nonbinary hierarchical tree and the multi-membership clusters. The nonbinary hierarchical tree clearly highlights meaningful clusters, while the multimembership feature may provide more useful service strategies. Experimental results on the customer relationship management confirm the effectiveness of the new method.

  16. MHCcluster, a method for functional clustering of MHC molecules.

    Science.gov (United States)

    Thomsen, Martin; Lundegaard, Claus; Buus, Søren; Lund, Ole; Nielsen, Morten

    2013-09-01

    The identification of peptides binding to major histocompatibility complexes (MHC) is a critical step in the understanding of T cell immune responses. The human MHC genomic region (HLA) is extremely polymorphic comprising several thousand alleles, many encoding a distinct molecule. The potentially unique specificities remain experimentally uncharacterized for the vast majority of HLA molecules. Likewise, for nonhuman species, only a minor fraction of the known MHC molecules have been characterized. Here, we describe a tool, MHCcluster, to functionally cluster MHC molecules based on their predicted binding specificity. The method has a flexible web interface that allows the user to include any MHC of interest in the analysis. The output consists of a static heat map and graphical tree-based visualizations of the functional relationship between MHC variants and a dynamic TreeViewer interface where both the functional relationship and the individual binding specificities of MHC molecules are visualized. We demonstrate that conventional sequence-based clustering will fail to identify the functional relationship between molecules, when applied to MHC system, and only through the use of the predicted binding specificity can a correct clustering be found. Clustering of prevalent HLA-A and HLA-B alleles using MHCcluster confirms the presence of 12 major specificity groups (supertypes) some however with highly divergent specificities. Importantly, some HLA molecules are shown not to fit any supertype classification. Also, we use MHCcluster to show that chimpanzee MHC class I molecules have a reduced functional diversity compared to that of HLA class I molecules. MHCcluster is available at www.cbs.dtu.dk/services/MHCcluster-2.0. PMID:23775223

  17. MVClustViz: A Novice Yet Simple Multivariate Cluster Visualization Technique for Centroid-based Clusters

    OpenAIRE

    Sagar S. De; Minati Mishra; Satchidananda Dehuri

    2013-01-01

    In the visual data mining, visualization of clusters is a challenging task. Although lots of techniques already have been developed, the challenges still remain to represent large volume of data with multiple dimension and overlapped clusters. In this paper, a multivariate clusters visualization technique (MVClustViz) has been presented to visualize the centroid-based clusters. The geographic projection technique supports multi-dimension, large volume, and both crisp and fuzzy clusters visual...

  18. Core Business Selection Based on Ant Colony Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Yu Lan

    2014-01-01

    Full Text Available Core business is the most important business to the enterprise in diversified business. In this paper, we first introduce the definition and characteristics of the core business and then descript the ant colony clustering algorithm. In order to test the effectiveness of the proposed method, Tianjin Port Logistics Development Co., Ltd. is selected as the research object. Based on the current situation of the development of the company, the core business of the company can be acquired by ant colony clustering algorithm. Thus, the results indicate that the proposed method is an effective way to determine the core business for company.

  19. Criteria of off-diagonal long-range order in Bose and Fermi systems based on the Lee-Yang cluster expansion method

    Science.gov (United States)

    Sakumichi, Naoyuki; Kawakami, Norio; Ueda, Masahito

    2012-04-01

    The quantum-statistical cluster expansion method of Lee and Yang is extended to investigate off-diagonal long-range order (ODLRO) in one-component and multicomponent mixtures of bosons or fermions. Our formulation is applicable to both a uniform system and a trapped system without local-density approximation and allows systematic expansions of one-particle and multiparticle reduced density matrices in terms of cluster functions, which are defined for the same system with Boltzmann statistics. Each term in this expansion can be associated with a Lee-Yang graph. We elucidate a physical meaning of each Lee-Yang graph; in particular, for a mixture of ultracold atoms and bound dimers, an infinite sum of the ladder-type Lee-Yang 0-graphs is shown to lead to Bose-Einstein condensation of dimers below the critical temperature. In the case of Bose statistics, an infinite series of Lee-Yang 1-graphs is shown to converge and gives the criteria of ODLRO at the one-particle level. Applications to a dilute Bose system of hard spheres are also made. In the case of Fermi statistics, an infinite series of Lee-Yang 2-graphs is shown to converge and gives the criteria of ODLRO at the two-particle level. Applications to a two-component Fermi gas in the tightly bound limit are also made.

  20. Finding Within Cluster Dense Regions Using Distance Based Technique

    Directory of Open Access Journals (Sweden)

    Wesam Ashour

    2012-03-01

    Full Text Available One of the main categories in Data Clustering is density based clustering. Density based clustering techniques like DBSCAN are attractive because they can find arbitrary shaped clusters along with noisy outlier. The main weakness of the traditional density based algorithms like DBSCAN is clustering the different density level data sets. DBSCAN calculations done according to given parameters applied to all points in a data set, while densities of the data set clusters may be totally different. The proposed algorithm overcomes this weakness of the traditional density based algorithms. The algorithm starts with partitioning the data within a cluster to units based on a user parameter and compute the density for each unit separately. Consequently, the algorithm compares the results and merges neighboring units with closer approximate density values to become a new cluster. The experimental results of the simulation show that the proposed algorithm gives good results in finding clusters for different density cluster data set.

  1. A Data-Clustering Based Robust SIFT Feature Matching Method%一种基于数据聚类的鲁棒SIFT特征匹配方法

    Institute of Scientific and Technical Information of China (English)

    范志强; 赵沁平

    2012-01-01

    We present a data clustering method for robust SIFT matching. Our matching process contains an offline module to cluster features from a group of reference images and an online module to match them to the live images in order to enhance matching robustness. The main contribution lies in constructing a composite k-d data structure which can be used not only to cluster features but also to implement features matching. Then an optimal keyframe selection method is proposed using our composite k-d treef which can not only put the matching process forward but also give us a way to employ a cascading feature matching strategy to combine matching results of composite k-d tree and keyframe. Experimental results show that our method dramatically enhances matching robustness.%针对噪声敏感造成的SIFT特征匹配鲁棒性低问题,提出一种基于数据聚类的两阶段特征匹配方法.在满足特征匹配几何距离最邻近本质要求下扩展了k-d数据结构,使其不但能够完成算术平均化匹配特征离线聚类,而且能够实现第1阶段聚类特征在线匹配.在此基础上,给出一种概率最优投票策略选择关键图像进行第2阶段匹配,最后合并两阶段属于关键图像的所有匹配特征对.实验结果表明,对于大量存在重叠关系的图像集合,该方法能够有效减少重复特征数量,降低噪声信息对特征匹配的干扰,极大地提高特征匹配的鲁棒性.

  2. New clustered regularly interspaced short palindromic repeat locus spacer pair typing method based on the newly incorporated spacer for Salmonella enterica.

    Science.gov (United States)

    Li, Hao; Li, Peng; Xie, Jing; Yi, Shengjie; Yang, Chaojie; Wang, Jian; Sun, Jichao; Liu, Nan; Wang, Xu; Wu, Zhihao; Wang, Ligui; Hao, Rongzhang; Wang, Yong; Jia, Leili; Li, Kaiqin; Qiu, Shaofu; Song, Hongbin

    2014-08-01

    A clustered regularly interspaced short palindromic repeat (CRISPR) typing method has recently been developed and used for typing and subtyping of Salmonella spp., but it is complicated and labor intensive because it has to analyze all spacers in two CRISPR loci. Here, we developed a more convenient and efficient method, namely, CRISPR locus spacer pair typing (CLSPT), which only needs to analyze the two newly incorporated spacers adjoining the leader array in the two CRISPR loci. We analyzed a CRISPR array of 82 strains belonging to 21 Salmonella serovars isolated from humans in different areas of China by using this new method. We also retrieved the newly incorporated spacers in each CRISPR locus of 537 Salmonella isolates which have definite serotypes in the Pasteur Institute's CRISPR Database to evaluate this method. Our findings showed that this new CLSPT method presents a high level of consistency (kappa = 0.9872, Matthew's correlation coefficient = 0.9712) with the results of traditional serotyping, and thus, it can also be used to predict serotypes of Salmonella spp. Moreover, this new method has a considerable discriminatory power (discriminatory index [DI] = 0.8145), comparable to those of multilocus sequence typing (DI = 0.8088) and conventional CRISPR typing (DI = 0.8684). Because CLSPT only costs about $5 to $10 per isolate, it is a much cheaper and more attractive method for subtyping of Salmonella isolates. In conclusion, this new method will provide considerable advantages over other molecular subtyping methods, and it may become a valuable epidemiologic tool for the surveillance of Salmonella infections.

  3. Research of Evaluating Method for Subject Clusters and Industrial Clusters Cooperative Innovation Ability Based on Fuzzy Gray Degree%基于模糊灰度的学科集群和产业集群协同创新能力评价方法研究

    Institute of Scientific and Technical Information of China (English)

    殷春武

    2013-01-01

    Subject clusters and industrial clusters cooperative innovation is the priority among priorities for guaranteeing the sustainable development of regional economy. Based on analyzing the importance of the subject cluster and industrial cluster cooperative innovation ability,double cluster cooperative innovation ability evaluation index system is constructed,and u-sing OWA operator assembly the multiple weight determining methods the combination weights of evaluation index is ob-tained .Combining assessment scale of language scale and gray degree is proposed for evaluation.Final a double cluster co-operative innovation ability evaluation method is proposed based on fuzzy set and gray degree and full of double cluster syn-ergy innovation ability evaluation theory.%学科集群和产业集群的协同创新是保障区域经济可持续发展的重中之重。在充分分析学科集群和产业集群协同创新能力重要性的基础上,着重构造双集群协同创新能力评价指标体系,利用OWA算子集结多种权重确定方法实现评价指标的组合赋权,根据协同创新能力评价的不可定量性和不可知性提出利用语言标度与灰度相结合的评价标度进行评价,最后给出一种基于模糊灰度的双集群协同创新能力评价方法,充实了双集群协同创新能力评价理论体系。

  4. ONTOLOGY BASED DOCUMENT CLUSTERING USING MAPREDUCE

    Directory of Open Access Journals (Sweden)

    Abdelrahman Elsayed

    2015-05-01

    Full Text Available Nowadays, document clustering is considered as a data intensive task due to the dramatic, fast increase in the number of available documents. Nevertheless, the features that represent those documents are also too large. The most common method for representing documents is the vector space model, which represents document features as a bag of words and does not represent semantic relations between words. In this paper we introduce a distributed implementation for the bisecting k-means using MapReduce programming model. The aim behind our proposed implementation is to solve the problem of clustering intensive data documents. In addition, we propose integrating the WordNet ontology with bisecting k-means in order to utilize the semantic relations between words to enhance document clustering results. Our presented experimental results show that using lexical categories for nouns only enhances internal evaluation measures of document clustering; and decreases the documents features from thousands to tens features. Our experiments were conducted using Amazon Elastic MapReduce to deploy the Bisecting k-means algorithm.

  5. A clustering routing algorithm based on improved ant colony clustering for wireless sensor networks

    Science.gov (United States)

    Xiao, Xiaoli; Li, Yang

    Because of real wireless sensor network node distribution uniformity, this paper presents a clustering strategy based on the ant colony clustering algorithm (ACC-C). To reduce the energy consumption of the head near the base station and the whole network, The algorithm uses ant colony clustering on non-uniform clustering. The improve route optimal degree is presented to evaluate the performance of the chosen route. Simulation results show that, compared with other algorithms, like the LEACH algorithm and the improve particle cluster kind of clustering algorithm (PSC - C), the proposed approach is able to keep away from the node with less residual energy, which can improve the life of networks.

  6. 基于引力搜索核聚类算法的水电机组振动故障诊断%Vibration Fault Diagnosis of Hydroelectric Generating Unit Using Gravitational Search Based Kernel Clustering Method

    Institute of Scientific and Technical Information of China (English)

    李超顺; 周建中; 肖剑; 肖汉

    2013-01-01

    Kernel clustering is a kind of valid methods for vibration fault diagnosis of hydro-turbine generating unit (HGU). In order to solve the problem of evaluating clustering results and selecting parameter of kernel function, a novel gravitational search based kernel clustering (GSKC) was proposed. At first, the kernel clustering objective function was built based on kernel Xie-Beni clustering index, then the gravitational search method was introduced and applied to solve the objective function, while the clustering center and parameter of kernel function were encoded as optimization variables together; in this end the fault diagnosis model based on similarity was defined. UCI testing data sets were used to check the classification accuracy, and then CSKC was applied in fault diagnosis of HGU. Experimental results show that GSKC was more accurate in classification than traditional methods, meanwhile GSKC was able to cluster the fault samples of HGU effectively, and diagnosis different kinds of fault accurately.%核聚类是一类有效的水力发电机组振动故障诊断方法,为了解决核聚类有效性评价和核参数选择的问题,提出了一种引力搜索核聚类算法.首先建立以核Xie-Beni指标为目标的聚类模型;然后引入引力搜索框架,以聚类中心和核函数参数为优化变量,通过引力搜索求解核聚类模型;最后定义了基于核空间样本相似度的故障诊断模型.利用国际标准样本集对该方法进行分类测试,并将该方法应用于水电机组振动故障诊断.试验结果表明:与传统聚类方法相比,文中方法具有更高分类精度,且能对故障样本准确聚类并提取诊断模型参数,实现故障的准确诊断.

  7. Prioritizing the risk of plant pests by clustering methods; self-organising maps, k-means and hierarchical clustering

    Directory of Open Access Journals (Sweden)

    Susan Worner

    2013-09-01

    Full Text Available For greater preparedness, pest risk assessors are required to prioritise long lists of pest species with potential to establish and cause significant impact in an endangered area. Such prioritization is often qualitative, subjective, and sometimes biased, relying mostly on expert and stakeholder consultation. In recent years, cluster based analyses have been used to investigate regional pest species assemblages or pest profiles to indicate the risk of new organism establishment. Such an approach is based on the premise that the co-occurrence of well-known global invasive pest species in a region is not random, and that the pest species profile or assemblage integrates complex functional relationships that are difficult to tease apart. In other words, the assemblage can help identify and prioritise species that pose a threat in a target region. A computational intelligence method called a Kohonen self-organizing map (SOM, a type of artificial neural network, was the first clustering method applied to analyse assemblages of invasive pests. The SOM is a well known dimension reduction and visualization method especially useful for high dimensional data that more conventional clustering methods may not analyse suitably. Like all clustering algorithms, the SOM can give details of clusters that identify regions with similar pest assemblages, possible donor and recipient regions. More important, however SOM connection weights that result from the analysis can be used to rank the strength of association of each species within each regional assemblage. Species with high weights that are not already established in the target region are identified as high risk. However, the SOM analysis is only the first step in a process to assess risk to be used alongside or incorporated within other measures. Here we illustrate the application of SOM analyses in a range of contexts in invasive species risk assessment, and discuss other clustering methods such as k

  8. On Comparison of Clustering Methods for Pharmacoepidemiological Data.

    Science.gov (United States)

    Feuillet, Fanny; Bellanger, Lise; Hardouin, Jean-Benoit; Victorri-Vigneau, Caroline; Sébille, Véronique

    2015-01-01

    The high consumption of psychotropic drugs is a public health problem. Rigorous statistical methods are needed to identify consumption characteristics in post-marketing phase. Agglomerative hierarchical clustering (AHC) and latent class analysis (LCA) can both provide clusters of subjects with similar characteristics. The objective of this study was to compare these two methods in pharmacoepidemiology, on several criteria: number of clusters, concordance, interpretation, and stability over time. From a dataset on bromazepam consumption, the two methods present a good concordance. AHC is a very stable method and it provides homogeneous classes. LCA is an inferential approach and seems to allow identifying more accurately extreme deviant behavior. PMID:24905478

  9. Cluster size statistic and cluster mass statistic: two novel methods for identifying changes in functional connectivity between groups or conditions.

    Directory of Open Access Journals (Sweden)

    Alex Ing

    Full Text Available Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS and cluster mass statistic (CMS--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2. Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.

  10. A heuristic method for finding the optimal number of clusters with application in medical data.

    Science.gov (United States)

    Bayati, Hamidreza; Davoudi, Heydar; Fatemizadeh, Emad

    2008-01-01

    In this paper, a heuristic method for determining the optimal number of clusters is proposed. Four clustering algorithms, namely K-means, Growing Neural Gas, Simulated Annealing based technique, and Fuzzy C-means in conjunction with three well known cluster validity indices, namely Davies-Bouldin index, Calinski-Harabasz index, Maulik-Bandyopadhyay index, in addition to the proposed index are used. Our simulations evaluate capability of mentioned indices in some artificial and medical datasets. PMID:19163761

  11. Performance of analytical methods for overdispersed counts in cluster randomized trials: sample size, degree of clustering and imbalance.

    Science.gov (United States)

    Durán Pacheco, Gonzalo; Hattendorf, Jan; Colford, John M; Mäusezahl, Daniel; Smith, Thomas

    2009-10-30

    Many different methods have been proposed for the analysis of cluster randomized trials (CRTs) over the last 30 years. However, the evaluation of methods on overdispersed count data has been based mostly on the comparison of results using empiric data; i.e. when the true model parameters are not known. In this study, we assess via simulation the performance of five methods for the analysis of counts in situations similar to real community-intervention trials. We used the negative binomial distribution to simulate overdispersed counts of CRTs with two study arms, allowing the period of time under observation to vary among individuals. We assessed different sample sizes, degrees of clustering and degrees of cluster-size imbalance. The compared methods are: (i) the two-sample t-test of cluster-level rates, (ii) generalized estimating equations (GEE) with empirical covariance estimators, (iii) GEE with model-based covariance estimators, (iv) generalized linear mixed models (GLMM) and (v) Bayesian hierarchical models (Bayes-HM). Variation in sample size and clustering led to differences between the methods in terms of coverage, significance, power and random-effects estimation. GLMM and Bayes-HM performed better in general with Bayes-HM producing less dispersed results for random-effects estimates although upward biased when clustering was low. GEE showed higher power but anticonservative coverage and elevated type I error rates. Imbalance affected the overall performance of the cluster-level t-test and the GEE's coverage in small samples. Important effects arising from accounting for overdispersion are illustrated through the analysis of a community-intervention trial on Solar Water Disinfection in rural Bolivia. PMID:19672840

  12. A Method of Deep Web Clustering Based on SOM Neural Network%一种基于自组织映射神经网络的Deep Web聚类方法

    Institute of Scientific and Technical Information of China (English)

    吴凌云

    2012-01-01

    为提高Deepwleb数据源聚类的效率,降低人工参与度,提出了一种基于自组织映射网络SOM的DeepWeb接口聚类方法。该方法采用PRE.QUERY方式,使用接口表单的结构特征统计量作为输入。在UIUC数据集上测试后取得了预期的效果。%In order to improve the efficiency of Deep Web data sources clustering and reduce the manual work, this paper addressed a method of Deep Web interface clustering based on self-orgaalizing map neural network, which utilizes PREQUERY and takes the struetual statistic as inputs. After testing on UIUC datasets, this method gets an expected effect.

  13. Component Based Clustering in Wireless Sensor Networks

    CERN Document Server

    Amaxilatis, Dimitrios; Koninis, Christos; Pyrgelis, Apostolos

    2011-01-01

    Clustering is an important research topic for wireless sensor networks (WSNs). A large variety of approaches has been presented focusing on different performance metrics. Even though all of them have many practical applications, an extremely limited number of software implementations is available to the research community. Furthermore, these very few techniques are implemented for specific WSN systems or are integrated in complex applications. Thus it is very difficult to comparatively study their performance and almost impossible to reuse them in future applications under a different scope. In this work we study a large body of well established algorithms. We identify their main building blocks and propose a component-based architecture for developing clustering algorithms that (a) promotes exchangeability of algorithms thus enabling the fast prototyping of new approaches, (b) allows cross-layer implementations to realize complex applications, (c) offers a common platform to comparatively study the performan...

  14. Clustering Methods Application for Customer Segmentation to Manage Advertisement Campaign

    Directory of Open Access Journals (Sweden)

    Maciej Kutera

    2010-10-01

    Full Text Available Clustering methods are recently so advanced elaborated algorithms for large collection data analysis that they have been already included today to data mining methods. Clustering methods are nowadays larger and larger group of methods, very quickly evolving and having more and more various applications. In the article, our research concerning usefulness of clustering methods in customer segmentation to manage advertisement campaign is presented. We introduce results obtained by using four selected methods which have been chosen because their peculiarities suggested their applicability to our purposes. One of the analyzed method k-means clustering with random selected initial cluster seeds gave very good results in customer segmentation to manage advertisement campaign and these results were presented in details in the article. In contrast one of the methods (hierarchical average linkage was found useless in customer segmentation. Further investigations concerning benefits of clustering methods in customer segmentation to manage advertisement campaign is worth continuing, particularly that finding solutions in this field can give measurable profits for marketing activity.

  15. Gene expression data clustering using a multiobjective symmetry based clustering technique.

    Science.gov (United States)

    Saha, Sriparna; Ekbal, Asif; Gupta, Kshitija; Bandyopadhyay, Sanghamitra

    2013-11-01

    The invention of microarrays has rapidly changed the state of biological and biomedical research. Clustering algorithms play an important role in clustering microarray data sets where identifying groups of co-expressed genes are a very difficult task. Here we have posed the problem of clustering the microarray data as a multiobjective clustering problem. A new symmetry based fuzzy clustering technique is developed to solve this problem. The effectiveness of the proposed technique is demonstrated on five publicly available benchmark data sets. Results are compared with some widely used microarray clustering techniques. Statistical and biological significance tests have also been carried out. PMID:24209942

  16. An Effective Method of Producing Small Neutral Carbon Clusters

    Institute of Scientific and Technical Information of China (English)

    XIA Zhu-Hong; CHEN Cheng-Chu; HSU Yen-Chu

    2007-01-01

    An effective method of producing small neutral carbon clusters Cn (n = 1-6) is described. The small carbon clusters (positive or negative charge or neutral) are formed by plasma which are produced by a high power 532nm pulse laser ablating the surface of the metal Mn rod to react with small hydrocarbons supplied by a pulse valve, then the neutral carbon clusters are extracted and photo-ionized by another laser (266nm or 355nm) in the ionization region of a linear time-of-flight mass spectrometer. The distributions of the initial neutral carbon clusters are analysed with the ionic species appeared in mass spectra. It is observed that the yield of small carbon clusters with the present method is about 10 times than that of the traditional widely used technology of laser vaporization of graphite.

  17. Isochronal annealing of electron-irradiated dilute Fe alloys modelled by an ab initio based AKMC method: Influence of solute-interstitial cluster properties

    International Nuclear Information System (INIS)

    The evolution of the microstructure of dilute Fe alloys under irradiation has been modelled using a multiscale approach based on ab initio and atomistic kinetic Monte Carlo simulations. In these simulations, both self interstitials and vacancies, isolated or in clusters, are considered. Isochronal annealing after electron irradiation experiments have been simulated in pure Fe, Fe-Cu and Fe-Mn dilute alloys, focusing on recovery stages I and II. The parameters regarding the self interstitial - solute atom interactions are based on ab initio predictions and some of these interactions have been slightly adjusted, without modifying the interaction character, on isochronal annealing experimental data. The different recovery peaks are globally well reproduced. These simulations allow interpreting the different recovery peaks as well as the effect of varying solute concentration. For some peaks, these simulations have allowed to revisit and re-interpret the experimental data. In Fe-Cu, the trapping of self interstitials by Cu atoms allows experimental results to be reproduced, although no mixed dumbbells are formed, contrary to the former interpretations. Whereas, in Fe-Mn, the favorable formation of mixed dumbbell plays an important role in the Mn effect.

  18. AN IMPROVED TEACHING-LEARNING BASED OPTIMIZATION APPROACH FOR FUZZY CLUSTERING

    Directory of Open Access Journals (Sweden)

    Parastou Shahsamandi E.

    2014-11-01

    Full Text Available Fuzzy clustering has been widely studied and applied in a variety of key areas of science and engineering. In this paper the Improved Teaching Learning Based Optimization (ITLBO algorithm is used for data clustering, in which the objects in the same cluster are similar. This algorithm has been tested on several datasets and compared with some other popular algorithm in clustering. Results have been shown that the proposed method improves the output of clustering and can be efficiently used for fuzzy clustering.

  19. Parallel Density-Based Clustering for Discovery of Ionospheric Phenomena

    Science.gov (United States)

    Pankratius, V.; Gowanlock, M.; Blair, D. M.

    2015-12-01

    Ionospheric total electron content maps derived from global networks of dual-frequency GPS receivers can reveal a plethora of ionospheric features in real-time and are key to space weather studies and natural hazard monitoring. However, growing data volumes from expanding sensor networks are making manual exploratory studies challenging. As the community is heading towards Big Data ionospheric science, automation and Computer-Aided Discovery become indispensable tools for scientists. One problem of machine learning methods is that they require domain-specific adaptations in order to be effective and useful for scientists. Addressing this problem, our Computer-Aided Discovery approach allows scientists to express various physical models as well as perturbation ranges for parameters. The search space is explored through an automated system and parallel processing of batched workloads, which finds corresponding matches and similarities in empirical data. We discuss density-based clustering as a particular method we employ in this process. Specifically, we adapt Density-Based Spatial Clustering of Applications with Noise (DBSCAN). This algorithm groups geospatial data points based on density. Clusters of points can be of arbitrary shape, and the number of clusters is not predetermined by the algorithm; only two input parameters need to be specified: (1) a distance threshold, (2) a minimum number of points within that threshold. We discuss an implementation of DBSCAN for batched workloads that is amenable to parallelization on manycore architectures such as Intel's Xeon Phi accelerator with 60+ general-purpose cores. This manycore parallelization can cluster large volumes of ionospheric total electronic content data quickly. Potential applications for cluster detection include the visualization, tracing, and examination of traveling ionospheric disturbances or other propagating phenomena. Acknowledgments. We acknowledge support from NSF ACI-1442997 (PI V. Pankratius).

  20. Formation and evolution of MnNi clusters in neutron irradiated dilute Fe alloys modelled by a first principle-based AKMC method

    Energy Technology Data Exchange (ETDEWEB)

    Ngayam-Happy, R. [EDF-R and D, Departement Materiaux et Mecanique des Composants (MMC), Les Renardieres, F-77818 Moret sur Loing Cedex (France); Unite Materiaux et Transformations (UMET), UMR CNRS 8207, Universite de Lille 1, ENSCL, F-59655 Villeneuve d' Ascq Cedex (France); Laboratoire commun EDF-CNRS Etude et Modelisation des Microstructures pour le Vieillissement des Materiaux (EM2VM) (France); Becquart, C.S., E-mail: charlotte.becquart@univ-lille1.fr [Unite Materiaux et Transformations (UMET), UMR CNRS 8207, Universite de Lille 1, ENSCL, F-59655 Villeneuve d' Ascq Cedex (France); Laboratoire commun EDF-CNRS Etude et Modelisation des Microstructures pour le Vieillissement des Materiaux (EM2VM) (France); Domain, C. [EDF-R and D, Departement Materiaux et Mecanique des Composants (MMC), Les Renardieres, F-77818 Moret sur Loing Cedex (France); Unite Materiaux et Transformations (UMET), UMR CNRS 8207, Universite de Lille 1, ENSCL, F-59655 Villeneuve d' Ascq Cedex (France); Laboratoire commun EDF-CNRS Etude et Modelisation des Microstructures pour le Vieillissement des Materiaux (EM2VM) (France)

    2012-07-15

    An atomistic Monte Carlo model parameterised on electronic structure calculations data has been used to study the formation and evolution under irradiation of solute clusters in Fe-MnNi ternary and Fe-CuMnNi quaternary alloys. Two populations of solute rich clusters have been observed, which can be discriminated by whether or not the solute atoms are associated with self-interstitial clusters. Mn-Ni-rich clusters are observed at a very early stage of the irradiation in both modelled alloys, whereas the quaternary alloys contain also Cu-containing clusters. Mn-Ni-rich clusters nucleate very early via a self-interstitial-driven mechanism, earlier than Cu-rich clusters; the latter, however, which are likely to form via a vacancy-driven mechanism, grow in number much faster than the former, helped by the thermodynamic driving force to Cu precipitation in Fe, thereby becoming dominant in the low dose regime. The kinetics of the number density increase of the two populations is thus significantly different. Finally the main conclusion suggested by this work is that the so-called late blooming phases might as well be neither late, nor phases.

  1. Formation and evolution of MnNi clusters in neutron irradiated dilute Fe alloys modelled by a first principle-based AKMC method

    International Nuclear Information System (INIS)

    An atomistic Monte Carlo model parameterised on electronic structure calculations data has been used to study the formation and evolution under irradiation of solute clusters in Fe–MnNi ternary and Fe–CuMnNi quaternary alloys. Two populations of solute rich clusters have been observed, which can be discriminated by whether or not the solute atoms are associated with self-interstitial clusters. Mn–Ni-rich clusters are observed at a very early stage of the irradiation in both modelled alloys, whereas the quaternary alloys contain also Cu-containing clusters. Mn–Ni-rich clusters nucleate very early via a self-interstitial-driven mechanism, earlier than Cu-rich clusters; the latter, however, which are likely to form via a vacancy-driven mechanism, grow in number much faster than the former, helped by the thermodynamic driving force to Cu precipitation in Fe, thereby becoming dominant in the low dose regime. The kinetics of the number density increase of the two populations is thus significantly different. Finally the main conclusion suggested by this work is that the so-called late blooming phases might as well be neither late, nor phases.

  2. Relativistic extended coupled cluster method for magnetic hyperfine structure constant

    CERN Document Server

    Sasmal, Sudip; Nayak, Malaya K; Vaval, Nayana; Pal, Sourav

    2015-01-01

    This article deals with the general implementation of 4-component spinor relativistic extended coupled cluster (ECC) method to calculate first order property of atoms and molecules in their open-shell ground state configuration. The implemented relativistic ECC is employed to calculate hyperfine structure (HFS) constant of alkali metals (Li, Na, K, Rb and Cs), singly charged alkaline earth metal atoms (Be+, Mg+, Ca+ and Sr+) and molecules (BeH, MgF and CaH). We have compared our ECC results with the calculations based on restricted active space configuration interaction (RAS-CI) method. Our results are in better agreement with the available experimental values than those of the RAS-CI values.

  3. PARTIAL TRAINING METHOD FOR HEURISTIC ALGORITHM OF POSSIBLE CLUSTERIZATION UNDER UNKNOWN NUMBER OF CLASSES

    Directory of Open Access Journals (Sweden)

    D. A. Viattchenin

    2009-01-01

    Full Text Available A method for constructing a subset of labeled objects which is used in a heuristic algorithm of possible  clusterization with partial  training is proposed in the  paper.  The  method  is  based  on  data preprocessing by the heuristic algorithm of possible clusterization using a transitive closure of a fuzzy tolerance. Method efficiency is demonstrated by way of an illustrative example.

  4. Constructing storyboards based on hierarchical clustering analysis

    Science.gov (United States)

    Hasebe, Satoshi; Sami, Mustafa M.; Muramatsu, Shogo; Kikuchi, Hisakazu

    2005-07-01

    There are growing needs for quick preview of video contents for the purpose of improving accessibility of video archives as well as reducing network traffics. In this paper, a storyboard that contains a user-specified number of keyframes is produced from a given video sequence. It is based on hierarchical cluster analysis of feature vectors that are derived from wavelet coefficients of video frames. Consistent use of extracted feature vectors is the key to avoid a repetition of computationally-intensive parsing of the same video sequence. Experimental results suggest that a significant reduction in computational time is gained by this strategy.

  5. Genetic algorithm based two-mode clustering of metabolomics data

    NARCIS (Netherlands)

    Hageman, J.A.; Berg, R.A. van den; Westerhuis, J.A.; Werf, M.J. van der; Smilde, A.K.

    2008-01-01

    Metabolomics and other omics tools are generally characterized by large data sets with many variables obtained under different environmental conditions. Clustering methods and more specifically two-mode clustering methods are excellent tools for analyzing this type of data. Two-mode clustering metho

  6. Cluster variation method in the atomic ordering theory

    International Nuclear Information System (INIS)

    A brief review is presented of the history of the origin, generalization, and the application of one of modern methods for the examination of cooperative phenomena to the theory of atomic ordering. The method has been named ''cluster variation method''. Using a computer, mathematical difficulties have been overcome; and the interest to the cluster variation method has considerarably increased. The results are discussed, which have been obtained by the above method for binary alloys with a face-centered cubic lattice or with space-centered one. Considered is the theory of atomic ordering in ternary alloys according to the type of binary superstructures, L12 and L10. The cluster variation method is applicable to a new model of the alloy, too. The method allows the range of problems to be expanded, which are solved in the statistical theory of atomic ordering

  7. Knowledge based cluster ensemble for cancer discovery from biomolecular data.

    Science.gov (United States)

    Yu, Zhiwen; Wongb, Hau-San; You, Jane; Yang, Qinmin; Liao, Hongying

    2011-06-01

    The adoption of microarray techniques in biological and medical research provides a new way for cancer diagnosis and treatment. In order to perform successful diagnosis and treatment of cancer, discovering and classifying cancer types correctly is essential. Class discovery is one of the most important tasks in cancer classification using biomolecular data. Most of the existing works adopt single clustering algorithms to perform class discovery from biomolecular data. However, single clustering algorithms have limitations, which include a lack of robustness, stability, and accuracy. In this paper, we propose a new cluster ensemble approach called knowledge based cluster ensemble (KCE) which incorporates the prior knowledge of the data sets into the cluster ensemble framework. Specifically, KCE represents the prior knowledge of a data set in the form of pairwise constraints. Then, the spectral clustering algorithm (SC) is adopted to generate a set of clustering solutions. Next, KCE transforms pairwise constraints into confidence factors for these clustering solutions. After that, a consensus matrix is constructed by considering all the clustering solutions and their corresponding confidence factors. The final clustering result is obtained by partitioning the consensus matrix. Comparison with single clustering algorithms and conventional cluster ensemble approaches, knowledge based cluster ensemble approaches are more robust, stable and accurate. The experiments on cancer data sets show that: 1) KCE works well on these data sets; 2) KCE not only outperforms most of the state-of-the-art single clustering algorithms, but also outperforms most of the state-of-the-art cluster ensemble approaches.

  8. Clustering and training set selection methods for improving the accuracy of quantitative laser induced breakdown spectroscopy

    International Nuclear Information System (INIS)

    We investigated five clustering and training set selection methods to improve the accuracy of quantitative chemical analysis of geologic samples by laser induced breakdown spectroscopy (LIBS) using partial least squares (PLS) regression. The LIBS spectra were previously acquired for 195 rock slabs and 31 pressed powder geostandards under 7 Torr CO2 at a stand-off distance of 7 m at 17 mJ per pulse to simulate the operational conditions of the ChemCam LIBS instrument on the Mars Science Laboratory Curiosity rover. The clustering and training set selection methods, which do not require prior knowledge of the chemical composition of the test-set samples, are based on grouping similar spectra and selecting appropriate training spectra for the partial least squares (PLS2) model. These methods were: (1) hierarchical clustering of the full set of training spectra and selection of a subset for use in training; (2) k-means clustering of all spectra and generation of PLS2 models based on the training samples within each cluster; (3) iterative use of PLS2 to predict sample composition and k-means clustering of the predicted compositions to subdivide the groups of spectra; (4) soft independent modeling of class analogy (SIMCA) classification of spectra, and generation of PLS2 models based on the training samples within each class; (5) use of Bayesian information criteria (BIC) to determine an optimal number of clusters and generation of PLS2 models based on the training samples within each cluster. The iterative method and the k-means method using 5 clusters showed the best performance, improving the absolute quadrature root mean squared error (RMSE) by ∼ 3 wt.%. The statistical significance of these improvements was ∼ 85%. Our results show that although clustering methods can modestly improve results, a large and diverse training set is the most reliable way to improve the accuracy of quantitative LIBS. In particular, additional sulfate standards and specifically

  9. A Study of Sequence Clustering on Protein’s Primary Structure using a Statistical Method

    Directory of Open Access Journals (Sweden)

    Alina Bogan-Marta

    2006-07-01

    Full Text Available The clustering of biological sequences into biologically meaningful classesdenotes two computationally complex challenges: the choice of a biologically pertinent andcomputable criterion to evaluate the clusters homogenity, and the optimal exploration ofthe solution space. Here we are analysing the clustering potential of a new method ofsequence similarity based on statistical sequence content evaluation. Applying on the samedata the popular CLUSTAL W method for sequence similarity we contrasted the results.The analysis, computational efficiency and high accuracy of the results from the newmethod is encouraging for further development that could make it an appealing alternativeto the existent methods.

  10. Clustering-based redshift estimation: application to VIPERS/CFHTLS

    CERN Document Server

    Scottez, V; Granett, B R; Moutard, T; Kilbinger, M; Scodeggio, M; Garilli, B; Bolzonella, M; de la Torre, S; Guzzo, L; Abbas, U; Adami, C; Arnouts, S; Bottini, D; Branchini, E; Cappi, A; Cucciati, O; Davidzon, I; Fritz, A; Franzetti, P; Iovino, A; Krywult, J; Brun, V Le; Fèvre, O Le; Maccagni, D; Małek, K; Marulli, F; Polletta, M; Pollo, A; Tasca, L A M; Tojeiro, R; Vergani, D; Zanichelli, A; Bel, J; Coupon, J; De Lucia, G; Ilbert, O; McCracken, H J; Moscardini, L

    2016-01-01

    We explore the accuracy of the clustering-based redshift estimation proposed by M\\'enard et al. (2013) when applied to VIPERS and CFHTLS real data. This method enables us to reconstruct redshift distributions from measurement of the angular clus- tering of objects using a set of secure spectroscopic redshifts. We use state of the art spectroscopic measurements with iAB 0.5 which allows us to test the accuracy of the clustering-based red- shift distributions. We show that this method enables us to reproduce the true mean color-redshift relation when both populations have the same magnitude limit. We also show that this technique allows the inference of redshift distributions for a population fainter than the one of reference and we give an estimate of the color-redshift mapping in this case. This last point is of great interest for future large redshift surveys which suffer from the need of a complete faint spectroscopic sample.

  11. Mapping Soil Texture of a Plain Area Using Fuzzy-c-Means Clustering Method Based on Land Surface Diurnal Temperature Difference

    Institute of Scientific and Technical Information of China (English)

    WANG De-Cai; ZHANG Gan-Lin; PAN Xian-Zhang; ZHAO Yu-Guo; ZHAO Ming-Song; WANG Gai-Fen

    2012-01-01

    The use of landscape covariates to estimate soil properties is not suitable for the areas of low relief due to the high variability of soil properties in similar topographic and vegetation conditions.A new method was implemented to map regional soil texture (in terms of sand,silt and clay contents) by hypothesizing that the change in the land surface diurnal temperature difference (DTD) is related to soil texture in case of a relatively homogeneous rainfall input.To examine this hypothesis,the DTDs from moderate resolution imagine spectroradiometer (MODIS) during a selected time period,i.e.,after a heavy rainfall between autumn harvest and autumn sowing,were classified using fuzzy-c-means (FCM) clustering.Six classes were generated,and for each class,the sand (> 0.05 mm),silt (0.002-0.05 mm) and clay (< 0.002 mm) contents at the location of maximum membership value were considered as the typical values of that class.A weighted average model was then used to digitally map soil texture.The results showed that the predicted map quite accurately reflected the regional soil variation.A validation dataset produced estimates of error for the predicted maps of sand,silt and clay contents at root mean of squared error values of 8.4%,7.8% and 2.3%,respectively,which is satisfactory in a practical context.This study thus provided a methodology that can help improve the accuracy and efficiency of soil texture mapping in plain areas using easily available data sources.

  12. Time series clustering based on nonparametric multidimensional forecast densities

    OpenAIRE

    Vilar, José A.; Vilar, Juan M.

    2013-01-01

    A new time series clustering method based on comparing forecast densities for a sequence of $k>1$ consecutive horizons is proposed. The unknown $k$-dimensional forecast densities can be non-parametrically approximated by using bootstrap procedures that mimic the generating processes without parametric restrictions. However, the difficulty of constructing accurate kernel estimators of multivariate densities is well known. To circumvent the high dimensionality problem, the bootstrap prediction ...

  13. Web Search Result Clustering based on Cuckoo Search and Consensus Clustering

    OpenAIRE

    Alam, Mansaf; Sadaf, Kishwar

    2015-01-01

    Clustering of web search result document has emerged as a promising tool for improving retrieval performance of an Information Retrieval (IR) system. Search results often plagued by problems like synonymy, polysemy, high volume etc. Clustering other than resolving these problems also provides the user the easiness to locate his/her desired information. In this paper, a method, called WSRDC-CSCC, is introduced to cluster web search result using cuckoo search meta-heuristic method and Consensus...

  14. Cluster parallel rendering based on encoded mesh

    Institute of Scientific and Technical Information of China (English)

    QIN Ai-hong; XIONG Hua; PENG Hao-yu; LIU Zhen; SHI Jiao-ying

    2006-01-01

    Use of compressed mesh in parallel rendering architecture is still an unexplored area, the main challenge of which is to partition and sort the encoded mesh in compression-domain. This paper presents a mesh compression scheme PRMC (Parallel Rendering based Mesh Compression) supplying encoded meshes that can be partitioned and sorted in parallel rendering system even in encoded-domain. First, we segment the mesh into submeshes and clip the submeshes' boundary into Runs, and then piecewise compress the submeshes and Runs respectively. With the help of several auxiliary index tables, compressed submeshes and Runs can serve as rendering primitives in parallel rendering system. Based on PRMC, we design and implement a parallel rendering architecture. Compared with uncompressed representation, experimental results showed that PRMC meshes applied in cluster parallel rendering system can dramatically reduce the communication requirement.

  15. A NEW METHOD TO QUANTIFY X-RAY SUBSTRUCTURES IN CLUSTERS OF GALAXIES

    Energy Technology Data Exchange (ETDEWEB)

    Andrade-Santos, Felipe; Lima Neto, Gastao B.; Lagana, Tatiana F. [Departamento de Astronomia, Instituto de Astronomia, Geofisica e Ciencias Atmosfericas, Universidade de Sao Paulo, Geofisica e Ciencias Atmosfericas, Rua do Matao 1226, Cidade Universitaria, 05508-090 Sao Paulo, SP (Brazil)

    2012-02-20

    We present a new method to quantify substructures in clusters of galaxies, based on the analysis of the intensity of structures. This analysis is done in a residual image that is the result of the subtraction of a surface brightness model, obtained by fitting a two-dimensional analytical model ({beta}-model or Sersic profile) with elliptical symmetry, from the X-ray image. Our method is applied to 34 clusters observed by the Chandra Space Telescope that are in the redshift range z in [0.02, 0.2] and have a signal-to-noise ratio (S/N) greater than 100. We present the calibration of the method and the relations between the substructure level with physical quantities, such as the mass, X-ray luminosity, temperature, and cluster redshift. We use our method to separate the clusters in two sub-samples of high- and low-substructure levels. We conclude, using Monte Carlo simulations, that the method recuperates very well the true amount of substructure for small angular core radii clusters (with respect to the whole image size) and good S/N observations. We find no evidence of correlation between the substructure level and physical properties of the clusters such as gas temperature, X-ray luminosity, and redshift; however, analysis suggest a trend between the substructure level and cluster mass. The scaling relations for the two sub-samples (high- and low-substructure level clusters) are different (they present an offset, i.e., given a fixed mass or temperature, low-substructure clusters tend to be more X-ray luminous), which is an important result for cosmological tests using the mass-luminosity relation to obtain the cluster mass function, since they rely on the assumption that clusters do not present different scaling relations according to their dynamical state.

  16. Displacement of Building Cluster Using Field Analysis Method

    Institute of Scientific and Technical Information of China (English)

    Al Tinghua

    2003-01-01

    This paper presents a field based method to deal with the displacement of building cluster,which is driven by the street widening. The compress of street boundary results in the force to push the building moving inside and the force propagation is a decay process. To describe the phenomenon above, the field theory is introduced with the representation model of isoline. On the basis of the skeleton of Delaunay triangulation,the displacement field is built in which the propagation force is related to the adjacency degree with respect to the street boundary. The study offers the computation of displacement direction and offset distance for the building displacement. The vector operation is performed on the basis of grade and other field concepts.

  17. The properties of small Ag clusters bound to DNA bases

    Science.gov (United States)

    Soto-Verdugo, Víctor; Metiu, Horia; Gwinn, Elisabeth

    2010-05-01

    We study the binding of neutral silver clusters, Agn (n=1-6), to the DNA bases adenine (A), cytosine (C), guanine (G), and thymine (T) and the absorption spectra of the silver cluster-base complexes. Using density functional theory (DFT), we find that the clusters prefer to bind to the doubly bonded ring nitrogens and that binding to T is generally much weaker than to C, G, and A. Ag3 and Ag4 make the stronger bonds. Bader charge analysis indicates a mild electron transfer from the base to the clusters for all bases, except T. The donor bases (C, G, and A) bind to the sites on the cluster where the lowest unoccupied molecular orbital has a pronounced protrusion. The site where cluster binds to the base is controlled by the shape of the higher occupied states of the base. Time-dependent DFT calculations show that different base-cluster isomers may have very different absorption spectra. In particular, we find new excitations in base-cluster molecules, at energies well below those of the isolated components, and with strengths that depend strongly on the orientations of planar clusters with respect to the base planes. Our results suggest that geometric constraints on binding, imposed by designed DNA structures, may be a feasible route to engineering the selection of specific cluster-base assemblies.

  18. A statistical information-based clustering approach in distance space

    Institute of Scientific and Technical Information of China (English)

    YUE Shi-hong; LI Ping; GUO Ji-dong; ZHOU Shui-geng

    2005-01-01

    Clustering, as a powerful data mining technique for discovering interesting data distributions and patterns in the underlying database, is used in many fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Density-based Spatial Clustering of Applications with Noise (DBSCAN) (Ester et al., 1996) is a good performance clustering method for dealing with spatial data although it leaves many problems to be solved. For example,DBSCAN requires a necessary user-specified threshold while its computation is extremely time-consuming by current method such as OPTICS, etc. (Ankerst et al., 1999), and the performance of DBSCAN under different norms has yet to be examined. In this paper, we first developed a method based on statistical information of distance space in database to determine the necessary threshold. Then our examination of the DBSCAN performance under different norms showed that there was determinable relation between them. Finally, we used two artificial databases to verify the effectiveness and efficiency of the proposed methods.

  19. 基于智能聚类分析的产品典型工艺路线提取方法%Typical product process route extraction method based on intelligent clustering analysis

    Institute of Scientific and Technical Information of China (English)

    张辉; 裘乐淼; 张树有; 胡星星

    2013-01-01

    Aiming at the problems of enterprise process data and knowledge mining, a method of extracting product typical process routs based on intelligent clustering analysis was presented. A similarity factor between two process routs was established and a multi-level comprehensive measurement method for calculating the similarity between two process routs was proposed. Based on the similarity calculation, a process rout design structure matrix was constructed and the noise reduction processing was applied to the matrix data. To reduce the difficulty and complexity of clustering division, the particle swarm optimization was used to realize the intelligent clustering division of process rout design structure matrix. The typical process routs were extracted from the clustering clusters consequently. A mechanical press enterprise was taken as an example to extract typical process routs from the process data, and the effectiveness of proposed method was verified.%针对企业工艺数据与知识挖掘问题,提出应用智能聚类分析技术提取产品典型工艺路线的方法.构建了工艺路线的相似度度量因子,提出了对工艺路线进行相似度计算的多级相似度综合度量方法,在相似度计算基础上,构建了工艺路线设计结构矩阵,并对矩阵数据进行降噪处理;为降低聚类划分的难度和复杂性,运用粒子群优化算法实现了工艺路线设计结构矩阵的智能聚类划分,并从聚类簇中提取到典型工艺路线.以机械压力机企业工艺数据的典型工艺路线提取为例,验证了该方法的有效性.

  20. Motion estimation using point cluster method and Kalman filter.

    Science.gov (United States)

    Senesh, M; Wolf, A

    2009-05-01

    The most frequently used method in a three dimensional human gait analysis involves placing markers on the skin of the analyzed segment. This introduces a significant artifact, which strongly influences the bone position and orientation and joint kinematic estimates. In this study, we tested and evaluated the effect of adding a Kalman filter procedure to the previously reported point cluster technique (PCT) in the estimation of a rigid body motion. We demonstrated the procedures by motion analysis of a compound planar pendulum from indirect opto-electronic measurements of markers attached to an elastic appendage that is restrained to slide along the rigid body long axis. The elastic frequency is close to the pendulum frequency, as in the biomechanical problem, where the soft tissue frequency content is similar to the actual movement of the bones. Comparison of the real pendulum angle to that obtained by several estimation procedures--PCT, Kalman filter followed by PCT, and low pass filter followed by PCT--enables evaluation of the accuracy of the procedures. When comparing the maximal amplitude, no effect was noted by adding the Kalman filter; however, a closer look at the signal revealed that the estimated angle based only on the PCT method was very noisy with fluctuation, while the estimated angle based on the Kalman filter followed by the PCT was a smooth signal. It was also noted that the instantaneous frequencies obtained from the estimated angle based on the PCT method is more dispersed than those obtained from the estimated angle based on Kalman filter followed by the PCT method. Addition of a Kalman filter to the PCT method in the estimation procedure of rigid body motion results in a smoother signal that better represents the real motion, with less signal distortion than when using a digital low pass filter. Furthermore, it can be concluded that adding a Kalman filter to the PCT procedure substantially reduces the dispersion of the maximal and minimal

  1. Lightweight and Distributed Connectivity-Based Clustering Derived from Schelling's Model

    Science.gov (United States)

    Tsugawa, Sho; Ohsaki, Hiroyuki; Imase, Makoto

    In the literature, two connectivity-based distributed clustering schemes exist: CDC (Connectivity-based Distributed node Clustering scheme) and SDC (SCM-based Distributed Clustering). While CDC and SDC have mechanisms for maintaining clusters against nodes joining and leaving, neither method assumes that frequent changes occur in the network topology. In this paper, we propose a lightweight distributed clustering method that we term SBDC (Schelling-Based Distributed Clustering) since this scheme is derived from Schelling's model — a popular segregation model in sociology. We evaluate the effectiveness of the proposed SBDC in an environment where frequent changes arise in the network topology. Our simulation results show that SBDC outperforms CDC and SDC under frequent changes in network topology caused by high node mobility.

  2. Method for detecting clusters of possible uranium deposits

    International Nuclear Information System (INIS)

    When a two-dimensional map contains points that appear to be scattered somewhat at random, a question that often arises is whether groups of points that appear to cluster are merely exhibiting ordinary behavior, which one can expect with any random distribution of points, or whether the clusters are too pronounced to be attributable to chance alone. A method for detecting clusters along a straight line is applied to the two-dimensional map of 214Bi anomalies observed as part of the National Uranium Resource Evaluation Program in the Lubbock, Texas, region. Some exact probabilities associated with this method are computed and compared with two approximate methods. The two methods for approximating probabilities work well in the cases examined and can be used when it is not feasible to obtain the exact probabilities

  3. An optical imaging method for studying the spatial distribution of argon clusters

    International Nuclear Information System (INIS)

    An optical imaging method based on Rayleigh scattering is introduced to study the spatial distribution of atomic argon clusters produced in a gas jet. The radial distribution and evolution of the clusters are captured directly by a high speed camera, resulting in greatly increased precision and accuracy. It is found that the radial distribution of the clusters follows a Gaussian curve rather than the double-humped curve observed in a previous experiment. The normalized radial and axial distributions of the clusters are not influenced by the stagnation pressure and may be strictly determined by the nozzle structure. The average cluster sizes decrease slightly at far axial distances. A method of estimating the half-angle of the nozzle is also presented

  4. HCTE: Hierarchical Clustering based routing algorithm with applying the Two cluster heads in each cluster for Energy balancing in WSN

    Directory of Open Access Journals (Sweden)

    Nasrin Azizi

    2012-01-01

    Full Text Available In wireless sensor networks, the energy constraint is one of the most important restrictions. With considering this issue, the energy balancing is essential for prolonging the network lifetime. Hence this problem has been considered as a main challenge in the research of scientific communities. In the recent papers many clustering based routing algorithms have been proposed to prolong the network lifetime in wireless sensor networks. But many of them not consider the energy balancing among nodes. In this work we propose the new clustering based routing protocol namely HCTE that cluster head selection mechanism in it is done in two separate stages. So there will be two cluster head in a cluster. The routing algorithm used in proposed protocol is multi hop. Simulation Results show that the HCTE prolongs the network lifetime about 35% compared to the LEACH.

  5. An Improved Fuzzy c-Means Clustering Algorithm Based on Shadowed Sets and PSO

    Directory of Open Access Journals (Sweden)

    Jian Zhang

    2014-01-01

    Full Text Available To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM based on particle swarm optimization (PSO and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect.

  6. An incremental clustering algorithm based on Mahalanobis distance

    Science.gov (United States)

    Aik, Lim Eng; Choon, Tan Wee

    2014-12-01

    Classical fuzzy c-means clustering algorithm is insufficient to cluster non-spherical or elliptical distributed datasets. The paper replaces classical fuzzy c-means clustering euclidean distance with Mahalanobis distance. It applies Mahalanobis distance to incremental learning for its merits. A Mahalanobis distance based fuzzy incremental clustering learning algorithm is proposed. Experimental results show the algorithm is an effective remedy for the defect in fuzzy c-means algorithm but also increase training accuracy.

  7. A Method of Network Traffic Identification Based on Improved Clustering Algorithms%基于改进分簇算法的网络流量识别方法

    Institute of Scientific and Technical Information of China (English)

    王宇科; 黎文伟; 苏欣

    2011-01-01

    The automatic detection of applications associated with network traffic is very important for network security and traffic management. Unfortunately, because of some of the applications like P2P, VOIP applications using dynamic port numbers, masquerading techniques, and encryption, it is difficult using simple port-based analysis to classify packet payloads in order to identify these applica tions. And many research works have proposed using the clustering algorithms to identify network traf fic, but these algorithms have some defects in how to choose the cluster center and the number of clus ters. In this paper, we first use the Weighting D2 algorithm to improve the selection of the initialized cluster centers, and use the value of NMK Normalize Mutual Information) to ascertain the number of clusters, and then get an improved clustering algorithm, and finally propose a application level identifi cation method based on this algorithm. The experimental results show that this method reaches 90% ac curacy or more, and gets lower False Positive Rate and False Rejection Rate.%网络流量相关应用的自动检测对于网络安全和流量管理来说非常重要.但是,由于Peer-to-Peer(P2P)、VOIP等网络新应用使用动态端口、伪装和加密流等技术,使得基于端口匹配和数据包特征字段分析等识别方法在识别这些应用时存在一定的难度.不少研究工作提出了分簇算法进行流量识别,但现有的分簇算法在簇中心和簇数目的选择上存在一定缺陷.本文首先使用基于Weighting D2算法对初始化簇中心选择进行改进,通过NMI值来确定簇的数目,得到改进的分簇算法,并提出一种基于该算法的应用层流量识别方法.对于应用层流量,尤其是P2P应用识别实验结果表明,该方法能达到90%以上的识别率以及较低的误识别率和漏识别率.

  8. A New Method For Galaxy Cluster Detection; 1, The Algorithm

    CERN Document Server

    Gladders, M D; Gladders, Michael D.

    2000-01-01

    Numerous methods for finding clusters at moderate to high redshifts have been proposed in recent years, at wavelengths ranging from radio to X-rays. In this paper we describe a new method for detecting clusters in two-band optical/near-IR imaging data. The method relies upon the observation that all rich clusters, at all redshifts observed so far, appear to have a red sequence of early-type galaxies. The emerging picture is that all rich clusters contain a core population of passively evolving elliptical galaxies which are coeval and formed at high redshifts. The proposed search method exploits this strong empirical fact by using the red sequence as a direct indicator of overdensity. The fundamental advantage of this approach is that with appropriate filters, cluster elliptical galaxies at a given redshift are redder than all normal galaxies at lower redshifts. A simple color cut thus virtually eliminates all foreground contamination, even at significant redshifts. In this paper, one of a series of two, we de...

  9. Methods for analyzing cost effectiveness data from cluster randomized trials

    Directory of Open Access Journals (Sweden)

    Clark Allan

    2007-09-01

    Full Text Available Abstract Background Measurement of individuals' costs and outcomes in randomized trials allows uncertainty about cost effectiveness to be quantified. Uncertainty is expressed as probabilities that an intervention is cost effective, and confidence intervals of incremental cost effectiveness ratios. Randomizing clusters instead of individuals tends to increase uncertainty but such data are often analysed incorrectly in published studies. Methods We used data from a cluster randomized trial to demonstrate five appropriate analytic methods: 1 joint modeling of costs and effects with two-stage non-parametric bootstrap sampling of clusters then individuals, 2 joint modeling of costs and effects with Bayesian hierarchical models and 3 linear regression of net benefits at different willingness to pay levels using a least squares regression with Huber-White robust adjustment of errors, b a least squares hierarchical model and c a Bayesian hierarchical model. Results All five methods produced similar results, with greater uncertainty than if cluster randomization was not accounted for. Conclusion Cost effectiveness analyses alongside cluster randomized trials need to account for study design. Several theoretically coherent methods can be implemented with common statistical software.

  10. Performance Improvement of Cache Management In Cluster Based MANET

    Directory of Open Access Journals (Sweden)

    Abdulaziz Zam

    2013-08-01

    Full Text Available Caching is one of the most effective techniques used to improve the data access performance in wireless networks. Accessing data from a remote server imposes high latency and power consumption through forwarding nodes that guide the requests to the server and send data back to the clients. In addition, accessing data may be unreliable or even impossible due to erroneous wireless links and frequently disconnections. Due to the nature of MANET and its high frequent topology changes, and also small cache size and constrained power supply in mobile nodes, the management of the cache would be a challenge. To maintain the MANET’s stability and scalability, clustering is considered as an effective approach. In this paper an efficient cache management method is proposed for the Cluster Based Mobile Ad-hoc NETwork (C-B-MANET. The performance of the method is evaluated in terms of packet delivery ratio, latency and overhead metrics.

  11. Fast optimization of binary clusters using a novel dynamic lattice searching method.

    Science.gov (United States)

    Wu, Xia; Cheng, Wen

    2014-09-28

    Global optimization of binary clusters has been a difficult task despite of much effort and many efficient methods. Directing toward two types of elements (i.e., homotop problem) in binary clusters, two classes of virtual dynamic lattices are constructed and a modified dynamic lattice searching (DLS) method, i.e., binary DLS (BDLS) method, is developed. However, it was found that the BDLS can only be utilized for the optimization of binary clusters with small sizes because homotop problem is hard to be solved without atomic exchange operation. Therefore, the iterated local search (ILS) method is adopted to solve homotop problem and an efficient method based on the BDLS method and ILS, named as BDLS-ILS, is presented for global optimization of binary clusters. In order to assess the efficiency of the proposed method, binary Lennard-Jones clusters with up to 100 atoms are investigated. Results show that the method is proved to be efficient. Furthermore, the BDLS-ILS method is also adopted to study the geometrical structures of (AuPd)79 clusters with DFT-fit parameters of Gupta potential.

  12. Internet Forensics Framework Based-on Clustering

    Directory of Open Access Journals (Sweden)

    Imam Riadi

    2013-01-01

    Full Text Available Internet network attacks are complicated and worth studying. The attacks include Denial of Service (DoS. DoS attacks that exploit vulnerabilities found in operating systems, network services and applications. Indicators of DoS attacks, is when legitimate users cannot access the system. This paper proposes a framework for Internet based forensic logs that aims to assist in the investigation process to reveal DoS attacks. The framework in this study consists of several steps, among others : logging into the text file and database as well as identifying an attack based on the packet header length. After the identification process, logs are grouped using k-means clustering algorithm into three levels of attack (dangerous, rather dangerous and not dangerous based on port numbers and tcpflags of the package. Based on the test results the proposed framework can be grouped into three level attacks and found the attacker with a success rate of 89,02%, so, it can be concluded that the proposed framework can meet the goals set in this research.

  13. Brain Tumor Extraction from T1- Weighted MRI using Co-clustering and Level Set Methods

    Directory of Open Access Journals (Sweden)

    S.Satheesh

    2013-04-01

    Full Text Available The aim of the paper is to propose effective technique for tumor extraction from T1-weighted magnetic resonance brain images with combination of co-clustering and level set methods. The co-clustering is the effective region based segmentation technique for the brain tumor extraction but have a drawback at the boundary of tumors. While, the level set without re-initialization which is good edge based segmentation technique but have some drawbacks in providing initial contour. Therefore, in this paper the region based co-clustering and edge-based level set method are combined through initially extracting tumor using co-clustering and then providing the initial contour to level set method, which help in cancelling the drawbacks of co-clustering and level set method. The data set of five patients, where one slice is selected from each data set is used to analyze the performance of the proposed method. The quality metrics analysis of the proposed method is proved much better as compared to level set without re-initialization method.

  14. A Satellite Beam Planning Method Based on Clustering%一种基于聚类的卫星波束规划方法

    Institute of Scientific and Technical Information of China (English)

    郝英川

    2014-01-01

    针对卫星通信系统点波束的可移动特点,为了提高卫星资源的利用率,将聚类算法引入到波束规划中。通过对地面节点的业务区域统计,动态调整波束的覆盖规划,保证了卫星资源的利用效率及系统的通信容量和服务质量。对算法进行了典型场景的仿真,验证了算法的可行性和高效性。%Satellite communication systems with mobile spot beams are able to adjust beam direction to cover area on earth according to the distribution of clients and service.To improve the efficiency of beams,clustering theory is introduced to deal with this problem.The area to be covered is first clustered into several candidate clusters according to the statistics of client distribution and their throughput requirement,and then the beam assignment is processed by associating candidate areas with specific beams.The efficiency of satellite resource,as well as throughput and QoS of system,lies directly on the assignment of beams.Its feasibility and efficiency are veri-fied by simulations.

  15. Report of a Workshop on Parallelization of Coupled Cluster Methods

    Energy Technology Data Exchange (ETDEWEB)

    Rodney J. Bartlett Erik Deumens

    2008-05-08

    The benchmark, ab initio quantum mechanical methods for molecular structure and spectra are now recognized to be coupled-cluster theory. To benefit from the transiiton to tera- and petascale computers, such coupled-cluster methods must be created to run in a scalable fashion. This Workshop, held as a aprt of the 48th annual Sanibel meeting, at St. Simns, Island, GA, addressed that issue. Representatives of all the principal scientific groups who are addressing this topic were in attendance, to exchange information about the problem and to identify what needs to be done in the future. This report summarized the conclusions of the workshop.

  16. The coupled cluster method in hamiltonian lattice field theory

    CERN Document Server

    Schütte, D R; Hamer, C J; Weihong, Zheng

    1997-01-01

    The coupled cluster or exp S form of the eigenvalue problem for lattice Hamiltonian QCD (without quarks) is investigated. A new construction prescription is given for the calculation of the relevant coupled cluster matrix elements with respect to an orthogonal and independent loop space basis. The method avoids the explicit introduction of gauge group coupling coefficients by mapping the eigenvalue problem onto a suitable set of character functions, which allows a simplified procedure. Using appropriate group theoretical methods, we show that it is possible to set up the eigenvalue problem for eigenstates having arbitrary lattice momentum and lattice angular momentum.

  17. Improved method for the feature extraction of laser scanner using genetic clustering

    Institute of Scientific and Technical Information of China (English)

    Yu Jinxia; Cai Zixing; Duan Zhuohua

    2008-01-01

    Feature extraction of range images provided by ranging sensor is a key issue of pattern recognition. To automatically extract the environmental feature sensed by a 2D ranging sensor laser scanner, an improved method based on genetic clustering VGA-clustering is presented. By integrating the spatial neighbouring information of range data into fuzzy clustering algorithm, a weighted fuzzy clustering algorithm (WFCA) instead of standard clustering algorithm is introduced to realize feature extraction of laser scanner. Aimed at the unknown clustering number in advance, several validation index functions are used to estimate the validity of different clustering al-gorithms and one validation index is selected as the fitness function of genetic algorithm so as to determine the accurate clustering number automatically. At the same time, an improved genetic algorithm IVGA on the basis of VGA is proposed to solve the local optimum of clustering algorithm, which is implemented by increasing the population diversity and improving the genetic operators of elitist rule to enhance the local search capacity and to quicken the convergence speed. By the comparison with other algorithms, the effectiveness of the algorithm introduced is demonstrated.

  18. Dynamic access clustering selecting mechanism based on Markov decision process for MANET

    Institute of Scientific and Technical Information of China (English)

    WANG Dao-yuan; TIAN Hui

    2007-01-01

    Clustering is an important method in the mobile Ad-hoc network (MANET). As a result of their mobility, the cluster selection is inevitable for the mobile nodes during their roaming between the different clusters. In this study, based on the analysis of the cluster-selecting problem in the environment containing multiple clusters, which are overlaying and intercrossing, a novel dynamic selecting mechanism is proposed to resolve the dynamic selection optimization of roaming between the different clusters in MANET. This selecting mechanism is also based on the consideration of the stability of communication system, the communicating bandwidth, and the effect of cluster selecting on the communication and also in accordance with the Markov decision-making model.

  19. Cluster-in-molecule local correlation method for large systems

    Institute of Scientific and Technical Information of China (English)

    LI Wei; LI ShuHua

    2014-01-01

    A linear scaling local correlation method,cluster-in-molecule(CIM)method,was developed in the last decade for large systems.The basic idea of the CIM method is that the electron correlation energy of a large system,within the M ller-Plesset perturbation theory(MP)or coupled cluster(CC)theory,can be approximately obtained from solving the corresponding MP or CC equations of various clusters.Each of such clusters consists of a subset of localized molecular orbitals(LMOs)of the target system,and can be treated independently at various theory levels.In the present article,the main idea of the CIM method is reviewed,followed by brief descriptions of some recent developments,including its multilevel extension and different ways of constructing clusters.Then,some applications for large systems are illustrated.The CIM method is shown to be an efficient and reliable method for electron correlation calculations of large systems,including biomolecules and supramolecular complexes.

  20. Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters. I. Statistical and Computational Methods

    Science.gov (United States)

    Stenning, D. C.; Wagner-Kaiser, R.; Robinson, E.; van Dyk, D. A.; von Hippel, T.; Sarajedini, A.; Stein, N.

    2016-07-01

    We develop a Bayesian model for globular clusters composed of multiple stellar populations, extending earlier statistical models for open clusters composed of simple (single) stellar populations. Specifically, we model globular clusters with two populations that differ in helium abundance. Our model assumes a hierarchical structuring of the parameters in which physical properties—age, metallicity, helium abundance, distance, absorption, and initial mass—are common to (i) the cluster as a whole or to (ii) individual populations within a cluster, or are unique to (iii) individual stars. An adaptive Markov chain Monte Carlo (MCMC) algorithm is devised for model fitting that greatly improves convergence relative to its precursor non-adaptive MCMC algorithm. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We use numerical studies to demonstrate that our method can recover parameters of two-population clusters, and also show how model misspecification can potentially be identified. As a proof of concept, we analyze the two stellar populations of globular cluster NGC 5272 using our model and methods. (BASE-9 is available from GitHub: https://github.com/argiopetech/base/releases).

  1. Genetic association mapping via evolution-based clustering of haplotypes.

    Directory of Open Access Journals (Sweden)

    Ioanna Tachmazidou

    2007-07-01

    Full Text Available Multilocus analysis of single nucleotide polymorphism haplotypes is a promising approach to dissecting the genetic basis of complex diseases. We propose a coalescent-based model for association mapping that potentially increases the power to detect disease-susceptibility variants in genetic association studies. The approach uses Bayesian partition modelling to cluster haplotypes with similar disease risks by exploiting evolutionary information. We focus on candidate gene regions with densely spaced markers and model chromosomal segments in high linkage disequilibrium therein assuming a perfect phylogeny. To make this assumption more realistic, we split the chromosomal region of interest into sub-regions or windows of high linkage disequilibrium. The haplotype space is then partitioned into disjoint clusters, within which the phenotype-haplotype association is assumed to be the same. For example, in case-control studies, we expect chromosomal segments bearing the causal variant on a common ancestral background to be more frequent among cases than controls, giving rise to two separate haplotype clusters. The novelty of our approach arises from the fact that the distance used for clustering haplotypes has an evolutionary interpretation, as haplotypes are clustered according to the time to their most recent common ancestor. Our approach is fully Bayesian and we develop a Markov Chain Monte Carlo algorithm to sample efficiently over the space of possible partitions. We compare the proposed approach to both single-marker analyses and recently proposed multi-marker methods and show that the Bayesian partition modelling performs similarly in localizing the causal allele while yielding lower false-positive rates. Also, the method is computationally quicker than other multi-marker approaches. We present an application to real genotype data from the CYP2D6 gene region, which has a confirmed role in drug metabolism, where we succeed in mapping the location

  2. BioCluster:Tool for Identification and Clustering of Enterobacteriaceae Based on Biochemical Data

    Institute of Scientific and Technical Information of China (English)

    Ahmed Abdullah; S.M.Sabbir Alam; Munawar Sultana; M.Anwar Hossain

    2015-01-01

    Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochem-ical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is labor-intensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and sim-ilarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface (GUI) tool named BioCluster. This tool was designed for automated clustering and iden-tification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering (HC) and the Improved Hierarchical Clustering (IHC), a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1–47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clus-tering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that BioCluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at http://microbialgen.du.ac.bd/biocluster/.

  3. A two-stage method for microcalcification cluster segmentation in mammography by deformable models

    Energy Technology Data Exchange (ETDEWEB)

    Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.; Karahaliou, A.; Costaridou, L., E-mail: costarid@upatras.gr [Department of Medical Physics, School of Medicine, University of Patras, Patras 26504 (Greece); Vassiou, K. [Department of Anatomy, School of Medicine, University of Thessaly, Larissa 41500 (Greece)

    2015-10-15

    Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods are applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST{sub cluster}, average of minimum distance—AMINDIST{sub cluster}) and the area overlap measure (AOM{sub cluster}). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing

  4. Green Clustering Implementation Based on DPS-MOPSO

    Directory of Open Access Journals (Sweden)

    Yang Lu

    2014-01-01

    Full Text Available A green clustering implementation is proposed to be as the first method in the framework of an energy-efficient strategy for centralized enterprise high-density WLANs. Traditionally, to maintain the network coverage, all of the APs within the WLAN have to be powered on. Nevertheless, the new algorithm can power off a large proportion of APs while the coverage is maintained as the always-on counterpart. The proposed algorithm is composed of two parallel and concurrent procedures, which are the faster procedure based on K-means and the more accurate procedure based on Dynamic Population Size Multiple Objective Particle Swarm Optimization (DPS-MOPSO. To implement green clustering efficiently and accurately, dynamic population size and mutational operators are introduced as complements for the classical MOPSO. In addition to the function of AP selection, the new green clustering algorithm has another new function as the reference and guidance for AP deployment. This paper also presents simulations in scenarios modeled with ray-tracing method and FDTD technique, and the results show that about 67% up to 90% of energy consumption can be saved while the original network coverage is maintained during periods when few users are online or when the traffic load is low.

  5. Covariance analysis of differential drag-based satellite cluster flight

    Science.gov (United States)

    Ben-Yaacov, Ohad; Ivantsov, Anatoly; Gurfil, Pini

    2016-06-01

    One possibility for satellite cluster flight is to control relative distances using differential drag. The idea is to increase or decrease the drag acceleration on each satellite by changing its attitude, and use the resulting small differential acceleration as a controller. The most significant advantage of the differential drag concept is that it enables cluster flight without consuming fuel. However, any drag-based control algorithm must cope with significant aerodynamical and mechanical uncertainties. The goal of the current paper is to develop a method for examination of the differential drag-based cluster flight performance in the presence of noise and uncertainties. In particular, the differential drag control law is examined under measurement noise, drag uncertainties, and initial condition-related uncertainties. The method used for uncertainty quantification is the Linear Covariance Analysis, which enables us to propagate the augmented state and filter covariance without propagating the state itself. Validation using a Monte-Carlo simulation is provided. The results show that all uncertainties have relatively small effect on the inter-satellite distance, even in the long term, which validates the robustness of the used differential drag controller.

  6. Clustering Analysis on E-commerce Transaction Based on K-means Clustering

    Directory of Open Access Journals (Sweden)

    Xuan HUANG

    2014-02-01

    Full Text Available Based on the density, increment and grid etc, shortcomings like the bad elasticity, weak handling ability of high-dimensional data, sensitive to time sequence of data, bad independence of parameters and weak handling ability of noise are usually existed in clustering algorithm when facing a large number of high-dimensional transaction data. Making experiments by sampling data samples of the 300 mobile phones of Taobao, the following conclusions can be obtained: compared with Single-pass clustering algorithm, the K-means clustering algorithm has a high intra-class dissimilarity and inter-class similarity when analyzing e-commerce transaction. In addition, the K-means clustering algorithm has very high efficiency and strong elasticity when dealing with a large number of data items. However, clustering effects of this algorithm are affected by clustering number and initial positions of clustering center. Therefore, it is easy to show the local optimization for clustering results. Therefore, how to determine clustering number and initial positions of the clustering center of this algorithm is still the important job to be researched in the future.

  7. A COMPARATIVE STUDY TO FIND A SUITABLE METHOD FOR TEXT DOCUMENT CLUSTERING

    Directory of Open Access Journals (Sweden)

    Dr.M.Punithavalli

    2012-01-01

    Full Text Available Text mining is used in various text related tasks such as information extraction, concept/entity extraction,document summarization, entity relation modeling (i.e., learning relations between named entities,categorization/classification and clustering. This paper focuses on document clustering, a field of textmining, which groups a set of documents into a list of meaningful categories. The main focus of thispaper is to present a performance analysis of various techniques available for document clustering. Theresults of this comparative study can be used to improve existing text data mining frameworks andimprove the way of knowledge discovery. This paper considers six clustering techniques for documentclustering. The techniques are grouped into three groups namely Group 1 - K-means and its variants(traditional K-means and K* Means algorithms, Group 2 - Expectation Maximization and its variants(traditional EM, Spherical Gaussian EM algorithm and Linear Partitioning and Reallocation clustering(LPR using EM algorithms, Group 3 - Semantic-based techniques (Hybrid method and Feature-basedalgorithms. A total of seven algorithms are considered and were selected based on their popularity inthe text mining field. Several experiments were conducted to analyze the performance of the algorithmand to select the winner in terms of cluster purity, clustering accuracy and speed of clustering.

  8. Exemplar-Based Clustering via Simulated Annealing

    Science.gov (United States)

    Brusco, Michael J.; Kohn, Hans-Friedrich

    2009-01-01

    Several authors have touted the p-median model as a plausible alternative to within-cluster sums of squares (i.e., K-means) partitioning. Purported advantages of the p-median model include the provision of "exemplars" as cluster centers, robustness with respect to outliers, and the accommodation of a diverse range of similarity data. We developed…

  9. Clustering in mobile ad hoc network based on neural network

    Institute of Scientific and Technical Information of China (English)

    CHEN Ai-bin; CAI Zi-xing; HU De-wen

    2006-01-01

    An on-demand distributed clustering algorithm based on neural network was proposed. The system parameters and the combined weight for each node were computed, and cluster-heads were chosen using the weighted clustering algorithm, then a training set was created and a neural network was trained. In this algorithm, several system parameters were taken into account, such as the ideal node-degree, the transmission power, the mobility and the battery power of the nodes. The algorithm can be used directly to test whether a node is a cluster-head or not. Moreover, the clusters recreation can be speeded up.

  10. Research and Implementation of Unsupervised Clustering-Based Intrusion Detection

    Institute of Scientific and Technical Information of China (English)

    Luo Min; Zhang Huan-guo; Wang Li-na

    2003-01-01

    An unsupervised clustering-based intrusion de tection algorithm is discussed in this paper. The basic idea of the algorithm is to produce the cluster by comparing the distances of unlabeled training data sets. With the classified data instances, anomaly data clusters can be easily identified by normal cluster ratio and the identified cluster can be used in real data detection. The benefit of the algorithm is that it doesnt need labeled training data sets. The experiment concludes that this approach can detect unknown intrusions efficiently in the real network connections via using the data sets of KDD99.

  11. New Method of Community Structure Detecting Based on Fuzzy c-means Clustering%基于模糊c均值聚类的社团结构探测新方法

    Institute of Scientific and Technical Information of China (English)

    陈海阳; 周长银

    2012-01-01

    According to the definition of community structure,a new community structure detecting method was given based on fuzzy c-means clustering algorithm.The shortest path length between the pionts of the network,Person correlation coefficient and square method were used to construct the relation matrix of the points.The question of community structure detecting was turned into the question of points clustering.Using the fuzzy c-means clustering algorithm and the modularity of network,the best community structure is confirmed.At last,the algorithm was validated by the two network data named Zachary Karate Club and Dolphin Network.%通过对社团结构定义的研究,提出了一种基于模糊c均值聚类算法的网络社团探测新方法.利用网络节点间的最短路径长度、Person相关系数方法及平方法构造了节点间的相关度等价矩阵,从而将社团发现问题转换成节点的聚类问题.在此基础上,应用模糊c均值聚类算法以及网络划分形式对应的模块度来确定最优的社团结构,最后利用Zachary空手道俱乐部网络和Dolphin网络这两个经典模型验证了该算法的可行性.

  12. Quantum Monte Carlo methods and lithium cluster properties

    Energy Technology Data Exchange (ETDEWEB)

    Owen, R.K.

    1990-12-01

    Properties of small lithium clusters with sizes ranging from n = 1 to 5 atoms were investigated using quantum Monte Carlo (QMC) methods. Cluster geometries were found from complete active space self consistent field (CASSCF) calculations. A detailed development of the QMC method leading to the variational QMC (V-QMC) and diffusion QMC (D-QMC) methods is shown. The many-body aspect of electron correlation is introduced into the QMC importance sampling electron-electron correlation functions by using density dependent parameters, and are shown to increase the amount of correlation energy obtained in V-QMC calculations. A detailed analysis of D-QMC time-step bias is made and is found to be at least linear with respect to the time-step. The D-QMC calculations determined the lithium cluster ionization potentials to be 0.1982(14) [0.1981], 0.1895(9) [0.1874(4)], 0.1530(34) [0.1599(73)], 0.1664(37) [0.1724(110)], 0.1613(43) [0.1675(110)] Hartrees for lithium clusters n = 1 through 5, respectively; in good agreement with experimental results shown in the brackets. Also, the binding energies per atom was computed to be 0.0177(8) [0.0203(12)], 0.0188(10) [0.0220(21)], 0.0247(8) [0.0310(12)], 0.0253(8) [0.0351(8)] Hartrees for lithium clusters n = 2 through 5, respectively. The lithium cluster one-electron density is shown to have charge concentrations corresponding to nonnuclear attractors. The overall shape of the electronic charge density also bears a remarkable similarity with the anisotropic harmonic oscillator model shape for the given number of valence electrons.

  13. A Hierarchical Clustering Based Approach in Aspect Mining

    OpenAIRE

    Gabriela Czibula; Grigoreta Sofia Cojocar

    2012-01-01

    A Hierarchical Clustering Based Approach in Aspect Mining Clustering is a division of data into groups of similar objects. Aspect mining is a process that tries to identify crosscutting concerns in existing software systems. The goal is to refactor the existing systems to use aspect oriented programming, in order to make them easier to maintain and to evolve. The aim of this paper is to present a new hierarchical clustering based approach in aspect mining. For this purpose we propose HAC algo...

  14. Reliability analysis of cluster-based ad-hoc networks

    Energy Technology Data Exchange (ETDEWEB)

    Cook, Jason L. [Quality Engineering and System Assurance, Armament Research Development Engineering Center, Picatinny Arsenal, NJ (United States); Ramirez-Marquez, Jose Emmanuel [School of Systems and Enterprises, Stevens Institute of Technology, Castle Point on Hudson, Hoboken, NJ 07030 (United States)], E-mail: Jose.Ramirez-Marquez@stevens.edu

    2008-10-15

    The mobile ad-hoc wireless network (MAWN) is a new and emerging network scheme that is being employed in a variety of applications. The MAWN varies from traditional networks because it is a self-forming and dynamic network. The MAWN is free of infrastructure and, as such, only the mobile nodes comprise the network. Pairs of nodes communicate either directly or through other nodes. To do so, each node acts, in turn, as a source, destination, and relay of messages. The virtue of a MAWN is the flexibility this provides; however, the challenge for reliability analyses is also brought about by this unique feature. The variability and volatility of the MAWN configuration makes typical reliability methods (e.g. reliability block diagram) inappropriate because no single structure or configuration represents all manifestations of a MAWN. For this reason, new methods are being developed to analyze the reliability of this new networking technology. New published methods adapt to this feature by treating the configuration probabilistically or by inclusion of embedded mobility models. This paper joins both methods together and expands upon these works by modifying the problem formulation to address the reliability analysis of a cluster-based MAWN. The cluster-based MAWN is deployed in applications with constraints on networking resources such as bandwidth and energy. This paper presents the problem's formulation, a discussion of applicable reliability metrics for the MAWN, and illustration of a Monte Carlo simulation method through the analysis of several example networks.

  15. 基于子镜头聚类方法的关键帧提取技术%Method of Key Frame Extraction Based on Sub-Shot Clustering

    Institute of Scientific and Technical Information of China (English)

    罗森林; 马舒洁; 梁静; 潘丽敏; 冯杨

    2011-01-01

    By analyzing the known techniques of frame extraction, a new method of key frame extraction (KFE) is proposed based on sub-shot clustering in this paper. Using the features of color histogram between successive frames, the key frames could be extracted via sub-shot detection and clustering after the relocation of the shot boundary. Experimental results show that, the proposed method is adaptable, accurate, and effective. The improvement of key frame extraction lies in the fact that it not only reduces the complexity of extracting process, but also avoids redundancies of key frames effectively.%分析主流的关键帧提取技术,提出了一种基于子镜头聚类的关键帧提取算法.该方法在重新定位镜头的起始和终止帧号后,利用帧与帧之间的颜色直方图特征,通过子镜头检测和聚类进行关键帧提取.实验结果表明,该算法具有良好的适应性,既降低了关键帧提取算法的计算复杂度,正确率高,同时能有效避免关键帧的冗余,达到了很好的关键帧提取效果.

  16. A cluster-based simulation of facet-based search

    OpenAIRE

    Urruty, T.; Hopfgartner, F.; Villa, R.; Gildea, N.; Jose, J.M.

    2008-01-01

    The recent increase of online video has challenged the research in the field of video information retrieval. Video search engines are becoming more and more interactive, helping the user to easily find what he or she is looking for. In this poster, we present a new approach of using an iterative clustering algorithm on text and visual features to simulate users creating new facets in a facet-based interface. Our experimental results prove the usefulness of such an approach.

  17. Functionalization of atomic cobalt clusters obtained by electrochemical methods

    Energy Technology Data Exchange (ETDEWEB)

    Rodriguez Cobo, Eldara [Laboratorio de Magnetismo y Tecnologia, Instituto Tecnoloxico, Pabillon de Servicios, Campus Sur, 15782 Santiago de Compostela (Spain); Departamento de Quimica Organica y Unidad Asociada al CSIC, Universidad de Santiago de Compostela, 15782 Santiago de Compostela (Spain); Rivas Rey, Jose; Blanco Varela, M. Carmen; Lopez Quintela, M. Arturo [Laboratorio de Magnetismo y Tecnologia, Instituto Tecnoloxico, Pabillon de Servicios, Campus Sur, 15782 Santiago de Compostela (Spain); Mourino Mosquera, Antonio; Torneiro Abuin, Mercedes [Departamento de Quimica Organica y Unidad Asociada al CSIC, Universidad de Santiago de Compostela, 15782 Santiago de Compostela (Spain)

    2006-05-15

    Functionalization of magnetic nanoparticles with appropriate organic molecules is very important for many applications. In the present study, cobalt nanoparticles, with an average diameter of 2 nm corresponding to Co{sub 309} clusters were synthesised by an electrochemical method, and then coated with ADCB (4-(9-deceniloxi)benzoic acid), in order to protect the clusters against oxidation and to obtain a final nanostructure, which can be attached later on to many different materials, like drugs, proteins or some other biological molecules. (copyright 2006 WILEY-VCH Verlag GmbH and Co. KGaA, Weinheim) (orig.)

  18. A Photometric Method for estimating CNO Abundances in Globular Clusters

    CERN Document Server

    Peat, David; Peat, David; Butler, Raymond

    2002-01-01

    Stromgren indices v and b are combined with broad-band index I, and a new index p, the short wavelength half of the v band, to estimate CN 4215A molecular absorption in a sample of stars in M22. The results have been used to estimate carbon and nitrogen abundances and suggest groups of stars within this cluster, each with a characteristic nitrogen abundance, but with a range of carbon abundances. The results suggest the possibility of stars consisting of material which has undergone CNO recycling two or three times. The method can be subsequently used for other globular clusters.

  19. Membership determination of open cluster NGC 188 based on the DBSCAN clustering algorithm

    International Nuclear Information System (INIS)

    High-precision proper motions and radial velocities of 1046 stars are used to determine member stars using three-dimensional (3D) kinematics for open cluster NGC 188 based on the density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm. By implementing this algorithm, 472 member stars in the cluster are obtained with 3D kinematics. The color-magnitude diagram (CMD) of the 472 member stars using 3D kinematics shows a well-defined main sequence and a red giant branch, which indicate that the DBSCAN clustering algorithm is very effective for membership determination. The DBSCAN clustering algorithm can effectively select probable member stars in 3D kinematic space without any assumption about the distribution of the cluster or field stars. Analysis results show that the CMD of member stars is significantly clearer than the one based on 2D kinematics, which allows us to better constrain the cluster members and estimate their physical parameters. Using the 472 member stars, the average absolute proper motion and radial velocity are determined to be (PMα, PMδ) = (−2.58 ± 0.22, +0.17 ± 0.18) mas yr−1 and Vr = −42.35 ± 0.05 km s−1, respectively. Our values are in good agreement with values derived by other authors

  20. Clustering Methods with Qualitative Data: a Mixed-Methods Approach for Prevention Research with Small Samples.

    Science.gov (United States)

    Henry, David; Dymnicki, Allison B; Mohatt, Nathaniel; Allen, James; Kelly, James G

    2015-10-01

    Qualitative methods potentially add depth to prevention research but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed-methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed-methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-means clustering, and latent class analysis produced similar levels of accuracy with binary data and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a "real-world" example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities.

  1. Clustering Methods with Qualitative Data: a Mixed-Methods Approach for Prevention Research with Small Samples.

    Science.gov (United States)

    Henry, David; Dymnicki, Allison B; Mohatt, Nathaniel; Allen, James; Kelly, James G

    2015-10-01

    Qualitative methods potentially add depth to prevention research but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed-methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed-methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-means clustering, and latent class analysis produced similar levels of accuracy with binary data and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a "real-world" example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities. PMID:25946969

  2. Coresets vs clustering: comparison of methods for redundancy reduction in very large white matter fiber sets

    Science.gov (United States)

    Alexandroni, Guy; Zimmerman Moreno, Gali; Sochen, Nir; Greenspan, Hayit

    2016-03-01

    Recent advances in Diffusion Weighted Magnetic Resonance Imaging (DW-MRI) of white matter in conjunction with improved tractography produce impressive reconstructions of White Matter (WM) pathways. These pathways (fiber sets) often contain hundreds of thousands of fibers, or more. In order to make fiber based analysis more practical, the fiber set needs to be preprocessed to eliminate redundancies and to keep only essential representative fibers. In this paper we demonstrate and compare two distinctive frameworks for selecting this reduced set of fibers. The first framework entails pre-clustering the fibers using k-means, followed by Hierarchical Clustering and replacing each cluster with one representative. For the second clustering stage seven distance metrics were evaluated. The second framework is based on an efficient geometric approximation paradigm named coresets. Coresets present a new approach to optimization and have huge success especially in tasks requiring large computation time and/or memory. We propose a modified version of the coresets algorithm, Density Coreset. It is used for extracting the main fibers from dense datasets, leaving a small set that represents the main structures and connectivity of the brain. A novel approach, based on a 3D indicator structure, is used for comparing the frameworks. This comparison was applied to High Angular Resolution Diffusion Imaging (HARDI) scans of 4 healthy individuals. We show that among the clustering based methods, that cosine distance gives the best performance. In comparing the clustering schemes with coresets, Density Coreset method achieves the best performance.

  3. Cluster Abell 520: a perspective based on member galaxies. A cluster forming at the crossing of three filaments?

    CERN Document Server

    Girardi, M; Boschin, W; Ellingson, E

    2008-01-01

    The connection of cluster mergers with the presence of extended, diffuse radio sources in galaxy clusters is still debated. An interesting case is the rich, merging cluster Abell 520, containing a radio halo. A recent gravitational analysis has shown in this cluster the presence of a massive dark core suggested to be a possible problem for the current cold dark matter paradigm. We aim to obtain new insights into the internal dynamics of Abell 520 analyzing velocities and positions of member galaxies. Our analysis is based on redshift data for 293 galaxies in the cluster field obtained combining new redshift data for 86 galaxies acquired at the TNG with data obtained by CNOC team and other few data from the literature. We also use new photometric data obtained at the INT telescope. We combine galaxy velocities and positions to select 167 cluster members around z~0.201. We analyze the cluster structure using the weighted gap analysis, the KMM method, the Dressler-Shectman statistics and the analysis of the velo...

  4. Multiple imputation methods for bivariate outcomes in cluster randomised trials.

    Science.gov (United States)

    DiazOrdaz, K; Kenward, M G; Gomes, M; Grieve, R

    2016-09-10

    Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost-effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing-at-random clustered data scenarios were simulated following a full-factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed-effects multiple imputation and too low following single-level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. PMID:26990655

  5. 基于聚类的多属性群决策专家权重确定方法%A Method for Determining the Experts’ Weights of Multi-Attribute Group Decision-Making Based on Clustering Analysis

    Institute of Scientific and Technical Information of China (English)

    何立华; 王栎绮; 张连营

    2014-01-01

    对于多属性群决策中专家权重确定的问题,本文提出了基于聚类的专家权重确定方法,将专家权重分为类别间权重和类别内权重,对专家聚类步骤和类别间权重的计算方法进行了改进。通过专家给出的判断矩阵构建相容度矩阵,利用系统聚类原理,对相容度矩阵进行聚类,得到最大相容度谱系图。通过最大相容度间的距离和给定阈值的比较,对专家进行恰当分类,从而避免了根据现有研究步骤只能将专家分为两类的不足。此外,在确定类别间权重时,除继续对类容量较大的类赋予较大的类别间权重系数外,还引入专家判断矩阵的属性权重一致性来反映类别间的差异,从而有效避免了当某几类专家中含有相等数目专家时,赋予这几类专家相同类别间权重系数的问题。所提方法结构清晰、计算简便,并使得专家权重计算结果更为合理准确。最后运用一个算例对比验证了该方法的可行性和有效性。%An experts’ weight determining method based on the experts ’ weights clustering analysis is proposed to determine the experts’ weights of multi-attribute group decision-making.The experts’ weight is divided into the weights between categories and within category .The steps of experts ’ clustering and the calculation method of the weights between categories are improved .The clustering pedigree chart of the maximum compatibility de-gree is got by building the expert judgment compatibility matrix according to the expert judgment matrix , making use of the system clustering principle to cluster the compatibility degree matrix .The experts are classified proper-ly according to the comparison of the distance between the maximum compatibility degree and the given threshold value, which overcomes the shortcoming of only clustering the experts into two categories in existing literatures . In addition, while determining the

  6. A Systematic Analysis of Caustic Methods for Galaxy Cluster Masses

    CERN Document Server

    Gifford, Daniel; Kern, Nicholas

    2013-01-01

    We quantify the expected observed statistical and systematic uncertainties of the escape velocity as a measure of the gravitational potential and total mass of galaxy clusters. We focus our attention on low redshift (z 25, the scatter in the escape velocity mass is dominated by projections along the line-of-sight. Algorithmic uncertainties from the determination of the projected escape velocity profile are negligible. We quantify how target selection based on magnitude, color, and projected radial separation can induce small additional biases into the escape velocity masses. Using N_gal = 150 (25), the caustic technique has a per cluster scatter in ln(M|M_200) of 0.3 (0.5) and bias 1+/-3% (16+/-5%) for clusters with masses > 10^14M_solar at z<0.15.

  7. A Novel Density based improved k-means Clustering Algorithm – Dbkmeans

    Directory of Open Access Journals (Sweden)

    K. Mumtaz

    2010-03-01

    Full Text Available Mining knowledge from large amounts of spatial data isknown as spatial data mining. It becomes a highly demandingfield because huge amounts of spatial data have been collected invarious applications ranging from geo-spatial data to bio-medicalknowledge. The amount of spatial data being collected isincreasing exponentially. So, it far exceeded human’s ability toanalyze. Recently, clustering has been recognized as a primarydata mining method for knowledge discovery in spatial database.The database can be clustered in many ways depending on theclustering algorithm employed, parameter settings used, andother factors. Multiple clustering can be combined so that thefinal partitioning of data provides better clustering. In this paper,a novel density based k-means clustering algorithm has beenproposed to overcome the drawbacks of DBSCAN and kmeansclustering algorithms. The result will be an improved version of kmeansclustering algorithm. This algorithm will perform betterthan DBSCAN while handling clusters of circularly distributeddata points and slightly overlapped clusters.

  8. A Load Balance Routing Algorithm Based on Uneven Clustering

    Directory of Open Access Journals (Sweden)

    Liang Yuan

    2013-10-01

    Full Text Available Aiming at the problem of uneven load in clustering Wireless Sensor Network (WSN, a kind of load balance routing algorithm based on uneven clustering is proposed to do uneven clustering and calculate optimal number of clustering. This algorithm prevents the number of common node under some certain cluster head from being too large which leads load to be overweight to death through even node clustering. It constructs evaluation function which can better reflect residual energy distribution of nodes and at the same time constructs routing evaluation function between cluster heads which uses MATLAB to do simulation on the performance of this algorithm. Simulation result shows that the routing established by this algorithm effectively improves network’s energy balance and lengthens the life cycle of network.  

  9. Blind signal separation of underdetermined mixtures based on clustering algorithms on planes

    Institute of Scientific and Technical Information of China (English)

    Xie Shengli; Tan Beihai; Fu Yuli

    2007-01-01

    Based on clustering method on planes, blind signal separation (BSS) of underdetermined mixtures with three observed signals is discussed. The condition of sufficient sparsity of the source signals is not necessary when clustering method on planes is used. In other words, it needs not that only one source signal plays the main role among others at one time. The proposed method uses normal line clustering of planes first. Then, the mixing matrix can be identified via deciding the intersection lines of the planes. This method is an effective implement of the new theory presented by Georgiev. Simulations illustrate accuracy and restoring capability of the method to estimate the mixing matrix.

  10. Multi-face detection based on downsampling and modified subtractive clustering for color images

    Institute of Scientific and Technical Information of China (English)

    KONG Wan-zeng; ZHU Shan-an

    2007-01-01

    This paper presents a multi-face detection method for color images. The method is based on the assumption that faces are well separated from the background by skin color detection. These faces can be located by the proposed method which modifies the subtractive clustering. The modified clustering algorithm proposes a new definition of distance for multi-face detection, and its key parameters can be predetermined adaptively by statistical information of face objects in the image. Downsampling is employed to reduce the computation of clustering and speed up the process of the proposed method. The effectiveness of the proposed method is illustrated by three experiments.

  11. Clustering analysis of ancient celadon based on SOM neural network

    Institute of Scientific and Technical Information of China (English)

    ZHOU ShaoHuai; FU Lue; LIANG BaoLiu

    2008-01-01

    In the study,chemical compositions of 48 fragments of ancient ceramics excavated in 4 archaeological kiln sites which were located in 3 cities (Hangzhou,Cixi and Longquan in Zhejiang Province,China) have been examined by energy-dispersive X-ray fluorescence (EDXRF) technique.Then the method of SOM was introduced into the clustering analysis based on the major and minor element compositions of the bodies,the results manifested that 48 samples could be perfectly distributed into 3 locations,Hangzhou,Cixi and Longquan.Because the major and minor ele-ment compositions of two Royal Kilns were similar to each other,the classification accuracy over them was merely 76.92%.In view of this,the authors have made a SOM clustering analysis again based on the trace element compositions of the bodies,the classification accuracy rose to 84.61%.These results indicated that discrepancies in the trace element compositions of the bodies of the ancient ce-ramics excavated in two Royal Kiln sites were more distinct than those in the major and minor element compositions,which was in accordance with the fact.We ar-gued that SOM could be employed in the clustering analysis of ancient ceramics.

  12. Clustering analysis of ancient celadon based on SOM neural network

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    In the study, chemical compositions of 48 fragments of ancient ceramics excavated in 4 archaeological kiln sites which were located in 3 cities (Hangzhou, Cixi and Longquan in Zhejiang Province, China) have been examined by energy-dispersive X-ray fluorescence (EDXRF) technique. Then the method of SOM was introduced into the clustering analysis based on the major and minor element compositions of the bodies, the results manifested that 48 samples could be perfectly distributed into 3 locations, Hangzhou, Cixi and Longquan. Because the major and minor element compositions of two Royal Kilns were similar to each other, the classification accuracy over them was merely 76.92%. In view of this, the authors have made a SOM clustering analysis again based on the trace element compositions of the bodies, the classification accuracy rose to 84.61%. These results indicated that discrepancies in the trace element compositions of the bodies of the ancient ceramics excavated in two Royal Kiln sites were more distinct than those in the major and minor element compositions, which was in accordance with the fact. We argued that SOM could be employed in the clustering analysis of ancient ceramics.

  13. Comparison of Bayesian clustering and edge detection methods for inferring boundaries in landscape genetics

    Science.gov (United States)

    Safner, T.; Miller, M.P.; McRae, B.H.; Fortin, M.-J.; Manel, S.

    2011-01-01

    Recently, techniques available for identifying clusters of individuals or boundaries between clusters using genetic data from natural populations have expanded rapidly. Consequently, there is a need to evaluate these different techniques. We used spatially-explicit simulation models to compare three spatial Bayesian clustering programs and two edge detection methods. Spatially-structured populations were simulated where a continuous population was subdivided by barriers. We evaluated the ability of each method to correctly identify boundary locations while varying: (i) time after divergence, (ii) strength of isolation by distance, (iii) level of genetic diversity, and (iv) amount of gene flow across barriers. To further evaluate the methods' effectiveness to detect genetic clusters in natural populations, we used previously published data on North American pumas and a European shrub. Our results show that with simulated and empirical data, the Bayesian spatial clustering algorithms outperformed direct edge detection methods. All methods incorrectly detected boundaries in the presence of strong patterns of isolation by distance. Based on this finding, we support the application of Bayesian spatial clustering algorithms for boundary detection in empirical datasets, with necessary tests for the influence of isolation by distance. ?? 2011 by the authors; licensee MDPI, Basel, Switzerland.

  14. An adaptive spatial clustering method for automatic brain MR image segmentation

    Institute of Scientific and Technical Information of China (English)

    Jingdan Zhang; Daoqing Dai

    2009-01-01

    In this paper, an adaptive spatial clustering method is presented for automatic brain MR image segmentation, which is based on a competitive learning algorithm-self-organizing map (SOM). We use a pattern recognition approach in terms of feature generation and classifier design. Firstly, a multi-dimensional feature vector is constructed using local spatial information. Then, an adaptive spatial growing hierarchical SOM (ASGHSOM) is proposed as the classifier, which is an extension of SOM, fusing multi-scale segmentation with the competitive learning clustering algorithm to overcome the problem of overlapping grey-scale intensities on boundary regions. Furthermore, an adaptive spatial distance is integrated with ASGHSOM, in which local spatial information is considered in the cluster-ing process to reduce the noise effect and the classification ambiguity. Our proposed method is validated by extensive experiments using both simulated and real MR data with varying noise level, and is compared with the state-of-the-art algorithms.

  15. Scalable Integrated Region-Based Image Retrieval Using IRM and Statistical Clustering.

    Science.gov (United States)

    Wang, James Z.; Du, Yanping

    Statistical clustering is critical in designing scalable image retrieval systems. This paper presents a scalable algorithm for indexing and retrieving images based on region segmentation. The method uses statistical clustering on region features and IRM (Integrated Region Matching), a measure developed to evaluate overall similarity between images…

  16. PHC: A Fast Partition and Hierarchy-Based Clustering Algorithm

    Institute of Scientific and Technical Information of China (English)

    ZHOU HaoFeng(周皓峰); YUAN QingQing(袁晴晴); CHENG ZunPing(程尊平); SHI BaiLe(施伯乐)

    2003-01-01

    Cluster analysis is a process to classify data in a specified data set. In this field,much attention is paid to high-efficiency clustering algorithms. In this paper, the features in thecurrent partition-based and hierarchy-based algorithms are reviewed, and a new hierarchy-basedalgorithm PHC is proposed by combining advantages of both algorithms, which uses the cohesionand the closeness to amalgamate the clusters. Compared with similar algorithms, the performanceof PHC is improved, and the quality of clustering is guaranteed. And both the features were provedby the theoretic and experimental analyses in the paper.

  17. A Hierarchical Clustering Method Based on the Threshold of Semantic Feature in Big Data%大数据中一种基于语义特征阈值的层次聚类方法

    Institute of Scientific and Technical Information of China (English)

    罗恩韬; 王国军

    2015-01-01

    云计算、健康医疗、街景地图服务、推荐系统等新兴服务促使数据的种类和规模以前所未有的速度增长,数据量的激增会导致很多共性问题.例如数据的可表示,可处理和可靠性问题.如何有效处理和分析数据之间的关系,提高数据的划分效率,建立数据的聚类分析模型,已经成为学术界和企业界共同亟待解决的问题.该文提出一种基于语义特征的层次聚类方法,首先根据数据的语义特征进行训练,然后在每个子集上利用训练结果进行层次聚类,最终产生整体数据的密度中心点,提高了数据聚类效率和准确性.此方法采样复杂度低,数据分析准确,易于实现,具有良好的判定性.%The type and scale of data has been promoted with a hitherto unknown speed by the emerging services including cloud computing, health care, street view services recommendation system and so on. However, the surge in the volume of data may lead to many common problems, such as the representability, reliability and handlability of data. Therefore, how to effectively handle the relationship between the data and the analysis to improve the efficiency of classification of the data and establish the data clustering analysis model has become an academic and business problem, which needs to be solved urgently. A hierarchical clustering method based on semantic feature is proposed. Firstly, the data should be trained according to the semantic features of data, and then is used the training result to process hierarchical clustering in each subset; finally, the density center point is produced. This method can improve the efficiency and accuracy of data clustering. This algorithm is of low complexity about sampling, high accuracy of data analysis and good judgment. Furthermore, the algorithm is easy to realize.

  18. The Method of Data Aggregation for Wireless Sensor Network Based on Cluster Compressed Sensing of Multi-Sparsity Basis%多稀疏基分簇压缩感知的WSN数据融合方法

    Institute of Scientific and Technical Information of China (English)

    朱路; 刘媛媛; 慈白山; 潘泽中

    2016-01-01

    A novel data fusion method for WSN(Wireless Sensor Network)based on cluster compressed sensing (CCS)of multi-sparsity basis is presented to solve the contradiction between data accuracy collected and energy consumption in sensor nodes. In the proposed method,the improved threshold is adopted to select cluster head and form optimization cluster from the random deployment of sensor nodes,and the Bernoulli random matrix is utilized to linearly compress sensor data in the cluster by every cluster head,then the compressed information is transmitted to the sink,so it reduces data transmission and energy consumption of communication,thus improving the lifetime of network. According to monitor signals being of sparsity in finite difference and wavelets,the sink uses OOMP al⁃gorithm to reconstruct linear compression projection information from the finite difference and wavelets sparsity ba⁃sis respectively. And the least square method is adopted to get together the two different reconstruction signals which can improve data accuracy. Simulation experiment results show that the data fusion method of WSN based on CCS of multi-sparsity basis can guarantee data accuracy collected,and improve the lifetime of whole network at the same time,to solve the contradiction between data accuracy collected and network lifetime.%针对传感器节点采集数据精度与能量消耗的矛盾,提出多稀疏基分簇压缩感知的无线传感器网络WSN(Wireless Sensor Network)数据融合方法。该方法利用改进的阈值对随机部署的传感器节点进行簇首选择继而形成最优簇,簇首采用伯努利随机观测矩阵对簇内节点信号进行线性压缩投影,然后将压缩的信息传送给汇聚节点,减少数据传输即降低通信能耗,从而提高网络的生命周期。根据传感器节点监测信号在有限差分和小波中都具有可压缩特性,汇聚节点在有限差分和小波两个稀疏基的约束下,利用OOMP算法

  19. Data relationship degree-based clustering data aggregation for VANET

    Science.gov (United States)

    Kumar, Rakesh; Dave, Mayank

    2016-03-01

    Data aggregation is one of the major needs of vehicular ad hoc networks (VANETs) due to the constraints of resources. Data aggregation in VANET can reduce the data redundancy in the process of data gathering and thus conserving the bandwidth. In realistic applications, it is always important to construct an effective route strategy that optimises not only communication cost but also the aggregation cost. Data aggregation at the cluster head by individual vehicle causes flooding of the data, which results in maximum latency and bandwidth consumption. Another approach of data aggregation in VANET is sending local representative data based on spatial correlation of sampled data. In this article, we emphasise on the problem that recent spatial correlation data models of vehicles in VANET are not appropriate for measuring the correlation in a complex and composite environment. Moreover, the data represented by these models is generally inaccurate when compared to the real data. To minimise this problem, we propose a group-based data aggregation method that uses data relationship degree (DRD). In the proposed approach, DRD is a spatial relationship measurement parameter that measures the correlation between a vehicle's data and its neighbouring vehicles' data. The DRD clustering method where grouping of vehicle's data is done based on the available data and its correlation is presented in detail. Results prove that the representative data using proposed approach have a low distortion and provides an improvement in packet delivery ratio and throughput (up to of 10.84% and 24.82% respectively) as compared to the other state-of-the-art solutions like Cluster-Based Accurate Syntactic Compression of Aggregated Data in VANETs.

  20. Automatic Clustering Approaches Based On Initial Seed Points

    Directory of Open Access Journals (Sweden)

    G.V.S.N.R.V.Prasad

    2011-12-01

    Full Text Available Since clustering is applied in many fields, a number of clustering techniques and algorithms have been proposed and are available in the literature. This paper proposes a novel approach to address the major problems in any of the partitional clustering algorithms like choosing appropriate K-value and selection of K-initial seed points. The performance of any partitional clustering algorithms depends oninitial seed points which are random in all the existing partitional clustering algorithms. To overcome this problem, a novel algorithm called Weighted Interior Clustering (WIC algorithm to find approximate initial seed-points, number of clusters and data points in the clusters is proposed in this paper. This paper also proposes another novel approach combining a newly proposed WIC algorithm with K-means named as Weighted Interior K-means Clustering (WIKC. The novelty of this WIKC is that it improves the quality and performance of K-means clustering algorithm with reduced complexity. The experimental results on various datasets, with various instances clearly indicates the efficacy of the proposed methods over the other methods.

  1. Cluster detection of diseases in heterogeneous populations: an alternative to scan methods

    Directory of Open Access Journals (Sweden)

    Rebeca Ramis

    2014-05-01

    Full Text Available Cluster detection has become an important part of the agenda of epidemiologists and public health authorities, the identification of high- and low-risk areas is fundamental in the definition of public health strategies and in the suggestion of potential risks factors. Currently, there are different cluster detection techniques available, the most popular being those using windows to scan the areas within the studied region. However, when these areas are heterogeneous in populations’ sizes, scan window methods can lead to inaccurate conclusions. In order to perform cluster detection over heterogeneously populated areas, we developed a method not based on scanning windows but instead on standard mortality ratios (SMR using irregular spatial aggregation (ISA. Its extension, i.e. irregular spatial aggregation with covariates (ISAC, includes covariates with residuals from Poisson regression. We compared the performance of the method with the flexible shaped spatial scan statistic (FlexScan using mortality data for stomach and bladder cancer for 8,098 Spanish towns. The results show a collection of clusters for stomach and bladder cancer similar to that detected by ISA and FlexScan. However, in general, clusters detected by FlexScan were bigger and include towns with SMR, which were not statistically significant. For bladder cancer, clusters detected by ISAC differed from those detected by ISA and FlexScan in shape and location. The ISA and ISAC methods could be an alternative to the traditional scan window methods for cluster detection over aggregated data when the areas under study are heterogeneous in terms of population. The simplicity and flexibility of the methods make them more attractive than methods based on more complicated algorithms.

  2. Cluster detection of diseases in heterogeneous populations: an alternative to scan methods.

    Science.gov (United States)

    Ramis, Rebeca; Gómez-Barroso, Diana; López-Abente, Gonzalo

    2014-05-01

    Cluster detection has become an important part of the agenda of epidemiologists and public health authorities, the identification of high- and low-risk areas is fundamental in the definition of public health strategies and in the suggestion of potential risks factors. Currently, there are different cluster detection techniques available, the most popular being those using windows to scan the areas within the studied region. However, when these areas are heterogeneous in populations' sizes, scan window methods can lead to inaccurate conclusions. In order to perform cluster detection over heterogeneously populated areas, we developed a method not based on scanning windows but instead on standard mortality ratios (SMR) using irregular spatial aggregation (ISA). Its extension, i.e. irregular spatial aggregation with covariates (ISAC), includes covariates with residuals from Poisson regression. We compared the performance of the method with the flexible shaped spatial scan statistic (FlexScan) using mortality data for stomach and bladder cancer for 8,098 Spanish towns. The results show a collection of clusters for stomach and bladder cancer similar to that detected by ISA and FlexScan. However, in general, clusters detected by FlexScan were bigger and include towns with SMR, which were not statistically significant. For bladder cancer, clusters detected by ISAC differed from those detected by ISA and FlexScan in shape and location. The ISA and ISAC methods could be an alternative to the traditional scan window methods for cluster detection over aggregated data when the areas under study are heterogeneous in terms of population. The simplicity and flexibility of the methods make them more attractive than methods based on more complicated algorithms. PMID:24893029

  3. Monomer Basis Representation Method For Calculating The Spectra Of Molecular Clusters I. The Method And Qualitative Models

    CERN Document Server

    Ocak, Mahir E

    2012-01-01

    Firstly, a sequential symmetry adaptation procedure is derived for semidirect product groups. Then, this sequential symmetry adaptation procedure is used in the development of new method named Monomer Basis Representation (MBR) for calculating the vibration-rotation-tunneling (VRT) spectra of molecular clusters. The method is based on generation of optimized bases for each monomer in the cluster as a linear combination of some primitive basis functions and then using the sequential symmetry adaptation procedure for generating a small symmetry adapted basis for the solution of the full problem. It is seen that given an optimized basis for each monomer the application of the sequential symmetry adaptation procedure leads to a generalized eigenvalue problem instead of a standard eigenvalue problem if the procedure is used as it is. In this paper, MBR method will be developed as a solution of that problem such that it leads to generation of an orthogonal optimized basis for the cluster being studied regardless of...

  4. Remote sensing clustering analysis based on object-based interval modeling

    Science.gov (United States)

    He, Hui; Liang, Tianheng; Hu, Dan; Yu, Xianchuan

    2016-09-01

    In object-based clustering, image data are segmented into objects (groups of pixels) and then clustered based on the objects' features. This method can be used to automatically classify high-resolution, remote sensing images, but requires accurate descriptions of object features. In this paper, we ascertain that interval-valued data model is appropriate for describing clustering prototype features. With this in mind, we developed an object-based interval modeling method for high-resolution, multiband, remote sensing data. We also designed an adaptive interval-valued fuzzy clustering method. We ran experiments utilizing images from the SPOT-5 satellite sensor, for the Pearl River Delta region and Beijing. The results indicate that the proposed algorithm considers both the anisotropy of the remote sensing data and the ambiguity of objects. Additionally, we present a new dissimilarity measure for interval vectors, which better separates the interval vectors generated by features of the segmentation units (objects). This approach effectively limits classification errors caused by spectral mixing between classes. Compared with the object-based unsupervised classification method proposed earlier, the proposed algorithm improves the classification accuracy without increasing computational complexity.

  5. 基于 EM 聚类的 Moodle平台用户分析%Moodle User Analysis Based on EM Cluster Method

    Institute of Scientific and Technical Information of China (English)

    彭勃

    2014-01-01

    T he EM algorithm is applied in the research of e‐learning platform users'learning behaviors , clusters can be built from low level data with weak semantic information to further analysis of the behavior features of higher semantic level .The results can help instructors to group students in collaborative learning . T herefore ,the qualified rate of online courses can be improved .%将EM聚类方法应用于在线学习平台使用者的学习行为研究,由含有较低语义信息的数据得到的聚类簇进一步分析而得到较高语义层次的行为特征。结果有助于教师指导学生在使用平台进行协作学习方面如何进行有效分组,进而提高在线课程的结业率。

  6. Cloud Computing Application for Hotspot Clustering Using Recursive Density Based Clustering (RDBC)

    Science.gov (United States)

    Santoso, Aries; Khiyarin Nisa, Karlina

    2016-01-01

    Indonesia has vast areas of tropical forest, but are often burned which causes extensive damage to property and human life. Monitoring hotspots can be one of the forest fire management. Each hotspot is recorded in dataset so that it can be processed and analyzed. This research aims to build a cloud computing application which visualizes hotspots clustering. This application uses the R programming language with Shiny web framework and implements Recursive Density Based Clustering (RDBC) algorithm. Clustering is done on hotspot dataset of the Kalimantan Island and South Sumatra Province to find the spread pattern of hotspots. The clustering results are evaluated using the Silhouette's Coefficient (SC) which yield best value 0.3220798 for Kalimantan dataset. Clustering pattern are displayed in the form of web pages so that it can be widely accessed and become the reference for fire occurrence prediction.

  7. LBSN user movement trajectory clustering mining method based on road network%基于路网的LBSN用户移动轨迹聚类挖掘方法

    Institute of Scientific and Technical Information of China (English)

    邹永贵; 万建斌; 夏英

    2013-01-01

    The data in LBSN (location-based social network) have geographical and social attribute.It is helpful to improve the efficiency of uncertainly trajectory clustering mining combined with user' s trajectories and friendship.This paper presented a ranking function based on the feature of friends relationship to sort user' s effect and find the active users.Meanwhile,it introduced accuracy detection of the road network sub-trajectories to the process of network matching based on data reduction.Moreover,it stored the active users' correct matching ways to reduce the time complexity.Finally,it mined hot routes within the city by taking into account both R* tree spatial index mechanism and DBSCAN clustering algorithm.Theoretical analysis and experiment results show that compared to the existing method,the method has better stretchability,can get clustering result more accurately and efficiently in the LBSN environment.%基于LBSN(基于位置的社交网络)中数据的地理和社交属性,结合用户轨迹和好友关系,有助于提高不确定轨迹聚类挖掘的效率.根据LBSN用户的好友关系特征,引入评分函数,对用户影响力进行排序,找出其中的活跃用户;在传统路网子轨迹匹配和对签到数据清理的基础上,加入子轨迹匹配准确性监测,并存储活跃用户匹配成功的路段,进而减少路网匹配时间.最后综合R*树的空间索引机制和DBSCAN聚类算法对城市内的热点路径进行挖掘.理论分析和实验表明,相比于已有方法,改进的的聚类挖掘方法在LBSN环境中的时间效率和准确性都有较大的提高,且有较好的可伸缩性.

  8. A Method for Detecting Communities Based on Clustering Technology and Ant Colony Algorithm%一种基于聚类技术和蚁群算法的社团发现方法

    Institute of Scientific and Technical Information of China (English)

    宋智玲; 贾小珠

    2011-01-01

    How to discover communities automatically has great significance in the study of structure, function and behavior of complex network. Based on ant colony algorithm to optimize the node computational performance, a new method of identifying similar nodes is provided by using clustering technology.%如何在复杂网络中自动地发现社团,对于研究复杂网络的结构、功能和行为有着非常重要的意义.在聚类技术的基础上,提出了一种基于蚁群算法识别相似结点的方法,以优化结点的计算性能.

  9. Estimating insolation based on PV-module currents in a cluster of stand-alone solar systems: Introduction of a new method

    NARCIS (Netherlands)

    Nieuwenhout, F; van den Borg, N.; van Sark, W.G.J.H.M.; Turkenburg, W.C.

    2006-01-01

    In order to evaluate the performance of solar home systems (SHS), data on local insolation is a prerequisite. We present the outline of a new method to estimate insolation if direct measurements are unavailable. This method comprises estimation of daily irradiation by correlating photovoltaic (PV)-m

  10. A new method for estimating insolation based on PV-module currents in a cluster of stand-alone solar systems

    NARCIS (Netherlands)

    Nieuwenhout, F; van der Borg, N; van Sark, W.G.J.H.M.; Turkenburg, W.C.

    2007-01-01

    In order to evaluate the performance of solar home systems (SHSs), data on local insolation is a prerequisite. We present a new method to estimate insolation if direct measurements are unavailable. This method comprises estimation of daily irradiation by correlating photovoltaic (PV) module currents

  11. A new method for estimating insolation based on PV-module currents in a cluster of stand-alone solar systems

    Energy Technology Data Exchange (ETDEWEB)

    Nieuwenhout, F.; Van der Borg, N. [Energy Research Centre of the Netherlands, Petten (Netherlands); Van Sark, W.; Turkenburg, W. [Utrecht University (Netherlands). Copernicus Institute for Sustainable Development and Innovation, Department of Science, Technology and Society

    2006-07-01

    In order to evaluate the performance of solar home systems (SHSs), data on local insolation is a prerequisite. We present a new method to estimate insolation if direct measurements are unavailable. This method comprises estimation of daily irradiation by correlating photovoltaic (PV) module currents from a number of SHSs, located a few kilometres apart. The method was tested with a 3-year time series for nine SHS in a remote area in Indonesia. Verification with reference cell measurements over a 2-month period showed that our method could determine average daily irradiation with a mean bias error of 1.3%. Daily irradiation figures showed a standard error of 5%. The systematic error in this method is estimated to be around 10%. Especially if calibration with measurements during a short period is possible, the proposed method provides more accurate monthly insolation figures compared with the readily available satellite data from the NASA SSE database. An advantage of the proposed method over satellite data is that irradiation figures can be calculated on a daily basis, while the SSE database only provides monthly averages. It is concluded that the new method is a valuable tool to obtain information on insolation when long-term measurements are absent. (author)

  12. An improved BIRCH clustering method for radar reconnaissance data based on extremum features%基于极值特征的雷达侦察数据BIRCH聚类方法

    Institute of Scientific and Technical Information of China (English)

    张宇

    2016-01-01

    为解决传统BIRCH算法对数据对象输入顺序敏感、聚类结果不稳定的问题,提出了一种改进的BIRCH算法。该算法将雷达信号侦察数据的脉冲载频、脉冲重复间隔和脉冲宽度分别进行聚类,根据工程应用中各参数量测误差和系统误差设定不同阈值。并且引入聚类簇的极大极小值作为聚类特征,使用层次树的方法来构建聚类特征模型,实现了雷达侦察数据的快速向下搜索及聚类。实验结果表明,该方法是可行、有效的。%In order to solve the problem that traditional BIRCH clustering algorithms are sensitive to the data input order with unstable clustering results, an improved BIRCH clustering algorithm is proposed. In this method, the RF, PRI and PW values of radar reconnaissance data are clustered as three separate parameters. In engineering application, different threshold is set according to the parameter measurement error and system error. The maximum and minimum of clusters are used as clustering features to construct the clustering feature model by using hierarchical tree method. The searching and clustering for radar reconnaissance data can be quickly achieved. Experimental results show that the proposed method is feasible and effective.

  13. A Method for Group Decision-Making Based on Entropy Weight and Gray Cluster Analysis%基于熵权的群组灰色聚类决策法

    Institute of Scientific and Technical Information of China (English)

    蔡忠义; 陈云翔; 徐吉辉; 项华春

    2012-01-01

    In order to reasonably determine the weight of each expert in multi-attribute group decision-making, a method based on entropy weight and gray cluster analysis was proposed. According to the sequencing vectors obtained by normalization of each expert's corresponding judgment matrixes, cluster analysis was made with the absolute correlation matrix of gray system and the weights of inter-class were determined. The weights of within-class could be ascertained by the theory of entropy weight. A numerical example proved the feasibility and effectiveness of this method. The result showed that the method can effectively improve the rationality for weight determining and can contribute to scientific group decision-making.%在多属性群组决策方法的研究中,为了客观合理地确定群组专家的权值,提出一种基于熵权的群组灰色聚类决策方法.依据各个专家的判断矩阵归一化得到的排序向量,利用灰色绝对关联矩阵进行聚类分析并类间赋权,运用熵权理论进行类内赋权,结合算例验证了该方法可行有效.结果表明,该方法可以有效提高专家赋权的合理性和群组决策的科学性.

  14. Ball istic Group Target Clustering Recognition Method Based on Feature Sensitivity%基于特征敏感度的弹道群目标聚类识别方法

    Institute of Scientific and Technical Information of China (English)

    李昌玺; 郭戈; 汪毅; 赵龙华; 张晨

    2015-01-01

    针对弹道单目标识别中,传感器资源利用率低、识别耗时长的缺点,采用群目标理论,提出了基于特征敏感度的弹道群目标聚类识别方法。该方法首先对目标群进行聚类,然后选取目标特征组合对各个分群进行敏感度计算,确立各分群的威胁度。同时,通过比较某个特定阶段各类特征的优劣性,为特征优化组合提供指导意见,优化了目标特征组合,提高了识别效率。最后,仿真实验验证了方法的可行性。%Aiming at low utilization rate and time-consuming identification of sensor resources of single ballistic target recognition,this paper puts forward the ballistic target group clustering recogni-tion method based on feature sensitivity .The keys to this method include clustering the target group and forming some small groups firstly,and then selecting the target feature combination to calculate the sensitivity for each small group,at the last establishing threat degree.At the same time,by com-paring the advantages and disadvantages of a particular stage,this paper brings out guiding opinions on feature optimization and combination,optimizes the feature combination and improves the efficien-cy of identification.In the end,the feasibility of the method is proven by simulation results.

  15. Case studies of atomic properties using coupled-cluster and unitary coupled-cluster methods

    CERN Document Server

    Sur, C; Das, B P; Mukherjee, D; Sur, Chiranjib; Chaudhuri, Rajat K.

    2005-01-01

    The nuclear magnetic dipole (A) and electric quadrupole coupling constants (B) of Aluminium (^{27}Al) atom are computed using the relativistic coupled cluster (CC) and unitary coupled cluster (UCC) method. Effects of electron correlations on the computed quantities are investigated using different levels of CC approximations and truncation schemes. The ionization potential (IPs) and excitation energies (EEs) with respect to the ground state and related properties such oscillator strengths, transition probabilities, nuclear magnetic dipole and electric quadrupole moments are also computed to assess the accuracy of the scheme. The electric quadrupole coupling constant obtained from the present calculation is off by less than 1% from the experiment. The IPs and EEs also agree with experiment to 0.06 eV or better and the fine structure splittings are accurate to better than 0.001 eV. The one-electron properties reported here are also in excellent agreement with the experiment.

  16. An Efficient Semantic Model For Concept Based Clustering And Classification

    Directory of Open Access Journals (Sweden)

    SaiSindhu Bandaru

    2012-03-01

    Full Text Available Usually in text mining techniques the basic measures like term frequency of a term (word or phrase is computed to compute the importance of the term in the document. But with statistical analysis, the original semantics of the term may not carry the exact meaning of the term. To overcome this problem, a new framework has been introduced which relies on concept based model and synonym based approach. The proposed model can efficiently find significant matching and related concepts between documents according to concept based and synonym based approaches. Large sets of experiments using the proposed model on different set in clustering and classification are conducted. Experimental results demonstrate the substantialenhancement of the clustering quality using sentence based, document based, corpus based and combined approach concept analysis. A new similarity measure has been proposed to find the similarity between adocument and the existing clusters, which can be used in classification of the document with existing clusters.

  17. A fast density-based clustering algorithm for real-time Internet of Things stream.

    Science.gov (United States)

    Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut

    2014-01-01

    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.

  18. A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream

    Directory of Open Access Journals (Sweden)

    Amineh Amini

    2014-01-01

    Full Text Available Data streams are continuously generated over time from Internet of Things (IoT devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.

  19. Recognition of Marrow Cell Images Based on Fuzzy Clustering

    Directory of Open Access Journals (Sweden)

    Xitao Zheng

    2012-02-01

    Full Text Available In order to explore the leukocyte distribution of human being to predict the recurrent leukemia, the mouse marrow cells are investigated to get the possible indication of the recurrence. This paper uses the C-mean fuzzy clustering recognition method to identify cells from sliced mouse marrow image. In our image processing, red cells, leukocytes, megakaryocyte, and cytoplasm can not be separated by their staining color, RGB combinations are used to classify the image into 8 sectors so that the searching area can be matched with these sectors. The gray value distribution and the texture patterns are used to construct membership function. Previous work on this project involves the recognition using pixel distribution and probability lays the background of data processing and preprocessing. Constraints based on size, pixel distribution, and grayscale pattern are used for the successful counting of individual cells. Tests show that this shape, pattern and color based method can reach satisfied counting under similar illumination condition.

  20. Research and Implementation of Unsupervised Clustering-Based Intrusion Detection

    Institute of Scientific and Technical Information of China (English)

    LuoMin; ZhangHuan-guo; WangLi-na

    2003-01-01

    An unsupervised clustering-based intrusion detection algorithm is discussed in this paper. The basic idea of the algorithm is to produce the cluster by comparing the distances of unlabeled training data sets. With the classified data instances, anomaly data clusters can be easily identified by normal duster ratio and the identified cluster can be used in real data detection. The benefit of the algorithm is that it doesn't need labeled training data sets. The experiment coneludes that this approach can detect unknown intrusions efficiently in the real network connections via using the data sets of KDD99.

  1. Translationally-invariant coupled-cluster method for finite systems

    CERN Document Server

    Guardiola, R; Navarro, J; Portesi, M

    1998-01-01

    The translational invariant formulation of the coupled-cluster method is presented here at the complete SUB(2) level for a system of nucleons treated as bosons. The correlation amplitudes are solution of a non-linear coupled system of equations. These equations have been solved for light and medium systems, considering the central but still semi-realistic nucleon-nucleon S3 interaction.

  2. Estimating insolation based on PV-module currents in a cluster of stand-alone solar systems: Introduction of a new method

    Energy Technology Data Exchange (ETDEWEB)

    Nieuwenhout, Frans; Van der Borg, Nico [Energy Research Centre of the Netherlands, Petten (Netherlands); Van Sark, Wilfried; Turkenburg, Wim [Copernicus Institute for Sustainable Development and Innovation, Utrecht University (Netherlands). Department of Science, Technology and Society

    2006-09-15

    In order to evaluate the performance of solar home systems (SHS), data on local insolation is a prerequisite. We present the outline of a new method to estimate insolation if direct measurements are unavailable. This method comprises estimation of daily irradiation by correlating photovoltaic (PV)-module currents from a number of solar home systems, located a few kilometres apart. The objective is to obtain reliable daily and monthly insolation figures that are representative for an area of a few square kilometres. (author)

  3. Intelligent Hybrid Cluster Based Classification Algorithm for Social Network Analysis

    Directory of Open Access Journals (Sweden)

    S. Muthurajkumar

    2014-05-01

    Full Text Available In this paper, we propose an hybrid clustering based classification algorithm based on mean approach to effectively classify to mine the ordered sequences (paths from weblog data in order to perform social network analysis. In the system proposed in this work for social pattern analysis, the sequences of human activities are typically analyzed by switching behaviors, which are likely to produce overlapping clusters. In this proposed system, a robust Modified Boosting algorithm is proposed to hybrid clustering based classification for clustering the data. This work is useful to provide connection between the aggregated features from the network data and traditional indices used in social network analysis. Experimental results show that the proposed algorithm improves the decision results from data clustering when combined with the proposed classification algorithm and hence it is proved that of provides better classification accuracy when tested with Weblog dataset. In addition, this algorithm improves the predictive performance especially for multiclass datasets which can increases the accuracy.

  4. Cluster Based Topology Control in Dynamic Mobile Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    T. Parameswaran

    2014-05-01

    Full Text Available In Mobile Ad hoc NETworks (MANETs, mobility of nodes, resource constraints and selfish behavior of nodes are important factors which may degrade the performance. Clustering is an effective scheme to improve the performance of MANETs features such as scalability, reliability, and stability. Each cluster member (CM is associated with only one cluster head (CH and can communicate with the CH by single hop communication. Mobility information is used by many existing clustering schemes such as weighted clustering algorithm (WCA Link expiration time prediction scheme and k-hop compound metric based clustering. In scheme 1 the CH election is based on a weighted sum of four different parameters such as node status, neighbor’s distribution, mobility, and remaining energy which brings flexibility but weight factor for each parameter if difficult. In scheme 2 lifetime of a wireless link between a node pair is predicted by GPS location information. In scheme 3 the predicted mobility parameter is combined with the connectivity to create a new compound metric for CH election. Despite various efforts in mobility clustering, not much work has been done specifically for high mobility nodes. Our proposed solution provides secure CH election and incentives to encourage nodes to honestly participating in election process. Mobility strategies are used to handle the various problems caused by node movements such as association losses to current CHs and CH role changes, for extending the connection lifetime and provide more stable clusters. The conducted simulation results shows that the proposed approach outperforms the existing clustering schemes.

  5. Likelihood-based inference for clustered line transect data

    DEFF Research Database (Denmark)

    Waagepetersen, Rasmus Plenge; Schweder, Tore

    The uncertainty in estimation of spatial animal density from line transect surveys depends on the degree of spatial clustering in the animal population. To quantify the clustering we model line transect data as independent thinnings of spatial shot-noise Cox processes. Likelihood-based inference...

  6. Result Diversification Based on Query-Specific Cluster Ranking

    NARCIS (Netherlands)

    He, J.; Meij, E.; Rijke, M. de

    2011-01-01

    Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification

  7. Result diversification based on query-specific cluster ranking

    NARCIS (Netherlands)

    J. He; E. Meij; M. de Rijke

    2011-01-01

    Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification

  8. Finite mixture models and model-based clustering

    Directory of Open Access Journals (Sweden)

    Volodymyr Melnykov

    2010-01-01

    Full Text Available Finite mixture models have a long history in statistics, having been used to model population heterogeneity, generalize distributional assumptions, and lately, for providing a convenient yet formal framework for clustering and classification. This paper provides a detailed review into mixture models and model-based clustering. Recent trends as well as open problems in the area are also discussed.

  9. Likelihood-based inference for clustered line transect data

    DEFF Research Database (Denmark)

    Waagepetersen, Rasmus; Schweder, Tore

    2006-01-01

    The uncertainty in estimation of spatial animal density from line transect surveys depends on the degree of spatial clustering in the animal population. To quantify the clustering we model line transect data as independent thinnings of spatial shot-noise Cox processes. Likelihood-based inference...

  10. Sonar Image Detection Algorithm Based on Two-Phase Manifold Partner Clustering

    Institute of Scientific and Technical Information of China (English)

    Xingmei Wang; Zhipeng Liu; Jianchuang Sun; Shu Liu

    2015-01-01

    According to the characteristics of sonar image data with manifold feature, the sonar image detection method based on two⁃phase manifold partner clustering algorithm is proposed. Firstly, K⁃means block clustering based on euclidean distance is proposed to reduce the data set. Mean value, standard deviation, and gray minimum value are considered as three features based on the relatinship between clustering model and data structure. Then K⁃means clustering algorithm based on manifold distance is utilized clustering again on the reduced data set to improve the detection efficiency. In K⁃means clustering algorithm based on manifold distance, line segment length on the manifold is analyzed, and a new power function line segment length is proposed to decrease the computational complexity. In order to quickly calculate the manifold distance, new all⁃source shortest path as the pretreatment of efficient algorithm is proposed. Based on this, the spatial feature of the image block is added in the three features to get the final precise partner clustering algorithm. The comparison with the other typical clustering algorithms demonstrates that the proposed algorithm gets good detection result. And it has better adaptability by experiments of the different real sonar images.

  11. A Survey on the Taxonomy of Cluster-Based Routing Protocols for Homogeneous Wireless Sensor Networks

    Science.gov (United States)

    Naeimi, Soroush; Ghafghazi, Hamidreza; Chow, Chee-Onn; Ishii, Hiroshi

    2012-01-01

    The past few years have witnessed increased interest among researchers in cluster-based protocols for homogeneous networks because of their better scalability and higher energy efficiency than other routing protocols. Given the limited capabilities of sensor nodes in terms of energy resources, processing and communication range, the cluster-based protocols should be compatible with these constraints in either the setup state or steady data transmission state. With focus on these constraints, we classify routing protocols according to their objectives and methods towards addressing the shortcomings of clustering process on each stage of cluster head selection, cluster formation, data aggregation and data communication. We summarize the techniques and methods used in these categories, while the weakness and strength of each protocol is pointed out in details. Furthermore, taxonomy of the protocols in each phase is given to provide a deeper understanding of current clustering approaches. Ultimately based on the existing research, a summary of the issues and solutions of the attributes and characteristics of clustering approaches and some open research areas in cluster-based routing protocols that can be further pursued are provided. PMID:22969350

  12. Cluster-based global firms' use of local capabilities

    DEFF Research Database (Denmark)

    Andersen, Poul Houman; Bøllingtoft, Anne

    2011-01-01

    ’s knowledge base as a mediating variable, the purpose of this paper is to examine how globalization affected the studied firms’ use of local cluster-based knowledge, integration of local and global knowledge, and networking capabilities. Design/methodology/approach – Qualitative case studies of nine firms...... knowledge were highly active in local knowledge use, whereas CBFs characterized by a more implicit knowledge base did not use localized knowledge. Research limitations/implications – The study is exploratory and covers three clusters in one small and open developed economy. Further corroboration through...... takes a micro-oriented perspective and focus on clusters in Denmark, a small and mature economy...

  13. Cluster membership probabilities from proper motions and multi-wavelength photometric catalogues. I. Method and application to the Pleiades cluster

    Science.gov (United States)

    Sarro, L. M.; Bouy, H.; Berihuete, A.; Bertin, E.; Moraux, E.; Bouvier, J.; Cuillandre, J.-C.; Barrado, D.; Solano, E.

    2014-03-01

    Context. With the advent of deep wide surveys, large photometric and astrometric catalogues of literally all nearby clusters and associations have been produced. The unprecedented accuracy and sensitivity of these data sets and their broad spatial, temporal and wavelength coverage make obsolete the classical membership selection methods that were based on a handful of colours and luminosities. We present a new technique designed to take full advantage of the high dimensionality (photometric, astrometric, temporal) of such a survey to derive self-consistent and robust membership probabilities of the Pleiades cluster. Aims: We aim at developing a methodology to infer membership probabilities to the Pleiades cluster from the DANCe multidimensional astro-photometric data set in a consistent way throughout the entire derivation. The determination of the membership probabilities has to be applicable to censored data and must incorporate the measurement uncertainties into the inference procedure. Methods: We use Bayes' theorem and a curvilinear forward model for the likelihood of the measurements of cluster members in the colour-magnitude space, to infer posterior membership probabilities. The distribution of the cluster members proper motions and the distribution of contaminants in the full multidimensional astro-photometric space is modelled with a mixture-of-Gaussians likelihood. Results: We analyse several representation spaces composed of the proper motions plus a subset of the available magnitudes and colour indices. We select two prominent representation spaces composed of variables selected using feature relevance determination techniques based in Random Forests, and analyse the resulting samples of high probability candidates. We consistently find lists of high probability (p > 0.9975) candidates with ≈1000 sources, 4 to 5 times more than obtained in the most recent astro-photometric studies of the cluster. Conclusions: Multidimensional data sets require

  14. Perceptual Object Extraction Based on Saliency and Clustering

    Directory of Open Access Journals (Sweden)

    Qiaorong Zhang

    2010-08-01

    Full Text Available Object-based visual attention has received an increasing interest in recent years. Perceptual object is the basic attention unit of object-based visual attention. The definition and extraction of perceptual objects is one of the key technologies in object-based visual attention computation model. A novel perceptual object definition and extraction method is proposed in this paper. Based on Gestalt theory and visual feature integration theory, perceptual object is defined using homogeneity region, salient region and edges. An improved saliency map generating algorithm is employed first. Based on the saliency map, salient edges are extracted. Then graph-based clustering algorithm is introduced to get homogeneity regions in the image. Finally an integration strategy is adopted to combine salient edges and homogeneity regions to extract perceptual objects. The proposed perceptual object extraction method has been tested on lots of natural images. Experiment results and analysis are presented in this paper also. Experiment results show that the proposed method is reasonable and valid.

  15. A Hybrid Image Filtering Method for Computer-Aided Detection of Microcalcification Clusters in Mammograms

    Directory of Open Access Journals (Sweden)

    Xiaoyong Zhang

    2013-01-01

    Full Text Available The presence of microcalcification clusters (MCs in mammogram is a major indicator of breast cancer. Detection of an MC is one of the key issues for breast cancer control. In this paper, we present a highly accurate method based on a morphological image processing and wavelet transform technique to detect the MCs in mammograms. The microcalcifications are firstly enhanced by using multistructure elements morphological processing. Then, the candidates of microcalcifications are refined by a multilevel wavelet reconstruction approach. Finally, MCs are detected based on their distributions feature. Experiments are performed on 138 clinical mammograms. The proposed method is capable of detecting 92.9% of true microcalcification clusters with an average of 0.08 false microcalcification clusters detected per image.

  16. A New Method to Quantify X-ray Substructures in Clusters of Galaxies

    CERN Document Server

    Andrade-Santos, Felipe; Laganá, Tatiana Ferraz

    2011-01-01

    We present a new method to quantify substructures in clusters of galaxies, based on the analysis of the intensity of structures. This analysis is done in a residual image that is the result of the subtraction of a surface brightness model, obtained by fitting a two-dimensional analytical model (beta-model or S\\'ersic profile) with elliptical symmetry, from the X-ray image. Our method is applied to 34 clusters observed by the Chandra Space Telescope that are in the redshift range 0.02method and the relations between the substructure level with physical quantities, such as the mass, X-ray luminosity, temperature, and cluster redshift. We use our method to separate the clusters in two sub-samples of high and low substructure levels. We conclude, using Monte Carlo simulations, that the method recuperates very well the true amount of substructure for small angular core radii clusters (with respect to the whole image s...

  17. An Ontology-based Knowledge Management System for Industry Clusters

    CERN Document Server

    Sureephong, Pradorn; Ouzrout, Yacine; Bouras, Abdelaziz

    2008-01-01

    Knowledge-based economy forces companies in the nation to group together as a cluster in order to maintain their competitiveness in the world market. The cluster development relies on two key success factors which are knowledge sharing and collaboration between the actors in the cluster. Thus, our study tries to propose knowledge management system to support knowledge management activities within the cluster. To achieve the objectives of this study, ontology takes a very important role in knowledge management process in various ways; such as building reusable and faster knowledge-bases, better way for representing the knowledge explicitly. However, creating and representing ontology create difficulties to organization due to the ambiguity and unstructured of source of knowledge. Therefore, the objectives of this paper are to propose the methodology to create and represent ontology for the organization development by using knowledge engineering approach. The handicraft cluster in Thailand is used as a case stu...

  18. Neural network based cluster creation in the ATLAS silicon Pixel Detector

    CERN Document Server

    Perez Cavalcanti, T; The ATLAS collaboration

    2012-01-01

    The hit signals read out from pixels on planar semi-conductor sensors are grouped into clusters, to reconstruct the location where a charged particle passed through. The resolution of the individual pixel sizes can be improved significantly using the information from the cluster of adjacent pixels. Such analog cluster creation techniques have been used by the ATLAS experiment for many years giving an excellent performance. However, in dense environments, such as those inside high-energy jets, is likely that the charge deposited by two or more close-by tracks merges into one single cluster. A new pattern recognition algorithm based on neural network methods has been developed for the ATLAS Pixel Detector. This can identify the shared clusters, split them if necessary, and estimate the positions of all particles traversing the cluster. The algorithm significantly reduces ambiguities in the assignment of pixel detector measurements to tracks within jets, and improves the positional accuracy with respect to stand...

  19. CACONET: Ant Colony Optimization (ACO) Based Clustering Algorithm for VANET.

    Science.gov (United States)

    Aadil, Farhan; Bajwa, Khalid Bashir; Khan, Salabat; Chaudary, Nadeem Majeed; Akram, Adeel

    2016-01-01

    A vehicular ad hoc network (VANET) is a wirelessly connected network of vehicular nodes. A number of techniques, such as message ferrying, data aggregation, and vehicular node clustering aim to improve communication efficiency in VANETs. Cluster heads (CHs), selected in the process of clustering, manage inter-cluster and intra-cluster communication. The lifetime of clusters and number of CHs determines the efficiency of network. In this paper a Clustering algorithm based on Ant Colony Optimization (ACO) for VANETs (CACONET) is proposed. CACONET forms optimized clusters for robust communication. CACONET is compared empirically with state-of-the-art baseline techniques like Multi-Objective Particle Swarm Optimization (MOPSO) and Comprehensive Learning Particle Swarm Optimization (CLPSO). Experiments varying the grid size of the network, the transmission range of nodes, and number of nodes in the network were performed to evaluate the comparative effectiveness of these algorithms. For optimized clustering, the parameters considered are the transmission range, direction and speed of the nodes. The results indicate that CACONET significantly outperforms MOPSO and CLPSO. PMID:27149517

  20. A Density Based Dynamic Data Clustering Algorithm based on Incremental Dataset

    Directory of Open Access Journals (Sweden)

    K. R.S. Kumar

    2012-01-01

    Full Text Available Problem statement: Clustering and visualizing high-dimensional dynamic data is a challenging problem. Most of the existing clustering algorithms are based on the static statistical relationship among data. Dynamic clustering is a mechanism to adopt and discover clusters in real time environments. There are many applications such as incremental data mining in data warehousing applications, sensor network, which relies on dynamic data clustering algorithms. Approach: In this work, we present a density based dynamic data clustering algorithm for clustering incremental dataset and compare its performance with full run of normal DBSCAN, Chameleon on the dynamic dataset. Most of the clustering algorithms perform well and will give ideal performance with good accuracy measured with clustering accuracy, which is calculated using the original class labels and the calculated class labels. However, if we measure the performance with a cluster validation metric, then it will give another kind of result. Results: This study addresses the problems of clustering a dynamic dataset in which the data set is increasing in size over time by adding more and more data. So to evaluate the performance of the algorithms, we used Generalized Dunn Index (GDI, Davies-Bouldin index (DB as the cluster validation metric and as well as time taken for clustering. Conclusion: In this study, we have successfully implemented and evaluated the proposed density based dynamic clustering algorithm. The performance of the algorithm was compared with Chameleon and DBSCAN clustering algorithms. The proposed algorithm performed significantly well in terms of clustering accuracy as well as speed.

  1. COMPOSITE PEDAGOGICAL STAFF-CLUSTERS AS A CONDITION OF DEPARTMENT EDUCATIONAL AND METHODICAL WORK DEVELOPMENT

    OpenAIRE

    Vladimir A. Fedorov; Aleksandr V. Stepanov; Tatyana M. Stepanova

    2015-01-01

    The aim of the investigation is to justify urgency and efficiency of teachers’ collective work of the chair on formation of integrated special professional competences of trainees.Methods. Teamwork is considered as a leading method of aim realisation; it is suggested to carry this method in practice on the basis of a situational technique of intra-chair structure formation – composite staff-clusters, whose activity is based on synergetic interaction of educational process participants.Results...

  2. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth.

    Science.gov (United States)

    Zhang, Zhaoyang; Fang, Hua; Wang, Honggang

    2016-06-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services. PMID:27126063

  3. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth.

    Science.gov (United States)

    Zhang, Zhaoyang; Fang, Hua; Wang, Honggang

    2016-06-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services.

  4. MBA-LF: A NEW DATA CLUSTERING METHOD USING MODIFIED BAT ALGORITHM AND LEVY FLIGHT

    Directory of Open Access Journals (Sweden)

    R. Jensi

    2015-10-01

    Full Text Available Data clustering plays an important role in partitioning the large set of data objects into known/unknown number of groups or clusters so that the objects in each cluster are having high degree of similarity while objects in different clusters are dissimilar to each other. Recently a number of data clustering methods are explored by using traditional methods as well as nature inspired swarm intelligence algorithms. In this paper, a new data clustering method using modified bat algorithm is presented. The experimental results show that the proposed algorithm is suitable for data clustering in an efficient and robust way.

  5. Fast Kernel Density Estimate Theorem and Scaling up Graph-based Relaxed Clustering Method%快速核密度估计定理和大规模图论松弛聚类方法

    Institute of Scientific and Technical Information of China (English)

    钱鹏江; 王士同; 邓赵红

    2011-01-01

    In this paper, the fast kernel density estimate (FKDE) theorem is presented firstly, which points out that the integrated squared error between the Gaussian kernel based KDE of the whole dataset and the one of a sampled subset is related to the sample size and the kernel width, but not to the size of the whole dataset. Next, it is deduced that the objective function of graph-based relaxed clustering (GRC) algorithm based on Gaussian kernel can be represented as two parts: weight sum of Parzen window (PW) and "quadratic entropy", that is, GRC can also be viewed as a KDE problem. So the scaling up GRC by KDE approximation (SUGRC-KDEA) method is proposed according to the FKDE theorem. Compared with the previous work, the advantage of this method lies in that it provides an easier and more straightforward implementation for GRC on large datasets.%首先证明了快速核密度估计(Fast kernel density estimate,FKDE)定理:基于抽样子集的高斯核密度估计(KDE)与原数据集的KDE间的误差与抽样容量和核参数相关,而与总样本容量无关.接着本文揭示了基于高斯核形式的图论松弛聚类(Graph-based relaxed clustering,GRC)算法的目标表达式可分解成“Parzen窗加权和+平方熵”的形式,即此时GRC可视作一个核密度估计问题,这样基于KDE近似策略,本文提出了大规模图论松弛聚类方法(Scaling up GRC by KDEapproximation,SUGRC-KDEA).较之先前的工作,这一方法的优势在于为GRC作用于大规模数据集提供了更简单和易于实现的方案.

  6. Density-Based Clustering with Geographical Background Constraints Using a Semantic Expression Model

    Directory of Open Access Journals (Sweden)

    Qingyun Du

    2016-05-01

    Full Text Available A semantics-based method for density-based clustering with constraints imposed by geographical background knowledge is proposed. In this paper, we apply an ontological approach to the DBSCAN (Density-Based Geospatial Clustering of Applications with Noise algorithm in the form of knowledge representation for constraint clustering. When used in the process of clustering geographic information, semantic reasoning based on a defined ontology and its relationships is primarily intended to overcome the lack of knowledge of the relevant geospatial data. Better constraints on the geographical knowledge yield more reasonable clustering results. This article uses an ontology to describe the four types of semantic constraints for geographical backgrounds: “No Constraints”, “Constraints”, “Cannot-Link Constraints”, and “Must-Link Constraints”. This paper also reports the implementation of a prototype clustering program. Based on the proposed approach, DBSCAN can be applied with both obstacle and non-obstacle constraints as a semi-supervised clustering algorithm and the clustering results are displayed on a digital map.

  7. Cluster-based spectrum sensing for cognitive radios with imperfect channel to cluster-head

    KAUST Repository

    Ben Ghorbel, Mahdi

    2012-04-01

    Spectrum sensing is considered as the first and main step for cognitive radio systems to achieve an efficient use of spectrum. Cooperation and clustering among cognitive radio users are two techniques that can be employed with spectrum sensing in order to improve the sensing performance by reducing miss-detection and false alarm. In this paper, within the framework of a clustering-based cooperative spectrum sensing scheme, we study the effect of errors in transmitting the local decisions from the secondary users to the cluster heads (or the fusion center), while considering non-identical channel conditions between the secondary users. Closed-form expressions for the global probabilities of detection and false alarm at the cluster head are derived. © 2012 IEEE.

  8. Microphone Clustering and BP Network based Acoustic Source Localization in Distributed Microphone Arrays

    Directory of Open Access Journals (Sweden)

    CHEN, Z.

    2013-11-01

    Full Text Available A microphone clustering and back propagation (BP neural network based acoustic source localization method using distributed microphone arrays in an intelligent meeting room is proposed. In the proposed method, a novel clustering algorithm is first used to divide all microphones into several clusters where each one corresponds to a specified BP network. Afterwards, the energy-based cluster selecting scheme is applied to select clusters which are small and close to the source. In each chosen cluster, the time difference of arrival of each microphone pair is estimated, and then all estimated time delays act as input of the corresponding BP network for position estimation. Finally, all estimated positions from the chosen clusters are fused for global position estimation. Only subsets rather than all the microphones are responsible for acoustic source localization, which leads to less computational cost; moreover, the local estimation in each selected cluster can be processed in parallel, which expects to improve the localization speed potentially. Simulation results from comparison with other related localization approaches confirm the validity of the proposed method.

  9. A ROBUST CLUSTER HEAD SELECTION BASED ON NEIGHBORHOOD CONTRIBUTION AND AVERAGE MINIMUM POWER FOR MANETs

    Directory of Open Access Journals (Sweden)

    S.Balaji

    2015-06-01

    Full Text Available Mobile Adhoc network is an instantaneous wireless network that is dynamic in nature. It supports single hop and multihop communication. In this infrastructure less network, clustering is a significant model to maintain the topology of the network. The clustering process includes different phases like cluster formation, cluster head selection, cluster maintenance. Choosing cluster head is important as the stability of the network depends on well-organized and resourceful cluster head. When the node has increased number of neighbors it can act as a link between the neighbor nodes which in further reduces the number of hops in multihop communication. Promisingly the node with more number of neighbors should also be available with enough energy to provide stability in the network. Hence these aspects demand the focus. In weight based cluster head selection, closeness and average minimum power required is considered for purging the ineligible nodes. The optimal set of nodes selected after purging will compete to become cluster head. The node with maximum weight selected as cluster head. Mathematical formulation is developed to show the proposed method provides optimum result. It is also suggested that weight factor in calculating the node weight should give precise importance to energy and node stability.

  10. Gaussian Kernel Based Fuzzy C-Means Clustering Algorithm for Image Segmentation

    Directory of Open Access Journals (Sweden)

    Rehna Kalam

    2016-04-01

    Full Text Available Image processing is an important research area in c omputer vision. clustering is an unsupervised study. clustering can also be used for image segmen tation. there exist so many methods for image segmentation. image segmentation plays an importan t role in image analysis.it is one of the first and the most important tasks in image analysis and computer vision. this proposed system presents a variation of fuzzy c-means algorithm tha t provides image clustering. the kernel fuzzy c-means clustering algorithm (kfcm is derived from the fuzzy c-means clustering algorithm(fcm.the kfcm algorithm that provides ima ge clustering and improves accuracy significantly compared with classical fuzzy c-means algorithm. the new algorithm is called gaussian kernel based fuzzy c-means clustering algo rithm (gkfcmthe major characteristic of gkfcm is the use of a fuzzy clustering approach ,ai ming to guarantee noise insensitiveness and image detail preservation.. the objective of the wo rk is to cluster the low intensity in homogeneity area from the noisy images, using the clustering me thod, segmenting that portion separately using content level set approach. the purpose of designin g this system is to produce better segmentation results for images corrupted by noise, so that it c an be useful in various fields like medical image analysis, such as tumor detection, study of anatomi cal structure, and treatment planning.

  11. Richness-based masses of rich and famous galaxy clusters

    Science.gov (United States)

    Andreon, S.

    2016-03-01

    We present a catalog of galaxy cluster masses derived by exploiting the tight correlation between mass and richness, i.e., a properly computed number of bright cluster galaxies. The richness definition adopted in this work is properly calibrated, shows a small scatter with mass, and has a known evolution, which means that we can estimate accurate (0.16 dex) masses more precisely than by adopting any other richness estimates or X-ray or SZ-based proxies based on survey data. We measured a few hundred galaxy clusters at 0.05 html

  12. KFDA and clustering based multiclass SVM for intrusion detection

    Institute of Scientific and Technical Information of China (English)

    WEI Yu-xin; WU Mu-qing

    2008-01-01

    To improve the classification accuracy and reducethe training time, an intrusion detection technology is proposed,which combines feature extraction technology and multiclasssupport vector machine (SVM) classification algorithm. Theintrusion detection model setup has two phases. The first phaseis to project the original training data into kernel fisherdiscriminant analysis (KFDA) space. The second phase is to usefuzzy clustering technology to cluster the projected data andconstruct the decision tree, based on the clustering results. Theoverall detection model is set up based on the decision tree.Results of the experiment using knowledge discovery and datamining (KDD) from 99 datasets demonstrate that the proposedtechnology can be an an effective way for intrusion detection.

  13. Clustering and rule-based classifications of chemical structures evaluated in the biological activity space.

    Science.gov (United States)

    Schuffenhauer, Ansgar; Brown, Nathan; Ertl, Peter; Jenkins, Jeremy L; Selzer, Paul; Hamon, Jacques

    2007-01-01

    Classification methods for data sets of molecules according to their chemical structure were evaluated for their biological relevance, including rule-based, scaffold-oriented classification methods and clustering based on molecular descriptors. Three data sets resulting from uniformly determined in vitro biological profiling experiments were classified according to their chemical structures, and the results were compared in a Pareto analysis with the number of classes and their average spread in the profile space as two concurrent objectives which were to be minimized. It has been found that no classification method is overall superior to all other studied methods, but there is a general trend that rule-based, scaffold-oriented methods are the better choice if classes with homogeneous biological activity are required, but a large number of clusters can be tolerated. On the other hand, clustering based on chemical fingerprints is superior if fewer and larger classes are required, and some loss of homogeneity in biological activity can be accepted.

  14. Risk Assessment for Bridges Safety Management during Operation Based on Fuzzy Clustering Algorithm

    OpenAIRE

    Xia Hanyu; Zhang Lijing; Tao Gang; Tong Bing; Zhang Haiou

    2016-01-01

    In recent years, large span and large sea-crossing bridges are built, bridges accidents caused by improper operational management occur frequently. In order to explore the better methods for risk assessment of the bridges operation departments, the method based on fuzzy clustering algorithm is selected. Then, the implementation steps of fuzzy clustering algorithm are described, the risk evaluation system is built, and Taizhou Bridge is selected as an example, the quantitation of risk factors ...

  15. Fuzzy Based Anomaly Intrusion Detection System for Clustered WSN

    OpenAIRE

    Sumathy Murugan; Sundara Rajan, M.

    2015-01-01

    In Wireless Sensor Networks (WSN), the intrusion detection technique may result in increased computational cost, packet loss, performance degradation and so on. In order to overcome these issues, in this study, we propose a fuzzy based anomaly intrusion detection system for clustered WSN. Initially the cluster heads are selected based on the parameters such as link quality, residual energy and coverage. Then the anomaly intrusion is detected using fuzzy logic technique. This technique conside...

  16. Fuzzy Logic Method for Enhancement Fault-Tolerant of Cluster Head in Wireless Sensor Networks Clustering

    Directory of Open Access Journals (Sweden)

    Farnaz Pakdeland

    2016-08-01

    Full Text Available Wireless sensor network is comprised of several sensor nodes. The retaining factors influence the network operation. In the clustering structure the cluster head failure can cause loss of information.The aim of this paper is to increase tolerance error in the cluster head node. At first, paying attention to the producing balance in the density of the cluster cause to postpone the death time of the cluster head node and lessen the collision due to the lack of the energy balance in clusters. The innovation in this stage is formed by using two fuzzy logic systems. One in the phase of evaluation of the cluster head chance, and the other in the phase of producing balance and the nodes migration to the qualified clusters to increase balance, Then the focus is on recognizing and repairing the cluster head fault.

  17. Efficient Cluster Head Selection Methods for Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Jong-Shin Chen

    2010-08-01

    Full Text Available The past few years have witnessed increased in the potential use of wireless sensor network (WSN such as disaster management, combat field reconnaissance, border protection and security surveillance. Sensors in these applications are expected to be remotely deployed in large numbers and to operate autonomously in unattended environments. Since a WSN is composed of nodes with nonreplenishable energy resource, elongating the network lifetime is the main concern. To support scalability, nodes are often grouped into disjoint clusters. Each cluster would have a leader, often referred as cluster head (CH. A CH is responsible for not only the general request but also assisting the general nodes to route the sensed data to the target nodes. The power-consumption of a CH is higher then of a general (non-CH node. Therefore, the CH selection will affect the lifetime of a WSN. However, the application scenario contexts of WSNs that determine the definitions of lifetime will impact to achieve the objective of elongating lifetime. In this study, we classify the lifetime into different types and give the corresponding CH selection method to achieve the life-time extension objective. Simulation results demonstrate our study can enlarge the life-time for different requests of the sensor networks.

  18. 基于灰色聚类方法的慢性心力衰竭中医证型文献分析%Grey Clustering Method Based on Traditional Chinese Medicine Literature of Chronic Heart Failure Analysis

    Institute of Scientific and Technical Information of China (English)

    刘宾; 王付; 黄明宜

    2011-01-01

    Objective:CHF syndrome differentiation of the literature was collated and analyzed to explore the CHF syndrome differentiation type of objective laws.Method: Over the past 10 years, CHF Syndromes of literature,theoretical calculations using gray card type data series on gray and clustering.Result: 13 permits will be based,clustering is: Heart Qi and Yin deficiency, yang deficiency of water pan, yin yang dried off, blood stasis, phlegm.Conclusion: CHF traditional Chinese medicine(TCM) Syndrome Differentiation of the objective law, for the further development of CHF TCM diagnostic criteria and evaluation criteria provides efficacy and references.%目的:对慢性心力衰竭(CHF)辨证分型的文献报道进行了整理和分析,探讨CHF中医辨证分型的客观规律.方法:收集近10年CHF辨证分型的文献报道,采用灰色系统理论计算证型数据序列的灰色绝对关联度并进行聚类.结果:将得到的13个证型,聚类为心气阴两虚,阳虚水泛,阴竭阳脱,血瘀,痰阻.结论:探讨CHF中医辨证分型的客观规律,为进一步制定CHF中医证候诊断标准和疗效评价标准提供了参考和依据.

  19. ENERGY EFFICIENT HIERARCHICAL CLUSTER-BASED ROUTING FOR WIRELESS SENSOR NETWORKS

    OpenAIRE

    Shideh Sadat Shirazi; Aboulfazl Torqi Haqiqat

    2015-01-01

    In this paper we propose an energy efficient routing algorithm based on hierarchical clustering in wireless sensor networks (WSNs).This algorithm decreases the energy consumption of nodes and helps to increase the lifetime of sensor networks. To achieve this goal, this research network is divided into 4segments that lead to uniform energy consumption among sensor nodes. We also propose a multi-step clustering method to send and receive data from nodes to the base station. The s...

  20. Improving Energy Efficient Clustering Method for Wireless Sensor Network

    Directory of Open Access Journals (Sweden)

    Md. Imran Hossain

    2013-08-01

    Full Text Available Wireless sensor networks have recently emerged as important computing platform. These sensors are power-limited and have limited computing resources. Therefore the sensor energy has to be managed wisely in order to maximize the lifetime of the network. Simply speaking, LEACH requires the knowledge of energy for every node in the network topology used. In LEACHs threshold which selects the cluster head is fixed so this protocol does not consider network topology environments. We proposed IELP algorithm, which selects cluster heads using different thresholds. New cluster head selection probability consists of the initial energy and the number of neighbor nodes. On rotation basis, a head-set member receives data from the neighboring nodes and transmits the aggregated results to the distant base station. For a given number of data collecting sensor nodes, the number of control and management nodes can be systematically adjusted to reduce the energy consumption, which increases the network life.The simulation results show that the performance of IELP has an improvement of 39% over LEACH and 20% over SEP in the area of 100m*100m for m=0.1, α =2 where advanced nodes (m and the additional energy factor between advanced and normal nodes (α.

  1. Exploitation of semantic methods to cluster pharmacovigilance terms.

    Science.gov (United States)

    Dupuch, Marie; Dupuch, Laëtitia; Hamon, Thierry; Grabar, Natalia

    2014-01-01

    Pharmacovigilance is the activity related to the collection, analysis and prevention of adverse drug reactions (ADRs) induced by drugs. This activity is usually performed within dedicated databases (national, European, international...), in which the ADRs declared for patients are usually coded with a specific controlled terminology MedDRA (Medical Dictionary for Drug Regulatory Activities). Traditionally, the detection of adverse drug reactions is performed with data mining algorithms, while more recently the groupings of close ADR terms are also being exploited. The Standardized MedDRA Queries (SMQs) have become a standard in pharmacovigilance. They are created manually by international boards of experts with the objective to group together the MedDRA terms related to a given safety topic. Within the MedDRA version 13, 84 SMQs exist, although several important safety topics are not yet covered. The objective of our work is to propose an automatic method for assisting the creation of SMQs using the clustering of semantically close MedDRA terms. The experimented method relies on semantic approaches: semantic distance and similarity algorithms, terminology structuring methods and term clustering. The obtained results indicate that the proposed unsupervised methods appear to be complementary for this task, they can generate subsets of the existing SMQs and make this process systematic and less time consuming. PMID:24739596

  2. Comparing Methods for segmentation of Microcalcification Clusters in Digitized Mammograms

    CERN Document Server

    Moradmand, Hajar; Targhi, Hossein Khazaei

    2012-01-01

    The appearance of microcalcifications in mammograms is one of the early signs of breast cancer. So, early detection of microcalcification clusters (MCCs) in mammograms can be helpful for cancer diagnosis and better treatment of breast cancer. In this paper a computer method has been proposed to support radiologists in detection MCCs in digital mammography. First, in order to facilitate and improve the detection step, mammogram images have been enhanced with wavelet transformation and morphology operation. Then for segmentation of suspicious MCCs, two methods have been investigated. The considered methods are: adaptive threshold and watershed segmentation. Finally, the detected MCCs areas in different algorithms will be compared to find out which segmentation method is more appropriate for extracting MCCs in mammograms.

  3. Clustering economies based on multiple criteria decision making techniques

    Directory of Open Access Journals (Sweden)

    Mansour Momeni

    2011-10-01

    Full Text Available One of the primary concerns on many countries is to determine different important factors affecting economic growth. In this paper, we study some factors such as unemployment rate, inflation ratio, population growth, average annual income, etc to cluster different countries. The proposed model of this paper uses analytical hierarchy process (AHP to prioritize the criteria and then uses a K-mean technique to cluster 59 countries based on the ranked criteria into four groups. The first group includes countries with high standards such as Germany and Japan. In the second cluster, there are some developing countries with relatively good economic growth such as Saudi Arabia and Iran. The third cluster belongs to countries with faster rates of growth compared with the countries located in the second group such as China, India and Mexico. Finally, the fourth cluster includes countries with relatively very low rates of growth such as Jordan, Mali, Niger, etc.

  4. Relation Based Mining Model for Enhancing Web Document Clustering

    Directory of Open Access Journals (Sweden)

    M.Reka

    2014-05-01

    Full Text Available The design of web Information management system becomes more complex one with more time complexity. Information retrieval is a difficult task due to the huge volume of web documents. The way of clustering makes the retrieval easier and less time consuming. Thisalgorithm introducesa web document clustering approach, which use the semantic relation between documents, which reduces the time complexity. It identifies the relations and concepts in a document and also computes the relation score between documents. This algorithm analyses the key concepts from the web documents by preprocessing, stemming, and stop word removal. Identified concepts are used to compute the document relation score and clusterrelation score. The domain ontology is used to compute the document relation score and cluster relation score. Based on the document relation score and cluster relation score, the web document cluster is identified. This algorithm uses 2,00,000 web documents for evaluation and 60 percentas trainingset and 40 percent as testing set.

  5. Neural network based cluster creation in the ATLAS Pixel Detector

    CERN Document Server

    Andreazza, A; The ATLAS collaboration

    2012-01-01

    The read-out from individual pixels on planar semi-conductor sensors are grouped into clusters to reconstruct the location where a charged particle passed through the sensor. The resolution given by individual pixel sizes is significantly improved by using the information from the charge sharing be- tween pixels. Such analog cluster creation techniques have been used by the ATLAS experiment for many years to obtain an excellent performance. How- ever, in dense environments, such as those inside high-energy jets, clusters have an increased probability of merging the charge deposited by multiple particles. Recently, a neural network based algorithm which estimates both the cluster position and whether a cluster should be split has been developed for the ATLAS Pixel Detector. The algorithm significantly reduces ambigui- ties in the assignment of pixel detector measurement to tracks and improves the position accuracy with respect to standard techniques by taking into account the 2-dimensional charge distribution.

  6. An Efficient Initialization Method for K-Means Clustering of Hyperspectral Data

    Science.gov (United States)

    Alizade Naeini, A.; Jamshidzadeh, A.; Saadatseresht, M.; Homayouni, S.

    2014-10-01

    K-means is definitely the most frequently used partitional clustering algorithm in the remote sensing community. Unfortunately due to its gradient decent nature, this algorithm is highly sensitive to the initial placement of cluster centers. This problem deteriorates for the high-dimensional data such as hyperspectral remotely sensed imagery. To tackle this problem, in this paper, the spectral signatures of the endmembers in the image scene are extracted and used as the initial positions of the cluster centers. For this purpose, in the first step, A Neyman-Pearson detection theory based eigen-thresholding method (i.e., the HFC method) has been employed to estimate the number of endmembers in the image. Afterwards, the spectral signatures of the endmembers are obtained using the Minimum Volume Enclosing Simplex (MVES) algorithm. Eventually, these spectral signatures are used to initialize the k-means clustering algorithm. The proposed method is implemented on a hyperspectral dataset acquired by ROSIS sensor with 103 spectral bands over the Pavia University campus, Italy. For comparative evaluation, two other commonly used initialization methods (i.e., Bradley & Fayyad (BF) and Random methods) are implemented and compared. The confusion matrix, overall accuracy and Kappa coefficient are employed to assess the methods' performance. The evaluations demonstrate that the proposed solution outperforms the other initialization methods and can be applied for unsupervised classification of hyperspectral imagery for landcover mapping.

  7. Improving Tensor Based Recommenders with Clustering

    DEFF Research Database (Denmark)

    Leginus, Martin; Dolog, Peter; Zemaitis, Valdas

    2012-01-01

    Social tagging systems (STS) model three types of entities (i.e. tag-user-item) and relationships between them are encoded into a 3-order tensor. Latent relationships and patterns can be discovered by applying tensor factorization techniques like Higher Order Singular Value Decomposition (HOSVD...... of the recommendations and execution time are improved and memory requirements are decreased. The clustering is motivated by the fact that many tags in a tag space are semantically similar thus the tags can be grouped. Finally, promising experimental results are presented...

  8. Cluster-based DBMS Management Tool with High-Availability

    Directory of Open Access Journals (Sweden)

    Jae-Woo Chang

    2005-02-01

    Full Text Available A management tool which is needed for monitoring and managing cluster-based DBMSs has been little studied. So, we design and implement a cluster-based DBMS management tool with high-availability that monitors the status of nodes in a cluster system as well as the status of DBMS instances in a node. The tool enables users to recognize a single virtual system image and provides them with the status of all the nodes and resources in the system by using a graphic user interface (GUI. By using a load balancer, our management tool can increase the performance of a cluster-based DBMS as well as can overcome the limitation of the existing parallel DBMSs.

  9. Improved Density Based Spatial Clustering of Applications of Noise Clustering Algorithm for Knowledge Discovery in Spatial Data

    Directory of Open Access Journals (Sweden)

    Arvind Sharma

    2016-01-01

    Full Text Available There are many techniques available in the field of data mining and its subfield spatial data mining is to understand relationships between data objects. Data objects related with spatial features are called spatial databases. These relationships can be used for prediction and trend detection between spatial and nonspatial objects for social and scientific reasons. A huge data set may be collected from different sources as satellite images, X-rays, medical images, traffic cameras, and GIS system. To handle this large amount of data and set relationship between them in a certain manner with certain results is our primary purpose of this paper. This paper gives a complete process to understand how spatial data is different from other kinds of data sets and how it is refined to apply to get useful results and set trends to predict geographic information system and spatial data mining process. In this paper a new improved algorithm for clustering is designed because role of clustering is very indispensable in spatial data mining process. Clustering methods are useful in various fields of human life such as GIS (Geographic Information System, GPS (Global Positioning System, weather forecasting, air traffic controller, water treatment, area selection, cost estimation, planning of rural and urban areas, remote sensing, and VLSI designing. This paper presents study of various clustering methods and algorithms and an improved algorithm of DBSCAN as IDBSCAN (Improved Density Based Spatial Clustering of Application of Noise. The algorithm is designed by addition of some important attributes which are responsible for generation of better clusters from existing data sets in comparison of other methods.

  10. Energy Band Based Clustering Protocol for Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Prabhat Kumar

    2012-07-01

    Full Text Available Clustering is one of the widely used techniques to prolong the lifetime of wireless sensor networks in environments where battery replacement of individual sensor nodes is not an option after their deployment. However, clustering overheads such as cluster formation, its size, cluster head selection rotation, directly affects the lifetime of WSN. This paper introduces and analyzes a new Single Hop Energy Band Based clustering protocol (EBBCP which tries to minimize the above said overheads resulting in a prolonged life for the WSN. EBBCP works on static clusters formed on the basis of energy band in the setup phase. The protocol reduces per round overhead of cluster formation which has been proved by the simulation result in MATLAB. The paper contains an in-depth analysis of the results obtained during simulation and compares EBBCP with LEACH. Unlike LEACH, EBBCP achieves evenly distributed Cluster Head throughout the target area. This protocol also produces evenly distributed dead nodes. EEBCP beats LEACH in total data packet received and produces better network life time. EBBCP uses the concept of grid node to eliminate the need of position finding system like GPS to estimating the transmission signal strength.

  11. Nonparametric Cognitive Diagnosis:A Cluster Diagnostic Method Based on Grade Response Items%非参数认知诊断方法:多级评分的聚类分析

    Institute of Scientific and Technical Information of China (English)

    康春花; 任平; 曾平飞

    2015-01-01

    基于属性合分和聚类分析的思想提出了适用于多级评分的聚类分析方法,同时探讨了属性层次结构、样本容量和失误率对该方法判准率的影响。研究发现:(1)该方法在各种试验情境下均有较高的模式判准率和边际判准率;(2)判准率不依赖样本容量的大小,使其可适用于小型测评及课堂评估;(3)判准率受属性层次紧密度影响较小;(4)该方法在实践情境中表现出较好的内外部效度。%Examinations help students learn more efficiently by filling their learning gaps. To achieve this goal, we have to differentiate students who have from those who have not mastered a set of attributes as measured by the test through cognitive diagnostic assessment. K-means cluster analysis, being a nonparametric cognitive diagnosis method requires the Q-matrix only, which reflects the relationship between attributes and items. This does not require the estimation of the parameters, so is independent of sample size, simple to operate, and easy to understand. Previous research use the sum score vectors or capability scores vector as the clustering objects. These methods are only adaptive for dichotomous data. Structural response items are, however, the main type used in examinations, particularly as required in recent reforms. On the basis of previous research, this paper puts forward a method to calculate a capability matrix reflecting the mastery level on skills and is applicable to grade response items. Our study included four parts. First, we introduced the K-means cluster diagnosis method which has been adapted for dichotomous data. Second, we expanded the K-means cluster diagnosis method for grade response data (GRCDM). Third, in Part Two, we investigated the performance of the method introduced using a simulation study. Fourth, we investigated the performance of the method in an empirical study. The simulation study focused on three factors. First, the sample size was

  12. Genetic Diversity among Parents of Hybrid Rice Based on Cluster Analysis of Morphological Traits and Simple Sequence Repeat Markers

    Institute of Scientific and Technical Information of China (English)

    WANG Sheng-jun; LU Zuo-mei; WAN Jian-min

    2006-01-01

    The genetic diversity of 41 parental lines popularized in commercial hybrid rice production in China was studied by using cluster analysis of morphological traits and simple sequence repeat (SSR) markers. Forty-one entries were assigned into two clusters (I.e. Early or medium-maturing cluster; medium or late-maturing cluster) and further assigned into six sub-clusters based on morphological trait cluster analysis. The early or medium-maturing cluster was composed of 15 maintainer lines, four early-maturing restorer lines and two thermo-sensitive genic male sterile lines, and the medium or late-maturing cluster included 16 restorer lines and 4 medium or late-maturing maintainer lines. Moreover, the SSR cluster analysis classified 41 entries into two clusters (I.e. Maintainer line cluster and restorer line cluster) and seven sub-clusters. The maintainer line cluster consisted of all 19 maintainer lines, two thermo-sensitive genic male sterile lines, while the restorer line cluster was composed of all 20 restorer lines. The SSR analysis fitted better with the pedigree information. From the views on hybrid rice breeding, the results suggested that SSR analysis might be a better method to study the diversity of parental lines in indica hybrid rice.

  13. A Study of Video Scenes Clustering Based on Shot Key Frames

    Institute of Scientific and Technical Information of China (English)

    CAI Bo; ZHANG Lu; ZHOU Dong-ru

    2005-01-01

    In digital video analysis, browse, retrieval and query, shot is incapable of meeting needs. Scene is a cluster of a series of shots, which partially meets above demands. In this paper, an algorithm of video scenes clustering based on shot key frame sets is proposed. We use X2 histogram match and twin histogram comparison for shot detection. A method is presented for key frame set extraction based on distance of non adjacent frames, further more, the minimum distance of key frame sets as distance of shots is computed, eventually scenes are clustered according to the distance of shots. Experiments of this algorithm show satisfactory performance in correctness and computing speed.

  14. Evidence-Based Clustering of Reads and Taxonomic Analysis of Metagenomic Data

    Science.gov (United States)

    Folino, Gianluigi; Gori, Fabio; Jetten, Mike S. M.; Marchiori, Elena

    The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. In this paper we focus on clustering methods and their application to taxonomic analysis of metagenomic data. Clustering analysis for metagenomics amounts to group similar partial sequences, such as raw sequence reads, into clusters in order to discover information about the internal structure of the considered dataset, or the relative abundance of protein families. Different methods for clustering analysis of metagenomic datasets have been proposed. Here we focus on evidence-based methods for clustering that employ knowledge extracted from proteins identified by a BLASTx search (proxygenes). We consider two clustering algorithms introduced in previous works and a new one. We discuss advantages and drawbacks of the algorithms, and use them to perform taxonomic analysis of metagenomic data. To this aim, three real-life benchmark datasets used in previous work on metagenomic data analysis are used. Comparison of the results indicates satisfactory coherence of the taxonomies output by the three algorithms, with respect to phylogenetic content at the class level and taxonomic distribution at phylum level. In general, the experimental comparative analysis substantiates the effectiveness of evidence-based clustering methods for taxonomic analysis of metagenomic data.

  15. Clustering in Very Large Databases Based on Distance and Density

    Institute of Scientific and Technical Information of China (English)

    QIAN WeiNing(钱卫宁); GONG XueQing(宫学庆); ZHOU AoYing(周傲英)

    2003-01-01

    Clustering in very large databases or data warehouses, with many applications in areas such as spatial computation, web information collection, pattern recognition and economic analysis, is a huge task that challenges data mining researches. Current clustering methods always have the problems: 1) scanning the whole database leads to high I/O cost and expensive maintenance (e.g., R*-tree); 2) pre-specifying the uncertain parameter k, with which clustering can only be refined by trial and test many times; 3) lacking high efficiency in treating arbitrary shape under very large data set environment. In this paper, we first present a new hybrid-clustering algorithm to solve these problems. This new algorithm, which combines both distance and density strategies,can handle any arbitrary shape clusters effectively. It makes full use of statistics information in mining to reduce the time complexity greatly while keeping good clustering quality. Furthermore,this algorithm can easily eliminate noises and identify outliers. An experimental evaluation is performed on a spatial database with this method and other popular clustering algorithms (CURE and DBSCAN). The results show that our algorithm outperforms them in terms of efficiency and cost, and even gets much more speedup as the data size scales up much larger.

  16. AN EFFICIENT INITIALIZATION METHOD FOR K-MEANS CLUSTERING OF HYPERSPECTRAL DATA

    Directory of Open Access Journals (Sweden)

    A. Alizade Naeini

    2014-10-01

    Full Text Available K-means is definitely the most frequently used partitional clustering algorithm in the remote sensing community. Unfortunately due to its gradient decent nature, this algorithm is highly sensitive to the initial placement of cluster centers. This problem deteriorates for the high-dimensional data such as hyperspectral remotely sensed imagery. To tackle this problem, in this paper, the spectral signatures of the endmembers in the image scene are extracted and used as the initial positions of the cluster centers. For this purpose, in the first step, A Neyman–Pearson detection theory based eigen-thresholding method (i.e., the HFC method has been employed to estimate the number of endmembers in the image. Afterwards, the spectral signatures of the endmembers are obtained using the Minimum Volume Enclosing Simplex (MVES algorithm. Eventually, these spectral signatures are used to initialize the k-means clustering algorithm. The proposed method is implemented on a hyperspectral dataset acquired by ROSIS sensor with 103 spectral bands over the Pavia University campus, Italy. For comparative evaluation, two other commonly used initialization methods (i.e., Bradley & Fayyad (BF and Random methods are implemented and compared. The confusion matrix, overall accuracy and Kappa coefficient are employed to assess the methods’ performance. The evaluations demonstrate that the proposed solution outperforms the other initialization methods and can be applied for unsupervised classification of hyperspectral imagery for landcover mapping.

  17. Comparing Methods for segmentation of Microcalcification Clusters in Digitized Mammograms

    Directory of Open Access Journals (Sweden)

    Hajar Moradmand

    2011-11-01

    Full Text Available The appearance of microcalcifications in mammograms is one of the early signs of breast cancer. So, early detection of microcalcification clusters (MCCs in mammograms can be helpful for cancer diagnosis and better treatment of breast cancer. In this paper a computer system devised to support a radiologist in detection MCCs in digital mammography has been proposed. First, to facilitate and improve detection step, the mammogram images have been enhanced with wavelet transformation and morphology operation. Then for segmentation of suspicious MCCs, two methods have been investigated. The considered methods are: adaptive threshold and Watershed segmentation. The purpose of this paper is to find out which segmentation method is more appropriate for extracting suspicious areas that contain MCCs in mammograms. Finally the MCCs detection areas in different algorithms will be compared.

  18. Cluster Development of Zhengzhou Urban Agriculture Based on Diamond Model

    Institute of Scientific and Technical Information of China (English)

    2012-01-01

    Based on basic theory of Diamond Model,this paper analyzes the competitive power of Zhengzhou urban agriculture from production factors,demand conditions,related and supporting industries,business strategies and structure,and horizontal competition.In line with these situations,it introduces that the cluster development is an effective approach to lifting competitive power of Zhengzhou urban agriculture.Finally,it presents following countermeasures and suggestions:optimize spatial distribution for cluster development of urban agriculture;cultivate leading enterprises and optimize organizational form of urban agriculture;energetically develop low-carbon agriculture to create favorable ecological environment for cluster development of urban agriculture.

  19. WORMHOLE ATTACK MITIGATION IN MANET: A CLUSTER BASED AVOIDANCE TECHNIQUE

    Directory of Open Access Journals (Sweden)

    Subhashis Banerjee

    2014-01-01

    Full Text Available A Mobile Ad-Hoc Network (MANET is a self configuring, infrastructure less network of mobile devices connected by wireless links. Loopholes like wireless medium, lack of a fixed infrastructure, dynamic topology, rapid deployment practices, and the hostile environments in which they may be deployed, make MANET vulnerable to a wide range of security attacks and Wormhole attack is one of them. During this attack a malicious node captures packets from one location in the network, and tunnels them to another colluding malicious node at a distant point, which replays them locally. This paper presents a cluster based Wormhole attack avoidance technique. The concept of hierarchical clustering with a novel hierarchical 32- bit node addressing scheme is used for avoiding the attacking path during the route discovery phase of the DSR protocol, which is considered as the under lying routing protocol. Pinpointing the location of the wormhole nodes in the case of exposed attack is also given by using this method.

  20. Neuro-fuzzy system modeling based on automatic fuzzy clustering

    Institute of Scientific and Technical Information of China (English)

    Yuangang TANG; Fuchun SUN; Zengqi SUN

    2005-01-01

    A neuro-fuzzy system model based on automatic fuzzy clustering is proposed.A hybrid model identification algorithm is also developed to decide the model structure and model parameters.The algorithm mainly includes three parts:1) Automatic fuzzy C-means (AFCM),which is applied to generate fuzzy rules automatically,and then fix on the size of the neuro-fuzzy network,by which the complexity of system design is reducesd greatly at the price of the fitting capability;2) Recursive least square estimation (RLSE).It is used to update the parameters of Takagi-Sugeno model,which is employed to describe the behavior of the system;3) Gradient descent algorithm is also proposed for the fuzzy values according to the back propagation algorithm of neural network.Finally,modeling the dynamical equation of the two-link manipulator with the proposed approach is illustrated to validate the feasibility of the method.

  1. Distance based (DBCP) Cluster Protocol for Heterogeneous Wireless Sensor Network

    OpenAIRE

    Kumar, Surender; Prateek, Manish; Bhushan, Bharat

    2014-01-01

    Clustering is an important concept to reduce the energy consumption and prolonging the life of a wireless sensor network. In heterogeneous wireless sensor network some of the nodes are equipped with more energy than the other nodes. Many routing algorithms are proposed for heterogeneous wireless sensor network. Stable Election Protocol (SEP) is one of the important protocol in this category. In this research paper a novel energy efficient distance based cluster protocol (DBCP) is proposed for...

  2. Energy efficient cluster-based routing in wireless sensor networks

    OpenAIRE

    Zeghilet, Houda; Badache, Nadjib; Maimour, Moufida

    2009-01-01

    Because of the lack of a global naming scheme, routing protocols in sensor networks usually use flooding to select paths and deliver data. This process although simple and effective, is very costly in terms of energy consumption, an important design issue in sensor networks routing protocols. Cluster-based routing is one solution to save energy. In this paper, we propose a combination of an improved clustering algorithm and directed diffusion, a well-known data-centric routing paradigm in sen...

  3. An Incremental Algorithm of Text Clustering Based on Semantic Sequences

    Institute of Scientific and Technical Information of China (English)

    FENG Zhonghui; SHEN Junyi; BAO Junpeng

    2006-01-01

    This paper proposed an incremental textclustering algorithm based on semantic sequence.Using similarity relation of semantic sequences and calculating the cover of similarity semantic sequences set, the candidate cluster with minimum entropy overlap value was selected as a result cluster every time in this algorithm.The comparison of experimental results shows that the precision of the algorithm is higher than other algorithms under same conditions and this is obvious especially on long documents set.

  4. Rank Based Clustering For Document Retrieval From Biomedical Databases

    Directory of Open Access Journals (Sweden)

    Jayanthi Manicassamy

    2009-09-01

    Full Text Available Now a day's, search engines are been most widely used for extracting information's from various resources throughout the world. Where, majority of searches lies in the field of biomedical for retrieving related documents from various biomedical databases. Currently search engines lacks in document clustering and representing relativeness level of documents extracted from the databases. In order to overcome these pitfalls a text based search engine have been developed for retrieving documents from Medline and PubMed biomedical databases. The search engine has incorporated page ranking bases clustering concept which automatically represents relativeness on clustering bases. Apart from this graph tree construction is made for representing the level of relatedness of the documents that are networked together. This advance functionality incorporation for biomedical document based search engine found to provide better results in reviewing related documents based on relativeness.

  5. Expanding Comparative Literature into Comparative Sciences Clusters with Neutrosophy and Quad-stage Method

    Directory of Open Access Journals (Sweden)

    Fu Yuhua

    2016-08-01

    Full Text Available By using Neutrosophy and Quad-stage Method, the expansions of comparative literature include: comparative social sciences clusters, comparative natural sciences clusters, comparative interdisciplinary sciences clusters, and so on. Among them, comparative social sciences clusters include: comparative literature, comparative history, comparative philosophy, and so on; comparative natural sciences clusters include: comparative mathematics, comparative physics, comparative chemistry, comparative medicine, comparative biology, and so on.

  6. Evaluation of sliding baseline methods for spatial estimation for cluster detection in the biosurveillance system

    Directory of Open Access Journals (Sweden)

    Leuze Michael

    2009-07-01

    Full Text Available Abstract Background The Centers for Disease Control and Prevention's (CDC's BioSense system provides near-real time situational awareness for public health monitoring through analysis of electronic health data. Determination of anomalous spatial and temporal disease clusters is a crucial part of the daily disease monitoring task. Our study focused on finding useful anomalies at manageable alert rates according to available BioSense data history. Methods The study dataset included more than 3 years of daily counts of military outpatient clinic visits for respiratory and rash syndrome groupings. We applied four spatial estimation methods in implementations of space-time scan statistics cross-checked in Matlab and C. We compared the utility of these methods according to the resultant background cluster rate (a false alarm surrogate and sensitivity to injected cluster signals. The comparison runs used a spatial resolution based on the facility zip code in the patient record and a finer resolution based on the residence zip code. Results Simple estimation methods that account for day-of-week (DOW data patterns yielded a clear advantage both in background cluster rate and in signal sensitivity. A 28-day baseline gave the most robust results for this estimation; the preferred baseline is long enough to remove daily fluctuations but short enough to reflect recent disease trends and data representation. Background cluster rates were lower for the rash syndrome counts than for the respiratory counts, likely because of seasonality and the large scale of the respiratory counts. Conclusion The spatial estimation method should be chosen according to characteristics of the selected data streams. In this dataset with strong day-of-week effects, the overall best detection performance was achieved using subregion averages over a 28-day baseline stratified by weekday or weekend/holiday behavior. Changing the estimation method for particular scenarios involving

  7. A Cluster-Based Fuzzy Fusion Algorithm for Event Detection in Heterogeneous Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    ZiQi Hao

    2015-01-01

    Full Text Available As limited energy is one of the tough challenges in wireless sensor networks (WSN, energy saving becomes important in increasing the lifecycle of the network. Data fusion enables combining information from several sources thus to provide a unified scenario, which can significantly save sensor energy and enhance sensing data accuracy. In this paper, we propose a cluster-based data fusion algorithm for event detection. We use k-means algorithm to form the nodes into clusters, which can significantly reduce the energy consumption of intracluster communication. Distances between cluster heads and event and energy of clusters are fuzzified, thus to use a fuzzy logic to select the clusters that will participate in data uploading and fusion. Fuzzy logic method is also used by cluster heads for local decision, and then the local decision results are sent to the base station. Decision-level fusion for final decision of event is performed by base station according to the uploaded local decisions and fusion support degree of clusters calculated by fuzzy logic method. The effectiveness of this algorithm is demonstrated by simulation results.

  8. Targets Separation and Imaging Method in Sparse Scene Based on Cluster Result of Range Profile Peaks%稀疏场景目标的距离像峰值聚类分割成像方法

    Institute of Scientific and Technical Information of China (English)

    2015-01-01

    This paper focuses on the synthetic aperture radar (SAR)imaging of space-sparse targets such as ships on the sea,and proposes a method of targets separation and imaging of sparse scene based on cluster result of range profile peaks.Firstly,wavelet de-noising algorithm is used to preprocess the original echo,and then the range profile at different viewing positions can be obtained by range compression and range migration correction.Peaks of the range profi les can be detected by the fast peak detection algorithm based on second order difference operator.Targets with sparse energy intervals can be imaged through azimuth compression after clustering of peaks in range dimension.What’s more,targets without coupling in range energy interval and direction synthetic aperture time can be imaged through azimuth compression after clustering of peaks both in range and direction dimension.Lastly,the effectiveness of the proposed method is validated by simulations.Results of experiment demonstrate that space-sparse targets such as ships can be imaged separately and completely with a small computation in azimuth compression, and the images are more beneficial for target recognition.%针对海面舰船等具有一定空间稀疏性的合成孔径雷达成像场景,提出了一种稀疏场景目标的距离像峰值聚类分割成像方法。首先采用小波降噪算法对 SAR 原始回波数据进行预处理,通过距离压缩和距离徙动校正获得不同观测位置的距离像,利用基于二阶差分算子的快速峰值检测算法检测距离像峰值点,对峰值检测结果距离维聚类后方位向成像,实现了距离向能量区间稀疏目标的分割成像;对峰值检测结果距离-方位二维聚类后方位向成像,实现了距离向能量区间与方位向合成孔径时间无耦合稀疏目标的分割成像。仿真结果表明,对海面舰船等具有空间稀疏性的成像场景,所提方法能够实现目标的有效分割成

  9. Detecting and extracting clusters in atom probe data: A simple, automated method using Voronoi cells

    International Nuclear Information System (INIS)

    The analysis of the formation of clusters in solid solutions is one of the most common uses of atom probe tomography. Here, we present a method where we use the Voronoi tessellation of the solute atoms and its geometric dual, the Delaunay triangulation to test for spatial/chemical randomness of the solid solution as well as extracting the clusters themselves. We show how the parameters necessary for cluster extraction can be determined automatically, i.e. without user interaction, making it an ideal tool for the screening of datasets and the pre-filtering of structures for other spatial analysis techniques. Since the Voronoi volumes are closely related to atomic concentrations, the parameters resulting from this analysis can also be used for other concentration based methods such as iso-surfaces. - Highlights: • Cluster analysis of atom probe data can be significantly simplified by using the Voronoi cell volumes of the atomic distribution. • Concentration fields are defined on a single atomic basis using Voronoi cells. • All parameters for the analysis are determined by optimizing the separation probability of bulk atoms vs clustered atoms

  10. Detecting and extracting clusters in atom probe data: A simple, automated method using Voronoi cells

    Energy Technology Data Exchange (ETDEWEB)

    Felfer, P., E-mail: peter.felfer@sydney.edu.au [Australian Centre for Microscopy and Microanalysis, The University of Sydney, NSW 2006 (Australia); School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006 (Australia); Ceguerra, A.V., E-mail: anna.ceguerra@sydney.edu.au [Australian Centre for Microscopy and Microanalysis, The University of Sydney, NSW 2006 (Australia); School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006 (Australia); Ringer, S.P., E-mail: simon.ringer@sydney.edu.au [Australian Centre for Microscopy and Microanalysis, The University of Sydney, NSW 2006 (Australia); School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006 (Australia); Cairney, J.M., E-mail: julie.cairney@sydney.edu.au [Australian Centre for Microscopy and Microanalysis, The University of Sydney, NSW 2006 (Australia); School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006 (Australia)

    2015-03-15

    The analysis of the formation of clusters in solid solutions is one of the most common uses of atom probe tomography. Here, we present a method where we use the Voronoi tessellation of the solute atoms and its geometric dual, the Delaunay triangulation to test for spatial/chemical randomness of the solid solution as well as extracting the clusters themselves. We show how the parameters necessary for cluster extraction can be determined automatically, i.e. without user interaction, making it an ideal tool for the screening of datasets and the pre-filtering of structures for other spatial analysis techniques. Since the Voronoi volumes are closely related to atomic concentrations, the parameters resulting from this analysis can also be used for other concentration based methods such as iso-surfaces. - Highlights: • Cluster analysis of atom probe data can be significantly simplified by using the Voronoi cell volumes of the atomic distribution. • Concentration fields are defined on a single atomic basis using Voronoi cells. • All parameters for the analysis are determined by optimizing the separation probability of bulk atoms vs clustered atoms.

  11. Authentication Based on Multilayer Clustering in Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    Suh Heyi-Sook

    2005-01-01

    Full Text Available In this paper, we describe a secure cluster-routing protocol based on a multilayer scheme in ad hoc networks. This work provides scalable, threshold authentication scheme in ad hoc networks. We present detailed security threats against ad hoc routing protocols, specifically examining cluster-based routing. Our proposed protocol, called "authentication based on multilayer clustering for ad hoc networks" (AMCAN, designs an end-to-end authentication protocol that relies on mutual trust between nodes in other clusters. The AMCAN strategy takes advantage of a multilayer architecture that is designed for an authentication protocol in a cluster head (CH using a new concept of control cluster head (CCH scheme. We propose an authentication protocol that uses certificates containing an asymmetric key and a multilayer architecture so that the CCH is achieved using the threshold scheme, thereby reducing the computational overhead and successfully defeating all identified attacks. We also use a more extensive area, such as a CCH, using an identification protocol to build a highly secure, highly available authentication service, which forms the core of our security framework.

  12. Two clusters of child molesters based on impulsiveness

    Directory of Open Access Journals (Sweden)

    Danilo A. Baltieri

    2015-06-01

    Full Text Available Objective:High impulsiveness is a general problem that affects most criminal offenders and is associated with greater recidivism risk. A cluster analysis of impulsiveness measured by the Barratt Impulsiveness Scale - Version 11 (BIS-11 was performed on a sample of hands-on child molesters.Methods:The sample consisted of 208 child molesters enrolled in two different sectional studies carried out in São Paulo, Brazil. Using three factors from the BIS-11, a k-means cluster analysis was performed using the average silhouette width to determine cluster number. Direct logistic regression was performed to analyze the association of criminological and clinical features with the resulting clusters.Results:Two clusters were delineated. The cluster characterized by higher impulsiveness showed higher scores on the Sexual Screening for Pedophilic Interests (SSPI, Static-99, and Sexual Addiction Screening Test.Conclusions:Given that child molesters are an extremely heterogeneous population, the “number of victims” item of the SSPI should call attention to those offenders with the highest motor, attentional, and non-planning impulsiveness. Our findings could have implications in terms of differences in therapeutic management for these two groups, with the most impulsive cluster benefitting from psychosocial strategies combined with pharmacological interventions.

  13. DSN Beowulf Cluster-Based VLBI Correlator

    Science.gov (United States)

    Rogstad, Stephen P.; Jongeling, Andre P.; Finley, Susan G.; White, Leslie A.; Lanyi, Gabor E.; Clark, John E.; Goodhart, Charles E.

    2009-01-01

    The NASA Deep Space Network (DSN) requires a broadband VLBI (very long baseline interferometry) correlator to process data routinely taken as part of the VLBI source Catalogue Maintenance and Enhancement task (CAT M&E) and the Time and Earth Motion Precision Observations task (TEMPO). The data provided by these measurements are a crucial ingredient in the formation of precision deep-space navigation models. In addition, a VLBI correlator is needed to provide support for other VLBI related activities for both internal and external customers. The JPL VLBI Correlator (JVC) was designed, developed, and delivered to the DSN as a successor to the legacy Block II Correlator. The JVC is a full-capability VLBI correlator that uses software processes running on multiple computers to cross-correlate two-antenna broadband noise data. Components of this new system (see Figure 1) consist of Linux PCs integrated into a Beowulf Cluster, an existing Mark5 data storage system, a RAID array, an existing software correlator package (SoftC) originally developed for Delta DOR Navigation processing, and various custom- developed software processes and scripts. Parallel processing on the JVC is achieved by assigning slave nodes of the Beowulf cluster to process separate scans in parallel until all scans have been processed. Due to the single stream sequential playback of the Mark5 data, some ramp-up time is required before all nodes can have access to required scan data. Core functions of each processing step are accomplished using optimized C programs. The coordination and execution of these programs across the cluster is accomplished using Pearl scripts, PostgreSQL commands, and a handful of miscellaneous system utilities. Mark5 data modules are loaded on Mark5 Data systems playback units, one per station. Data processing is started when the operator scans the Mark5 systems and runs a script that reads various configuration files and then creates an experiment-dependent status database

  14. Determination of the lipophilicity of Salvia miltiorrhiza Radix et Rhizoma (danshen root) ingredients by microemulsion liquid chromatography: optimization using cluster analysis and a linear solvation energy relationship-based method.

    Science.gov (United States)

    Li, Liangxing; Yang, Jianrui; Huang, Hongzhang; Xu, Liyuan; Gao, Chongkai; Li, Ning

    2016-07-01

    We evaluated 26 microemulsion liquid chromatography (MELC) systems for their potential as high-throughput screening platforms capable of modeling the partitioning behaviors of drug compounds in an n-octanol-water system, and for predicting the lipophilicity of those compounds (i.e. logP values). The MELC systems were compared by cluster analysis and a linear solvation energy relationship (LSER)-based method, and the optimal system was identified by comparing their Euclidean distances with the LSER coefficients. The most effective MELC system had a mobile phase consisting of 6.0% (w/w) Brij35 (a detergent), 6.6% (w/w) butanol, 0.8% (w/w) cyclohexane, 86.6% (w/w) buffer solution and 8 mm cetyltrimethyl ammonium bromide. The reliability of the established platform was confirmed by the agreement between the experimental data and the predicted values. The logP values of the ingredients of danshen root (Salvia miltiorrhiza Radix et Rhizoma) were then predicted. Copyright © 2015 John Wiley & Sons, Ltd. PMID:26490541

  15. Internet2-based 3D PET image reconstruction using a PC cluster.

    Science.gov (United States)

    Shattuck, D W; Rapela, J; Asma, E; Chatzioannou, A; Qi, J; Leahy, R M

    2002-08-01

    We describe an approach to fast iterative reconstruction from fully three-dimensional (3D) PET data using a network of PentiumIII PCs configured as a Beowulf cluster. To facilitate the use of this system, we have developed a browser-based interface using Java. The system compresses PET data on the user's machine, sends these data over a network, and instructs the PC cluster to reconstruct the image. The cluster implements a parallelized version of our preconditioned conjugate gradient method for fully 3D MAP image reconstruction. We report on the speed-up factors using the Beowulf approach and the impacts of communication latencies in the local cluster network and the network connection between the user's machine and our PC cluster.

  16. A fuzzy-clustering analysis based phonetic tied-mixture HMM

    Institute of Scientific and Technical Information of China (English)

    XU Xianghua; ZHU Jie; GUO Qiang

    2005-01-01

    To efficiently decrease the size of parameters and improve the robustness of parameters training, a fuzzy clustering based phonetic tied-mixture model, FPTM, is presented.The Gaussian codebook of FPTM is synthesized from Gaussian components belonging to the same root node in phonetic decision tree. Fuzzy clustering method is further used for FPTM covariance sharing. Experimental results show that compared with the conventional PTM with approximately the same parameters size, FPTM decrease the size of Gaussian weights by 77.59% and increases word accuracy by 7.92%, which proves Gaussian fuzzy clustering is efficient. Compared with FPTM, covariance-shared FPTM decreases word error rate by 1.14% , which proves the combined fuzzy clustering for both Gaussian and covariance is superior to Gaussian fuzzy clustering alone.

  17. Development Strategy Research of Industrial Clusters based on SWOT Analysis Method---To Pingxiang Bicycle Industrial Clusters for Example%基于SWOT分析的产业集群发展战略研究--以平乡自行车出口产业为例

    Institute of Scientific and Technical Information of China (English)

    杨金廷; 高敬

    2013-01-01

    SWOT分析方法在战略研究中应用较广,根据研究主体自身的既定内在条件进行分析,产业集群作为区域经济的载体,通过形成强劲、持续竞争力的能力,具有明显的发展优势。基于SWOT客观分析了平乡县自行车集群产业面临的外部机遇和威胁、内部优势和劣势,论述了SO、 WO、 ST、 WT不同组合下对应的产业发展策略,指出SO组合下应发展市场开发战略、品牌化战略、产品开发战略; WO组合下挖掘知识产权战略,以创新促发展,提升产品附加值,适应消费群体对高品质的需求; ST策略下应运用行业协会协同策略,通过行业协同形式共享资源,将企业联合起来共同打造平乡自行车成功模式; WT策略下提升产业集聚升级战略,改变平乡县目前小作坊、涣散管理的现状,优化产业资源配置。总之,充分发挥集群效应,为产业集群发展提供借鉴意义。%SWOT analysis method is used generally in strategic research. It analyses the established internal condition accor-ding to the research subject itself. As the carrier of the regional economic, by forming strong, sustained competitive ability, industrial clusters have great advantages of development. We analyze Opportunities, Threatens, Strengths and Weaknesses of Pingxiang bicycle cluster industry based on SWOT analysis method. We discussed four development strategies about SO, WO, ST, WT and pointed out that we should develop market development strategy, brand strategy, product development strategy under SO combination. Intellectual property strategy under WO combination , to enhance value-added products and adapt to the demand for high quality of consumers with the development of global science and technology progress and to develop innovation. The industry associations collaborative strategy under ST combination to build a successful model for Pingxiang bicycle cluster industry. To form a

  18. Combined Density-based and Constraint-based Algorithm for Clustering

    Institute of Scientific and Technical Information of China (English)

    CHEN Tung-shou; CHEN Rong-chang; LIN Chih-chiang; CHIU Yung-hsing

    2006-01-01

    We propose a new clustering algorithm that assists the researchers to quickly and accurately analyze data. We call this algorithm Combined Density-based and Constraint-based Algorithm (CDC). CDC consists of two phases. In the first phase, CDC employs the idea of density-based clustering algorithm to split the original data into a number of fragmented clusters. At the same time, CDC cuts off the noises and outliers. In the second phase, CDC employs the concept of K-means clustering algorithm to select a greater cluster to be the center. Then, the greater cluster merges some smaller clusters which satisfy some constraint rules.Due to the merged clusters around the center cluster, the clustering results show high accu racy. Moreover, CDC reduces the calculations and speeds up the clustering process. In this paper, the accuracy of CDC is evaluated and compared with those of K-means, hierarchical clustering, and the genetic clustering algorithm (GCA)proposed in 2004. Experimental results show that CDC has better performance.

  19. Multi-hop routing-based optimization of the number of cluster-heads in wireless sensor networks.

    Science.gov (United States)

    Nam, Choon Sung; Han, Young Shin; Shin, Dong Ryeol

    2011-01-01

    Wireless sensor networks require energy-efficient data transmission because the sensor nodes have limited power. A cluster-based routing method is more energy-efficient than a flat routing method as it can only send specific data for user requirements and aggregate similar data by dividing a network into a local cluster. However, previous clustering algorithms have some problems in that the transmission radius of sensor nodes is not realistic and multi-hop based communication is not used both inside and outside local clusters. As energy consumption based on clustering is dependent on the number of clusters, we need to know how many clusters are best. Thus, we propose an optimal number of cluster-heads based on multi-hop routing in wireless sensor networks. We observe that a local cluster made by a cluster-head influences the energy consumption of sensor nodes. We determined an equation for the number of packets to send and relay, and calculated the energy consumption of sensor networks using it. Through the process of calculating the energy consumption, we can obtain the optimal number of cluster-heads in wireless sensor networks. PMID:22163771

  20. A method for clustering of miRNA sequences using fragmented programming

    Science.gov (United States)

    Ivashchenko, Anatoly; Pyrkova, Anna; Niyazova, Raigul

    2016-01-01

    Clustering of miRNA sequences is an important problem in molecular genetics associated cellular biology. Thousands of such sequences are known today through advancement in sophisticated molecular tools, sequencing techniques, computational resources and rule based mathematical models. Analysis of such large-scale miRNA sequences for inferring patterns towards deducing cellular function is a great challenge in modern molecular biology. Therefore, it is of interest to develop mathematical models specific for miRNA sequences. The process is to group (cluster) such miRNA sequences using well-defined known features. We describe a method for clustering of miRNA sequences using fragmented programming. Subsequently, we illustrated the utility of the model using a dendrogram (a tree diagram) for publically known A.thaliana miRNA nucleotide sequences towards the inference of observed conserved patterns PMID:27212839

  1. Clustered iterative stochastic ensemble method for multi-modal calibration of subsurface flow models

    KAUST Repository

    Elsheikh, Ahmed H.

    2013-05-01

    A novel multi-modal parameter estimation algorithm is introduced. Parameter estimation is an ill-posed inverse problem that might admit many different solutions. This is attributed to the limited amount of measured data used to constrain the inverse problem. The proposed multi-modal model calibration algorithm uses an iterative stochastic ensemble method (ISEM) for parameter estimation. ISEM employs an ensemble of directional derivatives within a Gauss-Newton iteration for nonlinear parameter estimation. ISEM is augmented with a clustering step based on k-means algorithm to form sub-ensembles. These sub-ensembles are used to explore different parts of the search space. Clusters are updated at regular intervals of the algorithm to allow merging of close clusters approaching the same local minima. Numerical testing demonstrates the potential of the proposed algorithm in dealing with multi-modal nonlinear parameter estimation for subsurface flow models. © 2013 Elsevier B.V.

  2. Timing-Driven Nonuniform Depopulation-Based Clustering

    Directory of Open Access Journals (Sweden)

    Hanyu Liu

    2010-01-01

    hence improve routability by spreading the logic over the architecture. However, all depopulation-based clustering algorithms to this date increase critical path delay. In this paper, we present a timing-driven nonuniform depopulation-based clustering technique, T-NDPack, that targets critical path delay and channel width constraints simultaneously. T-NDPack adjusts the CLB capacity based on the criticality of the Basic Logic Element (BLE. Results show that T-NDPack reduces minimum channel width by 11.07% while increasing the number of CLBs by 13.28% compared to T-VPack. More importantly, T-NDPack decreases critical path delay by 2.89%.

  3. ANONYMIZATION BASED ON NESTED CLUSTERING FOR PRIVACY PRESERVATION IN DATA MINING

    Directory of Open Access Journals (Sweden)

    V.Rajalakshmi

    2013-07-01

    Full Text Available Privacy Preservation in data mining protects the data from revealing unauthorized extraction of information. Data Anonymization techniques implement this by modifying the data, so that the original values cannot be acquired easily. Perturbation techniques are variedly used which will greatly affect the quality of data,since there is a trade-off between privacy preservation and information loss which will subsequently affect the result of data mining. The method that is proposed in this paper is based on nested clustering of data andperturbation on each cluster. The size of clusters is kept optimal to reduce the information loss. The paper explains the methodology, implementation and results of nested clustering. Various metrics are also provided to explicate that this method overcomes the disadvantages of other perturbation methods.

  4. Improved Clustered Routing Algorithm Based on Distance and Energy in Wireless Sensor Networks

    OpenAIRE

    Wang, Dejun; Meng, Bo; Shaomin JIN

    2013-01-01

    Since the energy supply of a sensor node is limited, energy optimization should be considered as the key objective when studying the wireless sensor networks (WSN). Facing these challenges, clustering is one of the methods used to manage network energy consumption efficiently, and plays an important role in prolonging network lifetime and reducing energy consumption. The improved clustered routing algorithm based on distance and energy is proposed, which efficient improve the rate of data agg...

  5. Taste Identification of Tea Through a Fuzzy Neural Network Based on Fuzzy C-means Clustering

    Institute of Scientific and Technical Information of China (English)

    ZHENG Yan; ZHOU Chun-guang

    2003-01-01

    In this paper, we present a fuzzy neural network model based on Fuzzy C-Means (FCM) clustering algorithm to realize the taste identification of tea. The proposed method can acquire the fuzzy subset and its membership function in an automatic way with the aid of FCM clustering algorithm. Moreover, we improve the fuzzy weighted inference approach. The proposed model is illustrated with the simulation of taste identification of tea.

  6. A Novel Clustering Algorithm Based on Quantum Games

    CERN Document Server

    Li, Qiang; Jiang, Jing-ping

    2008-01-01

    The enormous successes have been made by quantum algorithms during the last decade. In this paper, we combine the quantum game with the problem of data clustering, and develop clustering algorithms based on it, in which data points in a dataset are considered as players who can make decisions and play quantum strategies in quantum games. After playing quantum games, each player's expected payoff is calculated and then he uses an link-removing-and-rewiring (LRR) function to change his neighbors and adjust the strength of links connecting to them for maximizing his payoff. Further, algorithms are discussed and analyzed in two cases of strategies, two payoff matrixes and two LRR functions. Consequently, the experimental results have demonstrated that data points in datasets are clustered reasonably and efficiently, and the clustering algorithms have fast rates of convergence. Moreover, the comparison with other algorithms also provides an indication of the effectiveness of the proposed approach.

  7. A Data-origin Authentication Protocol Based on ONOS Cluster

    Directory of Open Access Journals (Sweden)

    Qin Hua

    2016-01-01

    Full Text Available This paper is aim to propose a data-origin authentication protocol based on ONOS cluster. ONOS is a SDN controller which can work under a distributed environment. However, the security of an ONOS cluster is seldom considered, and the communication in an ONOS cluster may suffer from lots of security threats. In this paper, we used a two-tier self-renewable hash chain for identity authentication and data-origin authentication. We analyse the security and overhead of our proposal and made a comparison with current security measure. It showed that with the help of our proposal, communication in an ONOS cluster could be protected from identity forging, replay attacks, data tampering, MITM attacks and repudiation, also the computational overhead would decrease apparently.

  8. Richness-based masses of rich and famous galaxy clusters

    CERN Document Server

    Andreon, S

    2016-01-01

    We present a catalog of galaxy cluster masses derived by exploiting the tight correlation between mass and richness, i.e., a properly computed number of bright cluster galaxies. The richness definition adopted in this work is properly calibrated, shows a small scatter with mass, and has a known evolution, which means that we can estimate accurate ($0.16$ dex) masses more precisely than by adopting any other richness estimates or X-ray or SZ-based proxies based on survey data. We measured a few hundred galaxy clusters at $0.05clusters, that have a known X-ray emission, that are in the Abell catalog, or that are among the most most cited in the literature. Diagnostic plots and direct images of clusters are individually inspected and we improved cluster centers and, when needed, we revised redshifts. Whenever possible, we also checked for indications of contamination from other clus...

  9. A Data Cleansing Method for Clustering Large-scale Transaction Databases

    CERN Document Server

    Loh, Woong-Kee; Kang, Jun-Gyu

    2010-01-01

    In this paper, we emphasize the need for data cleansing when clustering large-scale transaction databases and propose a new data cleansing method that improves clustering quality and performance. We evaluate our data cleansing method through a series of experiments. As a result, the clustering quality and performance were significantly improved by up to 165% and 330%, respectively.

  10. Cluster Based Hybrid Niche Mimetic and Genetic Algorithm for Text Document Categorization

    Directory of Open Access Journals (Sweden)

    A. K. Santra

    2011-09-01

    Full Text Available An efficient cluster based hybrid niche mimetic and genetic algorithm for text document categorization to improve the retrieval rate of relevant document fetching is addressed. The proposal minimizes the processing of structuring the document with better feature selection using hybrid algorithm. In addition restructuring of feature words to associated documents gets reduced, in turn increases document clustering rate. The performance of the proposed work is measured in terms of cluster objects accuracy, term weight, term frequency and inverse document frequency. Experimental results demonstrate that it achieves very good performance on both feature selection and text document categorization, compared to other classifier methods.

  11. Voxel-based clustered imaging by multiparameter diffusion tensor images for glioma grading.

    Science.gov (United States)

    Inano, Rika; Oishi, Naoya; Kunieda, Takeharu; Arakawa, Yoshiki; Yamao, Yukihiro; Shibata, Sumiya; Kikuchi, Takayuki; Fukuyama, Hidenao; Miyamoto, Susumu

    2014-01-01

    Gliomas are the most common intra-axial primary brain tumour; therefore, predicting glioma grade would influence therapeutic strategies. Although several methods based on single or multiple parameters from diagnostic images exist, a definitive method for pre-operatively determining glioma grade remains unknown. We aimed to develop an unsupervised method using multiple parameters from pre-operative diffusion tensor images for obtaining a clustered image that could enable visual grading of gliomas. Fourteen patients with low-grade gliomas and 19 with high-grade gliomas underwent diffusion tensor imaging and three-dimensional T1-weighted magnetic resonance imaging before tumour resection. Seven features including diffusion-weighted imaging, fractional anisotropy, first eigenvalue, second eigenvalue, third eigenvalue, mean diffusivity and raw T2 signal with no diffusion weighting, were extracted as multiple parameters from diffusion tensor imaging. We developed a two-level clustering approach for a self-organizing map followed by the K-means algorithm to enable unsupervised clustering of a large number of input vectors with the seven features for the whole brain. The vectors were grouped by the self-organizing map as protoclusters, which were classified into the smaller number of clusters by K-means to make a voxel-based diffusion tensor-based clustered image. Furthermore, we also determined if the diffusion tensor-based clustered image was really helpful for predicting pre-operative glioma grade in a supervised manner. The ratio of each class in the diffusion tensor-based clustered images was calculated from the regions of interest manually traced on the diffusion tensor imaging space, and the common logarithmic ratio scales were calculated. We then applied support vector machine as a classifier for distinguishing between low- and high-grade gliomas. Consequently, the sensitivity, specificity, accuracy and area under the curve of receiver operating characteristic

  12. 基于最小熵值相似矩阵构造方法及其聚类过程%The method of construction similar matrices and its clustering process based on the minimum entropy

    Institute of Scientific and Technical Information of China (English)

    魏书堤

    2012-01-01

    针对利用相似矩阵进行聚类分析的分类问题,定义了相似矩阵及其性质,并给出相似矩阵的一些常见构造方法.针对现有构造方法缺少含义的问题,尤其是针对模糊问题,提出了一种基于最小信息熵值聚类构造相似矩阵的方法.该方法首先利用最小信息熵值原理获得多准则决策中不同属性的权重信息,然后求解不同方案集间的加权距离.通过构造正负不同的理想解,求解方案集与正负理想解直接的加权距离,并利用方案间加权距离与正负理想解整体距离之间的比例构造相似矩阵.算例表明该方法切实可行.%For clustering problem based on similarity matrix, similarity matrix and its properties were defined and some common construction method were introduced. For existing similar matrix structure lack of practical implications, a method based on minimum information entropy was put forward. In this method, the weights of different attributes were attained by using the minimum information entropy principle and then the weighted distances among different scheme sets were solved. By constructing different positive and negative ideal solution, the weighted distances between scheme and positive ideal or negative ideal were solved, then the similarity matrices were constructed with the proportion the weighted distances among different scheme sets to the weighted distances between scheme and positive ideal or negative ideal. The example shows the effectiveness of this method.

  13. SOFT CLUSTERING BASED EXPOSITION TO MULTIPLE DICTIONARY BAG OF WORDS

    OpenAIRE

    Sujatha, K. S.; B. Vinod

    2012-01-01

    Object classification is a highly important area of computer vision and has many applications including robotics, searching images, face recognition, aiding visually impaired people, censoring images and many more. A new common method of classification that uses features is the Bag of Words approach. In this method a codebook of visual words is created using various clustering methods. For increasing the performance Multiple Dictionaries BoW (MDBoW) method that uses more visual words from dif...

  14. An Algorithm of Speaker Clustering Based on Model Distance

    Directory of Open Access Journals (Sweden)

    Wei Li

    2014-03-01

    Full Text Available An algorithm based on Model Distance (MD for spectral speaker clustering is proposed to deal with the shortcoming of general spectral clustering algorithm in describing the distribution of signal source. First, an Universal Background Model (UBM is created with a large quantity of independent speakers; Then, Gaussian Mixture Model (GMM is trained from the UBM for every speech segment; At last, the probability distance between the GMM of every speech segment is used to build affinity matrix, and speaker spectral clustering is done on the affinity matrix. Experimental results based on news and conference data sets show that an average of 6.38% improvements in F measure is obtained in comparison with algorithm based on the feature vector distance. In addition, the proposed algorithm is 11.72 times faster

  15. Mapping the Generator Coordinate Method to the Coupled Cluster Approach

    CERN Document Server

    Stuber, Jason L

    2015-01-01

    The generator coordinate method (GCM) casts the wavefunction as an integral over a weighted set of non-orthogonal single determinantal states. In principle this representation can be used like the configuration interaction (CI) or shell model to systematically improve the approximate wavefunction towards an exact solution. In practice applications have generally been limited to systems with less than three degrees of freedom. This bottleneck is directly linked to the exponential computational expense associated with the numerical projection of broken symmetry Hartree-Fock (HF) or Hartree-Fock-Bogoliubov (HFB) wavefunctions and to the use of a variational rather than a bi-variational expression for the energy. We circumvent these issues by choosing a hole-particle representation for the generator and applying algebraic symmetry projection, via the use of tensor operators and the invariant mean (operator average). The resulting GCM formulation can be mapped directly to the coupled cluster (CC) approach, leading...

  16. 基于语句聚类识别的知识动态提取方法研究%A Dynamic Knowledge Extraction Method Based on Sentence-Clustering Recognition

    Institute of Scientific and Technical Information of China (English)

    苏牧; 肖人彬

    2001-01-01

    In view of the clustering phenomenon of natural language as well as the demand for dynamic alternation of knowledge architecture, a dynamic knowledge extraction method based on sentence-clustering recognition is put forward in this paper. First of all, the paper gives a research framework of the proposed method, which describes the transformation process from natural language texts to object-oriented knowledge architecture. Some problems related to sentence vectorization are investigated, several fundamental definitions as well as one judgement theorem are given, and the postpositional processing on attribute vectors of sentence cells is discussed. A sentence-clustering recognition approach is constructed using ART2 neural network and the concept of prior belief degree is adopted to measure the outcome of sentence recognition. A simulation procedure of ART2 neural network is compiled by Matlab, the effects of sentence recognition by the procedure are given, and the corresponding analysis is made. A width-first code-generating method is proposed so as to make knowledge transformation for the clustered sentences according to the outcome of sentence recognition by ART2 neural network, and the posterior belief degree is defined as a final evaluation index to sentence recognition and semantic model construction. The implementation steps of the above method are further introduced for a specific sentence pattern. Finally, the derived relation is generated by utilizing the new approach of structural modeling proposed by the authors, such that the knowledge extraction process from natural language texts to object-oriented knowledge architecture can be accomplished. The effectiveness of the proposed method is verified by a practical example of mechanical CAD, which is adopted to penetrate the whole paper so as to demonstrate the complete implementation details.%根据自然语言的群集现象和对知识体系动态更新的要求,该文提出了一种基于语句聚类

  17. Factorial PD-Clustering

    CERN Document Server

    Tortora, Cristina; Summa, Mireille Gettler

    2011-01-01

    Factorial clustering methods have been developed in recent years thanks to the improving of computational power. These methods perform a linear transformation of data and a clustering on transformed data optimizing a common criterion. Factorial PD-clustering is based on Probabilistic Distance clustering (PD-clustering). PD-clustering is an iterative, distribution free, probabilistic, clustering method. Factorial PD-clustering make a linear transformation of original variables into a reduced number of orthogonal ones using a common criterion with PD-Clustering. It is demonstrated that Tucker 3 decomposition allows to obtain this transformation. Factorial PD-clustering makes alternatively a Tucker 3 decomposition and a PD-clustering on transformed data until convergence. This method could significantly improve the algorithm performance and allows to work with large dataset, to improve the stability and the robustness of the method.

  18. Sleeping Cluster based Medium Access Control Layer Routing Protocol for Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    T. R. Rangaswamy

    2012-01-01

    Full Text Available Wireless sensor networks play a vital role in remote area applications, where human intervention is not possible. In a Wireless Sensor Network (WSN each and every node is strictly an energy as well as bandwidth constrained one. Problem statement: In a standard WSN, most of the routing techniques, move data from multiple sources to a single fixed base station. Because of the greater number of computational tasks, the existing routing protocol did not address the energy efficient problem properly. In order to overcome the problem of energy consumption due to more number of computational tasks, a new method is developed. Approach: The proposed algorithm divides the sensing field into three active clusters and one sleeping cluster. The cluster head selection is based on the distance between the base station and the normal nodes. The Time Division Multiple Access (TDMA mechanism is used to make the cluster remain in the active state as well as the sleeping state. In an active cluster 50% of nodes will be made active and the remaining 50% be in sleep state. A sleeping cluster will be made active after a period of time and periodically changes its functionality. Results: Due to this periodic change of state, energy consumption is minimized. The performance of the Low Energy Adaptive and Clustering Hierarchy (LEACH algorithm is also analyzed, using a network simulator NS2 based on the number of Cluster Heads (CH, Energy consumption, Lifetime and the number of nodes alive. Conclusion: The simulation studies were carried out using a network simulation tool NS2, for the proposed method and this is compared with the performance of the existing protocol. The superiority of the proposed method is highlighted.

  19. A Comprehensive Comparison of Different Clustering Methods for Reliability Analysis of Microarray Data

    Science.gov (United States)

    Kafieh, Rahele; Mehridehnavi, Alireza

    2013-01-01

    In this study, we considered some competitive learning methods including hard competitive learning and soft competitive learning with/without fixed network dimensionality for reliability analysis in microarrays. In order to have a more extensive view, and keeping in mind that competitive learning methods aim at error minimization or entropy maximization (different kinds of function optimization), we decided to investigate the abilities of mixture decomposition schemes. Therefore, we assert that this study covers the algorithms based on function optimization with particular insistence on different competitive learning methods. The destination is finding the most powerful method according to a pre-specified criterion determined with numerical methods and matrix similarity measures. Furthermore, we should provide an indication showing the intrinsic ability of the dataset to form clusters before we apply a clustering algorithm. Therefore, we proposed Hopkins statistic as a method for finding the intrinsic ability of a data to be clustered. The results show the remarkable ability of Rayleigh mixture model in comparison with other methods in reliability analysis task. PMID:24083134

  20. A comprehensive comparison of different clustering methods for reliability analysis of microarray data.

    Science.gov (United States)

    Kafieh, Rahele; Mehridehnavi, Alireza

    2013-01-01

    In this study, we considered some competitive learning methods including hard competitive learning and soft competitive learning with/without fixed network dimensionality for reliability analysis in microarrays. In order to have a more extensive view, and keeping in mind that competitive learning methods aim at error minimization or entropy maximization (different kinds of function optimization), we decided to investigate the abilities of mixture decomposition schemes. Therefore, we assert that this study covers the algorithms based on function optimization with particular insistence on different competitive learning methods. The destination is finding the most powerful method according to a pre-specified criterion determined with numerical methods and matrix similarity measures. Furthermore, we should provide an indication showing the intrinsic ability of the dataset to form clusters before we apply a clustering algorithm. Therefore, we proposed Hopkins statistic as a method for finding the intrinsic ability of a data to be clustered. The results show the remarkable ability of Rayleigh mixture model in comparison with other methods in reliability analysis task. PMID:24083134

  1. Vibrational anharmonicity of small gold and silver clusters using the VSCF method.

    Science.gov (United States)

    Mancera, Luis A; Benoit, David M

    2016-01-01

    We study the vibrational spectra of small neutral gold (Au2-Au10) and silver (Ag2-Au5) clusters using the vibrational self-consistent field method (VSCF) in order to account for anharmonicity. We report harmonic, VSCF, and correlation-corrected VSCF calculations obtained using a vibrational configuration interaction approach (VSCF/VCI). Our implementation of the method is based on an efficient calculation of the potential energy surfaces (PES), using periodic density functional theory (DFT) with a plane-wave pseudopotential basis. In some cases, we use an efficient technique (fast-VSCF) assisted by the Voter-Chen potential in order to get an efficient reduction of the number of pair-couplings between modes. This allows us to efficiently reduce the computing time of 2D-PES without degrading the accuracy. We found that anharmonicity of the gold clusters is very small with maximum rms deviations of about 1 cm(-1), although for some particular modes anharmonicity reaches values slightly larger than 2 cm(-1). Silver clusters show slightly larger anharmonicity. In both cases, large differences between calculated and experimental vibrational frequencies (when available) stem more likely from the quality of the electronic structure method used than from vibrational anharmonicity. We show that noble gas embedding often affects the vibrational properties of these clusters more than anharmonicity, and discuss our results in the context of experimental studies. PMID:26619274

  2. Cluster Evolution in Undercooled Melt and Solidification of Undercooled Ge-based Alloy Melts Induced by Extrinsic Clusters

    Institute of Scientific and Technical Information of China (English)

    王煦; 景勤; 王文魁

    2003-01-01

    The structure or short-range order of clusters in undercooled metallic melts is influenced, to some extent, by the interfacial free energy between the cluster and the melt. Analyses of the effects of interfacial energy on the cluster structure based on the Gibbs equation show a possibility that atoms in the clusters tend to be packed more loosely with the increasing cluster size (or the undercooling). Nucleation may occur, following these analyses,when clusters reach a definite size and atoms in the clusters relax to some extent to form the crystal structure.Indirect support to this viewpoint is provided by the present results of cluster-induced nucleation experiments on undercooled Ge73.7Ni26.3 alloy melts.

  3. Variable Selection in Model-based Clustering: A General Variable Role Modeling

    OpenAIRE

    Maugis, Cathy; Celeux, Gilles; Martin-Magniette, Marie-Laure

    2008-01-01

    The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. We propose a more versatile variable selection model which describes three possible roles for each variable: The relevant clustering variables, the irrelevant clustering variables dependent on a part of the relevant clustering variables and the irrelevant clustering variables totally indepe...

  4. A rough set based rational clustering framework for determining correlated genes.

    Science.gov (United States)

    Jeyaswamidoss, Jeba Emilyn; Thangaraj, Kesavan; Ramar, Kadarkarai; Chitra, Muthusamy

    2016-06-01

    Cluster analysis plays a foremost role in identifying groups of genes that show similar behavior under a set of experimental conditions. Several clustering algorithms have been proposed for identifying gene behaviors and to understand their significance. The principal aim of this work is to develop an intelligent rough clustering technique, which will efficiently remove the irrelevant dimensions in a high-dimensional space and obtain appropriate meaningful clusters. This paper proposes a novel biclustering technique that is based on rough set theory. The proposed algorithm uses correlation coefficient as a similarity measure to simultaneously cluster both the rows and columns of a gene expression data matrix and mean squared residue to generate the initial biclusters. Furthermore, the biclusters are refined to form the lower and upper boundaries by determining the membership of the genes in the clusters using mean squared residue. The algorithm is illustrated with yeast gene expression data and the experiment proves the effectiveness of the method. The main advantage is that it overcomes the problem of selection of initial clusters and also the restriction of one object belonging to only one cluster by allowing overlapping of biclusters. PMID:27352972

  5. Risk Probability Estimating Based on Clustering

    DEFF Research Database (Denmark)

    Chen, Yong; Jensen, Christian D.; Gray, Elizabeth;

    2003-01-01

    of prior experiences, recommendations from a trusted entity or the reputation of the other entity. In this paper we propose a dynamic mechanism for estimating the risk probability of a certain interaction in a given environment using hybrid neural networks. We argue that traditional risk assessment models...... from the insurance industry do not directly apply to ubiquitous computing environments. Instead, we propose a dynamic mechanism for risk assessment, which is based on pattern matching, classification and prediction procedures. This mechanism uses an estimator of risk probability, which is based...

  6. Hierarchical Compressed Sensing for Cluster Based Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Vishal Krishna Singh

    2016-02-01

    Full Text Available Data transmission consumes significant amount of energy in large scale wireless sensor networks (WSNs. In such an environment, reducing the in-network communication and distributing the load evenly over the network can reduce the overall energy consumption and maximize the network lifetime significantly. In this work, the aforementioned problem of network lifetime and uneven energy consumption in large scale wireless sensor networks is addressed. This work proposes a hierarchical compressed sensing (HCS scheme to reduce the in-network communication during the data gathering process. Co-related sensor readings are collected via a hierarchical clustering scheme. A compressed sensing (CS based data processing scheme is devised to transmit the data from the source to the sink. The proposed HCS is able to identify the optimal position for the application of CS to achieve reduced and similar number of transmissions on all the nodes in the network. An activity map is generated to validate the reduced and uniformly distributed communication load of the WSN. Based on the number of transmissions per data gathering round, the bit-hop metric model is used to analyse the overall energy consumption. Simulation results validate the efficiency of the proposed method over the existing CS based approaches.

  7. Applying clustering approach in predictive uncertainty estimation: a case study with the UNEEC method

    Science.gov (United States)

    Dogulu, Nilay; Solomatine, Dimitri; Lal Shrestha, Durga

    2014-05-01

    Within the context of flood forecasting, assessment of predictive uncertainty has become a necessity for most of the modelling studies in operational hydrology. There are several uncertainty analysis and/or prediction methods available in the literature; however, most of them rely on normality and homoscedasticity assumptions for model residuals occurring in reproducing the observed data. This study focuses on a statistical method analyzing model residuals without having any assumptions and based on a clustering approach: Uncertainty Estimation based on local Errors and Clustering (UNEEC). The aim of this work is to provide a comprehensive evaluation of the UNEEC method's performance in view of clustering approach employed within its methodology. This is done by analyzing normality of model residuals and comparing uncertainty analysis results (for 50% and 90% confidence level) with those obtained from uniform interval and quantile regression methods. An important part of the basis by which the methods are compared is analysis of data clusters representing different hydrometeorological conditions. The validation measures used are PICP, MPI, ARIL and NUE where necessary. A new validation measure linking prediction interval to the (hydrological) model quality - weighted mean prediction interval (WMPI) - is also proposed for comparing the methods more effectively. The case study is Brue catchment, located in the South West of England. A different parametrization of the method than its previous application in Shrestha and Solomatine (2008) is used, i.e. past error values in addition to discharge and effective rainfall is considered. The results show that UNEEC's notable characteristic in its methodology, i.e. applying clustering to data of predictors upon which catchment behaviour information is encapsulated, contributes increased accuracy of the method's results for varying flow conditions. Besides, classifying data so that extreme flow events are individually

  8. Richness-based masses of rich and famous galaxy clusters

    Science.gov (United States)

    Andreon, S.

    2016-03-01

    We present a catalog of galaxy cluster masses derived by exploiting the tight correlation between mass and richness, i.e., a properly computed number of bright cluster galaxies. The richness definition adopted in this work is properly calibrated, shows a small scatter with mass, and has a known evolution, which means that we can estimate accurate (0.16 dex) masses more precisely than by adopting any other richness estimates or X-ray or SZ-based proxies based on survey data. We measured a few hundred galaxy clusters at 0.05 web front-end is available at the URL http://www.brera.mi.astro.it/~andreon/famous.html

  9. Efficient Clustering for Irregular Geometries Based on Identification of Concavities

    Directory of Open Access Journals (Sweden)

    Velázquez-Villegas Fernando

    2014-04-01

    Full Text Available Two dimensional clustering problem has much relevance in applications related to the efficient use of raw material, such as cutting stock, packing, etc. This is a very complex problem in which multiple bodies are accommodated efficiently in a way that they occupy as little space as possible. The complexity of the problem increases with the complexity of the bodies. Clearly the number of possible arrangements between bodies is huge. No Fit Polygon (NFP allows to determine the entire relative positions between two patterns (regular or irregular in contact, non-overlapping, therefore the best position can be selected. However, NFP generation requires a lot of calculations; besides, selecting the best cluster isn’t a simple task because, between two irregular patterns in contact, hollows (unusable areas and external concavities (usable areas can be produced. This work presents a quick and simple method to reduce calculations associated with NFP generation and to minimize unusable areas in a cluster. This method consists of generating partial NFP, just on concave regions of the patterns, and selecting the best cluster using a total weighted efficiency, i.e. a weighted value of enclosure efficiency (ratio of occupied area on convex hull area and hollow efficiency (ratio of occupied area on cluster area. The proposed method produces similar results as those obtained by other methods; however the shape of the clusters obtained allows to accommodate more parts in similar spaces, which is a desirable result when it comes to optimizing the use of material. We present two examples to show the performance of the proposal.

  10. A Resampling Based Clustering Algorithm for Replicated Gene Expression Data.

    Science.gov (United States)

    Li, Han; Li, Chun; Hu, Jie; Fan, Xiaodan

    2015-01-01

    In gene expression data analysis, clustering is a fruitful exploratory technique to reveal the underlying molecular mechanism by identifying groups of co-expressed genes. To reduce the noise, usually multiple experimental replicates are performed. An integrative analysis of the full replicate data, instead of reducing the data to the mean profile, carries the promise of yielding more precise and robust clusters. In this paper, we propose a novel resampling based clustering algorithm for genes with replicated expression measurements. Assuming those replicates are exchangeable, we formulate the problem in the bootstrap framework, and aim to infer the consensus clustering based on the bootstrap samples of replicates. In our approach, we adopt the mixed effect model to accommodate the heterogeneous variances and implement a quasi-MCMC algorithm to conduct statistical inference. Experiments demonstrate that by taking advantage of the full replicate data, our algorithm produces more reliable clusters and has robust performance in diverse scenarios, especially when the data is subject to multiple sources of variance. PMID:26671802

  11. Hybrid Weighted-based Clustering Routing Protocol for Railway Communications

    Directory of Open Access Journals (Sweden)

    Jianli Xie

    2013-12-01

    Full Text Available In the paper, a hybrid clustering routing strategy is proposed for railway emergency ad hoc network, when GSM-R base stations are destroyed or some terminals (or nodes are far from the signal coverage. In this case, the cluster-head (CH election procedure is invoked on-demand, which takes into consideration the degree difference from the ideal degree, relative clustering stability, the sum of distance between the node and it’s one-hop neighbors, consumed power, node type and node mobility. For the clustering forming, the weights for the CH election parameters are allocated rationally by rough set theory. The hybrid weighted-based clustering routing (HWBCR strategy is designed for railway emergency communication scene, which aims to get a good trade-off between the computation costs and performances. The simulation platform is constructed to evaluate the performance of our strategy in terms of the average end-to-end delay, packet loss ratio, routing overhead and average throughput. The results, by comparing with the railway communication QoS index, reveal that our strategy is suitable for transmitting dispatching voice and data between train and ground, when the train speed is less than 220km/h

  12. Thematic clustering of text documents using an EM-based approach.

    Science.gov (United States)

    Kim, Sun; Wilbur, W John

    2012-10-01

    Clustering textual contents is an important step in mining useful information on the web or other text-based resources. The common task in text clustering is to handle text in a multi-dimensional space, and to partition documents into groups, where each group contains documents that are similar to each other. However, this strategy lacks a comprehensive view for humans in general since it cannot explain the main subject of each cluster. Utilizing semantic information can solve this problem, but it needs a well-defined ontology or pre-labeled gold standard set. In this paper, we present a thematic clustering algorithm for text documents. Given text, subject terms are extracted and used for clustering documents in a probabilistic framework. An EM approach is used to ensure documents are assigned to correct subjects, hence it converges to a locally optimal solution. The proposed method is distinctive because its results are sufficiently explanatory for human understanding as well as efficient for clustering performance. The experimental results show that the proposed method provides a competitive performance compared to other state-of-the-art approaches. We also show that the extracted themes from the MEDLINE® dataset represent the subjects of clusters reasonably well.

  13. A THREE-STEP SPATIAL-TEMPORAL-SEMANTIC CLUSTERING METHOD FOR HUMAN ACTIVITY PATTERN ANALYSIS

    Directory of Open Access Journals (Sweden)

    W. Huang

    2016-06-01

    Full Text Available How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time to four dimensions (space, time and semantics. More specifically, not only a location and time that people stay and spend are collected, but also what people “say” for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The

  14. a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis

    Science.gov (United States)

    Huang, W.; Li, S.; Xu, S.

    2016-06-01

    How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the

  15. 基于K均值PSOABC的测试用例自动生成方法%Automatic Testcase Generation Method Based on PSOABC and K-means Clustering Algorithm

    Institute of Scientific and Technical Information of China (English)

    贾冀婷

    2015-01-01

    To improve the automation ability of testcase generation in software testing is very important to guarantee the quality of soft-ware and reduce the cost of software. In this paper,propose an automatic testcase generation method based on particle swarm optimiza-tion,artificial bee colony algorithm and K-means clustering algorithm,and carry out the simulation experiments. The results show that the improved algorithm’ s efficiency is better and convergence ability is stronger than other algorithms such as particle swarm optimization and genetic algorithm in the automation ability of testcase generation.%软件测试中测试用例自动生成技术对于确保软件质量与降低开发成本都是非常重要的。文中基于K均值聚类算法与粒子群算法和人工蜂群算法相结合的混合算法,提出了一种测试用例自动生成方法,并且对此方法进行了仿真实验。实验结果表明,与基本的粒子群算法、遗传算法的测试用例自动生成方法相比较,基于文中改进算法的测试用例自动生成方法具有测试用例自动生成效率高、收敛能力强等优点。

  16. An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

    Directory of Open Access Journals (Sweden)

    K. Mumtaz

    2010-06-01

    Full Text Available Mining knowledge from large amounts of spatial data is known as spatial data mining. It becomes a highly demanding field because huge amounts of spatial data have been collected in various applications ranging from geo-spatial data to bio-medical knowledge. The amount of spatial data being collected is increasing exponentially. So, it far exceeded human’s ability to analyze. Recently, clustering has been recognized as a primary data mining method for knowledge discovery in spatial database. The development of clustering algorithms has received a lot of attention in the last few years and new clustering algorithms are proposed. DBSCAN is a pioneer density based clustering algorithm. It can find out the clusters of different shapes and sizes from the large amount of data containing noise and outliers. This paper shows the results of analyzing the properties of density based clustering characteristics of three clustering algorithms namely DBSCAN, k-means and SOM using synthetic two dimensional spatial data sets.

  17. D Partition-Based Clustering for Supply Chain Data Management

    Science.gov (United States)

    Suhaibah, A.; Uznir, U.; Anton, F.; Mioc, D.; Rahman, A. A.

    2015-10-01

    Supply Chain Management (SCM) is the management of the products and goods flow from its origin point to point of consumption. During the process of SCM, information and dataset gathered for this application is massive and complex. This is due to its several processes such as procurement, product development and commercialization, physical distribution, outsourcing and partnerships. For a practical application, SCM datasets need to be managed and maintained to serve a better service to its three main categories; distributor, customer and supplier. To manage these datasets, a structure of data constellation is used to accommodate the data into the spatial database. However, the situation in geospatial database creates few problems, for example the performance of the database deteriorate especially during the query operation. We strongly believe that a more practical hierarchical tree structure is required for efficient process of SCM. Besides that, three-dimensional approach is required for the management of SCM datasets since it involve with the multi-level location such as shop lots and residential apartments. 3D R-Tree has been increasingly used for 3D geospatial database management due to its simplicity and extendibility. However, it suffers from serious overlaps between nodes. In this paper, we proposed a partition-based clustering for the construction of a hierarchical tree structure. Several datasets are tested using the proposed method and the percentage of the overlapping nodes and volume coverage are computed and compared with the original 3D R-Tree and other practical approaches. The experiments demonstrated in this paper substantiated that the hierarchical structure of the proposed partitionbased clustering is capable of preserving minimal overlap and coverage. The query performance was tested using 300,000 points of a SCM dataset and the results are presented in this paper. This paper also discusses the outlook of the structure for future reference.

  18. Stepwise threshold clustering: a new method for genotyping MHC loci using next-generation sequencing technology.

    Directory of Open Access Journals (Sweden)

    William E Stutz

    Full Text Available Genes of the vertebrate major histocompatibility complex (MHC are of great interest to biologists because of their important role in immunity and disease, and their extremely high levels of genetic diversity. Next generation sequencing (NGS technologies are quickly becoming the method of choice for high-throughput genotyping of multi-locus templates like MHC in non-model organisms. Previous approaches to genotyping MHC genes using NGS technologies suffer from two problems:1 a "gray zone" where low frequency alleles and high frequency artifacts can be difficult to disentangle and 2 a similar sequence problem, where very similar alleles can be difficult to distinguish as two distinct alleles. Here were present a new method for genotyping MHC loci--Stepwise Threshold Clustering (STC--that addresses these problems by taking full advantage of the increase in sequence data provided by NGS technologies. Unlike previous approaches for genotyping MHC with NGS data that attempt to classify individual sequences as alleles or artifacts, STC uses a quasi-Dirichlet clustering algorithm to cluster similar sequences at increasing levels of sequence similarity. By applying frequency and similarity based criteria to clusters rather than individual sequences, STC is able to successfully identify clusters of sequences that correspond to individual or similar alleles present in the genomes of individual samples. Furthermore, STC does not require duplicate runs of all samples, increasing the number of samples that can be genotyped in a given project. We show how the STC method works using a single sample library. We then apply STC to 295 threespine stickleback (Gasterosteus aculeatus samples from four populations and show that neighboring populations differ significantly in MHC allele pools. We show that STC is a reliable, accurate, efficient, and flexible method for genotyping MHC that will be of use to biologists interested in a variety of downstream applications.

  19. Mining Representative Subset Based on Fuzzy Clustering

    Institute of Scientific and Technical Information of China (English)

    ZHOU Hongfang; FENG Boqin; L(U) Lintao

    2007-01-01

    Two new concepts-fuzzy mutuality and average fuzzy entropy are presented. Then based on these concepts, a new algorithm-RSMA (representative subset mining algorithm) is proposed, which can abstract representative subset from massive data.To accelerate the speed of producing representative subset, an improved algorithm-ARSMA(accelerated representative subset mining algorithm) is advanced, which adopt combining putting forward with backward strategies. In this way, the performance of the algorithm is improved. Finally we make experiments on real datasets and evaluate the representative subset. The experiment shows that ARSMA algorithm is more excellent than RandomPick algorithm either on effectiveness or efficiency.

  20. Structural variation from heterometallic cluster-based 1D chain to heterometallic tetranuclear cluster: Syntheses, structures and magnetic properties

    Science.gov (United States)

    Zhang, Shu-Hua; Zhao, Ru-Xia; Li, He-Ping; Ge, Cheng-Min; Li, Gui; Huang, Qiu-Ping; Zou, Hua-Hong

    2014-08-01

    Using the solvothermal method, we present the comparative preparation of {[Co3Na(dmaep)3(ehbd)(N3)3]·DMF}n (1) and [Co2Na2(hmbd)4(N3)2(DMF)2] (2), where Hehbd is 3-ethoxy-2-hydroxy-benzaldehyde, Hhmbd is 3-methoxy-2-hydroxy-benzaldehyde, and Hdmaep is 2-dimethylaminomethyl-6-ethoxy-phenol, which was synthesized by an in-situ reaction. Complexes 1 and 2 were characterized by elemental analysis, IR spectroscopy, and X-ray single-crystal diffraction. Complex 1 is a novel heterometallic cluster-based 1-D chain and 2 is a heterometallic tetranuclear cluster. The {Co3IINa} and {Co2IINa2} cores display dominant ferromagnetic interaction from the nature of the binding modes through μ1,1,1-N3- (end-on, EO).

  1. Clustering Based Feature Learning on Variable Stars

    CERN Document Server

    Mackenzie, Cristóbal; Protopapas, Pavlos

    2016-01-01

    The success of automatic classification of variable stars strongly depends on the lightcurve representation. Usually, lightcurves are represented as a vector of many statistical descriptors designed by astronomers called features. These descriptors commonly demand significant computational power to calculate, require substantial research effort to develop and do not guarantee good performance on the final classification task. Today, lightcurve representation is not entirely automatic; algorithms that extract lightcurve features are designed by humans and must be manually tuned up for every survey. The vast amounts of data that will be generated in future surveys like LSST mean astronomers must develop analysis pipelines that are both scalable and automated. Recently, substantial efforts have been made in the machine learning community to develop methods that prescind from expert-designed and manually tuned features for features that are automatically learned from data. In this work we present what is, to our ...

  2. Efficiency of a Multi-Reference Coupled Cluster method

    CERN Document Server

    Giner, Emmanuel; Scemama, Anthony; Malrieu, Jean Paul

    2015-01-01

    The multi-reference Coupled Cluster method first proposed by Meller et al (J. Chem. Phys. 1996) has been implemented and tested. Guess values of the amplitudes of the single and double excitations (the ${\\hat T}$ operator) on the top of the references are extracted from the knowledge of the coefficients of the Multi Reference Singles and Doubles Configuration Interaction (MRSDCI) matrix. The multiple parentage problem is solved by scaling these amplitudes on the interaction between the references and the Singles and Doubles. Then one proceeds to a dressing of the MRSDCI matrix under the effect of the Triples and Quadruples, the coefficients of which are estimated from the action of ${\\hat T}^2$. This dressing follows the logics of the intermediate effective Hamiltonian formalism. The dressed MRSDCI matrix is diagonalized and the process is iterated to convergence. The method is tested on a series of benchmark systems from Complete Active Spaces (CAS) involving 2 or 4 active electrons up to bond breakings. The...

  3. The Heterogeneous P-Median Problem for Categorization Based Clustering

    Science.gov (United States)

    Blanchard, Simon J.; Aloise, Daniel; DeSarbo, Wayne S.

    2012-01-01

    The p-median offers an alternative to centroid-based clustering algorithms for identifying unobserved categories. However, existing p-median formulations typically require data aggregation into a single proximity matrix, resulting in masked respondent heterogeneity. A proposed three-way formulation of the p-median problem explicitly considers…

  4. Frailty phenotypes in the elderly based on cluster analysis

    DEFF Research Database (Denmark)

    Dato, Serena; Montesanto, Alberto; Lagani, Vincenzo;

    2012-01-01

    genetic background on the frailty status is still questioned. We investigated the applicability of a cluster analysis approach based on specific geriatric parameters, previously set up and validated in a southern Italian population, to two large longitudinal Danish samples. In both cohorts, we identified...

  5. Cluster based parallel database management system for data intensive computing

    Institute of Scientific and Technical Information of China (English)

    Jianzhong LI; Wei ZHANG

    2009-01-01

    This paper describes a computer-cluster based parallel database management system (DBMS), InfiniteDB, developed by the authors. InfiniteDB aims at efficiently sup-port data intensive computing in response to the rapid grow-ing in database size and the need of high performance ana-lyzing of massive databases. It can be efficiently executed in the computing system composed by thousands of computers such as cloud computing system. It supports the parallelisms of intra-query, inter-query, intra-operation, inter-operation and pipelining. It provides effective strategies for managing massive databases including the multiple data declustering methods, the declustering-aware algorithms for relational operations and other database operations, and the adaptive query optimization method. It also provides the functions of parallel data warehousing and data mining, the coordinator-wrapper mechanism to support the integration of heteroge-neous information resources on the Internet, and the fault tol-erant and resilient infrastructures. It has been used in many applications and has proved quite effective for data intensive computing.

  6. Multilevel Analysis Methods for Partially Nested Cluster Randomized Trials

    Science.gov (United States)

    Sanders, Elizabeth A.

    2011-01-01

    This paper explores multilevel modeling approaches for 2-group randomized experiments in which a treatment condition involving clusters of individuals is compared to a control condition involving only ungrouped individuals, otherwise known as partially nested cluster randomized designs (PNCRTs). Strategies for comparing groups from a PNCRT in the…

  7. Health state evaluation of shield tunnel SHM using fuzzy cluster method

    Science.gov (United States)

    Zhou, Fa; Zhang, Wei; Sun, Ke; Shi, Bin

    2015-04-01

    Shield tunnel SHM is in the path of rapid development currently while massive monitoring data processing and quantitative health grading remain a real challenge, since multiple sensors belonging to different types are employed in SHM system. This paper addressed the fuzzy cluster method based on fuzzy equivalence relationship for the health evaluation of shield tunnel SHM. The method was optimized by exporting the FSV map to automatically generate the threshold value. A new holistic health score(HHS) was proposed and its effectiveness was validated by conducting a pilot test. A case study on Nanjing Yangtze River Tunnel was presented to apply this method. Three types of indicators, namely soil pressure, pore pressure and steel strain, were used to develop the evaluation set U. The clustering results were verified by analyzing the engineering geological conditions; the applicability and validity of the proposed method was also demonstrated. Besides, the advantage of multi-factor evaluation over single-factor model was discussed by using the proposed HHS. This investigation indicated the fuzzy cluster method and HHS is capable of characterizing the fuzziness of tunnel health, and it is beneficial to clarify the tunnel health evaluation uncertainties.

  8. Threshold selection for classification of MR brain images by clustering method

    Energy Technology Data Exchange (ETDEWEB)

    Moldovanu, Simona [Faculty of Sciences and Environment, Department of Chemistry, Physics and Environment, Dunărea de Jos University of Galaţi, 47 Domnească St., 800008, Romania, Phone: +40 236 460 780 (Romania); Dumitru Moţoc High School, 15 Milcov St., 800509, Galaţi (Romania); Obreja, Cristian; Moraru, Luminita, E-mail: luminita.moraru@ugal.ro [Faculty of Sciences and Environment, Department of Chemistry, Physics and Environment, Dunărea de Jos University of Galaţi, 47 Domnească St., 800008, Romania, Phone: +40 236 460 780 (Romania)

    2015-12-07

    Given a grey-intensity image, our method detects the optimal threshold for a suitable binarization of MR brain images. In MR brain image processing, the grey levels of pixels belonging to the object are not substantially different from the grey levels belonging to the background. Threshold optimization is an effective tool to separate objects from the background and further, in classification applications. This paper gives a detailed investigation on the selection of thresholds. Our method does not use the well-known method for binarization. Instead, we perform a simple threshold optimization which, in turn, will allow the best classification of the analyzed images into healthy and multiple sclerosis disease. The dissimilarity (or the distance between classes) has been established using the clustering method based on dendrograms. We tested our method using two classes of images: the first consists of 20 T2-weighted and 20 proton density PD-weighted scans from two healthy subjects and from two patients with multiple sclerosis. For each image and for each threshold, the number of the white pixels (or the area of white objects in binary image) has been determined. These pixel numbers represent the objects in clustering operation. The following optimum threshold values are obtained, T = 80 for PD images and T = 30 for T2w images. Each mentioned threshold separate clearly the clusters that belonging of the studied groups, healthy patient and multiple sclerosis disease.

  9. Utility-guided Clustering-based Transaction Data Anonymization

    Directory of Open Access Journals (Sweden)

    Aris Gkoulalas-Divanis

    2012-04-01

    Full Text Available Transaction data about individuals are increasingly collected to support a plethora of applications, spanning from marketing to biomedical studies. Publishing these data is required by many organizations, but may result in privacy breaches, if an attacker exploits potentially identifying information to link individuals to their records in the published data. Algorithms that prevent this threat by transforming transaction data prior to their release have been proposed recently, but they may incur significant utility loss due to their inability to: (i accommodate a range of different privacy requirements that data owners often have, and (ii guarantee that the produced data will satisfy data owners’ utility requirements. To address this issue, we propose a novel clustering-based framework to anonymizing transaction data, which provides the basis for designing algorithms that better preserve data utility. Based on this framework, we develop two anonymization algorithms which explore a larger solution space than existing methods and can satisfy a wide range of privacy requirements. Additionally, the second algorithm allows the specification and enforcement of utility requirements, thereby ensuring that the anonymized data remain useful in intended tasks. Experiments with both benchmark and real medical datasets verify that our algorithms significantly outperform the current state-of-the-art algorithms in terms of data utility, while being comparable in terms of efficiency.

  10. SEARCH PROFILES BASED ON USER TO CLUSTER SIMILARITY

    Directory of Open Access Journals (Sweden)

    Ilija Subasic

    2007-12-01

    Full Text Available Privacy of web users' query search logs has, since last year's AOL dataset release, been treated as one of the central issues concerning privacy on the Internet, Therefore, the question of privacy preservation has also raised a lot of attention in different communities surrounding the search engines. Usage of clustering methods for providing low level contextual search, wriile retaining high privacy/utility is examined in this paper. By using only the user's cluster membership the search query terms could be no longer retained thus providing less privacy concerns both for the users and companies. The paper brings lightweight framework for combining query words, user similarities and clustering in order to provide a meaningful way of mining user searches while protecting their privacy. This differs from previous attempts for privacy preserving in the attempt to anonymize the queries instead of the users.

  11. Comparative Studies of Various Clustering Techniques and Its Characteristics

    Directory of Open Access Journals (Sweden)

    M.Sathya Deepa

    2014-05-01

    Full Text Available Discovering knowledge from the mass database is the main objective of the Data Mining. Clustering is the key technique in data mining. A cluster is made up of a number of similar objects grouped together. The clustering is an unsupervised learning. There are many methods to form clusters. The four important methods of clustering are Partitional Clustering, Hierarchical Clustering, Density-Based Clustering and Grid-Based Clustering. In this paper, we discussed these four methods in detail.

  12. VANET Clustering Based Routing Protocol Suitable for Deserts.

    Science.gov (United States)

    Nasr, Mohammed Mohsen Mohammed; Abdelgader, Abdeldime Mohamed Salih; Wang, Zhi-Gong; Shen, Lian-Feng

    2016-01-01

    In recent years, there has emerged applications of vehicular ad hoc networks (VANETs) towards security, safety, rescue, exploration, military and communication redundancy systems in non-populated areas, besides its ordinary use in urban environments as an essential part of intelligent transportation systems (ITS). This paper proposes a novel algorithm for the process of organizing a cluster structure and cluster head election (CHE) suitable for VANETs. Moreover, it presents a robust clustering-based routing protocol, which is appropriate for deserts and can achieve high communication efficiency, ensuring reliable information delivery and optimal exploitation of the equipment on each vehicle. A comprehensive simulation is conducted to evaluate the performance of the proposed CHE and routing algorithms. PMID:27058539

  13. Environment-based selection effects of Planck clusters

    CERN Document Server

    Kosyra, Ralf; Seitz, Stella; Mana, Annalisa; Rozo, Eduardo; Rykoff, Eli; Sanchez, Ariel; Bender, Ralf

    2015-01-01

    We investigate whether the large scale structure environment of galaxy clusters imprints a selection bias on Sunyaev Zel'dovich (SZ) catalogs. Such a selection effect might be caused by line of sight (LoS) structures that add to the SZ signal or contain point sources that disturb the signal extraction in the SZ survey. We use the Planck PSZ1 union catalog (Planck Collab- oration et al. 2013a) in the SDSS region as our sample of SZ selected clusters. We calculate the angular two-point correlation function (2pcf) for physically correlated, foreground and background structure in the RedMaPPer SDSS DR8 catalog with respect to each cluster. We compare our results with an optically selected comparison cluster sample and with theoretical predictions. In contrast to the hypothesis of no environment-based selection, we find a mean 2pcf for background structures of -0.049 on scales of $\\lesssim 40'$, significantly non-zero at $\\sim 4 \\sigma$, which means that Planck clusters are more likely to be detected in regions of...

  14. Information bottleneck based incremental fuzzy clustering for large biomedical data.

    Science.gov (United States)

    Liu, Yongli; Wan, Xing

    2016-08-01

    Incremental fuzzy clustering combines advantages of fuzzy clustering and incremental clustering, and therefore is important in classifying large biomedical literature. Conventional algorithms, suffering from data sparsity and high-dimensionality, often fail to produce reasonable results and may even assign all the objects to a single cluster. In this paper, we propose two incremental algorithms based on information bottleneck, Single-Pass fuzzy c-means (spFCM-IB) and Online fuzzy c-means (oFCM-IB). These two algorithms modify conventional algorithms by considering different weights for each centroid and object and scoring mutual information loss to measure the distance between centroids and objects. spFCM-IB and oFCM-IB are used to group a collection of biomedical text abstracts from Medline database. Experimental results show that clustering performances of our approaches are better than such prominent counterparts as spFCM, spHFCM, oFCM and oHFCM, in terms of accuracy. PMID:27260783

  15. Problem decomposition by mutual information and force-based clustering

    Science.gov (United States)

    Otero, Richard Edward

    The scale of engineering problems has sharply increased over the last twenty years. Larger coupled systems, increasing complexity, and limited resources create a need for methods that automatically decompose problems into manageable sub-problems by discovering and leveraging problem structure. The ability to learn the coupling (inter-dependence) structure and reorganize the original problem could lead to large reductions in the time to analyze complex problems. Such decomposition methods could also provide engineering insight on the fundamental physics driving problem solution. This work forwards the current state of the art in engineering decomposition through the application of techniques originally developed within computer science and information theory. The work describes the current state of automatic problem decomposition in engineering and utilizes several promising ideas to advance the state of the practice. Mutual information is a novel metric for data dependence and works on both continuous and discrete data. Mutual information can measure both the linear and non-linear dependence between variables without the limitations of linear dependence measured through covariance. Mutual information is also able to handle data that does not have derivative information, unlike other metrics that require it. The value of mutual information to engineering design work is demonstrated on a planetary entry problem. This study utilizes a novel tool developed in this work for planetary entry system synthesis. A graphical method, force-based clustering, is used to discover related sub-graph structure as a function of problem structure and links ranked by their mutual information. This method does not require the stochastic use of neural networks and could be used with any link ranking method currently utilized in the field. Application of this method is demonstrated on a large, coupled low-thrust trajectory problem. Mutual information also serves as the basis for an

  16. Design of the optimum insulator gate bipolar transistor using response surface method with cluster analysis

    CERN Document Server

    Wang, Chi Ling; Huang Sy Ruen; Yeh Chao Yu

    2004-01-01

    In this paper, a statistical methodology that can be used for the optimization of the Insulator Gate Bipolar Transistor (IGBT) devices is proposed. This is achieved by integrating the response surface method (RSM) with cluster analysis, weighted composite method and genetic algorithm (GA). The device characteristic of IGBT was simulated based upon the fabrication simulator, ATHENA, and the device simulator, ATLAS. This methodology, yielded another way to investigate the IGBT device and to make a decision in the tradeoff between the breakdown voltage and the on-resistance. In this methodology, we also show how to use cluster analysis to determine the dominant factors that are not visible in the screening of all experiments. 20 Refs.

  17. Performance Comparison of Cluster based and Threshold based Algorithms for Detection and Prevention of Cooperative Black Hole Attack in MANETs

    Directory of Open Access Journals (Sweden)

    P. S. Hiremath

    2014-11-01

    Full Text Available In mobile ad-hoc networks (MANET, the movement of the nodes may quickly change the networks topology resulting in the increase of the overhead message in topology maintenance. The nodes communicate with each other by exchanging the hello packet and constructing the neighbor list at each node. MANET is vulnerable to attacks such as black hole attack, gray hole attack, worm hole attack and sybil attack. A black hole attack makes a serious impact on routing, packet delivery ratio, throughput, and end to end delay of packets. In this paper, the performance comparison of clustering based and threshold based algorithms for detection and prevention of cooperative in MANETs is examined. In this study every node is monitored by its own cluster head (CH, while server (SV monitors the entire network by channel overhearing method. Server computes the trust value based on sent and receive count of packets of the receiver node. It is implemented using AODV routing protocol in the NS2 simulations. The results are obtained by comparing the performance of clustering based and threshold based methods by varying the concentration of black hole nodes and are analyzed in terms of throughput, packet delivery ratio. The results demonstrate that the threshold based method outperforms the clustering based method in terms of throughput, packet delivery ratio and end to end delay.

  18. Management of Energy Consumption on Cluster Based Routing Protocol for MANET

    Science.gov (United States)

    Hosseini-Seno, Seyed-Amin; Wan, Tat-Chee; Budiarto, Rahmat; Yamada, Masashi

    The usage of light-weight mobile devices is increasing rapidly, leading to demand for more telecommunication services. Consequently, mobile ad hoc networks and their applications have become feasible with the proliferation of light-weight mobile devices. Many protocols have been developed to handle service discovery and routing in ad hoc networks. However, the majority of them did not consider one critical aspect of this type of network, which is the limited of available energy in each node. Cluster Based Routing Protocol (CBRP) is a robust/scalable routing protocol for Mobile Ad hoc Networks (MANETs) and superior to existing protocols such as Ad hoc On-demand Distance Vector (AODV) in terms of throughput and overhead. Therefore, based on this strength, methods to increase the efficiency of energy usage are incorporated into CBRP in this work. In order to increase the stability (in term of life-time) of the network and to decrease the energy consumption of inter-cluster gateway nodes, an Enhanced Gateway Cluster Based Routing Protocol (EGCBRP) is proposed. Three methods have been introduced by EGCBRP as enhancements to the CBRP: improving the election of cluster Heads (CHs) in CBRP which is based on the maximum available energy level, implementing load balancing for inter-cluster traffic using multiple gateways, and implementing sleep state for gateway nodes to further save the energy. Furthermore, we propose an Energy Efficient Cluster Based Routing Protocol (EECBRP) which extends the EGCBRP sleep state concept into all idle member nodes, excluding the active nodes in all clusters. The experiment results show that the EGCBRP decreases the overall energy consumption of the gateway nodes up to 10% and the EECBRP reduces the energy consumption of the member nodes up to 60%, both of which in turn contribute to stabilizing the network.

  19. 基于熵权-离差聚类法的城市公共安全舆情评估%Public Opinion Assesment about Urban Public Security Based on Entropy Weight-deviation and Clustering Method

    Institute of Scientific and Technical Information of China (English)

    张庆民; 王海燕; 吴春梅; 吴士亮

    2012-01-01

    Taking Tianya community forum as an example, public opinion about urban public security was evaluated for promoting constructing safe and sound cities and communities. Based on community security index, the security opinion data for five big cities, Beijing, Shanghai, Tianjin, Chongqing and Guangzhou were mined from January 2010 to May 2012 by LocoySpider. The weight values of the security opinion were determined by the entropy weight method and maximizing deviations method and a multi-attribute decision model was built. Urban public security opinion was evaluated using a ward system clustering. The results show that the rankings of the five cities by opinion index of the urban public security are: Shanghai, Chongqing, Beijing, Guangzhou,and Tianjin. Obvious differences exist among the urban public security opinion indexes of the five cities.%为了提高我国平安城市和安全社区建设水平,以天涯社区论坛为例,评估我国城市公共安全舆情.基于安全社区指标体系,采用火车头采集器对北京、天津、上海、重庆和广州5个城市2010年1月-2012年5月的公共安全舆情数据进行挖掘,采用熵权法和离差最大化法确定安全舆情指标的权重值,建立城市公共安全舆情评价的多属性决策模型.利用ward系统聚类法对城市公共安全舆情进行分类评价.计算结果表明,城市公共安全舆情指数的排序依次为上海、重庆、北京、广州、天津.这5个城市的公共安全舆情指数存在明显差异.

  20. Communication style and exercise compliance in physiotherapy (CONNECT. A cluster randomized controlled trial to test a theory-based intervention to increase chronic low back pain patients’ adherence to physiotherapists’ recommendations: study rationale, design, and methods

    Directory of Open Access Journals (Sweden)

    Lonsdale Chris

    2012-06-01

    Full Text Available Abstract Background Physical activity and exercise therapy are among the accepted clinical rehabilitation guidelines and are recommended self-management strategies for chronic low back pain. However, many back pain sufferers do not adhere to their physiotherapist’s recommendations. Poor patient adherence may decrease the effectiveness of advice and home-based rehabilitation exercises. According to self-determination theory, support from health care practitioners can promote patients’ autonomous motivation and greater long-term behavioral persistence (e.g., adherence to physiotherapists’ recommendations. The aim of this trial is to assess the effect of an intervention designed to increase physiotherapists’ autonomy-supportive communication on low back pain patients’ adherence to physical activity and exercise therapy recommendations. Methods/Design This study will be a single-blinded cluster randomized controlled trial. Outpatient physiotherapy centers (N =12 in Dublin, Ireland (population = 1.25 million will be randomly assigned using a computer-generated algorithm to either the experimental or control arm. Physiotherapists in the experimental arm (two hospitals and four primary care clinics will attend eight hours of communication skills training. Training will include handouts, workbooks, video examples, role-play, and discussion designed to teach physiotherapists how to communicate in a manner that promotes autonomous patient motivation. Physiotherapists in the waitlist control arm (two hospitals and four primary care clinics will not receive this training. Participants (N = 292 with chronic low back pain will complete assessments at baseline, as well as 1 week, 4 weeks, 12 weeks, and 24 weeks after their first physiotherapy appointment. Primary outcomes will include adherence to physiotherapy recommendations, as well as low back pain, function, and well-being. Participants will be blinded to treatment allocation, as

  1. A RM-Based Static Deployment System for Cluster

    OpenAIRE

    Weiguo Wu; Leiqiang Zhang; Lei Wang; Zhenghua Xue

    2009-01-01

    In this paper, a Reliable-Multicast (RM) based static deployment system for cluster is designed which can be functionally divided into three parts: image capture, image transference, and node configuration. Image capture takes the snapshot of the operating system in the source node. Image transference distributes the captured image to target nodes. Node configuration mainly finishes the configuration of some identified information to target nodes.Applying the technology of Image-Based Install...

  2. A Review of Density-Based clustering in Spatial Data

    Directory of Open Access Journals (Sweden)

    Pragati Shrivastava

    2012-09-01

    Full Text Available Data mining is a non-trivial process. That is identifyingnovel, valid and potentially useful patterns in data. Datamining supports automatic data exploration. That isextracting hidden information from the huge database.Data mining refers to search useful and relevantinformation from the database. Spatial mining is a branchof data mining. The spatial mining deals with the locationor geo-referenced data. Spatial mining are based on thedensity based clustering. Density is covered area of anydata.

  3. Customer-Classified Algorithm Based onFuzzy Clustering Analysis

    Institute of Scientific and Technical Information of China (English)

    郭蕴华; 祖巧红; 陈定方

    2004-01-01

    A customer-classified evaluation system is described with the customization-supporting tree of evaluation indexes, in which users can determine any evaluation index independently. Based on this system, a customer-classified algorithm based on fuzzy clustering analysis is proposed to implement the customer-classified management. A numerical example is presented, which provides correct results,indicating that the algorithm can be used in the decision support system of CRM.

  4. Approximate K-Nearest Neighbour Based Spatial Clustering Using K-D Tree

    Directory of Open Access Journals (Sweden)

    Mohammed Otair

    2013-03-01

    Full Text Available Different spatial objects that vary in their characteristics, such as molecular biology and geography, arepresented in spatial areas. Methods to organize, manage, and maintain those objects in a structuredmanner are required. Data mining raised different techniques to overcome these requirements. There aremany major tasks of data mining, but the mostly used task is clustering. Data set within the same clustershare common features that give each cluster its characteristics. In this paper, an implementation ofApproximate kNN-based spatial clustering algorithm using the K-d tree is proposed. The majorcontribution achieved by this research is the use of the k-d tree data structure for spatial clustering, andcomparing its performance to the brute-force approach. The results of the work performed in this paperrevealed better performance using the k-d tree, compared to the traditional brute-force approach.

  5. Cluster-based reduced-order modelling of a mixing layer

    CERN Document Server

    Kaiser, Eurika; Cordier, Laurent; Spohn, Andreas; Segond, Marc; Abel, Markus; Daviller, Guillaume; Niven, Robert K

    2013-01-01

    We propose a novel cluster-based reduced-order modelling (CROM) strategy of unsteady flows. CROM builds on the pioneering works of Gunzburger's group in cluster analysis (Burkardt et al. 2006) and Eckhardt's group in transition matrix models (Schneider et al. 2007) and constitutes a potential alternative to POD models. This strategy processes a time-resolved sequence of flow snapshots in two steps. First, the snapshot data is clustered into a small number of representative states, called centroids, in the state space. These centroids partition the state space in complementary non-overlapping regions (centroidal Voronoi cells). Departing from the standard algorithm, the probability of the clusters are determined, and the states are sorted by transition matrix consideration. Secondly, the transitions between the states are dynamically modelled via a Markov process. Physical mechanisms are then distilled by a refined analysis of the Markov process, e.g. with the finite-time Lyapunov exponent and entropic methods...

  6. Recognition of Spontaneous Combustion in Coal Mines Based on Genetic Clustering

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Spontaneous combustion is one of the greatest disasters in coal mines. Early recognition is important because it may be a potential inducement for other coalmine accidents. However, early recognition is difficult because of the complexity of different coal mines. Fuzzy clustering has been proposed to incorporate the uncertainty of spontaneous combustion in coal mines and it can give a clear degree of classification of combustion. Because FCM clustering tends to become trapped in local minima, a new approach of fuzzy c-means clustering based on a genetic algorithm is therefore proposed. Genetic algorithm is capable of locating optimal or near optimal solutions to difficult problems. It can be applied in many fields without first obtaining detailed knowledge about correlation. It is helpful in improving the effectiveness of fuzzy clustering in detecting spontaneous combustion. The effectiveness of the method is demonstrated by means of an experiment.

  7. The Cluster Variation Method: A Primer for Neuroscientists

    Directory of Open Access Journals (Sweden)

    Alianna J. Maren

    2016-09-01

    Full Text Available Effective Brain–Computer Interfaces (BCIs require that the time-varying activation patterns of 2-D neural ensembles be modelled. The cluster variation method (CVM offers a means for the characterization of 2-D local pattern distributions. This paper provides neuroscientists and BCI researchers with a CVM tutorial that will help them to understand how the CVM statistical thermodynamics formulation can model 2-D pattern distributions expressing structural and functional dynamics in the brain. The premise is that local-in-time free energy minimization works alongside neural connectivity adaptation, supporting the development and stabilization of consistent stimulus-specific responsive activation patterns. The equilibrium distribution of local patterns, or configuration variables, is defined in terms of a single interaction enthalpy parameter (h for the case of an equiprobable distribution of bistate (neural/neural ensemble units. Thus, either one enthalpy parameter (or two, for the case of non-equiprobable distribution yields equilibrium configuration variable values. Modeling 2-D neural activation distribution patterns with the representational layer of a computational engine, we can thus correlate variational free energy minimization with specific configuration variable distributions. The CVM triplet configuration variables also map well to the notion of a M = 3 functional motif. This paper addresses the special case of an equiprobable unit distribution, for which an analytic solution can be found.

  8. A new sparse Bayesian learning method for inverse synthetic aperture radar imaging via exploiting cluster patterns

    Science.gov (United States)

    Fang, Jun; Zhang, Lizao; Duan, Huiping; Huang, Lei; Li, Hongbin

    2016-05-01

    The application of sparse representation to SAR/ISAR imaging has attracted much attention over the past few years. This new class of sparse representation based imaging methods present a number of unique advantages over conventional range-Doppler methods, the basic idea behind these works is to formulate SAR/ISAR imaging as a sparse signal recovery problem. In this paper, we propose a new two-dimensional pattern-coupled sparse Bayesian learning(SBL) method to capture the underlying cluster patterns of the ISAR target images. Based on this model, an expectation-maximization (EM) algorithm is developed to infer the maximum a posterior (MAP) estimate of the hyperparameters, along with the posterior distribution of the sparse signal. Experimental results demonstrate that the proposed method is able to achieve a substantial performance improvement over existing algorithms, including the conventional SBL method.

  9. Search for global-minimum geometries of medium-sized germanium clusters. II. Motif-based low-lying clusters Ge21-Ge29

    Science.gov (United States)

    Yoo, S.; Zeng, X. C.

    2006-05-01

    We performed a constrained search for the geometries of low-lying neutral germanium clusters GeN in the size range of 21⩽N⩽29. The basin-hopping global optimization method is employed for the search. The potential-energy surface is computed based on the plane-wave pseudopotential density functional theory. A new series of low-lying clusters is found on the basis of several generic structural motifs identified previously for silicon clusters [S. Yoo and X. C. Zeng, J. Chem. Phys. 124, 054304 (2006)] as well as for smaller-sized germanium clusters [S. Bulusu et al., J. Chem. Phys. 122, 164305 (2005)]. Among the generic motifs examined, we found that two motifs stand out in producing most low-lying clusters, namely, the six/nine motif, a puckered-hexagonal-ring Ge6 unit attached to a tricapped trigonal prism Ge9, and the six/ten motif, a puckered-hexagonal-ring Ge6 unit attached to a bicapped antiprism Ge10. The low-lying clusters obtained are all prolate in shape and their energies are appreciably lower than the near-spherical low-energy clusters. This result is consistent with the ion-mobility measurement in that medium-sized germanium clusters detected are all prolate in shape until the size N ˜65.

  10. Comparisons of Graph-structure Clustering Methods for Gene Expression Data

    Institute of Scientific and Technical Information of China (English)

    Zhuo FANG; Lei LIU; Jiong YANG; Qing-Ming LUO; Yi-Xue LI

    2006-01-01

    Although many numerical clustering algorithms have been applied to gene expression data analysis, the essential step is still biological interpretation by manual inspection. The correlation between genetic co-regulation and affiliation to a common biological process is what biologists expect. Here, we introduce some clustering algorithms that are based on graph structure constituted by biological knowledge. After applying a widely used dataset, we compared the result clusters of two of these algorithms in terms of the homogeneity of clusters and coherence of annotation and matching ratio. The results show that the clusters of knowledge-guided analysis are the kernel parts of the clusters of Gene Ontology (GO)-Cluster software, which contains the genes that are most expression correlative and most consistent with biological functions. Moreover, knowledge-guided analysis seems much more applicable than GO-Cluster in a larger dataset.

  11. Selections of data preprocessing methods and similarity metrics for gene cluster analysis

    Institute of Scientific and Technical Information of China (English)

    YANG Chunmei; WAN Baikun; GAO Xiaofeng

    2006-01-01

    Clustering is one of the major exploratory techniques for gene expression data analysis. Only with suitable similarity metrics and when datasets are properly preprocessed, can results of high quality be obtained in cluster analysis. In this study, gene expression datasets with external evaluation criteria were preprocessed as normalization by line, normalization by column or logarithm transformation by base-2, and were subsequently clustered by hierarchical clustering, k-means clustering and self-organizing maps (SOMs) with Pearson correlation coefficient or Euclidean distance as similarity metric. Finally, the quality of clusters was evaluated by adjusted Rand index. The results illustrate that k-means clustering and SOMs have distinct advantages over hierarchical clustering in gene clustering, and SOMs are a bit better than k-means when randomly initialized. It also shows that hierarchical clustering prefers Pearson correlation coefficient as similarity metric and dataset normalized by line. Meanwhile, k-means clustering and SOMs can produce better clusters with Euclidean distance and logarithm transformed datasets. These results will afford valuable reference to the implementation of gene expression cluster analysis.

  12. A Virtual Router Cluster System Based on the Separation of the Control Plane and the Data Plane

    Institute of Scientific and Technical Information of China (English)

    2012-01-01

    This paper proposes a virtual router cluster system based on the separation of the control plane and the from multiple perspectives, such as architecture, key technologies, scenarios and standardization. To some extent, cluster simplifies network topology and management, achieves automatic conFig.uration and saves the IP address of low-cost expansion method of aggregation equipment port density

  13. Risk Assessment for Bridges Safety Management during Operation Based on Fuzzy Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Xia Hanyu

    2016-01-01

    Full Text Available In recent years, large span and large sea-crossing bridges are built, bridges accidents caused by improper operational management occur frequently. In order to explore the better methods for risk assessment of the bridges operation departments, the method based on fuzzy clustering algorithm is selected. Then, the implementation steps of fuzzy clustering algorithm are described, the risk evaluation system is built, and Taizhou Bridge is selected as an example, the quantitation of risk factors is described. After that, the clustering algorithm based on fuzzy equivalence is calculated on MATLAB 2010a. In the last, Taizhou Bridge operation management departments are classified and sorted according to the degree of risk, and the safety situation of operation departments is analyzed.

  14. Unsupervised active learning based on hierarchical graph-theoretic clustering.

    Science.gov (United States)

    Hu, Weiming; Hu, Wei; Xie, Nianhua; Maybank, Steve

    2009-10-01

    Most existing active learning approaches are supervised. Supervised active learning has the following problems: inefficiency in dealing with the semantic gap between the distribution of samples in the feature space and their labels, lack of ability in selecting new samples that belong to new categories that have not yet appeared in the training samples, and lack of adaptability to changes in the semantic interpretation of sample categories. To tackle these problems, we propose an unsupervised active learning framework based on hierarchical graph-theoretic clustering. In the framework, two promising graph-theoretic clustering algorithms, namely, dominant-set clustering and spectral clustering, are combined in a hierarchical fashion. Our framework has some advantages, such as ease of implementation, flexibility in architecture, and adaptability to changes in the labeling. Evaluations on data sets for network intrusion detection, image classification, and video classification have demonstrated that our active learning framework can effectively reduce the workload of manual classification while maintaining a high accuracy of automatic classification. It is shown that, overall, our framework outperforms the support-vector-machine-based supervised active learning, particularly in terms of dealing much more efficiently with new samples whose categories have not yet appeared in the training samples. PMID:19336318

  15. Identification of essential proteins based on edge clustering coefficient.

    Science.gov (United States)

    Wang, Jianxin; Li, Min; Wang, Huan; Pan, Yi

    2012-01-01

    Identification of essential proteins is key to understanding the minimal requirements for cellular life and important for drug design. The rapid increase of available protein-protein interaction (PPI) data has made it possible to detect protein essentiality on network level. A series of centrality measures have been proposed to discover essential proteins based on network topology. However, most of them tended to focus only on the location of single protein, but ignored the relevance between interactions and protein essentiality. In this paper, a new centrality measure for identifying essential proteins based on edge clustering coefficient, named as NC, is proposed. Different from previous centrality measures, NC considers both the centrality of a node and the relationship between it and its neighbors. For each interaction in the network, we calculate its edge clustering coefficient. A node’s essentiality is determined by the sum of the edge clustering coefficients of interactions connecting it and its neighbors. The new centrality measure NC takes into account the modular nature of protein essentiality. NC is applied to three different types of yeast protein-protein interaction networks, which are obtained from the DIP database, the MIPS database and the BioGRID database, respectively. The experimental results on the three different networks show that the number of essential proteins discovered by NC universally exceeds that discovered by the six other centrality measures: DC, BC, CC, SC, EC, and IC. Moreover, the essential proteins discovered by NC show significant cluster effect. PMID:22084147

  16. HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY CLUSTERING

    Directory of Open Access Journals (Sweden)

    Qiang Zhan

    2015-07-01

    Full Text Available Concept hierarchy is the backbone of ontology, and the concept hierarchy acquisition has been a hot topic in the field of ontology learning. this paper proposes a hyponymy extraction method of domain ontology concept based on cascaded conditional random field(CCRFs and hierarchy clustering. It takes free text as extracting object, adopts CCRFs identifying the domain concepts. First the low layer of CCRFs is used to identify simple domain concept, then the results are sent to the high layer, in which the nesting concepts are recognized. Next we adopt hierarchy clustering to identify the hyponymy relation between domain ontology concepts. The experimental results demonstrate the proposed method is efficient.

  17. Communities recognition in the Chesapeake Bay ecosystem by dynamical clustering algorithms based on different oscillators systems

    CERN Document Server

    Pluchino, Alessandro; Latora, Vito

    2008-01-01

    We have recently introduced an efficient method for the detection and identification of modules in complex networks, based on the de-synchronization properties (dynamical clustering) of phase oscillators. In this paper we apply the dynamical clustering tecnique to the identification of communities of marine organisms living in the Chesapeake Bay food web. We show that our algorithm is able to perform a very reliable classification of the real communities existing in this ecosystem by using different kinds of dynamical oscillators. We compare also our results with those of other methods for the detection of community structures in complex networks.

  18. Clustering as an EDA Method: The Case of Pedestrian Directional Flow Behavior

    Directory of Open Access Journals (Sweden)

    Ma. Regina E. Estuar

    2010-01-01

    Full Text Available Given the data of pedestrian trajectories in NTXY format, three clustering methods of K Means, Expectation Maximization (EM and Affinity Propagation were utilized as Exploratory Data Analysis to find the pattern of pedestrian directional flow behavior. The analysis begins without a prior notion regarding the structure of the pattern and it consequentially infers the structure of directional flow pattern. Significant similarities in patterns for both individual and instantaneous walking angles based on EDA method are reported and explained in case studies

  19. Formation of fragments in heavy-ion collisions using modified clusterization method

    CERN Document Server

    Goyal, Supriya

    2011-01-01

    We study the formation of fragments by extending the minimum spanning tree method (MST) for clusterization. In this extension, each fragment is subjected to a binding-energy check calculated using the modified Bethe-Weizsacker formula. Earlier, a constant binding-energy cut of 4 MeV/nucleon was imposed. Our results for 197Au +197 Au collisions are compared with ALADiN data and also with the calculations based on the simulated annealing technique. We shall show that the present modified version improves the agreement compared to the MST method.

  20. The XMM/2dF survey III: Comparison between optical and X-ray cluster detection methods

    CERN Document Server

    Basilakos, S; Georgakakis, A; Georgantopoulos, I; Gaga, T; Kolokotronis, V G; Stewart, G C

    2003-01-01

    We directly compare X-ray and optical techniques of cluster detection by combining SDSS photometric data with a wide-field ($\\sim 1.8$ deg$^{2}$) XMM-{\\em Newton} survey in the North Galactic Pole region. The optical cluster detection procedure is based on merging two independent selection methods - a smoothing+percolation technique, and a Matched Filter Algorithm. The X-ray cluster detection is based on a wavelet-based algorithm, incorporated in the SAS v.5.2 package. The final optical sample counts 9 candidate clusters with richness of more than 20 galaxies, corresponding roughly to APM richness class. Three, of our optically detected clusters are also detected in our X-ray survey. The most probable cause of the small number of optical cluster candidates detected in our X-ray survey is that they are relatively poor clusters, fainter than the X-ray flux limit (for extended sources) of our survey $f_{x}(0.3-2 {\\rm keV}) \\simeq 2 \\times 10^{-14} erg cm^{-2} s^{-1}$.

  1. Earthquakes clustering based on the magnitude and the depths in Molluca Province

    International Nuclear Information System (INIS)

    In this paper, we present a model to classify the earthquakes occurred in Molluca Province. We use K-Means clustering method to classify the earthquake based on the magnitude and the depth of the earthquake. The result can be used for disaster mitigation and for designing evacuation route in Molluca Province

  2. Earthquakes clustering based on the magnitude and the depths in Molluca Province

    Energy Technology Data Exchange (ETDEWEB)

    Wattimanela, H. J., E-mail: hwattimaela@yahoo.com [Pattimura University, Ambon (Indonesia); Institute of Technology Bandung, Bandung (Indonesia); Pasaribu, U. S.; Indratno, S. W.; Puspito, A. N. T. [Institute of Technology Bandung, Bandung (Indonesia)

    2015-12-22

    In this paper, we present a model to classify the earthquakes occurred in Molluca Province. We use K-Means clustering method to classify the earthquake based on the magnitude and the depth of the earthquake. The result can be used for disaster mitigation and for designing evacuation route in Molluca Province.

  3. FPGA BASED SOFTWARE TESTING PRIORITIZATION USING RnK-MEANS CLUSTERING

    Directory of Open Access Journals (Sweden)

    N. Bharathi

    2013-10-01

    Full Text Available Testing the software is to validate its correctness when it is deployed in its actual environment. Various test cases should be implemented and tested to validate the software. When more than one test case is involved, the order of testing needs to be prioritized to optimize the testing process. This paper proposed a prioritization method with repeated n times K means (RnK-means clustering. Priority for the test cases is assigned based on the cluster mean values by executing RnK-means for each factor of test cases. Existing techniques are calculating merely the average of factor weights for each test case for deciding priority. The proposed method involves K-means computations and it is accelerated by FPGA for deciding priority. The observed results proved 20 percent better performance with RnK-means clustering than the existing weighted average method.

  4. Voxel-based clustered imaging by multiparameter diffusion tensor images for glioma grading

    OpenAIRE

    Rika Inano; Naoya Oishi; Takeharu Kunieda; Yoshiki Arakawa; Yukihiro Yamao; Sumiya Shibata; Takayuki Kikuchi; Hidenao Fukuyama; Susumu Miyamoto

    2014-01-01

    Gliomas are the most common intra-axial primary brain tumour; therefore, predicting glioma grade would influence therapeutic strategies. Although several methods based on single or multiple parameters from diagnostic images exist, a definitive method for pre-operatively determining glioma grade remains unknown. We aimed to develop an unsupervised method using multiple parameters from pre-operative diffusion tensor images for obtaining a clustered image that could enable visual grading of glio...

  5. Bio-Inspired Prototype-Based Models and Applied Gompertzian Dynamics in Cluster Analysis

    OpenAIRE

    Pastorek, Lukáš

    2010-01-01

    The thesis deals with the analysis of the clustering and mapping techniques derived from the principles of the neural and statistical learning and growth theory. The selected branch of the unsupervised bio-inspired prototype-based models is described in terms of the proposed logical framework, which highlights the continuity of these methods with the classical "pure" statistical methods. Moreover, as those methods are broadly understood as the "black boxes" with the unpredictable, unclear and...

  6. A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays

    Science.gov (United States)

    Craig, Hugh; Berretta, Regina; Moscato, Pablo

    2016-01-01

    In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays. PMID:27571416

  7. A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays.

    Science.gov (United States)

    Naeni, Leila M; Craig, Hugh; Berretta, Regina; Moscato, Pablo

    2016-01-01

    In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays. PMID:27571416

  8. Model-based Clustering of Categorical Time Series with Multinomial Logit Classification

    Science.gov (United States)

    Frühwirth-Schnatter, Sylvia; Pamminger, Christoph; Winter-Ebmer, Rudolf; Weber, Andrea

    2010-09-01

    A common problem in many areas of applied statistics is to identify groups of similar time series in a panel of time series. However, distance-based clustering methods cannot easily be extended to time series data, where an appropriate distance-measure is rather difficult to define, particularly for discrete-valued time series. Markov chain clustering, proposed by Pamminger and Frühwirth-Schnatter [6], is an approach for clustering discrete-valued time series obtained by observing a categorical variable with several states. This model-based clustering method is based on finite mixtures of first-order time-homogeneous Markov chain models. In order to further explain group membership we present an extension to the approach of Pamminger and Frühwirth-Schnatter [6] by formulating a probabilistic model for the latent group indicators within the Bayesian classification rule by using a multinomial logit model. The parameters are estimated for a fixed number of clusters within a Bayesian framework using an Markov chain Monte Carlo (MCMC) sampling scheme representing a (full) Gibbs-type sampler which involves only draws from standard distributions. Finally, an application to a panel of Austrian wage mobility data is presented which leads to an interesting segmentation of the Austrian labour market.

  9. Clustering of User Behaviour based on Web Log data using Improved K-Means Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    S.Padmaja

    2016-02-01

    Full Text Available The proposed work does an improved K-means clustering algorithm for identifying internet user behaviour. Web data analysis includes the transformation and interpretation of web log data find out the information, patterns and knowledge discovery. The efficiency of the algorithm is analyzed by considering certain parameters. The parameters are date, time, S_id, CS_method, C_IP, User_agent and time taken. The research done by using more than 2 years of real data set collected from two different group of institutions web server .this dataset provides a better analysis of Log data to identify internet user behaviour.

  10. A Dirichlet Process Mixture Based Name Origin Clustering and Alignment Model for Transliteration

    Directory of Open Access Journals (Sweden)

    Chunyue Zhang

    2015-01-01

    Full Text Available In machine transliteration, it is common that the transliterated names in the target language come from multiple language origins. A conventional maximum likelihood based single model can not deal with this issue very well and often suffers from overfitting. In this paper, we exploit a coupled Dirichlet process mixture model (cDPMM to address overfitting and names multiorigin cluster issues simultaneously in the transliteration sequence alignment step over the name pairs. After the alignment step, the cDPMM clusters name pairs into many groups according to their origin information automatically. In the decoding step, in order to use the learned origin information sufficiently, we use a cluster combination method (CCM to build clustering-specific transliteration models by combining small clusters into large ones based on the perplexities of name language and transliteration model, which makes sure each origin cluster has enough data for training a transliteration model. On the three different Western-Chinese multiorigin names corpora, the cDPMM outperforms two state-of-the-art baseline models in terms of both the top-1 accuracy and mean F-score, and furthermore the CCM significantly improves the cDPMM.

  11. Priority Based Congestion Control Dynamic Clustering Protocol in Mobile Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    R. Beulah Jayakumari

    2015-01-01

    Full Text Available Wireless sensor network is widely used to monitor natural phenomena because natural disaster has globally increased which causes significant loss of life, economic setback, and social development. Saving energy in a wireless sensor network (WSN is a critical factor to be considered. The sensor nodes are deployed to sense, compute, and communicate alerts in a WSN which are used to prevent natural hazards. Generally communication consumes more energy than sensing and computing; hence cluster based protocol is preferred. Even with clustering, multiclass traffic creates congested hotspots in the cluster, thereby causing packet loss and delay. In order to conserve energy and to avoid congestion during multiclass traffic a novel Priority Based Congestion Control Dynamic Clustering (PCCDC protocol is developed. PCCDC is designed with mobile nodes which are organized dynamically into clusters to provide complete coverage and connectivity. PCCDC computes congestion at intra- and intercluster level using linear and binary feedback method. Each mobile node within the cluster has an appropriate queue model for scheduling prioritized packet during congestion without drop or delay. Simulation results have proven that packet drop, control overhead, and end-to-end delay are much lower in PCCDC which in turn significantly increases packet delivery ratio, network lifetime, and residual energy when compared with PASCC protocol.

  12. Priority Based Congestion Control Dynamic Clustering Protocol in Mobile Wireless Sensor Networks.

    Science.gov (United States)

    Jayakumari, R Beulah; Senthilkumar, V Jawahar

    2015-01-01

    Wireless sensor network is widely used to monitor natural phenomena because natural disaster has globally increased which causes significant loss of life, economic setback, and social development. Saving energy in a wireless sensor network (WSN) is a critical factor to be considered. The sensor nodes are deployed to sense, compute, and communicate alerts in a WSN which are used to prevent natural hazards. Generally communication consumes more energy than sensing and computing; hence cluster based protocol is preferred. Even with clustering, multiclass traffic creates congested hotspots in the cluster, thereby causing packet loss and delay. In order to conserve energy and to avoid congestion during multiclass traffic a novel Priority Based Congestion Control Dynamic Clustering (PCCDC) protocol is developed. PCCDC is designed with mobile nodes which are organized dynamically into clusters to provide complete coverage and connectivity. PCCDC computes congestion at intra- and intercluster level using linear and binary feedback method. Each mobile node within the cluster has an appropriate queue model for scheduling prioritized packet during congestion without drop or delay. Simulation results have proven that packet drop, control overhead, and end-to-end delay are much lower in PCCDC which in turn significantly increases packet delivery ratio, network lifetime, and residual energy when compared with PASCC protocol. PMID:26504898

  13. A Thread-based Two-stage Clustering Method of Microblog Topic Detection%基于线索树双层聚类的微博话题检测

    Institute of Scientific and Technical Information of China (English)

    马彬; 洪宇; 陆剑江; 姚建民; 朱巧明

    2012-01-01

    微博作为一种全新的信息发布模式,在极大程度上增强了网络信息的开放性和互动性,但同时也造成微博空间内信息量的裂变式增长.利用话题检测技术将微博文本信息按照话题进行归类和组织,可以帮助用户在动态变化的信息环境下高效获取个性信息或热点话题.该文针对微博文本短、半结构、上下文信息丰富等特点,提出了基于线索树的双层聚类的话题检测方法,通过利用融合了时序特征和作者信息的话题模型(Temporal-Author-Topic,TAT)进行线索树内的局部聚类,借以实现垃圾微博的过滤,最后利用整合后的线索树进行全局话题检测.实验结果显示该方法在解决数据稀疏方面取得了较好的效果,话题检测的F值达到31.2%.%Microblog is a novel individual publication model over Internet, making significantly more information open and interactive. Utilizing topic detection techniques to classify and organize microblog texts by topics can enable users access to the information interested to them under the dynamic environment. To deal with the short, semi-structured, context dependent microblog texts, we propose a thread-based two-stage clustering method. In the first phase, the temporal-author-topic (TAT) model is applied to clean the thread, namely to filter out the noisy microblog texts. In the second phrase, microblog texts with each thread are merged to form the thread texts for global topic detection. Experimental results show the approach achieves a good performance with a F-measure of 31. 2%.

  14. Statistical physics based heuristic clustering algorithms with an application to econophysics

    Science.gov (United States)

    Baldwin, Lucia Liliana

    Three new approaches to the clustering of data sets are presented. They are heuristic methods and represent forms of unsupervised (non-parametric) clustering. Applied to an unknown set of data these methods automatically determine the number of clusters and their location using no a priori assumptions. All are based on analogies with different physical phenomena. The first technique, named the Percolation Clustering Algorithm, embodies a novel variation on the nearest-neighbor algorithm focusing on the connectivity between sample points. Exploiting the equivalence with a percolation process, this algorithm considers data points to be surrounded by expanding hyperspheres, which bond when they touch each other. Once a sequence of joined spheres spans an entire cluster, percolation occurs and the cluster size remains constant until it merges with a neighboring cluster. The second procedure, named Nucleation and Growth Clustering, exploits the analogy with nucleation and growth which occurs in island formation during epitaxial growth of solids. The original data points are nucleation centers, around which aggregation will occur. Additional "ad-data" that are introduced into the sample space, interact with the data points and stick if located within a threshold distance. These "ad-data" are used as a tool to facilitate the detection of clusters. The third method, named Discrete Deposition Clustering Algorithm, constrains deposition to occur on a grid, which has the advantage of computational efficiency as opposed to the continuous deposition used in the previous method. The original data form the vertexes of a sparse graph and the deposition sites are defined to be the middle points of this graphs edges. Ad-data are introduced on the deposition site and the system is allowed to evolve in a self-organizing regime. This allows the simulation of a phase transition and by monitoring the specific heat capacity of the system one can mark out a "natural" criterion for

  15. Analysis of cost data in a cluster-randomized, controlled trial: comparison of methods

    DEFF Research Database (Denmark)

    Sokolowski, Ineta; Ørnbøl, Eva; Rosendal, Marianne;

    in clusters of general practices.   There have been suggestions to apply different methods, e.g., the non-parametric bootstrap, to highly skewed data from pragmatic randomized trials without clusters, but there is very little information about how to analyse skewed data from cluster-randomized trials. Many...... studies have used non-valid analysis of skewed data. We propose two different methods to compare mean cost in two groups. Firstly, we use a non-parametric bootstrap method where the re-sampling takes place on two levels in order to take into account the cluster effect. Secondly, we proceed with a log...

  16. Survey of Clustering based Financial Fraud Detection Research

    Directory of Open Access Journals (Sweden)

    Andrei Sorin SABAU

    2012-01-01

    Full Text Available Given the current global economic context, increasing efforts are being made to both prevent and detect fraud. This is a natural response to the ascendant trend in fraud activities recorded in the last couple of years, with a 13% increase only in 2011. Due to ever increasing volumes of data needed to be analyzed, data mining methods and techniques are being used more and more often. One domain data mining can excel at, suspicious transaction monitoring, has emerged for the first time as the most effective fraud detection method in 2011. Out of the available data mining techniques, clustering has proven itself a constant applied solution for detecting fraud. This paper surveys clustering techniques used in fraud detection over the last ten years, shortly reviewing each one.

  17. An Aggregation Cache Replacement Algorithm Based on Ontology Clustering

    Institute of Scientific and Technical Information of China (English)

    ZHU Jiang; SHEN Qingguo; TANG Tang; LI Yongqiang

    2006-01-01

    This paper describes the theory, implementation, and experimental evaluation of an Aggregation Cache Replacement(ACR) algorithm. By considering application background, carefully choosing weight values, using a special formula to calculate the similarity, and clustering ontologies by similarity for getting more embedded deep relations, ACR combines the ontology similarity with the value of object and decides which object is to be replaced. We demonstrate the usefulness of ACR through experiments. (a)It is found that the aggregation tree is created wholly differently according to the application cases. Therefore, clustering can direct the content adaptation more accurately according to the user perception and can satisfy the user with different preferences. (b) After comparing this new method with widely-used algorithm Last-Recently-Used (LRU) and First-in-First-out(FIFO) method, it is found that ACR outperforms the later two in accuracy and usability. (c) It has a better semantic explanation and makes adaptation more personalized and more precise.

  18. Application of the Clustering Method in Molecular Dynamics Simulation of the Diffusion Coefficient

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Using molecular dynamics (MD) simulation, the diffusion of oxygen, methane, ammonia and carbon dioxide in water was simulated in the canonical NVT ensemble, and the diffusion coefficient was analyzed by the clustering method. By comparing to the conventional method (using the Einstein model) and the differentiation-interval variation method, we found that the results obtained by the clustering method used in this study are more close to the experimental values. This method proved to be more reasonable than the other two methods.

  19. Energy Aware Cluster Based Routing Scheme For Wireless Sensor Network

    Directory of Open Access Journals (Sweden)

    Roy Sohini

    2015-09-01

    Full Text Available Wireless Sensor Network (WSN has emerged as an important supplement to the modern wireless communication systems due to its wide range of applications. The recent researches are facing the various challenges of the sensor network more gracefully. However, energy efficiency has still remained a matter of concern for the researches. Meeting the countless security needs, timely data delivery and taking a quick action, efficient route selection and multi-path routing etc. can only be achieved at the cost of energy. Hierarchical routing is more useful in this regard. The proposed algorithm Energy Aware Cluster Based Routing Scheme (EACBRS aims at conserving energy with the help of hierarchical routing by calculating the optimum number of cluster heads for the network, selecting energy-efficient route to the sink and by offering congestion control. Simulation results prove that EACBRS performs better than existing hierarchical routing algorithms like Distributed Energy-Efficient Clustering (DEEC algorithm for heterogeneous wireless sensor networks and Energy Efficient Heterogeneous Clustered scheme for Wireless Sensor Network (EEHC.

  20. Enhancing Text Clustering Using Concept-based Mining Model

    Directory of Open Access Journals (Sweden)

    Lincy Liptha R.

    2012-03-01

    Full Text Available Text Mining techniques are mostly based on statistical analysis of a word or phrase. The statistical analysis of a term frequency captures the importance of the term without a document only. But two terms can have the same frequency in the same document. But the meaning that one term contributes might be more appropriate than the meaning contributed by the other term. Hence, the terms that capture the semantics of the text should be given more importance. Here, a new concept-based mining is introduced. It analyses the terms based on the sentence, document and corpus level. The model consists of sentence-based concept analysis which calculates the conceptual term frequency (ctf, document-based concept analysis which finds the term frequency (tf, corpus-based concept analysis which determines the document frequency (dfand concept-based similarity measure. The process of calculating ctf, tf, df, measures in a corpus is attained by the proposed algorithm which is called Concept-Based Analysis Algorithm. By doing so we cluster the web documents in an efficient way and the quality of the clusters achieved by this model significantly surpasses the traditional single-term-base approaches.