WorldWideScience

Sample records for based clustering method

  1. An image segmentation method based on network clustering model

    Science.gov (United States)

    Jiao, Yang; Wu, Jianshe; Jiao, Licheng

    2018-01-01

    Network clustering phenomena are ubiquitous in nature and human society. In this paper, a method involving a network clustering model is proposed for mass segmentation in mammograms. First, the watershed transform is used to divide an image into regions, and features of the image are computed. Then a graph is constructed from the obtained regions and features. The network clustering model is applied to realize clustering of nodes in the graph. Compared with two classic methods, the algorithm based on the network clustering model performs more effectively in experiments.

  2. Improving local clustering based top-L link prediction methods via asymmetric link clustering information

    Science.gov (United States)

    Wu, Zhihao; Lin, Youfang; Zhao, Yiji; Yan, Hongyan

    2018-02-01

    Networks can represent a wide range of complex systems, such as social, biological and technological systems. Link prediction is one of the most important problems in network analysis, and has attracted much research interest recently. Many link prediction methods have been proposed to solve this problem with various techniques. We can note that clustering information plays an important role in solving the link prediction problem. In previous literatures, we find node clustering coefficient appears frequently in many link prediction methods. However, node clustering coefficient is limited to describe the role of a common-neighbor in different local networks, because it cannot distinguish different clustering abilities of a node to different node pairs. In this paper, we shift our focus from nodes to links, and propose the concept of asymmetric link clustering (ALC) coefficient. Further, we improve three node clustering based link prediction methods via the concept of ALC. The experimental results demonstrate that ALC-based methods outperform node clustering based methods, especially achieving remarkable improvements on food web, hamster friendship and Internet networks. Besides, comparing with other methods, the performance of ALC-based methods are very stable in both globalized and personalized top-L link prediction tasks.

  3. Agent-based method for distributed clustering of textual information

    Science.gov (United States)

    Potok, Thomas E [Oak Ridge, TN; Reed, Joel W [Knoxville, TN; Elmore, Mark T [Oak Ridge, TN; Treadwell, Jim N [Louisville, TN

    2010-09-28

    A computer method and system for storing, retrieving and displaying information has a multiplexing agent (20) that calculates a new document vector (25) for a new document (21) to be added to the system and transmits the new document vector (25) to master cluster agents (22) and cluster agents (23) for evaluation. These agents (22, 23) perform the evaluation and return values upstream to the multiplexing agent (20) based on the similarity of the document to documents stored under their control. The multiplexing agent (20) then sends the document (21) and the document vector (25) to the master cluster agent (22), which then forwards it to a cluster agent (23) or creates a new cluster agent (23) to manage the document (21). The system also searches for stored documents according to a search query having at least one term and identifying the documents found in the search, and displays the documents in a clustering display (80) of similarity so as to indicate similarity of the documents to each other.

  4. Kernel method for clustering based on optimal target vector

    International Nuclear Information System (INIS)

    Angelini, Leonardo; Marinazzo, Daniele; Pellicoro, Mario; Stramaglia, Sebastiano

    2006-01-01

    We introduce Ising models, suitable for dichotomic clustering, with couplings that are (i) both ferro- and anti-ferromagnetic (ii) depending on the whole data-set and not only on pairs of samples. Couplings are determined exploiting the notion of optimal target vector, here introduced, a link between kernel supervised and unsupervised learning. The effectiveness of the method is shown in the case of the well-known iris data-set and in benchmarks of gene expression levels, where it works better than existing methods for dichotomic clustering

  5. Segmentation of MRI Volume Data Based on Clustering Method

    Directory of Open Access Journals (Sweden)

    Ji Dongsheng

    2016-01-01

    Full Text Available Here we analyze the difficulties of segmentation without tag line of left ventricle MR images, and propose an algorithm for automatic segmentation of left ventricle (LV internal and external profiles. Herein, we propose an Incomplete K-means and Category Optimization (IKCO method. Initially, using Hough transformation to automatically locate initial contour of the LV, the algorithm uses a simple approach to complete data subsampling and initial center determination. Next, according to the clustering rules, the proposed algorithm finishes MR image segmentation. Finally, the algorithm uses a category optimization method to improve segmentation results. Experiments show that the algorithm provides good segmentation results.

  6. The relationship between supplier networks and industrial clusters: an analysis based on the cluster mapping method

    Directory of Open Access Journals (Sweden)

    Ichiro IWASAKI

    2010-06-01

    Full Text Available Michael Porter’s concept of competitive advantages emphasizes the importance of regional cooperation of various actors in order to gain competitiveness on globalized markets. Foreign investors may play an important role in forming such cooperation networks. Their local suppliers tend to concentrate regionally. They can form, together with local institutions of education, research, financial and other services, development agencies, the nucleus of cooperative clusters. This paper deals with the relationship between supplier networks and clusters. Two main issues are discussed in more detail: the interest of multinational companies in entering regional clusters and the spillover effects that may stem from their participation. After the discussion on the theoretical background, the paper introduces a relatively new analytical method: “cluster mapping” - a method that can spot regional hot spots of specific economic activities with cluster building potential. Experience with the method was gathered in the US and in the European Union. After the discussion on the existing empirical evidence, the authors introduce their own cluster mapping results, which they obtained by using a refined version of the original methodology.

  7. Spectral methods and cluster structure in correlation-based networks

    Science.gov (United States)

    Heimo, Tapio; Tibély, Gergely; Saramäki, Jari; Kaski, Kimmo; Kertész, János

    2008-10-01

    We investigate how in complex systems the eigenpairs of the matrices derived from the correlations of multichannel observations reflect the cluster structure of the underlying networks. For this we use daily return data from the NYSE and focus specifically on the spectral properties of weight W=|-δ and diffusion matrices D=W/sj-δ, where C is the correlation matrix and si=∑jW the strength of node j. The eigenvalues (and corresponding eigenvectors) of the weight matrix are ranked in descending order. As in the earlier observations, the first eigenvector stands for a measure of the market correlations. Its components are, to first approximation, equal to the strengths of the nodes and there is a second order, roughly linear, correction. The high ranking eigenvectors, excluding the highest ranking one, are usually assigned to market sectors and industrial branches. Our study shows that both for weight and diffusion matrices the eigenpair analysis is not capable of easily deducing the cluster structure of the network without a priori knowledge. In addition we have studied the clustering of stocks using the asset graph approach with and without spectrum based noise filtering. It turns out that asset graphs are quite insensitive to noise and there is no sharp percolation transition as a function of the ratio of bonds included, thus no natural threshold value for that ratio seems to exist. We suggest that these observations can be of use for other correlation based networks as well.

  8. A comparison of heuristic and model-based clustering methods for dietary pattern analysis.

    Science.gov (United States)

    Greve, Benjamin; Pigeot, Iris; Huybrechts, Inge; Pala, Valeria; Börnhorst, Claudia

    2016-02-01

    Cluster analysis is widely applied to identify dietary patterns. A new method based on Gaussian mixture models (GMM) seems to be more flexible compared with the commonly applied k-means and Ward's method. In the present paper, these clustering approaches are compared to find the most appropriate one for clustering dietary data. The clustering methods were applied to simulated data sets with different cluster structures to compare their performance knowing the true cluster membership of observations. Furthermore, the three methods were applied to FFQ data assessed in 1791 children participating in the IDEFICS (Identification and Prevention of Dietary- and Lifestyle-Induced Health Effects in Children and Infants) Study to explore their performance in practice. The GMM outperformed the other methods in the simulation study in 72 % up to 100 % of cases, depending on the simulated cluster structure. Comparing the computationally less complex k-means and Ward's methods, the performance of k-means was better in 64-100 % of cases. Applied to real data, all methods identified three similar dietary patterns which may be roughly characterized as a 'non-processed' cluster with a high consumption of fruits, vegetables and wholemeal bread, a 'balanced' cluster with only slight preferences of single foods and a 'junk food' cluster. The simulation study suggests that clustering via GMM should be preferred due to its higher flexibility regarding cluster volume, shape and orientation. The k-means seems to be a good alternative, being easier to use while giving similar results when applied to real data.

  9. Trend analysis using non-stationary time series clustering based on the finite element method

    OpenAIRE

    Gorji Sefidmazgi, M.; Sayemuzzaman, M.; Homaifar, A.; Jha, M. K.; Liess, S.

    2014-01-01

    In order to analyze low-frequency variability of climate, it is useful to model the climatic time series with multiple linear trends and locate the times of significant changes. In this paper, we have used non-stationary time series clustering to find change points in the trends. Clustering in a multi-dimensional non-stationary time series is challenging, since the problem is mathematically ill-posed. Clustering based on the finite element method (FEM) is one of the methods ...

  10. Bootstrap-based methods for estimating standard errors in Cox's regression analyses of clustered event times.

    Science.gov (United States)

    Xiao, Yongling; Abrahamowicz, Michal

    2010-03-30

    We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.

  11. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods.

    Directory of Open Access Journals (Sweden)

    Lovro Šubelj

    Full Text Available Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community.

  12. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods.

    Science.gov (United States)

    Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo

    2016-01-01

    Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community.

  13. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods

    Science.gov (United States)

    Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo

    2016-01-01

    Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community. PMID:27124610

  14. A Spatial Division Clustering Method and Low Dimensional Feature Extraction Technique Based Indoor Positioning System

    Directory of Open Access Journals (Sweden)

    Yun Mo

    2014-01-01

    Full Text Available Indoor positioning systems based on the fingerprint method are widely used due to the large number of existing devices with a wide range of coverage. However, extensive positioning regions with a massive fingerprint database may cause high computational complexity and error margins, therefore clustering methods are widely applied as a solution. However, traditional clustering methods in positioning systems can only measure the similarity of the Received Signal Strength without being concerned with the continuity of physical coordinates. Besides, outage of access points could result in asymmetric matching problems which severely affect the fine positioning procedure. To solve these issues, in this paper we propose a positioning system based on the Spatial Division Clustering (SDC method for clustering the fingerprint dataset subject to physical distance constraints. With the Genetic Algorithm and Support Vector Machine techniques, SDC can achieve higher coarse positioning accuracy than traditional clustering algorithms. In terms of fine localization, based on the Kernel Principal Component Analysis method, the proposed positioning system outperforms its counterparts based on other feature extraction methods in low dimensionality. Apart from balancing online matching computational burden, the new positioning system exhibits advantageous performance on radio map clustering, and also shows better robustness and adaptability in the asymmetric matching problem aspect.

  15. Trend analysis using non-stationary time series clustering based on the finite element method

    Science.gov (United States)

    Gorji Sefidmazgi, M.; Sayemuzzaman, M.; Homaifar, A.; Jha, M. K.; Liess, S.

    2014-05-01

    In order to analyze low-frequency variability of climate, it is useful to model the climatic time series with multiple linear trends and locate the times of significant changes. In this paper, we have used non-stationary time series clustering to find change points in the trends. Clustering in a multi-dimensional non-stationary time series is challenging, since the problem is mathematically ill-posed. Clustering based on the finite element method (FEM) is one of the methods that can analyze multidimensional time series. One important attribute of this method is that it is not dependent on any statistical assumption and does not need local stationarity in the time series. In this paper, it is shown how the FEM-clustering method can be used to locate change points in the trend of temperature time series from in situ observations. This method is applied to the temperature time series of North Carolina (NC) and the results represent region-specific climate variability despite higher frequency harmonics in climatic time series. Next, we investigated the relationship between the climatic indices with the clusters/trends detected based on this clustering method. It appears that the natural variability of climate change in NC during 1950-2009 can be explained mostly by AMO and solar activity.

  16. Nearest neighbor-density-based clustering methods for large hyperspectral images

    Science.gov (United States)

    Cariou, Claude; Chehdi, Kacem

    2017-10-01

    We address the problem of hyperspectral image (HSI) pixel partitioning using nearest neighbor - density-based (NN-DB) clustering methods. NN-DB methods are able to cluster objects without specifying the number of clusters to be found. Within the NN-DB approach, we focus on deterministic methods, e.g. ModeSeek, knnClust, and GWENN (standing for Graph WatershEd using Nearest Neighbors). These methods only require the availability of a k-nearest neighbor (kNN) graph based on a given distance metric. Recently, a new DB clustering method, called Density Peak Clustering (DPC), has received much attention, and kNN versions of it have quickly followed and showed their efficiency. However, NN-DB methods still suffer from the difficulty of obtaining the kNN graph due to the quadratic complexity with respect to the number of pixels. This is why GWENN was embedded into a multiresolution (MR) scheme to bypass the computation of the full kNN graph over the image pixels. In this communication, we propose to extent the MR-GWENN scheme on three aspects. Firstly, similarly to knnClust, the original labeling rule of GWENN is modified to account for local density values, in addition to the labels of previously processed objects. Secondly, we set up a modified NN search procedure within the MR scheme, in order to stabilize of the number of clusters found from the coarsest to the finest spatial resolution. Finally, we show that these extensions can be easily adapted to the three other NN-DB methods (ModeSeek, knnClust, knnDPC) for pixel clustering in large HSIs. Experiments are conducted to compare the four NN-DB methods for pixel clustering in HSIs. We show that NN-DB methods can outperform a classical clustering method such as fuzzy c-means (FCM), in terms of classification accuracy, relevance of found clusters, and clustering speed. Finally, we demonstrate the feasibility and evaluate the performances of NN-DB methods on a very large image acquired by our AISA Eagle hyperspectral

  17. An effective trust-based recommendation method using a novel graph clustering algorithm

    Science.gov (United States)

    Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin

    2015-10-01

    Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.

  18. Grey Wolf Optimizer Based on Powell Local Optimization Method for Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Sen Zhang

    2015-01-01

    Full Text Available One heuristic evolutionary algorithm recently proposed is the grey wolf optimizer (GWO, inspired by the leadership hierarchy and hunting mechanism of grey wolves in nature. This paper presents an extended GWO algorithm based on Powell local optimization method, and we call it PGWO. PGWO algorithm significantly improves the original GWO in solving complex optimization problems. Clustering is a popular data analysis and data mining technique. Hence, the PGWO could be applied in solving clustering problems. In this study, first the PGWO algorithm is tested on seven benchmark functions. Second, the PGWO algorithm is used for data clustering on nine data sets. Compared to other state-of-the-art evolutionary algorithms, the results of benchmark and data clustering demonstrate the superior performance of PGWO algorithm.

  19. Cluster based statistical feature extraction method for automatic bleeding detection in wireless capsule endoscopy video.

    Science.gov (United States)

    Ghosh, Tonmoy; Fattah, Shaikh Anowarul; Wahid, Khan A; Zhu, Wei-Ping; Ahmad, M Omair

    2018-03-01

    Wireless capsule endoscopy (WCE) is capable of demonstrating the entire gastrointestinal tract at an expense of exhaustive reviewing process for detecting bleeding disorders. The main objective is to develop an automatic method for identifying the bleeding frames and zones from WCE video. Different statistical features are extracted from the overlapping spatial blocks of the preprocessed WCE image in a transformed color plane containing green to red pixel ratio. The unique idea of the proposed method is to first perform unsupervised clustering of different blocks for obtaining two clusters and then extract cluster based features (CBFs). Finally, a global feature consisting of the CBFs and differential CBF is used to detect bleeding frame via supervised classification. In order to handle continuous WCE video, a post-processing scheme is introduced utilizing the feature trends in neighboring frames. The CBF along with some morphological operations is employed to identify bleeding zones. Based on extensive experimentation on several WCE videos, it is found that the proposed method offers significantly better performance in comparison to some existing methods in terms of bleeding detection accuracy, sensitivity, specificity and precision in bleeding zone detection. It is found that the bleeding detection performance obtained by using the proposed CBF based global feature is better than the feature extracted from the non-clustered image. The proposed method can reduce the burden of physicians in investigating WCE video to detect bleeding frame and zone with a high level of accuracy. Copyright © 2018 Elsevier Ltd. All rights reserved.

  20. Form gene clustering method about pan-ethnic-group products based on emotional semantic

    Science.gov (United States)

    Chen, Dengkai; Ding, Jingjing; Gao, Minzhuo; Ma, Danping; Liu, Donghui

    2016-09-01

    The use of pan-ethnic-group products form knowledge primarily depends on a designer's subjective experience without user participation. The majority of studies primarily focus on the detection of the perceptual demands of consumers from the target product category. A pan-ethnic-group products form gene clustering method based on emotional semantic is constructed. Consumers' perceptual images of the pan-ethnic-group products are obtained by means of product form gene extraction and coding and computer aided product form clustering technology. A case of form gene clustering about the typical pan-ethnic-group products is investigated which indicates that the method is feasible. This paper opens up a new direction for the future development of product form design which improves the agility of product design process in the era of Industry 4.0.

  1. Image Retrieval Based on Multiview Constrained Nonnegative Matrix Factorization and Gaussian Mixture Model Spectral Clustering Method

    Directory of Open Access Journals (Sweden)

    Qunyi Xie

    2016-01-01

    Full Text Available Content-based image retrieval has recently become an important research topic and has been widely used for managing images from repertories. In this article, we address an efficient technique, called MNGS, which integrates multiview constrained nonnegative matrix factorization (NMF and Gaussian mixture model- (GMM- based spectral clustering for image retrieval. In the proposed methodology, the multiview NMF scheme provides competitive sparse representations of underlying images through decomposition of a similarity-preserving matrix that is formed by fusing multiple features from different visual aspects. In particular, the proposed method merges manifold constraints into the standard NMF objective function to impose an orthogonality constraint on the basis matrix and satisfy the structure preservation requirement of the coefficient matrix. To manipulate the clustering method on sparse representations, this paper has developed a GMM-based spectral clustering method in which the Gaussian components are regrouped in spectral space, which significantly improves the retrieval effectiveness. In this way, image retrieval of the whole database translates to a nearest-neighbour search in the cluster containing the query image. Simultaneously, this study investigates the proof of convergence of the objective function and the analysis of the computational complexity. Experimental results on three standard image datasets reveal the advantages that can be achieved with the proposed retrieval scheme.

  2. Selection of representative embankments based on rough set - fuzzy clustering method

    Science.gov (United States)

    Bin, Ou; Lin, Zhi-xiang; Fu, Shu-yan; Gao, Sheng-song

    2018-02-01

    The premise condition of comprehensive evaluation of embankment safety is selection of representative unit embankment, on the basis of dividing the unit levee the influencing factors and classification of the unit embankment are drafted.Based on the rough set-fuzzy clustering, the influence factors of the unit embankment are measured by quantitative and qualitative indexes.Construct to fuzzy similarity matrix of standard embankment then calculate fuzzy equivalent matrix of fuzzy similarity matrix by square method. By setting the threshold of the fuzzy equivalence matrix, the unit embankment is clustered, and the representative unit embankment is selected from the classification of the embankment.

  3. Relation chain based clustering analysis

    Science.gov (United States)

    Zhang, Cheng-ning; Zhao, Ming-yang; Luo, Hai-bo

    2011-08-01

    Clustering analysis is currently one of well-developed branches in data mining technology which is supposed to find the hidden structures in the multidimensional space called feature or pattern space. A datum in the space usually possesses a vector form and the elements in the vector represent several specifically selected features. These features are often of efficiency to the problem oriented. Generally, clustering analysis goes into two divisions: one is based on the agglomerative clustering method, and the other one is based on divisive clustering method. The former refers to a bottom-up process which regards each datum as a singleton cluster while the latter refers to a top-down process which regards entire data as a cluster. As the collected literatures, it is noted that the divisive clustering is currently overwhelming both in application and research. Although some famous divisive clustering methods are designed and well developed, clustering problems are still far from being solved. The k - means algorithm is the original divisive clustering method which initially assigns some important index values, such as the clustering number and the initial clustering prototype positions, and that could not be reasonable in some certain occasions. More than the initial problem, the k - means algorithm may also falls into local optimum, clusters in a rigid way and is not available for non-Gaussian distribution. One can see that seeking for a good or natural clustering result, in fact, originates from the one's understanding of the concept of clustering. Thus, the confusion or misunderstanding of the definition of clustering always derives some unsatisfied clustering results. One should consider the definition deeply and seriously. This paper demonstrates the nature of clustering, gives the way of understanding clustering, discusses the methodology of designing a clustering algorithm, and proposes a new clustering method based on relation chains among 2D patterns. In

  4. A Combinational Clustering Based Method for cDNA Microarray Image Segmentation.

    Directory of Open Access Journals (Sweden)

    Guifang Shao

    Full Text Available Microarray technology plays an important role in drawing useful biological conclusions by analyzing thousands of gene expressions simultaneously. Especially, image analysis is a key step in microarray analysis and its accuracy strongly depends on segmentation. The pioneering works of clustering based segmentation have shown that k-means clustering algorithm and moving k-means clustering algorithm are two commonly used methods in microarray image processing. However, they usually face unsatisfactory results because the real microarray image contains noise, artifacts and spots that vary in size, shape and contrast. To improve the segmentation accuracy, in this article we present a combination clustering based segmentation approach that may be more reliable and able to segment spots automatically. First, this new method starts with a very simple but effective contrast enhancement operation to improve the image quality. Then, an automatic gridding based on the maximum between-class variance is applied to separate the spots into independent areas. Next, among each spot region, the moving k-means clustering is first conducted to separate the spot from background and then the k-means clustering algorithms are combined for those spots failing to obtain the entire boundary. Finally, a refinement step is used to replace the false segmentation and the inseparable ones of missing spots. In addition, quantitative comparisons between the improved method and the other four segmentation algorithms--edge detection, thresholding, k-means clustering and moving k-means clustering--are carried out on cDNA microarray images from six different data sets. Experiments on six different data sets, 1 Stanford Microarray Database (SMD, 2 Gene Expression Omnibus (GEO, 3 Baylor College of Medicine (BCM, 4 Swiss Institute of Bioinformatics (SIB, 5 Joe DeRisi's individual tiff files (DeRisi, and 6 University of California, San Francisco (UCSF, indicate that the improved

  5. A Combinational Clustering Based Method for cDNA Microarray Image Segmentation.

    Science.gov (United States)

    Shao, Guifang; Li, Tiejun; Zuo, Wangda; Wu, Shunxiang; Liu, Tundong

    2015-01-01

    Microarray technology plays an important role in drawing useful biological conclusions by analyzing thousands of gene expressions simultaneously. Especially, image analysis is a key step in microarray analysis and its accuracy strongly depends on segmentation. The pioneering works of clustering based segmentation have shown that k-means clustering algorithm and moving k-means clustering algorithm are two commonly used methods in microarray image processing. However, they usually face unsatisfactory results because the real microarray image contains noise, artifacts and spots that vary in size, shape and contrast. To improve the segmentation accuracy, in this article we present a combination clustering based segmentation approach that may be more reliable and able to segment spots automatically. First, this new method starts with a very simple but effective contrast enhancement operation to improve the image quality. Then, an automatic gridding based on the maximum between-class variance is applied to separate the spots into independent areas. Next, among each spot region, the moving k-means clustering is first conducted to separate the spot from background and then the k-means clustering algorithms are combined for those spots failing to obtain the entire boundary. Finally, a refinement step is used to replace the false segmentation and the inseparable ones of missing spots. In addition, quantitative comparisons between the improved method and the other four segmentation algorithms--edge detection, thresholding, k-means clustering and moving k-means clustering--are carried out on cDNA microarray images from six different data sets. Experiments on six different data sets, 1) Stanford Microarray Database (SMD), 2) Gene Expression Omnibus (GEO), 3) Baylor College of Medicine (BCM), 4) Swiss Institute of Bioinformatics (SIB), 5) Joe DeRisi's individual tiff files (DeRisi), and 6) University of California, San Francisco (UCSF), indicate that the improved approach is

  6. Research on the method of information system risk state estimation based on clustering particle filter

    Directory of Open Access Journals (Sweden)

    Cui Jia

    2017-05-01

    Full Text Available With the purpose of reinforcing correlation analysis of risk assessment threat factors, a dynamic assessment method of safety risks based on particle filtering is proposed, which takes threat analysis as the core. Based on the risk assessment standards, the method selects threat indicates, applies a particle filtering algorithm to calculate influencing weight of threat indications, and confirms information system risk levels by combining with state estimation theory. In order to improve the calculating efficiency of the particle filtering algorithm, the k-means cluster algorithm is introduced to the particle filtering algorithm. By clustering all particles, the author regards centroid as the representative to operate, so as to reduce calculated amount. The empirical experience indicates that the method can embody the relation of mutual dependence and influence in risk elements reasonably. Under the circumstance of limited information, it provides the scientific basis on fabricating a risk management control strategy.

  7. Research on the method of information system risk state estimation based on clustering particle filter

    Science.gov (United States)

    Cui, Jia; Hong, Bei; Jiang, Xuepeng; Chen, Qinghua

    2017-05-01

    With the purpose of reinforcing correlation analysis of risk assessment threat factors, a dynamic assessment method of safety risks based on particle filtering is proposed, which takes threat analysis as the core. Based on the risk assessment standards, the method selects threat indicates, applies a particle filtering algorithm to calculate influencing weight of threat indications, and confirms information system risk levels by combining with state estimation theory. In order to improve the calculating efficiency of the particle filtering algorithm, the k-means cluster algorithm is introduced to the particle filtering algorithm. By clustering all particles, the author regards centroid as the representative to operate, so as to reduce calculated amount. The empirical experience indicates that the method can embody the relation of mutual dependence and influence in risk elements reasonably. Under the circumstance of limited information, it provides the scientific basis on fabricating a risk management control strategy.

  8. A new multistage medical segmentation method based on superpixel and fuzzy clustering.

    Science.gov (United States)

    Ji, Shiyong; Wei, Benzheng; Yu, Zhen; Yang, Gongping; Yin, Yilong

    2014-01-01

    The medical image segmentation is the key approach of image processing for brain MRI images. However, due to the visual complex appearance of image structures and the imaging characteristic, it is still challenging to automatically segment brain MRI image. A new multi-stage segmentation method based on superpixel and fuzzy clustering (MSFCM) is proposed to achieve the good brain MRI segmentation results. The MSFCM utilizes the superpixels as the clustering objects instead of pixels, and it can increase the clustering granularity and overcome the influence of noise and bias effectively. In the first stage, the MRI image is parsed into several atomic areas, namely, superpixels, and a further parsing step is adopted for the areas with bigger gray variance over setting threshold. Subsequently, designed fuzzy clustering is carried out to the fuzzy membership of each superpixel, and an iterative broadcast method based on the Butterworth function is used to redefine their classifications. Finally, the segmented image is achieved by merging the superpixels which have the same classification label. The simulated brain database from BrainWeb site is used in the experiments, and the experimental results demonstrate that MSFCM method outperforms the traditional FCM algorithm in terms of segmentation accuracy and stability for MRI image.

  9. Fuzzy clustering methods application for Alzheimer’s diseases diagnosis based on PET images

    OpenAIRE

    Krashenyi, Ihor Eduardovych; Popov, Anton Oleksandrovych; Ramirez, Haver; Gorriz, Huan Manuel

    2016-01-01

    This work was dedicated to clustering methods application in fuzzy inference system for Alzheimer’s disease diagnosis using PET-images. Three methods (Subtractive Clustering, C-means and Fuzzy Grid Partition) of clustering were discussed and their performance in Alzheimer’s disease diagnosis were measured. Recommendation of the future use of Subtractive Clustering algorithm in the computeraided diagnosis system for Alzheimer’s disease are given. The performance of this algorithm is AUC=0,8791...

  10. Application of attribute weighting method based on clustering centers to discrimination of linearly non-separable medical datasets.

    Science.gov (United States)

    Polat, Kemal

    2012-08-01

    In this paper, attribute weighting method based on the cluster centers with aim of increasing the discrimination between classes has been proposed and applied to nonlinear separable datasets including two medical datasets (mammographic mass dataset and bupa liver disorders dataset) and 2-D spiral dataset. The goals of this method are to gather the data points near to cluster center all together to transform from nonlinear separable datasets to linear separable dataset. As clustering algorithm, k-means clustering, fuzzy c-means clustering, and subtractive clustering have been used. The proposed attribute weighting methods are k-means clustering based attribute weighting (KMCBAW), fuzzy c-means clustering based attribute weighting (FCMCBAW), and subtractive clustering based attribute weighting (SCBAW) and used prior to classifier algorithms including C4.5 decision tree and adaptive neuro-fuzzy inference system (ANFIS). To evaluate the proposed method, the recall, precision value, true negative rate (TNR), G-mean1, G-mean2, f-measure, and classification accuracy have been used. The results have shown that the best attribute weighting method was the subtractive clustering based attribute weighting with respect to classification performance in the classification of three used datasets.

  11. An Energy-Efficient Cluster-Based Vehicle Detection on Road Network Using Intention Numeration Method

    Directory of Open Access Journals (Sweden)

    Deepa Devasenapathy

    2015-01-01

    Full Text Available The traffic in the road network is progressively increasing at a greater extent. Good knowledge of network traffic can minimize congestions using information pertaining to road network obtained with the aid of communal callers, pavement detectors, and so on. Using these methods, low featured information is generated with respect to the user in the road network. Although the existing schemes obtain urban traffic information, they fail to calculate the energy drain rate of nodes and to locate equilibrium between the overhead and quality of the routing protocol that renders a great challenge. Thus, an energy-efficient cluster-based vehicle detection in road network using the intention numeration method (CVDRN-IN is developed. Initially, sensor nodes that detect a vehicle are grouped into separate clusters. Further, we approximate the strength of the node drain rate for a cluster using polynomial regression function. In addition, the total node energy is estimated by taking the integral over the area. Finally, enhanced data aggregation is performed to reduce the amount of data transmission using digital signature tree. The experimental performance is evaluated with Dodgers loop sensor data set from UCI repository and the performance evaluation outperforms existing work on energy consumption, clustering efficiency, and node drain rate.

  12. Swarm: robust and fast clustering method for amplicon-based studies

    Science.gov (United States)

    Rognes, Torbjørn; Quince, Christopher; de Vargas, Colomban; Dunthorn, Micah

    2014-01-01

    Popular de novo amplicon clustering methods suffer from two fundamental flaws: arbitrary global clustering thresholds, and input-order dependency induced by centroid selection. Swarm was developed to address these issues by first clustering nearly identical amplicons iteratively using a local threshold, and then by using clusters’ internal structure and amplicon abundances to refine its results. This fast, scalable, and input-order independent approach reduces the influence of clustering parameters and produces robust operational taxonomic units. PMID:25276506

  13. A novel intrusion detection method based on OCSVM and K-means recursive clustering

    Directory of Open Access Journals (Sweden)

    Leandros A. Maglaras

    2015-01-01

    Full Text Available In this paper we present an intrusion detection module capable of detecting malicious network traffic in a SCADA (Supervisory Control and Data Acquisition system, based on the combination of One-Class Support Vector Machine (OCSVM with RBF kernel and recursive k-means clustering. Important parameters of OCSVM, such as Gaussian width o and parameter v affect the performance of the classifier. Tuning of these parameters is of great importance in order to avoid false positives and over fitting. The combination of OCSVM with recursive k- means clustering leads the proposed intrusion detection module to distinguish real alarms from possible attacks regardless of the values of parameters o and v, making it ideal for real-time intrusion detection mechanisms for SCADA systems. Extensive simulations have been conducted with datasets extracted from small and medium sized HTB SCADA testbeds, in order to compare the accuracy, false alarm rate and execution time against the base line OCSVM method.

  14. Multiple-Features-Based Semisupervised Clustering DDoS Detection Method

    Directory of Open Access Journals (Sweden)

    Yonghao Gu

    2017-01-01

    Full Text Available DDoS attack stream from different agent host converged at victim host will become very large, which will lead to system halt or network congestion. Therefore, it is necessary to propose an effective method to detect the DDoS attack behavior from the massive data stream. In order to solve the problem that large numbers of labeled data are not provided in supervised learning method, and the relatively low detection accuracy and convergence speed of unsupervised k-means algorithm, this paper presents a semisupervised clustering detection method using multiple features. In this detection method, we firstly select three features according to the characteristics of DDoS attacks to form detection feature vector. Then, Multiple-Features-Based Constrained-K-Means (MF-CKM algorithm is proposed based on semisupervised clustering. Finally, using MIT Laboratory Scenario (DDoS 1.0 data set, we verify that the proposed method can improve the convergence speed and accuracy of the algorithm under the condition of using a small amount of labeled data sets.

  15. A hybrid method based on a new clustering technique and multilayer perceptron neural networks for hourly solar radiation forecasting

    International Nuclear Information System (INIS)

    Azimi, R.; Ghayekhloo, M.; Ghofrani, M.

    2016-01-01

    Highlights: • A novel clustering approach is proposed based on the data transformation approach. • A novel cluster selection method based on correlation analysis is presented. • The proposed hybrid clustering approach leads to deep learning for MLPNN. • A hybrid forecasting method is developed to predict solar radiations. • The evaluation results show superior performance of the proposed forecasting model. - Abstract: Accurate forecasting of renewable energy sources plays a key role in their integration into the grid. This paper proposes a hybrid solar irradiance forecasting framework using a Transformation based K-means algorithm, named TB K-means, to increase the forecast accuracy. The proposed clustering method is a combination of a new initialization technique, K-means algorithm and a new gradual data transformation approach. Unlike the other K-means based clustering methods which are not capable of providing a fixed and definitive answer due to the selection of different cluster centroids for each run, the proposed clustering provides constant results for different runs of the algorithm. The proposed clustering is combined with a time-series analysis, a novel cluster selection algorithm and a multilayer perceptron neural network (MLPNN) to develop the hybrid solar radiation forecasting method for different time horizons (1 h ahead, 2 h ahead, …, 48 h ahead). The performance of the proposed TB K-means clustering is evaluated using several different datasets and compared with different variants of K-means algorithm. Solar datasets with different solar radiation characteristics are also used to determine the accuracy and processing speed of the developed forecasting method with the proposed TB K-means and other clustering techniques. The results of direct comparison with other well-established forecasting models demonstrate the superior performance of the proposed hybrid forecasting method. Furthermore, a comparative analysis with the benchmark solar

  16. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis

    Directory of Open Access Journals (Sweden)

    Huanhuan Li

    2017-08-01

    Full Text Available The Shipboard Automatic Identification System (AIS is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW, a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our

  17. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis.

    Science.gov (United States)

    Li, Huanhuan; Liu, Jingxian; Liu, Ryan Wen; Xiong, Naixue; Wu, Kefeng; Kim, Tai-Hoon

    2017-08-04

    The Shipboard Automatic Identification System (AIS) is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW), a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA) is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our proposed method with

  18. An Extended Affinity Propagation Clustering Method Based on Different Data Density Types

    Directory of Open Access Journals (Sweden)

    XiuLi Zhao

    2015-01-01

    Full Text Available Affinity propagation (AP algorithm, as a novel clustering method, does not require the users to specify the initial cluster centers in advance, which regards all data points as potential exemplars (cluster centers equally and groups the clusters totally by the similar degree among the data points. But in many cases there exist some different intensive areas within the same data set, which means that the data set does not distribute homogeneously. In such situation the AP algorithm cannot group the data points into ideal clusters. In this paper, we proposed an extended AP clustering algorithm to deal with such a problem. There are two steps in our method: firstly the data set is partitioned into several data density types according to the nearest distances of each data point; and then the AP clustering method is, respectively, used to group the data points into clusters in each data density type. Two experiments are carried out to evaluate the performance of our algorithm: one utilizes an artificial data set and the other uses a real seismic data set. The experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithm itself.

  19. A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm

    Science.gov (United States)

    de Brito, Daniel M.; Maracaja-Coutinho, Vinicius; de Farias, Savio T.; Batista, Leonardo V.; do Rêgo, Thaís G.

    2016-01-01

    Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP—Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657

  20. A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.

    Directory of Open Access Journals (Sweden)

    Daniel M de Brito

    Full Text Available Genomic Islands (GIs are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me.

  1. A method to determine agro-climatic zones based on correlation and cluster analyses

    Science.gov (United States)

    Borges Valeriano, Taynara Tuany; de Souza Rolim, Glauco; de Oliveira Aparecido, Lucas Eduardo

    2017-12-01

    Determining agro-climatic zones (ACZs) is traditionally made by cross-comparing meteorological elements such as air temperature, rainfall, and water deficit (DEF). This study proposes a new method based on correlations between monthly DEFs during the crop cycle and annual yield and performs a multivariate cluster analysis on these correlations. This `correlation method' was applied to all municipalities in the state of São Paulo to determine ACZs for coffee plantations. A traditional ACZ method for coffee, which is based on temperature and DEF ranges (Evangelista et al.; RBEAA, 6:445-452, 2002), was applied to the study area to compare against the correlation method. The traditional ACZ classified the "Alta Mogina," "Média Mogiana," and "Garça and Marília" regions as traditional coffee regions that were either suitable or even restricted for coffee plantations. These traditional regions have produced coffee since 1800 and should not be classified as restricted. The correlation method classified those areas as high-producing regions and expanded them into other areas. The proposed method is innovative, because it is more detailed than common ACZ methods. Each developmental crop phase was analyzed based on correlations between the monthly DEF and yield, improving the importance of crop physiology in relation to climate.

  2. A robust automatic leukocyte recognition method based on island-clustering texture

    Directory of Open Access Journals (Sweden)

    Xiaoshun Li

    2016-01-01

    Full Text Available A leukocyte recognition method for human peripheral blood smear based on island-clustering texture (ICT is proposed. By analyzing the features of the five typical classes of leukocyte images, a new ICT model is established. Firstly, some feature points are extracted in a gray leukocyte image by mean-shift clustering to be the centers of islands. Secondly, the growing region is employed to create regions of the islands in which the seeds are just these feature points. These islands distribution can describe a new texture. Finally, a distinguished parameter vector of these islands is created as the ICT features by combining the ICT features with the geometric features of the leukocyte. Then the five typical classes of leukocytes can be recognized successfully at the correct recognition rate of more than 92.3% with a total sample of 1310 leukocytes. Experimental results show the feasibility of the proposed method. Further analysis reveals that the method is robust and results can provide important information for disease diagnosis.

  3. Global optimization method using SLE and adaptive RBF based on fuzzy clustering

    Science.gov (United States)

    Zhu, Huaguang; Liu, Li; Long, Teng; Zhao, Junfeng

    2012-07-01

    High fidelity analysis models, which are beneficial to improving the design quality, have been more and more widely utilized in the modern engineering design optimization problems. However, the high fidelity analysis models are so computationally expensive that the time required in design optimization is usually unacceptable. In order to improve the efficiency of optimization involving high fidelity analysis models, the optimization efficiency can be upgraded through applying surrogates to approximate the computationally expensive models, which can greately reduce the computation time. An efficient heuristic global optimization method using adaptive radial basis function (RBF) based on fuzzy clustering (ARFC) is proposed. In this method, a novel algorithm of maximin Latin hypercube design using successive local enumeration (SLE) is employed to obtain sample points with good performance in both space-filling and projective uniformity properties, which does a great deal of good to metamodels accuracy. RBF method is adopted for constructing the metamodels, and with the increasing the number of sample points the approximation accuracy of RBF is gradually enhanced. The fuzzy c-means clustering method is applied to identify the reduced attractive regions in the original design space. The numerical benchmark examples are used for validating the performance of ARFC. The results demonstrates that for most application examples the global optima are effectively obtained and comparison with adaptive response surface method (ARSM) proves that the proposed method can intuitively capture promising design regions and can efficiently identify the global or near-global design optimum. This method improves the efficiency and global convergence of the optimization problems, and gives a new optimization strategy for engineering design optimization problems involving computationally expensive models.

  4. A Spectrum Sensing Method Based on Signal Feature and Clustering Algorithm in Cognitive Wireless Multimedia Sensor Networks

    Directory of Open Access Journals (Sweden)

    Yongwei Zhang

    2017-01-01

    Full Text Available In order to solve the problem of difficulty in determining the threshold in spectrum sensing technologies based on the random matrix theory, a spectrum sensing method based on clustering algorithm and signal feature is proposed for Cognitive Wireless Multimedia Sensor Networks. Firstly, the wireless communication signal features are obtained according to the sampling signal covariance matrix. Then, the clustering algorithm is used to classify and test the signal features. Different signal features and clustering algorithms are compared in this paper. The experimental results show that the proposed method has better sensing performance.

  5. Stability of maximum-likelihood-based clustering methods: exploring the backbone of classifications

    International Nuclear Information System (INIS)

    Mungan, Muhittin; Ramasco, José J

    2010-01-01

    Components of complex systems are often classified according to the way they interact with each other. In graph theory such groups are known as clusters or communities. Many different techniques have been recently proposed to detect them, some of which involve inference methods using either Bayesian or maximum likelihood approaches. In this paper, we study a statistical model designed for detecting clusters based on connection similarity. The basic assumption of the model is that the graph was generated by a certain grouping of the nodes and an expectation maximization algorithm is employed to infer that grouping. We show that the method admits further development to yield a stability analysis of the groupings that quantifies the extent to which each node influences its neighbors' group membership. Our approach naturally allows for the identification of the key elements responsible for the grouping and their resilience to changes in the network. Given the generality of the assumptions underlying the statistical model, such nodes are likely to play special roles in the original system. We illustrate this point by analyzing several empirical networks for which further information about the properties of the nodes is available. The search and identification of stabilizing nodes constitutes thus a novel technique to characterize the relevance of nodes in complex networks

  6. A hybrid method based on fuzzy clustering and local region-based level set for segmentation of inhomogeneous medical images.

    Science.gov (United States)

    Rastgarpour, Maryam; Shanbehzadeh, Jamshid; Soltanian-Zadeh, Hamid

    2014-08-01

    medical images are more affected by intensity inhomogeneity rather than noise and outliers. This has a great impact on the efficiency of region-based image segmentation methods, because they rely on homogeneity of intensities in the regions of interest. Meanwhile, initialization and configuration of controlling parameters affect the performance of level set segmentation. To address these problems, this paper proposes a new hybrid method that integrates a local region-based level set method with a variation of fuzzy clustering. Specifically it takes an information fusion approach based on a coarse-to-fine framework that seamlessly fuses local spatial information and gray level information with the information of the local region-based level set method. Also, the controlling parameters of level set are directly computed from fuzzy clustering result. This approach has valuable benefits such as automation, no need to prior knowledge about the region of interest (ROI), robustness on intensity inhomogeneity, automatic adjustment of controlling parameters, insensitivity to initialization, and satisfactory accuracy. So, the contribution of this paper is to provide these advantages together which have not been proposed yet for inhomogeneous medical images. Proposed method was tested on several medical images from different modalities for performance evaluation. Experimental results approve its effectiveness in segmenting medical images in comparison with similar methods.

  7. Comparison Of Keyword Based Clustering Of Web Documents By Using Openstack 4j And By Traditional Method

    Directory of Open Access Journals (Sweden)

    Shiza Anand

    2015-08-01

    Full Text Available As the number of hypertext documents are increasing continuously day by day on world wide web. Therefore clustering methods will be required to bind documents into the clusters repositories according to the similarity lying between the documents. Various clustering methods exist such as Hierarchical Based K-means Fuzzy Logic Based Centroid Based etc. These keyword based clustering methods takes much more amount of time for creating containers and putting documents in their respective containers. These traditional methods use File Handling techniques of different programming languages for creating repositories and transferring web documents into these containers. In contrast openstack4j SDK is a new technique for creating containers and shifting web documents into these containers according to the similarity in much more less amount of time as compared to the traditional methods. Another benefit of this technique is that this SDK understands and reads all types of files such as jpg html pdf doc etc. This paper compares the time required for clustering of documents by using openstack4j and by traditional methods and suggests various search engines to adopt this technique for clustering so that they give result to the user querries in less amount of time.

  8. Fuzzy clustering-based feature extraction method for mental task classification.

    Science.gov (United States)

    Gupta, Akshansh; Kumar, Dhirendra

    2017-06-01

    A brain computer interface (BCI) is a communication system by which a person can send messages or requests for basic necessities without using peripheral nerves and muscles. Response to mental task-based BCI is one of the privileged areas of investigation. Electroencephalography (EEG) signals are used to represent the brain activities in the BCI domain. For any mental task classification model, the performance of the learning model depends on the extraction of features from EEG signal. In literature, wavelet transform and empirical mode decomposition are two popular feature extraction methods used to analyze a signal having non-linear and non-stationary property. By adopting the virtue of both techniques, a theoretical adaptive filter-based method to decompose non-linear and non-stationary signal has been proposed known as empirical wavelet transform (EWT) in recent past. EWT does not work well for the signals having overlapped in frequency and time domain and failed to provide good features for further classification. In this work, Fuzzy c-means algorithm is utilized along with EWT to handle this problem. It has been observed from the experimental results that EWT along with fuzzy clustering outperforms in comparison to EWT for the EEG-based response to mental task problem. Further, in case of mental task classification, the ratio of samples to features is very small. To handle the problem of small ratio of samples to features, in this paper, we have also utilized three well-known multivariate feature selection methods viz. Bhattacharyya distance (BD), ratio of scatter matrices (SR), and linear regression (LR). The results of experiment demonstrate that the performance of mental task classification has improved considerably by aforesaid methods. Ranking method and Friedman's statistical test are also performed to rank and compare different combinations of feature extraction methods and feature selection methods which endorse the efficacy of the proposed approach.

  9. Method of Selection of Bacteria Antibiotic Resistance Genes Based on Clustering of Similar Nucleotide Sequences.

    Science.gov (United States)

    Balashov, I S; Naumov, V A; Borovikov, P I; Gordeev, A B; Dubodelov, D V; Lyubasovskaya, L A; Rodchenko, Yu V; Bystritskii, A A; Aleksandrova, N V; Trofimov, D Yu; Priputnevich, T V

    2017-10-01

    A new method for selection of bacterium antibiotic resistance genes is proposed and tested for solving the problems related to selection of primers for PCR assay. The method implies clustering of similar nucleotide sequences and selection of group primers for all genes of each cluster. Clustering of resistance genes for six groups of antibiotics (aminoglycosides, β-lactams, fluoroquinolones, glycopeptides, macrolides and lincosamides, and fusidic acid) was performed. The method was tested for 81 strains of bacteria of different genera isolated from patients (K. pneumoniae, Staphylococcus spp., S. agalactiae, E. faecalis, E. coli, and G. vaginalis). The results obtained by us are comparable to those in the selection of individual genes; this allows reducing the number of primers necessary for maximum coverage of the known antibiotic resistance genes during PCR analysis.

  10. Microcalcification detection in full-field digital mammograms with PFCM clustering and weighted SVM-based method

    Science.gov (United States)

    Liu, Xiaoming; Mei, Ming; Liu, Jun; Hu, Wei

    2015-12-01

    Clustered microcalcifications (MCs) in mammograms are an important early sign of breast cancer in women. Their accurate detection is important in computer-aided detection (CADe). In this paper, we integrated the possibilistic fuzzy c-means (PFCM) clustering algorithm and weighted support vector machine (WSVM) for the detection of MC clusters in full-field digital mammograms (FFDM). For each image, suspicious MC regions are extracted with region growing and active contour segmentation. Then geometry and texture features are extracted for each suspicious MC, a mutual information-based supervised criterion is used to select important features, and PFCM is applied to cluster the samples into two clusters. Weights of the samples are calculated based on possibilities and typicality values from the PFCM, and the ground truth labels. A weighted nonlinear SVM is trained. During the test process, when an unknown image is presented, suspicious regions are located with the segmentation step, selected features are extracted, and the suspicious MC regions are classified as containing MC or not by the trained weighted nonlinear SVM. Finally, the MC regions are analyzed with spatial information to locate MC clusters. The proposed method is evaluated using a database of 410 clinical mammograms and compared with a standard unweighted support vector machine (SVM) classifier. The detection performance is evaluated using response receiver operating (ROC) curves and free-response receiver operating characteristic (FROC) curves. The proposed method obtained an area under the ROC curve of 0.8676, while the standard SVM obtained an area of 0.8268 for MC detection. For MC cluster detection, the proposed method obtained a high sensitivity of 92 % with a false-positive rate of 2.3 clusters/image, and it is also better than standard SVM with 4.7 false-positive clusters/image at the same sensitivity.

  11. Functional connectivity analysis of the neural bases of emotion regulation: A comparison of independent component method with density-based k-means clustering method.

    Science.gov (United States)

    Zou, Ling; Guo, Qian; Xu, Yi; Yang, Biao; Jiao, Zhuqing; Xiang, Jianbo

    2016-04-29

    Functional magnetic resonance imaging (fMRI) is an important tool in neuroscience for assessing connectivity and interactions between distant areas of the brain. To find and characterize the coherent patterns of brain activity as a means of identifying brain systems for the cognitive reappraisal of the emotion task, both density-based k-means clustering and independent component analysis (ICA) methods can be applied to characterize the interactions between brain regions involved in cognitive reappraisal of emotion. Our results reveal that compared with the ICA method, the density-based k-means clustering method provides a higher sensitivity of polymerization. In addition, it is more sensitive to those relatively weak functional connection regions. Thus, the study concludes that in the process of receiving emotional stimuli, the relatively obvious activation areas are mainly distributed in the frontal lobe, cingulum and near the hypothalamus. Furthermore, density-based k-means clustering method creates a more reliable method for follow-up studies of brain functional connectivity.

  12. A Method for Traffic Congestion Clustering Judgment Based on Grey Relational Analysis

    Directory of Open Access Journals (Sweden)

    Yingya Zhang

    2016-05-01

    Full Text Available Traffic congestion clustering judgment is a fundamental problem in the study of traffic jam warning. However, it is not satisfactory to judge traffic congestion degrees using only vehicle speed. In this paper, we collect traffic flow information with three properties (traffic flow velocity, traffic flow density and traffic volume of urban trunk roads, which is used to judge the traffic congestion degree. We first define a grey relational clustering model by leveraging grey relational analysis and rough set theory to mine relationships of multidimensional-attribute information. Then, we propose a grey relational membership degree rank clustering algorithm (GMRC to discriminant clustering priority and further analyze the urban traffic congestion degree. Our experimental results show that the average accuracy of the GMRC algorithm is 24.9% greater than that of the K-means algorithm and 30.8% greater than that of the Fuzzy C-Means (FCM algorithm. Furthermore, we find that our method can be more conducive to dynamic traffic warnings.

  13. Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture

    Science.gov (United States)

    Sanfilippo, Antonio [Richland, WA; Calapristi, Augustin J [West Richland, WA; Crow, Vernon L [Richland, WA; Hetzler, Elizabeth G [Kennewick, WA; Turner, Alan E [Kennewick, WA

    2009-12-22

    Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture are described. In one aspect, a document clustering method includes providing a document set comprising a plurality of documents, providing a cluster comprising a subset of the documents of the document set, using a plurality of terms of the documents, providing a cluster label indicative of subject matter content of the documents of the cluster, wherein the cluster label comprises a plurality of word senses, and selecting one of the word senses of the cluster label.

  14. Semi-supervised clustering methods

    Science.gov (United States)

    Bair, Eric

    2013-01-01

    Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as “semi-supervised clustering” methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided. PMID:24729830

  15. Analysis of density based and fuzzy c-means clustering methods on lesion border extraction in dermoscopy images.

    Science.gov (United States)

    Kockara, Sinan; Mete, Mutlu; Chen, Bernard; Aydin, Kemal

    2010-10-07

    Computer-aided segmentation and border detection in dermoscopic images is one of the core components of diagnostic procedures and therapeutic interventions for skin cancer. Automated assessment tools for dermoscopy images have become an important research field mainly because of inter- and intra-observer variations in human interpretation. In this study, we compare two approaches for automatic border detection in dermoscopy images: density based clustering (DBSCAN) and Fuzzy C-Means (FCM) clustering algorithms. In the first approach, if there exists enough density--greater than certain number of points--around a point, then either a new cluster is formed around the point or an existing cluster grows by including the point and its neighbors. In the second approach FCM clustering is used. This approach has the ability to assign one data point into more than one cluster. Each approach is examined on a set of 100 dermoscopy images whose manually drawn borders by a dermatologist are used as the ground truth. Error rates; false positives and false negatives along with true positives and true negatives are quantified by comparing results with manually determined borders from a dermatologist. The assessments obtained from both methods are quantitatively analyzed over three accuracy measures: border error, precision, and recall. As well as low border error, high precision and recall, visual outcome showed that the DBSCAN effectively delineated targeted lesion, and has bright future; however, the FCM had poor performance especially in border error metric.

  16. Cancer Transcriptome Dataset Analysis: Comparing Methods of Pathway and Gene Regulatory Network-Based Cluster Identification.

    Science.gov (United States)

    Nam, Seungyoon

    2017-04-01

    Cancer transcriptome analysis is one of the leading areas of Big Data science, biomarker, and pharmaceutical discovery, not to forget personalized medicine. Yet, cancer transcriptomics and postgenomic medicine require innovation in bioinformatics as well as comparison of the performance of available algorithms. In this data analytics context, the value of network generation and algorithms has been widely underscored for addressing the salient questions in cancer pathogenesis. Analysis of cancer trancriptome often results in complicated networks where identification of network modularity remains critical, for example, in delineating the "druggable" molecular targets. Network clustering is useful, but depends on the network topology in and of itself. Notably, the performance of different network-generating tools for network cluster (NC) identification has been little investigated to date. Hence, using gastric cancer (GC) transcriptomic datasets, we compared two algorithms for generating pathway versus gene regulatory network-based NCs, showing that the pathway-based approach better agrees with a reference set of cancer-functional contexts. Finally, by applying pathway-based NC identification to GC transcriptome datasets, we describe cancer NCs that associate with candidate therapeutic targets and biomarkers in GC. These observations collectively inform future research on cancer transcriptomics, drug discovery, and rational development of new analysis tools for optimal harnessing of omics data.

  17. Cluster forest based fuzzy logic for massive data clustering

    Science.gov (United States)

    Lahmar, Ines; Ben Ayed, Abdelkarim; Ben Halima, Mohamed; Alimi, Adel M.

    2017-03-01

    This article is focused in developing an improved cluster ensemble method based cluster forests. Cluster forests (CF) is considered as a version of clustering inspired from Random Forests (RF) in the context of clustering for massive data. It aggregates intermediate Fuzzy C-Means (FCM) clustering results via spectral clustering since pseudo-clustering results are presented in the spectral space in order to classify these data sets in the multidimensional data space. One of the main advantages is the use of FCM, which allows building fuzzy membership to all partitions of the datasets due to the fuzzy logic whereas the classical algorithms as K-means permitted to build just hard partitions. In the first place, we ameliorate the CF clustering algorithm with the integration of fuzzy FCM and we compare it with other existing clustering methods. In the second place, we compare K-means and FCM clustering methods with the agglomerative hierarchical clustering (HAC) and other theory presented methods using data benchmarks from UCI repository.

  18. Improving cluster-based methods for investigating potential for insect pest species establishment: region-specific risk factors

    Directory of Open Access Journals (Sweden)

    Michael J. Watts

    2011-09-01

    Full Text Available Existing cluster-based methods for investigating insect species assemblages or profiles of a region to indicate the risk of new insect pest invasion have a major limitation in that they assign the same species risk factors to each region in a cluster. Clearly regions assigned to the same cluster have different degrees of similarity with respect to their species profile or assemblage. This study addresses this concern by applying weighting factors to the cluster elements used to calculate regional risk factors, thereby producing region-specific risk factors. Using a database of the global distribution of crop insect pest species, we found that we were able to produce highly differentiated region-specific risk factors for insect pests. We did this by weighting cluster elements by their Euclidean distance from the target region. Using this approach meant that risk weightings were derived that were more realistic, as they were specific to the pest profile or species assemblage of each region. This weighting method provides an improved tool for estimating the potential invasion risk posed by exotic species given that they have an opportunity to establish in a target region.

  19. Research on the Purification Effect of Aquatic Plants Based on Grey Clustering Method

    Science.gov (United States)

    Gu, Sudan; Du, Fuhui

    2018-01-01

    This paper uses the grey clustering method to evaluate the water quality level of the MingGuan constructed wetland at the import and export of artificial wetlands. Constructed wetland of Ming Guanis established on the basis of the Fuyang River’s water quality improvement, to choose suitable aquatic plants, in order to achieve the Fuyang River water purification effect. Namely TP, TN, NH3-N, DO, COD and COMMn and permanganate index are selected as clustering indicators. Water quality is divided into five grades according to the Surface Water Environmental Quality Standard (GB3838-2002) as the evaluation standard. In order to select the suitable wetland plants, the purification effect of 6 kinds of higher aquatic plants on the sewage of fuyang river was tested. one kind of plants was selected: Typha. The results show that the water quality of the section is gradually changed from V water quality to III water quality. After tartificial wetland of cycle for a long time, Typha has good purification effect. In November, water quality categories are basically concentrated in the VI, V class, may be caused by chemical decomposition of aquatic plants, should strengthen the academic research.

  20. Analyses of crime patterns in NIBRS data based on a novel graph theory clustering method: Virginia as a case study.

    Science.gov (United States)

    Zhao, Peixin; Darrah, Marjorie; Nolan, Jim; Zhang, Cun-Quan

    2014-01-01

    This paper suggests a novel clustering method for analyzing the National Incident-Based Reporting System (NIBRS) data, which include the determination of correlation of different crime types, the development of a likelihood index for crimes to occur in a jurisdiction, and the clustering of jurisdictions based on crime type. The method was tested by using the 2005 assault data from 121 jurisdictions in Virginia as a test case. The analyses of these data show that some different crime types are correlated and some different crime parameters are correlated with different crime types. The analyses also show that certain jurisdictions within Virginia share certain crime patterns. This information assists with constructing a pattern for a specific crime type and can be used to determine whether a jurisdiction may be more likely to see this type of crime occur in their area.

  1. Analyses of Crime Patterns in NIBRS Data Based on a Novel Graph Theory Clustering Method: Virginia as a Case Study

    Directory of Open Access Journals (Sweden)

    Peixin Zhao

    2014-01-01

    Full Text Available This paper suggests a novel clustering method for analyzing the National Incident-Based Reporting System (NIBRS data, which include the determination of correlation of different crime types, the development of a likelihood index for crimes to occur in a jurisdiction, and the clustering of jurisdictions based on crime type. The method was tested by using the 2005 assault data from 121 jurisdictions in Virginia as a test case. The analyses of these data show that some different crime types are correlated and some different crime parameters are correlated with different crime types. The analyses also show that certain jurisdictions within Virginia share certain crime patterns. This information assists with constructing a pattern for a specific crime type and can be used to determine whether a jurisdiction may be more likely to see this type of crime occur in their area.

  2. A segmentation method for lung nodule image sequences based on superpixels and density-based spatial clustering of applications with noise.

    Science.gov (United States)

    Zhang, Wei; Zhang, Xiaolong; Zhao, Juanjuan; Qiang, Yan; Tian, Qi; Tang, Xiaoxian

    2017-01-01

    The fast and accurate segmentation of lung nodule image sequences is the basis of subsequent processing and diagnostic analyses. However, previous research investigating nodule segmentation algorithms cannot entirely segment cavitary nodules, and the segmentation of juxta-vascular nodules is inaccurate and inefficient. To solve these problems, we propose a new method for the segmentation of lung nodule image sequences based on superpixels and density-based spatial clustering of applications with noise (DBSCAN). First, our method uses three-dimensional computed tomography image features of the average intensity projection combined with multi-scale dot enhancement for preprocessing. Hexagonal clustering and morphological optimized sequential linear iterative clustering (HMSLIC) for sequence image oversegmentation is then proposed to obtain superpixel blocks. The adaptive weight coefficient is then constructed to calculate the distance required between superpixels to achieve precise lung nodules positioning and to obtain the subsequent clustering starting block. Moreover, by fitting the distance and detecting the change in slope, an accurate clustering threshold is obtained. Thereafter, a fast DBSCAN superpixel sequence clustering algorithm, which is optimized by the strategy of only clustering the lung nodules and adaptive threshold, is then used to obtain lung nodule mask sequences. Finally, the lung nodule image sequences are obtained. The experimental results show that our method rapidly, completely and accurately segments various types of lung nodule image sequences.

  3. De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units

    Directory of Open Access Journals (Sweden)

    Sarah L. Westcott

    2015-12-01

    Full Text Available Background. 16S rRNA gene sequences are routinely assigned to operational taxonomic units (OTUs that are then used to analyze complex microbial communities. A number of methods have been employed to carry out the assignment of 16S rRNA gene sequences to OTUs leading to confusion over which method is optimal. A recent study suggested that a clustering method should be selected based on its ability to generate stable OTU assignments that do not change as additional sequences are added to the dataset. In contrast, we contend that the quality of the OTU assignments, the ability of the method to properly represent the distances between the sequences, is more important.Methods. Our analysis implemented six de novo clustering algorithms including the single linkage, complete linkage, average linkage, abundance-based greedy clustering, distance-based greedy clustering, and Swarm and the open and closed-reference methods. Using two previously published datasets we used the Matthew’s Correlation Coefficient (MCC to assess the stability and quality of OTU assignments.Results. The stability of OTU assignments did not reflect the quality of the assignments. Depending on the dataset being analyzed, the average linkage and the distance and abundance-based greedy clustering methods generated OTUs that were more likely to represent the actual distances between sequences than the open and closed-reference methods. We also demonstrated that for the greedy algorithms VSEARCH produced assignments that were comparable to those produced by USEARCH making VSEARCH a viable free and open source alternative to USEARCH. Further interrogation of the reference-based methods indicated that when USEARCH or VSEARCH were used to identify the closest reference, the OTU assignments were sensitive to the order of the reference sequences because the reference sequences can be identical over the region being considered. More troubling was the observation that while both USEARCH and

  4. Supplier Risk Assessment Based on Best-Worst Method and K-Means Clustering: A Case Study

    Directory of Open Access Journals (Sweden)

    Merve Er Kara

    2018-04-01

    Full Text Available Supplier evaluation and selection is one of the most critical strategic decisions for developing a competitive and sustainable organization. Companies have to consider supplier related risks and threats in their purchasing decisions. In today’s competitive and risky business environment, it is very important to work with reliable suppliers. This study proposes a clustering based approach to group suppliers based on their risk profile. Suppliers of a company in the heavy-machinery sector are assessed based on 17 qualitative and quantitative risk types. The weights of the criteria are determined by using the Best-Worst method. Four factors are extracted by applying Factor Analysis to the supplier risk data. Then k-means clustering algorithm is applied to group core suppliers of the company based on the four risk factors. Three clusters are created with different risk exposure levels. The interpretation of the results provides insights for risk management actions and supplier development programs to mitigate supplier risk.

  5. K2: A NEW METHOD FOR THE DETECTION OF GALAXY CLUSTERS BASED ON CANADA-FRANCE-HAWAII TELESCOPE LEGACY SURVEY MULTICOLOR IMAGES

    International Nuclear Information System (INIS)

    Thanjavur, Karun; Willis, Jon; Crampton, David

    2009-01-01

    We have developed a new method, K2, optimized for the detection of galaxy clusters in multicolor images. Based on the Red Sequence approach, K2 detects clusters using simultaneous enhancements in both colors and position. The detection significance is robustly determined through extensive Monte Carlo simulations and through comparison with available cluster catalogs based on two different optical methods, and also on X-ray data. K2 also provides quantitative estimates of the candidate clusters' richness and photometric redshifts. Initially, K2 was applied to the two color (gri) 161 deg 2 images of the Canada-France-Hawaii Telescope Legacy Survey Wide (CFHTLS-W) data. Our simulations show that the false detection rate for these data, at our selected threshold, is only ∼1%, and that the cluster catalogs are ∼80% complete up to a redshift of z = 0.6 for Fornax-like and richer clusters and to z ∼ 0.3 for poorer clusters. Based on the g-, r-, and i-band photometric catalogs of the Terapix T05 release, 35 clusters/deg 2 are detected, with 1-2 Fornax-like or richer clusters every 2 deg 2 . Catalogs containing data for 6144 galaxy clusters have been prepared, of which 239 are rich clusters. These clusters, especially the latter, are being searched for gravitational lenses-one of our chief motivations for cluster detection in CFHTLS. The K2 method can be easily extended to use additional color information and thus improve overall cluster detection to higher redshifts. The complete set of K2 cluster catalogs, along with the supplementary catalogs for the member galaxies, are available on request from the authors.

  6. A Novel Method of Statistical Line Loss Estimation for Distribution Feeders Based on Feeder Cluster and Modified XGBoost

    Directory of Open Access Journals (Sweden)

    Shouxiang Wang

    2017-12-01

    Full Text Available The estimation of losses of distribution feeders plays a crucial guiding role for the planning, design, and operation of a distribution system. This paper proposes a novel estimation method of statistical line loss of distribution feeders using the feeder cluster technique and modified eXtreme Gradient Boosting (XGBoost algorithm that is based on the characteristic data of feeders that are collected in the smart power distribution and utilization system. In order to enhance the applicability and accuracy of the estimation model, k-medoids algorithm with weighting distance for clustering distribution feeders is proposed. Meanwhile, a variable selection method for clustering distribution feeders is discussed, considering the correlation and validity of variables. This paper next modifies the XGBoost algorithm by adding a penalty function in consideration of the effect of the theoretical value to the loss function for the estimation of statistical line loss of distribution feeders. The validity of the proposed methodology is verified by 762 distribution feeders in the Shanghai distribution system. The results show that the XGBoost method has higher accuracy than decision tree, neural network, and random forests by comparison of Root Mean Square Error (RMSE, Mean Absolute Percentage Error (MAPE, and Absolute Percentage Error (APE indexes. In particular, the theoretical value can significantly improve the reasonability of estimated results.

  7. A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method.

    Science.gov (United States)

    Yoo, Illhoi; Hu, Xiaohua; Song, Il-Yeol

    2007-11-27

    A huge amount of biomedical textual information has been produced and collected in MEDLINE for decades. In order to easily utilize biomedical information in the free text, document clustering and text summarization together are used as a solution for text information overload problem. In this paper, we introduce a coherent graph-based semantic clustering and summarization approach for biomedical literature. Our extensive experimental results show the approach shows 45% cluster quality improvement and 72% clustering reliability improvement, in terms of misclassification index, over Bisecting K-means as a leading document clustering approach. In addition, our approach provides concise but rich text summary in key concepts and sentences. Our coherent biomedical literature clustering and summarization approach that takes advantage of ontology-enriched graphical representations significantly improves the quality of document clusters and understandability of documents through summaries.

  8. Scalable Density-Based Subspace Clustering

    DEFF Research Database (Denmark)

    Müller, Emmanuel; Assent, Ira; Günnemann, Stephan

    2011-01-01

    method that steers mining to few selected subspace clusters. Our novel steering technique reduces subspace processing by identifying and clustering promising subspaces and their combinations directly. Thereby, it narrows down the search space while maintaining accuracy. Thorough experiments on real...... and synthetic databases show that steering is efficient and scalable, with high quality results. For future work, our steering paradigm for density-based subspace clustering opens research potential for speeding up other subspace clustering approaches as well....

  9. Consumers' Kansei Needs Clustering Method for Product Emotional Design Based on Numerical Design Structure Matrix and Genetic Algorithms.

    Science.gov (United States)

    Yang, Yan-Pu; Chen, Deng-Kai; Gu, Rong; Gu, Yu-Feng; Yu, Sui-Huai

    2016-01-01

    Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design.

  10. An Improved Clustering Method for Detection System of Public Security Events Based on Genetic Algorithm and Semisupervised Learning

    Directory of Open Access Journals (Sweden)

    Heng Wang

    2017-01-01

    Full Text Available The occurrence of series of events is always associated with the news report, social network, and Internet media. In this paper, a detecting system for public security events is designed, which carries out clustering operation to cluster relevant text data, in order to benefit relevant departments by evaluation and handling. Firstly, texts are mapped into three-dimensional space using the vector space model. Then, to overcome the shortcoming of the traditional clustering algorithm, an improved fuzzy c-means (FCM algorithm based on adaptive genetic algorithm and semisupervised learning is proposed. In the proposed algorithm, adaptive genetic algorithm is employed to select optimal initial clustering centers. Meanwhile, motivated by semisupervised learning, guiding effect of prior knowledge is used to accelerate iterative process. Finally, simulation experiments are conducted from two aspects of qualitative analysis and quantitative analysis, which demonstrate that the proposed algorithm performs excellently in improving clustering centers, clustering results, and consuming time.

  11. AF-DHNN: Fuzzy Clustering and Inference-Based Node Fault Diagnosis Method for Fire Detection.

    Science.gov (United States)

    Jin, Shan; Cui, Wen; Jin, Zhigang; Wang, Ying

    2015-07-17

    Wireless Sensor Networks (WSNs) have been utilized for node fault diagnosis in the fire detection field since the 1990s. However, the traditional methods have some problems, including complicated system structures, intensive computation needs, unsteady data detection and local minimum values. In this paper, a new diagnosis mechanism for WSN nodes is proposed, which is based on fuzzy theory and an Adaptive Fuzzy Discrete Hopfield Neural Network (AF-DHNN). First, the original status of each sensor over time is obtained with two features. One is the root mean square of the filtered signal (FRMS), the other is the normalized summation of the positive amplitudes of the difference spectrum between the measured signal and the healthy one (NSDS). Secondly, distributed fuzzy inference is introduced. The evident abnormal nodes' status is pre-alarmed to save time. Thirdly, according to the dimensions of the diagnostic data, an adaptive diagnostic status system is established with a Fuzzy C-Means Algorithm (FCMA) and Sorting and Classification Algorithm to reducing the complexity of the fault determination. Fourthly, a Discrete Hopfield Neural Network (DHNN) with iterations is improved with the optimization of the sensors' detected status information and standard diagnostic levels, with which the associative memory is achieved, and the search efficiency is improved. The experimental results show that the AF-DHNN method can diagnose abnormal WSN node faults promptly and effectively, which improves the WSN reliability.

  12. Projection-based curve clustering

    International Nuclear Information System (INIS)

    Auder, Benjamin; Fischer, Aurelie

    2012-01-01

    This paper focuses on unsupervised curve classification in the context of nuclear industry. At the Commissariat a l'Energie Atomique (CEA), Cadarache (France), the thermal-hydraulic computer code CATHARE is used to study the reliability of reactor vessels. The code inputs are physical parameters and the outputs are time evolution curves of a few other physical quantities. As the CATHARE code is quite complex and CPU time-consuming, it has to be approximated by a regression model. This regression process involves a clustering step. In the present paper, the CATHARE output curves are clustered using a k-means scheme, with a projection onto a lower dimensional space. We study the properties of the empirically optimal cluster centres found by the clustering method based on projections, compared with the 'true' ones. The choice of the projection basis is discussed, and an algorithm is implemented to select the best projection basis among a library of orthonormal bases. The approach is illustrated on a simulated example and then applied to the industrial problem. (authors)

  13. A Novel Wireless Power Transfer-Based Weighed Clustering Cooperative Spectrum Sensing Method for Cognitive Sensor Networks.

    Science.gov (United States)

    Liu, Xin

    2015-10-30

    In a cognitive sensor network (CSN), the wastage of sensing time and energy is a challenge to cooperative spectrum sensing, when the number of cooperative cognitive nodes (CNs) becomes very large. In this paper, a novel wireless power transfer (WPT)-based weighed clustering cooperative spectrum sensing model is proposed, which divides all the CNs into several clusters, and then selects the most favorable CNs as the cluster heads and allows the common CNs to transfer the received radio frequency (RF) energy of the primary node (PN) to the cluster heads, in order to supply the electrical energy needed for sensing and cooperation. A joint resource optimization is formulated to maximize the spectrum access probability of the CSN, through jointly allocating sensing time and clustering number. According to the resource optimization results, a clustering algorithm is proposed. The simulation results have shown that compared to the traditional model, the cluster heads of the proposed model can achieve more transmission power and there exists optimal sensing time and clustering number to maximize the spectrum access probability.

  14. Cluster identification based on correlations.

    Science.gov (United States)

    Schulman, L S

    2012-04-01

    The problem addressed is the identification of cooperating agents based on correlations created as a result of the joint action of these and other agents. A systematic method for using correlations beyond second moments is developed. The technique is applied to a didactic example, the identification of alphabet letters based on correlations among the pixels used in an image of the letter. As in this example, agents can belong to more than one cluster. Moreover, the identification scheme does not require that the patterns be known ahead of time.

  15. Cluster identification based on correlations

    Science.gov (United States)

    Schulman, L. S.

    2012-04-01

    The problem addressed is the identification of cooperating agents based on correlations created as a result of the joint action of these and other agents. A systematic method for using correlations beyond second moments is developed. The technique is applied to a didactic example, the identification of alphabet letters based on correlations among the pixels used in an image of the letter. As in this example, agents can belong to more than one cluster. Moreover, the identification scheme does not require that the patterns be known ahead of time.

  16. Scalable implementations of accurate excited-state coupled cluster theories: application of high-level methods to porphyrin based systems

    Energy Technology Data Exchange (ETDEWEB)

    Kowalski, Karol; Krishnamoorthy, Sriram; Olson, Ryan M.; Tipparaju, Vinod; Apra, Edoardo

    2011-11-30

    The development of reliable tools for excited-state simulations is emerging as an extremely powerful computational chemistry tool for understanding complex processes in the broad class of light harvesting systems and optoelectronic devices. Over the last years we have been developing equation of motion coupled cluster (EOMCC) methods capable of tackling these problems. In this paper we discuss the parallel performance of EOMCC codes which provide accurate description of the excited-state correlation effects. Two aspects are discuss in details: (1) a new algorithm for the iterative EOMCC methods based on the novel task scheduling algorithms, and (2) parallel algorithms for the non-iterative methods describing the effect of triply excited configurations. We demonstrate that the most computationally intensive non-iterative part can take advantage of 210,000 cores of the Cray XT5 system at OLCF. In particular, we demonstrate the importance of non-iterative many-body methods for achieving experimental level of accuracy for several porphyrin-based system.

  17. Cluster Based Text Classification Model

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases......, the classifier is trained on each cluster having reduced dimensionality and less number of examples. The experimental results show that the proposed model outperforms the existing classification models for the task of suspicious email detection and topic categorization on the Reuters-21578 and 20 Newsgroups...... datasets. Our model also outperforms A Decision Cluster Classification (ADCC) and the Decision Cluster Forest Classification (DCFC) models on the Reuters-21578 dataset....

  18. A line-based spectral clustering method for efficient planar structure extraction from LiDAR data

    Directory of Open Access Journals (Sweden)

    Y. He

    2013-10-01

    Full Text Available Planar structures are essential components of the urban landscape and automated extraction planar structure from LiDAR data is a fundamental step in solving complex mapping tasks such as building recognition and urban modelling. This paper presents a new and effective method for planar structure extraction from airborne LiDAR data based on spectral clustering of straight line segments. The straight line segments are derived from LiDAR scan lines using an Iterative-End-Point-Fit simplification algorithm. Adjacency matrix is then formed based on pair-wise similarity of the extracted line segments, and a symmetric affine matrix is derived which is then decomposed into eigenspace. The planar structures are then detected by mean-shift clustering algorithm in eigenspace. The use of straight line segments facilitates the processing and significantly reduces the computational load. Spectral analysis of straight line segments in eigenspace makes the planar structures more prominent, resulting in a robust extraction of planar surfaces. Experiments are performed on the ISPRS benchmark LiDAR data over three test sites containing a variety of buildings with complex roof structures and varying sizes. The experimental results, which are quantitatively evaluated independently by the ISPRS benchmark test group, are presented. The results show that the proposed method achieves on average 80% of completeness with over 98% of correctness. Better performance is observed over larger size of buildings (>10m2 with over 92% of completeness and nearly 100% of correctness in all test areas, indicating the robustness and high reliability of the proposed algorithm.

  19. Distribution-Based Cluster Structure Selection.

    Science.gov (United States)

    Yu, Zhiwen; Zhu, Xianjun; Wong, Hau-San; You, Jane; Zhang, Jun; Han, Guoqiang

    2017-11-01

    The objective of cluster structure ensemble is to find a unified cluster structure from multiple cluster structures obtained from different datasets. Unfortunately, not all the cluster structures contribute to the unified cluster structure. This paper investigates the problem of how to select the suitable cluster structures in the ensemble which will be summarized to a more representative cluster structure. Specifically, the cluster structure is first represented by a mixture of Gaussian distributions, the parameters of which are estimated using the expectation-maximization algorithm. Then, several distribution-based distance functions are designed to evaluate the similarity between two cluster structures. Based on the similarity comparison results, we propose a new approach, which is referred to as the distribution-based cluster structure ensemble (DCSE) framework, to find the most representative unified cluster structure. We then design a new technique, the distribution-based cluster structure selection strategy (DCSSS), to select a subset of cluster structures. Finally, we propose using a distribution-based normalized hypergraph cut algorithm to generate the final result. In our experiments, a nonparametric test is adopted to evaluate the difference between DCSE and its competitors. We adopt 20 real-world datasets obtained from the University of California, Irvine and knowledge extraction based on evolutionary learning repositories, and a number of cancer gene expression profiles to evaluate the performance of the proposed methods. The experimental results show that: 1) DCSE works well on the real-world datasets and 2) DCSE based on DCSSS can further improve the performance of the algorithm.

  20. A Local Pair Natural Orbital-Based Multireference Mukherjee’s Coupled Cluster Method

    Czech Academy of Sciences Publication Activity Database

    Demel, Ondřej; Pittner, Jiří

    2015-01-01

    Roč. 11, č. 7 (2015), s. 3104-3114 ISSN 1549-9618 R&D Projects: GA ČR GAP208/11/2222; GA ČR(CZ) GJ15-00058Y Institutional support: RVO:61388955 Keywords : ELECTRON CORRELATION METHODS * BRILLOUIN-WIGNER * CONFIGURATION-INTERACTION Subject RIV: CF - Physical ; Theoretical Chemistry Impact factor: 5.301, year: 2015

  1. The SMART CLUSTER METHOD - adaptive earthquake cluster analysis and declustering

    Science.gov (United States)

    Schaefer, Andreas; Daniell, James; Wenzel, Friedemann

    2016-04-01

    Earthquake declustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity with usual applications comprising of probabilistic seismic hazard assessments (PSHAs) and earthquake prediction methods. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation. Various methods have been developed to address this issue from other researchers. These have differing ranges of complexity ranging from rather simple statistical window methods to complex epidemic models. This study introduces the smart cluster method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal identification. Hereby, an adaptive search algorithm for data point clusters is adopted. It uses the earthquake density in the spatio-temporal neighbourhood of each event to adjust the search properties. The identified clusters are subsequently analysed to determine directional anisotropy, focussing on a strong correlation along the rupture plane and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010/2011 Darfield-Christchurch events, an adaptive classification procedure is applied to disassemble subsequent ruptures which may have been grouped into an individual cluster using near-field searches, support vector machines and temporal splitting. The steering parameters of the search behaviour are linked to local earthquake properties like magnitude of completeness, earthquake density and Gutenberg-Richter parameters. The method is capable of identifying and classifying earthquake clusters in space and time. It is tested and validated using earthquake data from California and New Zealand. As a result of the cluster identification process, each event in

  2. DBH: A de Bruijn graph-based heuristic method for clustering large-scale 16S rRNA sequences into OTUs.

    Science.gov (United States)

    Wei, Ze-Gang; Zhang, Shao-Wu

    2017-07-21

    Recent sequencing revolution driven by high-throughput technologies has led to rapid accumulation of 16S rRNA sequences for microbial communities. Clustering short sequences into operational taxonomic units (OTUs) is an initial crucial process in analyzing metagenomic data. Although many heuristic methods have been proposed for OTU inferences with low computational complexity, they just select one sequence as the seed for each cluster and the results are sensitive to the selected sequences that represent the clusters. To address this issue, we present a de Bruijn graph-based heuristic clustering method (DBH) for clustering massive 16S rRNA sequences into OTUs by introducing a novel seed selection strategy and greedy clustering approach. Compared with existing widely used methods on several simulated and real-life metagenomic datasets, the results show that DBH has higher clustering performance and low memory usage, facilitating the overestimation of OTUs number. DBH is more effective to handle large-scale metagenomic datasets. The DBH software can be freely downloaded from https://github.com/nwpu134/DBH.git for academic users. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. A scheduling algorithm based on Clara clustering

    Science.gov (United States)

    Kuang, Ling; Zhang, Lichen

    2017-08-01

    Task scheduling is a key issue in cloud computing. A new algorithm for queuing task scheduling based on Clara clustering and SJF cloud computing is proposed to introduce the Clara clustering for the shortcomings of SJF algorithm load imbalance. The Clara clustering method prepares the task clustering based on the task execution time and the waiting time of the task, and then divides the task into three groups according to the reference point obtained by the clustering. Based on the number of tasks per group in the proportion of the total number of tasks assigned to the implementation of the quota. Each queue team will perform task scheduling based on these quotas and SJF. The simulation results show that the algorithm has good load balancing and system performance.

  4. [Research on K-means clustering segmentation method for MRI brain image based on selecting multi-peaks in gray histogram].

    Science.gov (United States)

    Chen, Zhaoxue; Yu, Haizhong; Chen, Hao

    2013-12-01

    To solve the problem of traditional K-means clustering in which initial clustering centers are selected randomly, we proposed a new K-means segmentation algorithm based on robustly selecting 'peaks' standing for White Matter, Gray Matter and Cerebrospinal Fluid in multi-peaks gray histogram of MRI brain image. The new algorithm takes gray value of selected histogram 'peaks' as the initial K-means clustering center and can segment the MRI brain image into three parts of tissue more effectively, accurately, steadily and successfully. Massive experiments have proved that the proposed algorithm can overcome many shortcomings caused by traditional K-means clustering method such as low efficiency, veracity, robustness and time consuming. The histogram 'peak' selecting idea of the proposed segmentootion method is of more universal availability.

  5. Clustering based gene expression feature selection method: A computational approach to enrich the classifier efficiency of differentially expressed genes

    KAUST Repository

    Abusamra, Heba

    2016-07-20

    The native nature of high dimension low sample size of gene expression data make the classification task more challenging. Therefore, feature (gene) selection become an apparent need. Selecting a meaningful and relevant genes for classifier not only decrease the computational time and cost, but also improve the classification performance. Among different approaches of feature selection methods, however most of them suffer from several problems such as lack of robustness, validation issues etc. Here, we present a new feature selection technique that takes advantage of clustering both samples and genes. Materials and methods We used leukemia gene expression dataset [1]. The effectiveness of the selected features were evaluated by four different classification methods; support vector machines, k-nearest neighbor, random forest, and linear discriminate analysis. The method evaluate the importance and relevance of each gene cluster by summing the expression level for each gene belongs to this cluster. The gene cluster consider important, if it satisfies conditions depend on thresholds and percentage otherwise eliminated. Results Initial analysis identified 7120 differentially expressed genes of leukemia (Fig. 15a), after applying our feature selection methodology we end up with specific 1117 genes discriminating two classes of leukemia (Fig. 15b). Further applying the same method with more stringent higher positive and lower negative threshold condition, number reduced to 58 genes have be tested to evaluate the effectiveness of the method (Fig. 15c). The results of the four classification methods are summarized in Table 11. Conclusions The feature selection method gave good results with minimum classification error. Our heat-map result shows distinct pattern of refines genes discriminating between two classes of leukemia.

  6. Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster

    Science.gov (United States)

    Syakur, M. A.; Khotimah, B. K.; Rochman, E. M. S.; Satoto, B. D.

    2018-04-01

    Clustering is a data mining technique used to analyse data that has variations and the number of lots. Clustering was process of grouping data into a cluster, so they contained data that is as similar as possible and different from other cluster objects. SMEs Indonesia has a variety of customers, but SMEs do not have the mapping of these customers so they did not know which customers are loyal or otherwise. Customer mapping is a grouping of customer profiling to facilitate analysis and policy of SMEs in the production of goods, especially batik sales. Researchers will use a combination of K-Means method with elbow to improve efficient and effective k-means performance in processing large amounts of data. K-Means Clustering is a localized optimization method that is sensitive to the selection of the starting position from the midpoint of the cluster. So choosing the starting position from the midpoint of a bad cluster will result in K-Means Clustering algorithm resulting in high errors and poor cluster results. The K-means algorithm has problems in determining the best number of clusters. So Elbow looks for the best number of clusters on the K-means method. Based on the results obtained from the process in determining the best number of clusters with elbow method can produce the same number of clusters K on the amount of different data. The result of determining the best number of clusters with elbow method will be the default for characteristic process based on case study. Measurement of k-means value of k-means has resulted in the best clusters based on SSE values on 500 clusters of batik visitors. The result shows the cluster has a sharp decrease is at K = 3, so K as the cut-off point as the best cluster.

  7. The polarizable embedding coupled cluster method

    DEFF Research Database (Denmark)

    Sneskov, Kristian; Schwabe, Tobias; Kongsted, Jacob

    2011-01-01

    We formulate a new combined quantum mechanics/molecular mechanics (QM/MM) method based on a self-consistent polarizable embedding (PE) scheme. For the description of the QM region, we apply the popular coupled cluster (CC) method detailing the inclusion of electrostatic and polarization effects......-called PE-CCSDR(3) model. Finally, we utilize the presented method in the description of a full protein by investigating the shift of the intense electronic excitation energy of the photoactive yellow protein due to the surrounding amino acids....

  8. Neural networks and Fuzzy clustering methods for assessing the efficacy of microarray based intrinsic gene signatures in breast cancer classification and the character and relations of identified subtypes.

    Science.gov (United States)

    Samarasinghe, Sandhya; Chaiboonchoe, Amphun

    2015-01-01

    In the classification of breast cancer subtypes using microarray data, hierarchical clustering is commonly used. Although this form of clustering shows basic cluster patterns, more needs to be done to investigate the accuracy of clusters as well as to extract meaningful cluster characteristics and their relations to increase our confidence in their use in a clinical setting. In this study, an in-depth investigation of the efficacy of three reported gene subsets in distinguishing breast cancer subtypes was performed using four advanced computational intelligence methods-Self-Organizing Maps (SOM), Emergent Self-Organizing Maps (ESOM), Fuzzy Clustering by Local Approximation of Memberships (FLAME), and Fuzzy C-means (FCM)-each differing in the way they view data in terms of distance measures and fuzzy or crisp clustering. The gene subsets consisted of 71, 93, and 71 genes reported in the literature from three comprehensive experimental studies for distinguishing Luminal (A and B), Basal, Normal breast-like, and HER2 subtypes. Given the costly procedures involved in clinical studies, the proposed 93-gene set can be used for preliminary classification of breast cancer. Then, as a decision aid, SOM can be used to map the gene signature of a new patient to locate them with respect to all subtypes to get a comprehensive view of the classification. These can be followed by a deeper investigation in the light of the observations made in this study regarding overlapping subtypes. Results from the study could be used as the base for further refining the gene signatures from later experiments and from new experiments designed to separate overlapping clusters as well as to maximally separate all clusters.

  9. Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential Evolution

    OpenAIRE

    Satish Gajawada; Durga Toshniwal

    2012-01-01

    Differential Evolution (DE) is an algorithm for evolutionary optimization. Clustering problems have beensolved by using DE based clustering methods but these methods may fail to find clusters hidden insubspaces of high dimensional datasets. Subspace and projected clustering methods have been proposed inliterature to find subspace clusters that are present in subspaces of dataset. In this paper we proposeVINAYAKA, a semi-supervised projected clustering method based on DE. In this method DE opt...

  10. Evidence-based case selection: An innovative knowledge management method to cluster public technical and vocational education and training colleges in South Africa

    Directory of Open Access Journals (Sweden)

    Margaretha M. Visser

    2017-03-01

    Full Text Available Background: Case studies are core constructs used in information management research. A persistent challenge for business, information management and social science researchers is how to select a representative sample of cases among a population with diverse characteristics when convenient or purposive sampling is not considered rigorous enough. The context of the study is post-school education, and it involves an investigation of quantitative methods of clustering the population of public technical and vocational education and training (TVET colleges in South Africa into groups with a similar level of maturity in terms of their information systems. Objectives: The aim of the study was to propose an evidence-based quantitative method for the selection of cases for case study research and to demonstrate the use and usefulness thereof by clustering public TVET colleges. Method: The clustering method was based on the use of a representative characteristic of the context, as a proxy. In this context of management information systems (MISs, website maturity was used as a proxy and website maturity model theory was used in the development of an evaluation questionnaire. The questionnaire was used for capturing data on website characteristics, which was used to determine website maturity. The websites of the 50 public TVET colleges were evaluated by nine evaluators. Multiple statistical techniques were applied to establish inter-rater reliability and to produce clusters of colleges. Results: The analyses revealed three clusters of public TVET colleges based on their website maturity levels. The first cluster includes three colleges with no websites or websites at a low maturity level. The second cluster consists of 30 colleges with websites at an average maturity level. The third cluster contains 17 colleges with websites at a high maturity level. Conclusion: The main contribution to the knowledge domain is an innovative quantitative method employing a

  11. Visual cluster analysis and pattern recognition methods

    Science.gov (United States)

    Osbourn, Gordon Cecil; Martinez, Rubel Francisco

    2001-01-01

    A method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.

  12. Semi-supervised spectral algorithms for community detection in complex networks based on equivalence of clustering methods

    Science.gov (United States)

    Ma, Xiaoke; Wang, Bingbo; Yu, Liang

    2018-01-01

    Community detection is fundamental for revealing the structure-functionality relationship in complex networks, which involves two issues-the quantitative function for community as well as algorithms to discover communities. Despite significant research on either of them, few attempt has been made to establish the connection between the two issues. To attack this problem, a generalized quantification function is proposed for community in weighted networks, which provides a framework that unifies several well-known measures. Then, we prove that the trace optimization of the proposed measure is equivalent with the objective functions of algorithms such as nonnegative matrix factorization, kernel K-means as well as spectral clustering. It serves as the theoretical foundation for designing algorithms for community detection. On the second issue, a semi-supervised spectral clustering algorithm is developed by exploring the equivalence relation via combining the nonnegative matrix factorization and spectral clustering. Different from the traditional semi-supervised algorithms, the partial supervision is integrated into the objective of the spectral algorithm. Finally, through extensive experiments on both artificial and real world networks, we demonstrate that the proposed method improves the accuracy of the traditional spectral algorithms in community detection.

  13. Fuzzy Clustering Methods and their Application to Fuzzy Modeling

    DEFF Research Database (Denmark)

    Kroszynski, Uri; Zhou, Jianjun

    1999-01-01

    . A method to obtain an optimized number of clusters is outlined. Based upon the cluster's characteristics, a behavioural model is formulated in terms of a rule-base and an inference engine. The article reviews several variants for the model formulation. Some limitations of the methods are listed......Fuzzy modeling techniques based upon the analysis of measured input/output data sets result in a set of rules that allow to predict system outputs from given inputs. Fuzzy clustering methods for system modeling and identification result in relatively small rule-bases, allowing fast, yet accurate...

  14. Normalization based K means Clustering Algorithm

    OpenAIRE

    Virmani, Deepali; Taneja, Shweta; Malhotra, Geetika

    2015-01-01

    K-means is an effective clustering technique used to separate similar data into groups based on initial centroids of clusters. In this paper, Normalization based K-means clustering algorithm(N-K means) is proposed. Proposed N-K means clustering algorithm applies normalization prior to clustering on the available data as well as the proposed approach calculates initial centroids based on weights. Experimental results prove the betterment of proposed N-K means clustering algorithm over existing...

  15. Pragmatic Cluster Randomized Trials Using Covariate Constrained Randomization: A Method for Practice-based Research Networks (PBRNs).

    Science.gov (United States)

    Dickinson, L Miriam; Beaty, Brenda; Fox, Chet; Pace, Wilson; Dickinson, W Perry; Emsermann, Caroline; Kempe, Allison

    2015-01-01

    Cluster randomized trials (CRTs) are useful in practice-based research network translational research. However, simple or stratified randomization often yields study groups that differ on key baseline variables when the number of clusters is small. Unbalanced study arms constitute a potentially serious methodological problem for CRTs. Covariate constrained randomization with data on relevant variables before randomization was used to achieve balanced study arms in 2 pragmatic CRTs. In study 1, 16 counties in Colorado were randomized to practice-based or population-based reminder recall for vaccinating children ages 19 to 35 months. In study 2, 18 primary care practices were randomized to computer decision support plus practice facilitation versus computer decision support alone to improve care for patients with stage 3 and 4 chronic kidney disease. For each study, a set of optimal randomizations, which minimized differences of key variables between study arms, was identified from the set of all possible randomizations. Differences between study arms were smaller in the optimal versus remaining randomizations. Even for the randomization in the optimal set with the largest difference between groups, study arms did not differ significantly on any variable for either study (P > .05). Covariate constrained randomization, which restricts the full randomization set to a subset in which differences between study arms are minimized, is a useful tool for achieving balanced study arms in CRTs. Because of the increasing recognition of the risk of imbalance in CRTs and implications for interpreting study findings, procedures of this type should be considered in designing practice-based or community-based trials. © Copyright 2015 by the American Board of Family Medicine.

  16. Quantum Monte Carlo methods and lithium cluster properties. [Atomic clusters

    Energy Technology Data Exchange (ETDEWEB)

    Owen, R.K.

    1990-12-01

    Properties of small lithium clusters with sizes ranging from n = 1 to 5 atoms were investigated using quantum Monte Carlo (QMC) methods. Cluster geometries were found from complete active space self consistent field (CASSCF) calculations. A detailed development of the QMC method leading to the variational QMC (V-QMC) and diffusion QMC (D-QMC) methods is shown. The many-body aspect of electron correlation is introduced into the QMC importance sampling electron-electron correlation functions by using density dependent parameters, and are shown to increase the amount of correlation energy obtained in V-QMC calculations. A detailed analysis of D-QMC time-step bias is made and is found to be at least linear with respect to the time-step. The D-QMC calculations determined the lithium cluster ionization potentials to be 0.1982(14) (0.1981), 0.1895(9) (0.1874(4)), 0.1530(34) (0.1599(73)), 0.1664(37) (0.1724(110)), 0.1613(43) (0.1675(110)) Hartrees for lithium clusters n = 1 through 5, respectively; in good agreement with experimental results shown in the brackets. Also, the binding energies per atom was computed to be 0.0177(8) (0.0203(12)), 0.0188(10) (0.0220(21)), 0.0247(8) (0.0310(12)), 0.0253(8) (0.0351(8)) Hartrees for lithium clusters n = 2 through 5, respectively. The lithium cluster one-electron density is shown to have charge concentrations corresponding to nonnuclear attractors. The overall shape of the electronic charge density also bears a remarkable similarity with the anisotropic harmonic oscillator model shape for the given number of valence electrons.

  17. Development of innovative methods for risk assessment in high-rise construction based on clustering of risk factors

    Science.gov (United States)

    Okolelova, Ella; Shibaeva, Marina; Shalnev, Oleg

    2018-03-01

    The article analyses risks in high-rise construction in terms of investment value with account of the maximum probable loss in case of risk event. The authors scrutinized the risks of high-rise construction in regions with various geographic, climatic and socio-economic conditions that may influence the project environment. Risk classification is presented in general terms, that includes aggregated characteristics of risks being common for many regions. Cluster analysis tools, that allow considering generalized groups of risk depending on their qualitative and quantitative features, were used in order to model the influence of the risk factors on the implementation of investment project. For convenience of further calculations, each type of risk is assigned a separate code with the number of the cluster and the subtype of risk. This approach and the coding of risk factors makes it possible to build a risk matrix, which greatly facilitates the task of determining the degree of impact of risks. The authors clarified and expanded the concept of the price risk, which is defined as the expected value of the event, 105 which extends the capabilities of the model, allows estimating an interval of the probability of occurrence and also using other probabilistic methods of calculation.

  18. Tidal evolution of globular clusters. I - Method

    Science.gov (United States)

    Oh, K. S.; Lin, D. N. C.; Aarseth, S. J.

    1992-01-01

    Tidal evolution of globular clusters is regulated by both Galactic tidal effects and internal relaxation processes. In order to investigate the tidal evolution of globular clusters, a numerical scheme which utilizes a Fokker-Planck approach as well as direct numerical integration of the restricted three-body problem is developed. In the inner regions of the cluster, stellar orbits are mapped with the cluster's gravitational potential and orbit-averaged diffusion coefficients. In the outer regions, the Galactic tidal field is explicitly included in the direct orbital integration. This method is presented here with some tests on King-Michie models.

  19. A graph clustering method for community detection in complex networks

    Science.gov (United States)

    Zhou, HongFang; Li, Jin; Li, JunHuai; Zhang, FaCun; Cui, YingAn

    2017-03-01

    Information mining from complex networks by identifying communities is an important problem in a number of research fields, including the social sciences, biology, physics and medicine. First, two concepts are introduced, Attracting Degree and Recommending Degree. Second, a graph clustering method, referred to as AR-Cluster, is presented for detecting community structures in complex networks. Third, a novel collaborative similarity measure is adopted to calculate node similarities. In the AR-Cluster method, vertices are grouped together based on calculated similarity under a K-Medoids framework. Extensive experimental results on two real datasets show the effectiveness of AR-Cluster.

  20. A particle swarm optimized kernel-based clustering method for crop mapping from multi-temporal polarimetric L-band SAR observations

    Science.gov (United States)

    Tamiminia, Haifa; Homayouni, Saeid; McNairn, Heather; Safari, Abdoreza

    2017-06-01

    Polarimetric Synthetic Aperture Radar (PolSAR) data, thanks to their specific characteristics such as high resolution, weather and daylight independence, have become a valuable source of information for environment monitoring and management. The discrimination capability of observations acquired by these sensors can be used for land cover classification and mapping. The aim of this paper is to propose an optimized kernel-based C-means clustering algorithm for agriculture crop mapping from multi-temporal PolSAR data. Firstly, several polarimetric features are extracted from preprocessed data. These features are linear polarization intensities, and several statistical and physical based decompositions such as Cloude-Pottier, Freeman-Durden and Yamaguchi techniques. Then, the kernelized version of hard and fuzzy C-means clustering algorithms are applied to these polarimetric features in order to identify crop types. The kernel function, unlike the conventional partitioning clustering algorithms, simplifies the non-spherical and non-linearly patterns of data structure, to be clustered easily. In addition, in order to enhance the results, Particle Swarm Optimization (PSO) algorithm is used to tune the kernel parameters, cluster centers and to optimize features selection. The efficiency of this method was evaluated by using multi-temporal UAVSAR L-band images acquired over an agricultural area near Winnipeg, Manitoba, Canada, during June and July in 2012. The results demonstrate more accurate crop maps using the proposed method when compared to the classical approaches, (e.g. 12% improvement in general). In addition, when the optimization technique is used, greater improvement is observed in crop classification, e.g. 5% in overall. Furthermore, a strong relationship between Freeman-Durden volume scattering component, which is related to canopy structure, and phenological growth stages is observed.

  1. A Novel Cluster Head Selection Algorithm Based on Fuzzy Clustering and Particle Swarm Optimization.

    Science.gov (United States)

    Ni, Qingjian; Pan, Qianqian; Du, Huimin; Cao, Cen; Zhai, Yuqing

    2017-01-01

    An important objective of wireless sensor network is to prolong the network life cycle, and topology control is of great significance for extending the network life cycle. Based on previous work, for cluster head selection in hierarchical topology control, we propose a solution based on fuzzy clustering preprocessing and particle swarm optimization. More specifically, first, fuzzy clustering algorithm is used to initial clustering for sensor nodes according to geographical locations, where a sensor node belongs to a cluster with a determined probability, and the number of initial clusters is analyzed and discussed. Furthermore, the fitness function is designed considering both the energy consumption and distance factors of wireless sensor network. Finally, the cluster head nodes in hierarchical topology are determined based on the improved particle swarm optimization. Experimental results show that, compared with traditional methods, the proposed method achieved the purpose of reducing the mortality rate of nodes and extending the network life cycle.

  2. Comparing the performance of biomedical clustering methods

    DEFF Research Database (Denmark)

    Wiwie, Christian; Baumbach, Jan; Röttger, Richard

    2015-01-01

    expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future....... This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide......-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art....

  3. ADVANCED CLUSTER BASED IMAGE SEGMENTATION

    Directory of Open Access Journals (Sweden)

    D. Kesavaraja

    2011-11-01

    Full Text Available This paper presents efficient and portable implementations of a useful image segmentation technique which makes use of the faster and a variant of the conventional connected components algorithm which we call parallel Components. In the Modern world majority of the doctors are need image segmentation as the service for various purposes and also they expect this system is run faster and secure. Usually Image segmentation Algorithms are not working faster. In spite of several ongoing researches in Conventional Segmentation and its Algorithms might not be able to run faster. So we propose a cluster computing environment for parallel image Segmentation to provide faster result. This paper is the real time implementation of Distributed Image Segmentation in Clustering of Nodes. We demonstrate the effectiveness and feasibility of our method on a set of Medical CT Scan Images. Our general framework is a single address space, distributed memory programming model. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. The image segmentation algorithm makes use of an efficient cluster process which uses a novel approach for parallel merging. Our experimental results are consistent with the theoretical analysis and practical results. It provides the faster execution time for segmentation, when compared with Conventional method. Our test data is different CT scan images from the Medical database. More efficient implementations of Image Segmentation will likely result in even faster execution times.

  4. Spanning Tree Based Attribute Clustering

    DEFF Research Database (Denmark)

    Zeng, Yifeng; Jorge, Cordero Hernandez

    2009-01-01

    inconsistent edges from a maximum spanning tree by starting appropriate initial modes, therefore generating stable clusters. It discovers sound clusters through simple graph operations and achieves significant computational savings. We compare the Star Discovery algorithm against earlier attribute clustering...

  5. A Latent Variable Clustering Method for Wireless Sensor Networks

    DEFF Research Database (Denmark)

    Vasilev, Vladislav; Iliev, Georgi; Poulkov, Vladimir

    2016-01-01

    In this paper we derive a clustering method based on the Hidden Conditional Random Field (HCRF) model in order to maximizes the performance of a wireless sensor. Our novel approach to clustering in this paper is in the application of an index invariant graph that we defined in a previous work...... and that precisely links a hyper-tree structure to the data set assumptions. We show that a set of conditional index invariant hyper graph forms a tree and then, we show that any tree factorization optimizes the conditional probability of an HCRF model. We evaluate our method based on a custom data set that we...... obtain by running simulations of a time dynamic sensor network. The performance of the proposed method outperforms the existing clustering methods, such as the Girvan-Newmans algorithm, the Kargers algorithm and the Spectral Clustering method, in terms of packet acceptance probability and delay....

  6. Communication: An improved linear scaling perturbative triples correction for the domain based local pair-natural orbital based singles and doubles coupled cluster method [DLPNO-CCSD(T)

    Science.gov (United States)

    Guo, Yang; Riplinger, Christoph; Becker, Ute; Liakos, Dimitrios G.; Minenkov, Yury; Cavallo, Luigi; Neese, Frank

    2018-01-01

    In this communication, an improved perturbative triples correction (T) algorithm for domain based local pair-natural orbital singles and doubles coupled cluster (DLPNO-CCSD) theory is reported. In our previous implementation, the semi-canonical approximation was used and linear scaling was achieved for both the DLPNO-CCSD and (T) parts of the calculation. In this work, we refer to this previous method as DLPNO-CCSD(T0) to emphasize the semi-canonical approximation. It is well-established that the DLPNO-CCSD method can predict very accurate absolute and relative energies with respect to the parent canonical CCSD method. However, the (T0) approximation may introduce significant errors in absolute energies as the triples correction grows up in magnitude. In the majority of cases, the relative energies from (T0) are as accurate as the canonical (T) results of themselves. Unfortunately, in rare cases and in particular for small gap systems, the (T0) approximation breaks down and relative energies show large deviations from the parent canonical CCSD(T) results. To address this problem, an iterative (T) algorithm based on the previous DLPNO-CCSD(T0) algorithm has been implemented [abbreviated here as DLPNO-CCSD(T)]. Using triples natural orbitals to represent the virtual spaces for triples amplitudes, storage bottlenecks are avoided. Various carefully designed approximations ease the computational burden such that overall, the increase in the DLPNO-(T) calculation time over DLPNO-(T0) only amounts to a factor of about two (depending on the basis set). Benchmark calculations for the GMTKN30 database show that compared to DLPNO-CCSD(T0), the errors in absolute energies are greatly reduced and relative energies are moderately improved. The particularly problematic case of cumulene chains of increasing lengths is also successfully addressed by DLPNO-CCSD(T).

  7. Communication: An improved linear scaling perturbative triples correction for the domain based local pair-natural orbital based singles and doubles coupled cluster method [DLPNO-CCSD(T)

    KAUST Repository

    Guo, Yang

    2018-01-04

    In this communication, an improved perturbative triples correction (T) algorithm for domain based local pair-natural orbital singles and doubles coupled cluster (DLPNO-CCSD) theory is reported. In our previous implementation, the semi-canonical approximation was used and linear scaling was achieved for both the DLPNO-CCSD and (T) parts of the calculation. In this work, we refer to this previous method as DLPNO-CCSD(T0) to emphasize the semi-canonical approximation. It is well-established that the DLPNO-CCSD method can predict very accurate absolute and relative energies with respect to the parent canonical CCSD method. However, the (T0) approximation may introduce significant errors in absolute energies as the triples correction grows up in magnitude. In the majority of cases, the relative energies from (T0) are as accurate as the canonical (T) results of themselves. Unfortunately, in rare cases and in particular for small gap systems, the (T0) approximation breaks down and relative energies show large deviations from the parent canonical CCSD(T) results. To address this problem, an iterative (T) algorithm based on the previous DLPNO-CCSD(T0) algorithm has been implemented [abbreviated here as DLPNO-CCSD(T)]. Using triples natural orbitals to represent the virtual spaces for triples amplitudes, storage bottlenecks are avoided. Various carefully designed approximations ease the computational burden such that overall, the increase in the DLPNO-(T) calculation time over DLPNO-(T0) only amounts to a factor of about two (depending on the basis set). Benchmark calculations for the GMTKN30 database show that compared to DLPNO-CCSD(T0), the errors in absolute energies are greatly reduced and relative energies are moderately improved. The particularly problematic case of cumulene chains of increasing lengths is also successfully addressed by DLPNO-CCSD(T).

  8. Progeny Clustering: A Method to Identify Biological Phenotypes

    Science.gov (United States)

    Hu, Chenyue W.; Kornblau, Steven M.; Slater, John H.; Qutub, Amina A.

    2015-01-01

    Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset. PMID:26267476

  9. A two-stage cluster sampling method using gridded population data, a GIS, and Google EarthTM imagery in a population-based mortality survey in Iraq

    Directory of Open Access Journals (Sweden)

    Galway LP

    2012-04-01

    Full Text Available Abstract Background Mortality estimates can measure and monitor the impacts of conflict on a population, guide humanitarian efforts, and help to better understand the public health impacts of conflict. Vital statistics registration and surveillance systems are rarely functional in conflict settings, posing a challenge of estimating mortality using retrospective population-based surveys. Results We present a two-stage cluster sampling method for application in population-based mortality surveys. The sampling method utilizes gridded population data and a geographic information system (GIS to select clusters in the first sampling stage and Google Earth TM imagery and sampling grids to select households in the second sampling stage. The sampling method is implemented in a household mortality study in Iraq in 2011. Factors affecting feasibility and methodological quality are described. Conclusion Sampling is a challenge in retrospective population-based mortality studies and alternatives that improve on the conventional approaches are needed. The sampling strategy presented here was designed to generate a representative sample of the Iraqi population while reducing the potential for bias and considering the context specific challenges of the study setting. This sampling strategy, or variations on it, are adaptable and should be considered and tested in other conflict settings.

  10. A Hybrid Method for Image Segmentation Based on Artificial Fish Swarm Algorithm and Fuzzy c-Means Clustering.

    Science.gov (United States)

    Ma, Li; Li, Yang; Fan, Suohai; Fan, Runzhu

    2015-01-01

    Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) clustering is one of the popular clustering algorithms for medical image segmentation. However, FCM has the problems of depending on initial clustering centers, falling into local optimal solution easily, and sensitivity to noise disturbance. To solve these problems, this paper proposes a hybrid artificial fish swarm algorithm (HAFSA). The proposed algorithm combines artificial fish swarm algorithm (AFSA) with FCM whose advantages of global optimization searching and parallel computing ability of AFSA are utilized to find a superior result. Meanwhile, Metropolis criterion and noise reduction mechanism are introduced to AFSA for enhancing the convergence rate and antinoise ability. The artificial grid graph and Magnetic Resonance Imaging (MRI) are used in the experiments, and the experimental results show that the proposed algorithm has stronger antinoise ability and higher precision. A number of evaluation indicators also demonstrate that the effect of HAFSA is more excellent than FCM and suppressed FCM (SFCM).

  11. Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. II. Linear scaling domain based pair natural orbital coupled cluster theory

    International Nuclear Information System (INIS)

    Riplinger, Christoph; Pinski, Peter; Becker, Ute; Neese, Frank; Valeev, Edward F.

    2016-01-01

    Domain based local pair natural orbital coupled cluster theory with single-, double-, and perturbative triple excitations (DLPNO-CCSD(T)) is a highly efficient local correlation method. It is known to be accurate and robust and can be used in a black box fashion in order to obtain coupled cluster quality total energies for large molecules with several hundred atoms. While previous implementations showed near linear scaling up to a few hundred atoms, several nonlinear scaling steps limited the applicability of the method for very large systems. In this work, these limitations are overcome and a linear scaling DLPNO-CCSD(T) method for closed shell systems is reported. The new implementation is based on the concept of sparse maps that was introduced in Part I of this series [P. Pinski, C. Riplinger, E. F. Valeev, and F. Neese, J. Chem. Phys. 143, 034108 (2015)]. Using the sparse map infrastructure, all essential computational steps (integral transformation and storage, initial guess, pair natural orbital construction, amplitude iterations, triples correction) are achieved in a linear scaling fashion. In addition, a number of additional algorithmic improvements are reported that lead to significant speedups of the method. The new, linear-scaling DLPNO-CCSD(T) implementation typically is 7 times faster than the previous implementation and consumes 4 times less disk space for large three-dimensional systems. For linear systems, the performance gains and memory savings are substantially larger. Calculations with more than 20 000 basis functions and 1000 atoms are reported in this work. In all cases, the time required for the coupled cluster step is comparable to or lower than for the preceding Hartree-Fock calculation, even if this is carried out with the efficient resolution-of-the-identity and chain-of-spheres approximations. The new implementation even reduces the error in absolute correlation energies by about a factor of two, compared to the already accurate

  12. Progressive Exponential Clustering-Based Steganography

    Directory of Open Access Journals (Sweden)

    Li Yue

    2010-01-01

    Full Text Available Cluster indexing-based steganography is an important branch of data-hiding techniques. Such schemes normally achieve good balance between high embedding capacity and low embedding distortion. However, most cluster indexing-based steganographic schemes utilise less efficient clustering algorithms for embedding data, which causes redundancy and leaves room for increasing the embedding capacity further. In this paper, a new clustering algorithm, called progressive exponential clustering (PEC, is applied to increase the embedding capacity by avoiding redundancy. Meanwhile, a cluster expansion algorithm is also developed in order to further increase the capacity without sacrificing imperceptibility.

  13. Clustering Incomplete Spectral Data with Robust Methods

    Science.gov (United States)

    Äyrämö, S.; Pölönen, I.; Eskelinen, M. A.

    2017-10-01

    Missing value imputation is a common approach for preprocessing incomplete data sets. In case of data clustering, imputation methods may cause unexpected bias because they may change the underlying structure of the data. In order to avoid prior imputation of missing values the computational operations must be projected on the available data values. In this paper, we apply a robust nan-K-spatmed algorithm to the clustering problem on hyperspectral image data. Robust statistics, such as multivariate medians, are more insensitive to outliers than classical statistics relying on the Gaussian assumptions. They are, however, computationally more intractable due to the lack of closed-form solutions. We will compare robust clustering methods on the bands incomplete data cubes to standard K-means with full data cubes.

  14. METHOD OF CONSTRUCTION OF GENETIC DATA CLUSTERS

    Directory of Open Access Journals (Sweden)

    N. A. Novoselova

    2016-01-01

    Full Text Available The paper presents a method of construction of genetic data clusters (functional modules using the randomized matrices. To build the functional modules the selection and analysis of the eigenvalues of the gene profiles correlation matrix is performed. The principal components, corresponding to the eigenvalues, which are significantly different from those obtained for the randomly generated correlation matrix, are used for the analysis. Each selected principal component forms gene cluster. In a comparative experiment with the analogs the proposed method shows the advantage in allocating statistically significant different-sized clusters, the ability to filter non- informative genes and to extract the biologically interpretable functional modules matching the real data structure.

  15. Classical Music Clustering Based on Acoustic Features

    OpenAIRE

    Wang, Xindi; Haque, Syed Arefinul

    2017-01-01

    In this paper we cluster 330 classical music pieces collected from MusicNet database based on their musical note sequence. We use shingling and chord trajectory matrices to create signature for each music piece and performed spectral clustering to find the clusters. Based on different resolution, the output clusters distinctively indicate composition from different classical music era and different composing style of the musicians.

  16. CCM: A Text Classification Method by Clustering

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    In this paper, a new Cluster based Classification Model (CCM) for suspicious email detection and other text classification tasks, is presented. Comparative experiments of the proposed model against traditional classification models and the boosting algorithm are also discussed. Experimental results...... approach to text classification tasks simplifies the model and at the same time increases the accuracy....... show that the CCM outperforms traditional classification models as well as the boosting algorithm for the task of suspicious email detection on terrorism domain email dataset and topic categorization on the Reuters-21578 and 20 Newsgroups datasets. The overall finding is that applying a cluster based...

  17. Hybrid coupled cluster methods: combining active space coupled cluster methods with coupled cluster singles, doubles, and perturbative triples.

    Science.gov (United States)

    Kou, Zhuangfei; Shen, Jun; Xu, Enhua; Li, Shuhua

    2012-05-21

    Based on the coupled-cluster singles, doubles, and a hybrid treatment of triples (CCSD(T)-h) method developed by us [J. Shen, E. Xu, Z. Kou, and S. Li, J. Chem. Phys. 132, 114115 (2010); and ibid. 133, 234106 (2010); and ibid. 134, 044134 (2011)], we developed and implemented a new hybrid coupled cluster (CC) method, named CCSD(T)q-h, by combining CC singles and doubles, and active triples and quadruples (CCSDtq) with CCSD(T) to deal with the electronic structures of molecules with significant multireference character. These two hybrid CC methods can be solved with non-canonical and canonical MOs. With canonical MOs, the CCSD(T)-like equations in these two methods can be solved directly without iteration so that the storage of all triple excitation amplitudes can be avoided. A practical procedure to divide canonical MOs into active and inactive subsets is proposed. Numerical calculations demonstrated that CCSD(T)-h with canonical MOs can well reproduce the corresponding results obtained with non-canonical MOs. For three atom exchange reactions, we found that CCSD(T)-h can offer a significant improvement over the popular CCSD(T) method in describing the reaction barriers. For the bond-breaking processes in F(2) and H(2)O, our calculations demonstrated that CCSD(T)q-h is a good approximation to CCSDTQ over the entire bond dissociation processes.

  18. A Hybrid Method for Image Segmentation Based on Artificial Fish Swarm Algorithm and Fuzzy c-Means Clustering

    Directory of Open Access Journals (Sweden)

    Li Ma

    2015-01-01

    Full Text Available Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM clustering is one of the popular clustering algorithms for medical image segmentation. However, FCM has the problems of depending on initial clustering centers, falling into local optimal solution easily, and sensitivity to noise disturbance. To solve these problems, this paper proposes a hybrid artificial fish swarm algorithm (HAFSA. The proposed algorithm combines artificial fish swarm algorithm (AFSA with FCM whose advantages of global optimization searching and parallel computing ability of AFSA are utilized to find a superior result. Meanwhile, Metropolis criterion and noise reduction mechanism are introduced to AFSA for enhancing the convergence rate and antinoise ability. The artificial grid graph and Magnetic Resonance Imaging (MRI are used in the experiments, and the experimental results show that the proposed algorithm has stronger antinoise ability and higher precision. A number of evaluation indicators also demonstrate that the effect of HAFSA is more excellent than FCM and suppressed FCM (SFCM.

  19. Radionuclide identification using subtractive clustering method

    Energy Technology Data Exchange (ETDEWEB)

    Farias, Marcos Santana, E-mail: msantana@ien.gov.br [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil). Div. de Instrumentacao e Confiabilidade Humana; Nedjah, Nadia, E-mail: nadia@eng.uerj.br [Departamento de Engenharia Eletronica e Telecomunicacoes. Universidade do Estado do Rio de Janeiro (UERJ), RJ (Brazil); Mourelle, Luiza de Macedo, E-mail: ldmm@eng.uerj.br [Departamento de Engenharia de Sistemas e Computacao. Universidade do Estado do Rio de Janeiro (UERJ), RJ (Brazil)

    2011-07-01

    Radionuclide identification is crucial to planning protective measures in emergency situations. This paper presents the application of a method for a classification system of radioactive elements with a fast and efficient response. To achieve this goal is proposed the application of subtractive clustering algorithm. The proposed application can be implemented in reconfigurable hardware, a flexible medium to implement digital hardware circuits. (author)

  20. Radionuclide identification using subtractive clustering method

    International Nuclear Information System (INIS)

    Farias, Marcos Santana; Mourelle, Luiza de Macedo

    2011-01-01

    Radionuclide identification is crucial to planning protective measures in emergency situations. This paper presents the application of a method for a classification system of radioactive elements with a fast and efficient response. To achieve this goal is proposed the application of subtractive clustering algorithm. The proposed application can be implemented in reconfigurable hardware, a flexible medium to implement digital hardware circuits. (author)

  1. Orbit Clustering Based on Transfer Cost

    Science.gov (United States)

    Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.

    2013-01-01

    We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.

  2. Comparing clustering models in bank customers: Based on Fuzzy relational clustering approach

    Directory of Open Access Journals (Sweden)

    Ayad Hendalianpour

    2016-11-01

    Full Text Available Clustering is absolutely useful information to explore data structures and has been employed in many places. It organizes a set of objects into similar groups called clusters, and the objects within one cluster are both highly similar and dissimilar with the objects in other clusters. The K-mean, C-mean, Fuzzy C-mean and Kernel K-mean algorithms are the most popular clustering algorithms for their easy implementation and fast work, but in some cases we cannot use these algorithms. Regarding this, in this paper, a hybrid model for customer clustering is presented that is applicable in five banks of Fars Province, Shiraz, Iran. In this way, the fuzzy relation among customers is defined by using their features described in linguistic and quantitative variables. As follows, the customers of banks are grouped according to K-mean, C-mean, Fuzzy C-mean and Kernel K-mean algorithms and the proposed Fuzzy Relation Clustering (FRC algorithm. The aim of this paper is to show how to choose the best clustering algorithms based on density-based clustering and present a new clustering algorithm for both crisp and fuzzy variables. Finally, we apply the proposed approach to five datasets of customer's segmentation in banks. The result of the FCR shows the accuracy and high performance of FRC compared other clustering methods.

  3. APPECT: An Approximate Backbone-Based Clustering Algorithm for Tags

    DEFF Research Database (Denmark)

    Zong, Yu; Xu, Guandong; Jin, Pin

    2011-01-01

    resulting from the severe difficulty of ambiguity, redundancy and less semantic nature of tags. Clustering method is a useful tool to address the aforementioned difficulties. Most of the researches on tag clustering are directly using traditional clustering algorithms such as K-means or Hierarchical...... algorithm for Tags (APPECT). The main steps of APPECT are: (1) we execute the K-means algorithm on a tag similarity matrix for M times and collect a set of tag clustering results Z={C1,C2,…,Cm}; (2) we form the approximate backbone of Z by executing a greedy search; (3) we fix the approximate backbone...... Agglomerative Clustering on tagging data, which possess the inherent drawbacks, such as the sensitivity of initialization. In this paper, we instead make use of the approximate backbone of tag clustering results to find out better tag clusters. In particular, we propose an APProximate backbonE-based Clustering...

  4. Cluster-based tangible programming

    CSIR Research Space (South Africa)

    Smith, Andrew C

    2014-05-01

    Full Text Available Clustering is the act of grouping items that belong together. In this paper we explore clustering as a means to construct tangible program logic, and specifically as a means to use multiple tangible objects collectively as a single tangible program...

  5. Cluster randomized trial of an active, multifaceted information dissemination intervention based on The WHO Reproductive health library to change obstetric practices: methods and design issues [ISRCTN14055385

    Directory of Open Access Journals (Sweden)

    Lumbiganon Pisake

    2004-01-01

    Full Text Available Abstract Background Effective strategies for implementing best practices in low and middle income countries are needed. RHL is an annually updated electronic publication containing Cochrane systematic reviews, commentaries and practical recommendations on how to implement evidence-based practices. We are conducting a trial to evaluate the improvement in obstetric practices using an active dissemination strategy to promote uptake of recommendations in The WHO Reproductive Health Library (RHL. Methods A cluster randomized trial to improve obstetric practices in 40 hospitals in Mexico and Thailand is conducted. The trial uses a stratified random allocation based on country, size and type of hospitals. The core intervention consists of three interactive workshops delivered over a period of six months. The main outcome measures are changes in clinical practices that are recommended in RHL measured approximately a year after the first workshop. Results The design and implementation of a complex intervention using a cluster randomized trial design are presented. Conclusion Designing the intervention, choosing outcome variables and implementing the protocol in two diverse settings has been a time-consuming and challenging process. We hope that sharing this experience will help others planning similar projects and improve our ability to implement change.

  6. A Trajectory Regression Clustering Technique Combining a Novel Fuzzy C-Means Clustering Algorithm with the Least Squares Method

    Directory of Open Access Journals (Sweden)

    Xiangbing Zhou

    2018-04-01

    Full Text Available Rapidly growing GPS (Global Positioning System trajectories hide much valuable information, such as city road planning, urban travel demand, and population migration. In order to mine the hidden information and to capture better clustering results, a trajectory regression clustering method (an unsupervised trajectory clustering method is proposed to reduce local information loss of the trajectory and to avoid getting stuck in the local optimum. Using this method, we first define our new concept of trajectory clustering and construct a novel partitioning (angle-based partitioning method of line segments; second, the Lagrange-based method and Hausdorff-based K-means++ are integrated in fuzzy C-means (FCM clustering, which are used to maintain the stability and the robustness of the clustering process; finally, least squares regression model is employed to achieve regression clustering of the trajectory. In our experiment, the performance and effectiveness of our method is validated against real-world taxi GPS data. When comparing our clustering algorithm with the partition-based clustering algorithms (K-means, K-median, and FCM, our experimental results demonstrate that the presented method is more effective and generates a more reasonable trajectory.

  7. Single-Pass Clustering Algorithm Based on Storm

    Science.gov (United States)

    Fang, LI; Longlong, DAI; Zhiying, JIANG; Shunzi, LI

    2017-02-01

    The dramatically increasing volume of data makes the computational complexity of traditional clustering algorithm rise rapidly accordingly, which leads to the longer time. So as to improve the efficiency of the stream data clustering, a distributed real-time clustering algorithm (S-Single-Pass) based on the classic Single-Pass [1] algorithm and Storm [2] computation framework was designed in this paper. By employing this kind of method in the Topic Detection and Tracking (TDT) [3], the real-time performance of topic detection arises effectively. The proposed method splits the clustering process into two parts: one part is to form clusters for the multi-thread parallel clustering, the other part is to merge the generated clusters in the previous process and update the global clusters. Through the experimental results, the conclusion can be drawn that the proposed method have the nearly same clustering accuracy as the traditional Single-Pass algorithm and the clustering accuracy remains steady, computing rate increases linearly when increasing the number of cluster machines and nodes (processing threads).

  8. Recent advances in coupled-cluster methods

    CERN Document Server

    Bartlett, Rodney J

    1997-01-01

    Today, coupled-cluster (CC) theory has emerged as the most accurate, widely applicable approach for the correlation problem in molecules. Furthermore, the correct scaling of the energy and wavefunction with size (i.e. extensivity) recommends it for studies of polymers and crystals as well as molecules. CC methods have also paid dividends for nuclei, and for certain strongly correlated systems of interest in field theory.In order for CC methods to have achieved this distinction, it has been necessary to formulate new, theoretical approaches for the treatment of a variety of essential quantities

  9. Domain-Based Local Pair Natural Orbital Version of Mukherjee’s State-Specific Coupled Cluster Method

    Czech Academy of Sciences Publication Activity Database

    Brabec, Jiří; Lang, Jakub; Saitow, M.; Pittner, Jiří; Neese, F.; Demel, Ondřej

    2018-01-01

    Roč. 14, č. 3 (2018), s. 1370-1382 ISSN 1549-9618 R&D Projects: GA ČR GJ15-00058Y Institutional support: RVO:61388955 Keywords : MULTIREFERENCE PERTURBATION-THEORY * SINGLE-REFERENCE FORMALISM * ELECTRON CORRELATION METHODS Subject RIV: CF - Physical ; Theoretical Chemistry OBOR OECD: Physical chemistry Impact factor: 5.245, year: 2016

  10. Dynamic analysis of clustered building structures using substructures methods

    International Nuclear Information System (INIS)

    Leimbach, K.R.; Krutzik, N.J.

    1989-01-01

    The dynamic substructure approach to the building cluster on a common base mat starts with the generation of Ritz-vectors for each building on a rigid foundation. The base mat plus the foundation soil is subjected to kinematic constraint modes, for example constant, linear, quadratic or cubic constraints. These constraint modes are also imposed on the buildings. By enforcing kinematic compatibility of the complete structural system on the basis of the constraint modes a reduced Ritz model of the complete cluster is obtained. This reduced model can now be analyzed by modal time history or response spectrum methods

  11. PROSPECTS OF THE REGIONAL INTEGRATION POLICY BASED ON CLUSTER FORMATION

    Directory of Open Access Journals (Sweden)

    Elena Tsepilova

    2018-01-01

    Full Text Available The purpose of this article is to develop the theoretical foundations of regional integration policy and to determine its prospects on the basis of cluster formation. The authors use such research methods as systematization, comparative and complex analysis, synthesis, statistical method. Within the framework of the research, the concept of regional integration policy is specified, and its integration core – cluster – is allocated. The authors work out an algorithm of regional clustering, which will ensure the growth of economy and tax income. Measures have been proposed to optimize the organizational mechanism of interaction between the participants of the territorial cluster and the authorities that allow to ensure the effective functioning of clusters, including taxation clusters. Based on the results of studying the existing methods for assessing the effectiveness of cluster policy, the authors propose their own approach to evaluating the consequences of implementing the regional integration policy, according to which the list of quantitative and qualitative indicators is defined. The present article systematizes the experience and results of the cluster policy of certain European countries, that made it possible to determine the prospects and synergetic effect from the development of clusters as an integration foundation of regional policy in the Russian Federation. The authors carry out the analysis of activity of cluster formations using the example of the Rostov region – a leader in the formation of conditions for the cluster policy development in the Southern Federal District. 11 clusters and cluster initiatives are developing in this region. As a result, the authors propose measures for support of the already existing clusters and creation of the new ones.

  12. Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

    Directory of Open Access Journals (Sweden)

    Olszewski Kellen L

    2007-07-01

    Full Text Available Abstract Background The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes. Results We developed Nearest Neighbor Networks (NNN, a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. Conclusion The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the

  13. Fuzzy Clustering - Principles, Methods and Examples

    DEFF Research Database (Denmark)

    Kroszynski, Uri; Zhou, Jianjun

    1998-01-01

    of the methods. The examples were solved by hand and served as a test bench for exploration of the MATLAB capabilities included in the Fuzzy Control Toolbox. The fuzzy clustering methods described include Fuzzy c-means (FCM), Fuzzy c-lines (FCL) and Fuzzy c-elliptotypes (FCE).......One of the most remarkable advances in the field of identification and control of systems -in particular mechanical systems- whose behaviour can not be described by means of the usual mathematical models, has been achieved by the application of methods of fuzzy theory.In the framework of a study...... about identification of "black-box" properties by analysis of system input/output data sets, we have prepared an introductory note on the principles and the most popular data classification methods used in fuzzy modeling. This introductory note also includes some examples that illustrate the use...

  14. XML documents cluster research based on frequent subpatterns

    Science.gov (United States)

    Ding, Tienan; Li, Wei; Li, Xiongfei

    2015-12-01

    XML data is widely used in the information exchange field of Internet, and XML document data clustering is the hot research topic. In the XML document clustering process, measure differences between two XML documents is time costly, and impact the efficiency of XML document clustering. This paper proposed an XML documents clustering method based on frequent patterns of XML document dataset, first proposed a coding tree structure for encoding the XML document, and translate frequent pattern mining from XML documents into frequent pattern mining from string. Further, using the cosine similarity calculation method and cohesive hierarchical clustering method for XML document dataset by frequent patterns. Because of frequent patterns are subsets of the original XML document data, so the time consumption of XML document similarity measure is reduced. The experiment runs on synthetic dataset and the real datasets, the experimental result shows that our method is efficient.

  15. Cyanoform and Its Isomers. Relative Stabilities, Spectroscopic Features, and Rearrangements by Coupled Cluster and MCSCF-Based Methods.

    Science.gov (United States)

    Szczepaniak, Marek; Moc, Jerzy

    2017-02-16

    Although an isolation of elusive tricyanomethane HC(CN) 3 was recently reported, the existence of other HC 4 N 3 species has yet to be confirmed. In this work, the relative stabilities, spectroscopic features, and rearrangements of tricyanomethane and its four isomers are examined using single- (CCSD(T), CCSD(T)-F12) and multireference (MCSCF, MRPT2) methods. Tricyanomethane and dicyanoketenimine (NC) 2 CCNH, which are found to be the two most stable HC 4 N 3 isomers lying within 9 kcal/mol, can be discriminated by their spectroscopic parameters. The predicted stepwise interconversion path relating HC(CN) 3 and (NC) 2 CCNH features the HC 4 N 3 species comprising the C-C-N ring moiety, with the largest barrier being associated with the initial H migration to one of the CN carbons. Adding a water molecule reduces the H migration barrier strongly and makes it possible to interconvert tricyanomethane to dicyanoketenimine in a "concerted" way.

  16. Polarizable Density Embedding Coupled Cluster Method

    DEFF Research Database (Denmark)

    Hršak, Dalibor; Olsen, Jógvan Magnus Haugaard; Kongsted, Jacob

    2018-01-01

    by an embedding potential consisting of a set of fragment densities obtained from calculations on isolated fragments with a quantum-chemistry method such as Hartree-Fock (HF) or Kohn-Sham density functional theory (KS-DFT) and dressed with a set of atom-centered anisotropic dipole-dipole polarizabilities......). In the PDE-CC method, the smaller, but chemically important core region is described with a high-level CC method. The environment surrounding the core region can be separated into two levels of description: an inner and an outer region. The effect of the inner region on the core region is described......We present the theory and implementation of the polarizable density embedding (PDE) model in combination with coupled cluster (CC) theory (PDE-CC). This model has been implemented in the Dalton quantum chemistry program by adapting the CC code to the polarizable embedding library (PElib...

  17. Efficient Computation of Multiple Density-Based Clustering Hierarchies

    OpenAIRE

    Neto, Antonio Cavalcante Araujo; Sander, Joerg; Campello, Ricardo J. G. B.; Nascimento, Mario A.

    2017-01-01

    HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts in the sense that a small change in mpts typically leads to only a small or no change in the clustering structure, choosing a "good" mpts value can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters...

  18. Cluster Ensemble-Based Image Segmentation

    Directory of Open Access Journals (Sweden)

    Xiaoru Wang

    2013-07-01

    Full Text Available Image segmentation is the foundation of computer vision applications. In this paper, we propose a new cluster ensemble-based image segmentation algorithm, which overcomes several problems of traditional methods. We make two main contributions in this paper. First, we introduce the cluster ensemble concept to fuse the segmentation results from different types of visual features effectively, which can deliver a better final result and achieve a much more stable performance for broad categories of images. Second, we exploit the PageRank idea from Internet applications and apply it to the image segmentation task. This can improve the final segmentation results by combining the spatial information of the image and the semantic similarity of regions. Our experiments on four public image databases validate the superiority of our algorithm over conventional single type of feature or multiple types of features-based algorithms, since our algorithm can fuse multiple types of features effectively for better segmentation results. Moreover, our method is also proved to be very competitive in comparison with other state-of-the-art segmentation algorithms.

  19. Structure based alignment and clustering of proteins (STRALCP)

    Science.gov (United States)

    Zemla, Adam T.; Zhou, Carol E.; Smith, Jason R.; Lam, Marisa W.

    2013-06-18

    Disclosed are computational methods of clustering a set of protein structures based on local and pair-wise global similarity values. Pair-wise local and global similarity values are generated based on pair-wise structural alignments for each protein in the set of protein structures. Initially, the protein structures are clustered based on pair-wise local similarity values. The protein structures are then clustered based on pair-wise global similarity values. For each given cluster both a representative structure and spans of conserved residues are identified. The representative protein structure is used to assign newly-solved protein structures to a group. The spans are used to characterize conservation and assign a "structural footprint" to the cluster.

  20. Clustering-based classification of road traffic accidents using hierarchical clustering and artificial neural networks.

    Science.gov (United States)

    Taamneh, Madhar; Taamneh, Salah; Alkheder, Sharaf

    2017-09-01

    Artificial neural networks (ANNs) have been widely used in predicting the severity of road traffic crashes. All available information about previously occurred accidents is typically used for building a single prediction model (i.e., classifier). Too little attention has been paid to the differences between these accidents, leading, in most cases, to build less accurate predictors. Hierarchical clustering is a well-known clustering method that seeks to group data by creating a hierarchy of clusters. Using hierarchical clustering and ANNs, a clustering-based classification approach for predicting the injury severity of road traffic accidents was proposed. About 6000 road accidents occurred over a six-year period from 2008 to 2013 in Abu Dhabi were used throughout this study. In order to reduce the amount of variation in data, hierarchical clustering was applied on the data set to organize it into six different forms, each with different number of clusters (i.e., clusters from 1 to 6). Two ANN models were subsequently built for each cluster of accidents in each generated form. The first model was built and validated using all accidents (training set), whereas only 66% of the accidents were used to build the second model, and the remaining 34% were used to test it (percentage split). Finally, the weighted average accuracy was computed for each type of models in each from of data. The results show that when testing the models using the training set, clustering prior to classification achieves (11%-16%) more accuracy than without using clustering, while the percentage split achieves (2%-5%) more accuracy. The results also suggest that partitioning the accidents into six clusters achieves the best accuracy if both types of models are taken into account.

  1. MANNER OF STOCKS SORTING USING CLUSTER ANALYSIS METHODS

    Directory of Open Access Journals (Sweden)

    Jana Halčinová

    2014-06-01

    Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.

  2. Initialization of Possibilistic Clustering via Mountain Clustering Method

    Czech Academy of Sciences Publication Activity Database

    Coufal, David

    2000-01-01

    Roč. 10, č. 5 (2000), s. 797-809 ISSN 1210-0552. [SOFSEM 2000 Workshop on Soft Computing. Milovy, 27.11.2000-28.11.2000] R&D Projects: GA ČR GA201/00/1489 Institutional research plan: AV0Z1030915 Keywords : Mountain clustering * fuzzy c-means * possibilistic c-means Subject RIV: BA - General Mathematics

  3. Single pass kernel k-means clustering method

    Indian Academy of Sciences (India)

    This approach has reduced both time complexity and memory requirements. However, the clustering result of this method will be very much deviated form that obtained using the conventional kernel k-means method. This is because of the fact that pseudo cluster centers in the input space may not represent the exact cluster ...

  4. The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    Chen Yidong

    2004-01-01

    Full Text Available An unsupervised data clustering method, called the local maximum clustering (LMC method, is proposed for identifying clusters in experiment data sets based on research interest. A magnitude property is defined according to research purposes, and data sets are clustered around each local maximum of the magnitude property. By properly defining a magnitude property, this method can overcome many difficulties in microarray data clustering such as reduced projection in similarities, noises, and arbitrary gene distribution. To critically evaluate the performance of this clustering method in comparison with other methods, we designed three model data sets with known cluster distributions and applied the LMC method as well as the hierarchic clustering method, the -mean clustering method, and the self-organized map method to these model data sets. The results show that the LMC method produces the most accurate clustering results. As an example of application, we applied the method to cluster the leukemia samples reported in the microarray study of Golub et al. (1999.

  5. Adaptive density trajectory cluster based on time and space distance

    Science.gov (United States)

    Liu, Fagui; Zhang, Zhijie

    2017-10-01

    There are some hotspot problems remaining in trajectory cluster for discovering mobile behavior regularity, such as the computation of distance between sub trajectories, the setting of parameter values in cluster algorithm and the uncertainty/boundary problem of data set. As a result, based on the time and space, this paper tries to define the calculation method of distance between sub trajectories. The significance of distance calculation for sub trajectories is to clearly reveal the differences in moving trajectories and to promote the accuracy of cluster algorithm. Besides, a novel adaptive density trajectory cluster algorithm is proposed, in which cluster radius is computed through using the density of data distribution. In addition, cluster centers and number are selected by a certain strategy automatically, and uncertainty/boundary problem of data set is solved by designed weighted rough c-means. Experimental results demonstrate that the proposed algorithm can perform the fuzzy trajectory cluster effectively on the basis of the time and space distance, and obtain the optimal cluster centers and rich cluster results information adaptably for excavating the features of mobile behavior in mobile and sociology network.

  6. Risk assessment of water pollution sources based on an integrated k-means clustering and set pair analysis method in the region of Shiyan, China.

    Science.gov (United States)

    Li, Chunhui; Sun, Lian; Jia, Junxiang; Cai, Yanpeng; Wang, Xuan

    2016-07-01

    Source water areas are facing many potential water pollution risks. Risk assessment is an effective method to evaluate such risks. In this paper an integrated model based on k-means clustering analysis and set pair analysis was established aiming at evaluating the risks associated with water pollution in source water areas, in which the weights of indicators were determined through the entropy weight method. Then the proposed model was applied to assess water pollution risks in the region of Shiyan in which China's key source water area Danjiangkou Reservoir for the water source of the middle route of South-to-North Water Diversion Project is located. The results showed that eleven sources with relative high risk value were identified. At the regional scale, Shiyan City and Danjiangkou City would have a high risk value in term of the industrial discharge. Comparatively, Danjiangkou City and Yunxian County would have a high risk value in terms of agricultural pollution. Overall, the risk values of north regions close to the main stream and reservoir of the region of Shiyan were higher than that in the south. The results of risk level indicated that five sources were in lower risk level (i.e., level II), two in moderate risk level (i.e., level III), one in higher risk level (i.e., level IV) and three in highest risk level (i.e., level V). Also risks of industrial discharge are higher than that of the agricultural sector. It is thus essential to manage the pillar industry of the region of Shiyan and certain agricultural companies in the vicinity of the reservoir to reduce water pollution risks of source water areas. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. GA-Based Membrane Evolutionary Algorithm for Ensemble Clustering

    OpenAIRE

    Wang, Yanhua; Liu, Xiyu; Xiang, Laisheng

    2017-01-01

    Ensemble clustering can improve the generalization ability of a single clustering algorithm and generate a more robust clustering result by integrating multiple base clusterings, so it becomes the focus of current clustering research. Ensemble clustering aims at finding a consensus partition which agrees as much as possible with base clusterings. Genetic algorithm is a highly parallel, stochastic, and adaptive search algorithm developed from the natural selection and evolutionary mechanism of...

  8. Likelihood-based inference for clustered line transect data

    DEFF Research Database (Denmark)

    Waagepetersen, Rasmus Plenge; Schweder, Tore

    The uncertainty in estimation of spatial animal density from line transect surveys depends on the degree of spatial clustering in the animal population. To quantify the clustering we model line transect data as independent thinnings of spatial shot-noise Cox processes. Likelihood-based inference...... is implemented using Markov Chain Monte Carlo methods to obtain efficient estimates of spatial clustering parameters. Uncertainty is addressed using parametric bootstrap or by consideration of posterior distributions in a Bayesian setting. Maximum likelihood estimation and Bayesian inference is compared...

  9. Unbiased methods for removing systematics from galaxy clustering measurements

    Science.gov (United States)

    Elsner, Franz; Leistedt, Boris; Peiris, Hiranya V.

    2016-02-01

    Measuring the angular clustering of galaxies as a function of redshift is a powerful method for extracting information from the three-dimensional galaxy distribution. The precision of such measurements will dramatically increase with ongoing and future wide-field galaxy surveys. However, these are also increasingly sensitive to observational and astrophysical contaminants. Here, we study the statistical properties of three methods proposed for controlling such systematics - template subtraction, basic mode projection, and extended mode projection - all of which make use of externally supplied template maps, designed to characterize and capture the spatial variations of potential systematic effects. Based on a detailed mathematical analysis, and in agreement with simulations, we find that the template subtraction method in its original formulation returns biased estimates of the galaxy angular clustering. We derive closed-form expressions that should be used to correct results for this shortcoming. Turning to the basic mode projection algorithm, we prove it to be free of any bias, whereas we conclude that results computed with extended mode projection are biased. Within a simplified setup, we derive analytical expressions for the bias and discuss the options for correcting it in more realistic configurations. Common to all three methods is an increased estimator variance induced by the cleaning process, albeit at different levels. These results enable unbiased high-precision clustering measurements in the presence of spatially varying systematics, an essential step towards realizing the full potential of current and planned galaxy surveys.

  10. Ontology-based topic clustering for online discussion data

    Science.gov (United States)

    Wang, Yongheng; Cao, Kening; Zhang, Xiaoming

    2013-03-01

    With the rapid development of online communities, mining and extracting quality knowledge from online discussions becomes very important for the industrial and marketing sector, as well as for e-commerce applications and government. Most of the existing techniques model a discussion as a social network of users represented by a user-based graph without considering the content of the discussion. In this paper we propose a new multilayered mode to analysis online discussions. The user-based and message-based representation is combined in this model. A novel frequent concept sets based clustering method is used to cluster the original online discussion network into topic space. Domain ontology is used to improve the clustering accuracy. Parallel methods are also used to make the algorithms scalable to very large data sets. Our experimental study shows that the model and algorithms are effective when analyzing large scale online discussion data.

  11. Hierarchical clustering of RGB surface water images based on MIA ...

    African Journals Online (AJOL)

    Thus characterised images were partitioned into clusters of similar images using hierarchical clustering. The best defined clusters were obtained when the Ward's method was applied. Images were partitioned into the 2 main clusters in terms of similar colours of displayed objects. Each main cluster was further partitioned ...

  12. Cluster-based exposure variation analysis

    Science.gov (United States)

    2013-01-01

    Background Static posture, repetitive movements and lack of physical variation are known risk factors for work-related musculoskeletal disorders, and thus needs to be properly assessed in occupational studies. The aims of this study were (i) to investigate the effectiveness of a conventional exposure variation analysis (EVA) in discriminating exposure time lines and (ii) to compare it with a new cluster-based method for analysis of exposure variation. Methods For this purpose, we simulated a repeated cyclic exposure varying within each cycle between “low” and “high” exposure levels in a “near” or “far” range, and with “low” or “high” velocities (exposure change rates). The duration of each cycle was also manipulated by selecting a “small” or “large” standard deviation of the cycle time. Theses parameters reflected three dimensions of exposure variation, i.e. range, frequency and temporal similarity. Each simulation trace included two realizations of 100 concatenated cycles with either low (ρ = 0.1), medium (ρ = 0.5) or high (ρ = 0.9) correlation between the realizations. These traces were analyzed by conventional EVA, and a novel cluster-based EVA (C-EVA). Principal component analysis (PCA) was applied on the marginal distributions of 1) the EVA of each of the realizations (univariate approach), 2) a combination of the EVA of both realizations (multivariate approach) and 3) C-EVA. The least number of principal components describing more than 90% of variability in each case was selected and the projection of marginal distributions along the selected principal component was calculated. A linear classifier was then applied to these projections to discriminate between the simulated exposure patterns, and the accuracy of classified realizations was determined. Results C-EVA classified exposures more correctly than univariate and multivariate EVA approaches; classification accuracy was 49%, 47% and 52% for EVA (univariate

  13. Visual cluster analysis and pattern recognition template and methods

    Science.gov (United States)

    Osbourn, Gordon Cecil; Martinez, Rubel Francisco

    1999-01-01

    A method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.

  14. Visual cluster analysis and pattern recognition template and methods

    Energy Technology Data Exchange (ETDEWEB)

    Osbourn, G.C.; Martinez, R.F.

    1993-12-31

    This invention is comprised of a method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.

  15. Graph-based clustering and data visualization algorithms

    CERN Document Server

    Vathy-Fogarassy, Ágnes

    2013-01-01

    This work presents a data visualization technique that combines graph-based topology representation and dimensionality reduction methods to visualize the intrinsic data structure in a low-dimensional vector space. The application of graphs in clustering and visualization has several advantages. A graph of important edges (where edges characterize relations and weights represent similarities or distances) provides a compact representation of the entire complex data set. This text describes clustering and visualization methods that are able to utilize information hidden in these graphs, based on

  16. Improvement of economic potential estimation methods for enterprise with potential branch clusters use

    Directory of Open Access Journals (Sweden)

    V.Ya. Nusinov

    2017-08-01

    Full Text Available The research determines that the current existing methods of enterprise’s economic potential estimation are based on the use of additive, multiplicative and rating models. It is determined that the existing methods have a row of defects. For example, not all the methods take into account the branch features of the analysis, and also the level of development of the enterprise comparatively with other enterprises. It is suggested to level such defects by an account at the estimation of potential integral level not only by branch features of enterprises activity but also by the intra-account economic clusterization of such enterprises. Scientific works which are connected with the using of clusters for the estimation of economic potential are generalized. According to the results of generalization it is determined that it is possible to distinguish 9 scientific approaches in this direction: the use of natural clusterization of enterprises with the purpose of estimation and increase of region potential; the use of natural clusterization of enterprises with the purpose of estimation and increase of industry potential; use of artificial clusterization of enterprises with the purpose of estimation and increase of region potential; use of artificial clusterization of enterprises with the purpose of estimation and increase of industry potential; the use of artificial clusterization of enterprises with the purpose of clustering potential estimation; the use of artificial clusterization of enterprises with the purpose of estimation of clustering competitiveness potential; the use of natural (artificial clusterization for the estimation of clustering efficiency; the use of natural (artificial clusterization for the increase of level at region (industries development; the use of methods of economic potential of region (industries estimation or its constituents for the construction of the clusters. It is determined that the use of clusterization method in

  17. Homological methods, representation theory, and cluster algebras

    CERN Document Server

    Trepode, Sonia

    2018-01-01

    This text presents six mini-courses, all devoted to interactions between representation theory of algebras, homological algebra, and the new ever-expanding theory of cluster algebras. The interplay between the topics discussed in this text will continue to grow and this collection of courses stands as a partial testimony to this new development. The courses are useful for any mathematician who would like to learn more about this rapidly developing field; the primary aim is to engage graduate students and young researchers. Prerequisites include knowledge of some noncommutative algebra or homological algebra. Homological algebra has always been considered as one of the main tools in the study of finite-dimensional algebras. The strong relationship with cluster algebras is more recent and has quickly established itself as one of the important highlights of today’s mathematical landscape. This connection has been fruitful to both areas—representation theory provides a categorification of cluster algebras, wh...

  18. Advanced cluster methods for correlated-electron systems

    Energy Technology Data Exchange (ETDEWEB)

    Fischer, Andre

    2015-04-27

    In this thesis, quantum cluster methods are used to calculate electronic properties of correlated-electron systems. A special focus lies in the determination of the ground state properties of a 3/4 filled triangular lattice within the one-band Hubbard model. At this filling, the electronic density of states exhibits a so-called van Hove singularity and the Fermi surface becomes perfectly nested, causing an instability towards a variety of spin-density-wave (SDW) and superconducting states. While chiral d+id-wave superconductivity has been proposed as the ground state in the weak coupling limit, the situation towards strong interactions is unclear. Additionally, quantum cluster methods are used here to investigate the interplay of Coulomb interactions and symmetry-breaking mechanisms within the nematic phase of iron-pnictide superconductors. The transition from a tetragonal to an orthorhombic phase is accompanied by a significant change in electronic properties, while long-range magnetic order is not established yet. The driving force of this transition may not only be phonons but also magnetic or orbital fluctuations. The signatures of these scenarios are studied with quantum cluster methods to identify the most important effects. Here, cluster perturbation theory (CPT) and its variational extention, the variational cluster approach (VCA) are used to treat the respective systems on a level beyond mean-field theory. Short-range correlations are incorporated numerically exactly by exact diagonalization (ED). In the VCA, long-range interactions are included by variational optimization of a fictitious symmetry-breaking field based on a self-energy functional approach. Due to limitations of ED, cluster sizes are limited to a small number of degrees of freedom. For the 3/4 filled triangular lattice, the VCA is performed for different cluster symmetries. A strong symmetry dependence and finite-size effects make a comparison of the results from different clusters difficult

  19. Single pass kernel k-means clustering method

    Indian Academy of Sciences (India)

    In unsupervised classification, kernel -means clustering method has been shown to perform better than conventional -means clustering method in ... 518501, India; Department of Computer Science and Engineering, Jawaharlal Nehru Technological University, Anantapur College of Engineering, Anantapur 515002, India ...

  20. Cluster randomized trial of an active, multifaceted information dissemination intervention based on The WHO Reproductive health library to change obstetric practices: methods and design issues [ISRCTN14055385].

    Science.gov (United States)

    Gülmezoglu, A Metin; Villar, José; Grimshaw, Jeremy; Piaggio, Gilda; Lumbiganon, Pisake; Langer, Ana

    2004-01-15

    Effective strategies for implementing best practices in low and middle income countries are needed. RHL is an annually updated electronic publication containing Cochrane systematic reviews, commentaries and practical recommendations on how to implement evidence-based practices. We are conducting a trial to evaluate the improvement in obstetric practices using an active dissemination strategy to promote uptake of recommendations in The WHO Reproductive Health Library (RHL). A cluster randomized trial to improve obstetric practices in 40 hospitals in Mexico and Thailand is conducted. The trial uses a stratified random allocation based on country, size and type of hospitals. The core intervention consists of three interactive workshops delivered over a period of six months. The main outcome measures are changes in clinical practices that are recommended in RHL measured approximately a year after the first workshop. The design and implementation of a complex intervention using a cluster randomized trial design are presented. Designing the intervention, choosing outcome variables and implementing the protocol in two diverse settings has been a time-consuming and challenging process. We hope that sharing this experience will help others planning similar projects and improve our ability to implement change.

  1. Anomaly based Intrusion Detection using Modified Fuzzy Clustering

    Directory of Open Access Journals (Sweden)

    B.S. Harish

    2017-12-01

    Full Text Available This paper presents a network anomaly detection method based on fuzzy clustering. Computer security has become an increasingly vital field in computer science in response to the proliferation of private sensitive information. As a result, Intrusion Detection System has become an indispensable component of computer security. The proposed method consists of three steps: Pre-Processing, Feature Selection and Clustering. In pre-processing step, the duplicate samples are eliminated from the sample set. Next, principal component analysis is adopted to select the most discriminative features. In clustering step, the network samples are clustered using Robust Spatial Kernel Fuzzy C-Means (RSKFCM algorithm. RSKFCM is a variant of traditional Fuzzy C-Means which considers the neighbourhood membership information and uses kernel distance metric. To evaluate the proposed method, we conducted experiments on standard dataset and compared the results with state-of-the-art methods. We used cluster validity indices, accuracy and false positive rate as performance metrics. Experimental results inferred that, the proposed method achieves better results compared to other methods.

  2. Semantic based cluster content discovery in description first clustering algorithm

    International Nuclear Information System (INIS)

    Khan, M.W.; Asif, H.M.S.

    2017-01-01

    In the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms which firstly group the documents on the basis of similarity and then assign the meaningful labels to those groups. Description first clustering algorithm belong to the category in which the meaningful description is deduced first and then relevant documents are assigned to that description. LINGO (Label Induction Grouping Algorithm) is the algorithm of description first clustering category which is used for the automatic grouping of documents obtained from search results. It uses LSI (Latent Semantic Indexing); an IR (Information Retrieval) technique for induction of meaningful labels for clusters and VSM (Vector Space Model) for cluster content discovery. In this paper we present the LINGO while it is using LSI during cluster label induction and cluster content discovery phase. Finally, we compare results obtained from the said algorithm while it uses VSM and Latent semantic analysis during cluster content discovery phase. (author)

  3. Semantic Based Cluster Content Discovery in Description First Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    MUHAMMAD WASEEM KHAN

    2017-01-01

    Full Text Available In the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms which firstly group the documents on the basis of similarity and then assign the meaningful labels to those groups. Description first clustering algorithm belong to the category in which the meaningful description is deduced first and then relevant documents are assigned to that description. LINGO (Label Induction Grouping Algorithm is the algorithm of description first clustering category which is used for the automatic grouping of documents obtained from search results. It uses LSI (Latent Semantic Indexing; an IR (Information Retrieval technique for induction of meaningful labels for clusters and VSM (Vector Space Model for cluster content discovery. In this paper we present the LINGO while it is using LSI during cluster label induction and cluster content discovery phase. Finally, we compare results obtained from the said algorithm while it uses VSM and Latent semantic analysis during cluster content discovery phase.

  4. A Multidimensional and Multimembership Clustering Method for Social Networks and Its Application in Customer Relationship Management

    Directory of Open Access Journals (Sweden)

    Peixin Zhao

    2013-01-01

    Full Text Available Community detection in social networks plays an important role in cluster analysis. Many traditional techniques for one-dimensional problems have been proven inadequate for high-dimensional or mixed type datasets due to the data sparseness and attribute redundancy. In this paper we propose a graph-based clustering method for multidimensional datasets. This novel method has two distinguished features: nonbinary hierarchical tree and the multi-membership clusters. The nonbinary hierarchical tree clearly highlights meaningful clusters, while the multimembership feature may provide more useful service strategies. Experimental results on the customer relationship management confirm the effectiveness of the new method.

  5. Simulation-based marginal likelihood for cluster strong lensing cosmology

    Science.gov (United States)

    Killedar, M.; Borgani, S.; Fabjan, D.; Dolag, K.; Granato, G.; Meneghetti, M.; Planelles, S.; Ragone-Figueroa, C.

    2018-01-01

    Comparisons between observed and predicted strong lensing properties of galaxy clusters have been routinely used to claim either tension or consistency with Λ cold dark matter cosmology. However, standard approaches to such cosmological tests are unable to quantify the preference for one cosmology over another. We advocate approximating the relevant Bayes factor using a marginal likelihood that is based on the following summary statistic: the posterior probability distribution function for the parameters of the scaling relation between Einstein radii and cluster mass, α and β. We demonstrate, for the first time, a method of estimating the marginal likelihood using the X-ray selected z > 0.5 Massive Cluster Survey clusters as a case in point and employing both N-body and hydrodynamic simulations of clusters. We investigate the uncertainty in this estimate and consequential ability to compare competing cosmologies, which arises from incomplete descriptions of baryonic processes, discrepancies in cluster selection criteria, redshift distribution and dynamical state. The relation between triaxial cluster masses at various overdensities provides a promising alternative to the strong lensing test.

  6. Clustering based hybrid approach for facility location problem

    Directory of Open Access Journals (Sweden)

    Ashish Sharma

    2017-12-01

    Full Text Available The main objective of facility location problem is the utilization of the facility by maximum number of possible customers so that the profit is maximized. For instance, in some services like wireless sensor networks, Wi-Fi, repeaters, etc., where the service area is limited, some specific equipment is installed in such a way that it could be used by maximum number of users. Here, the number of users for a particular facility is optimized with the help of clustering technique. The study develops a model for facility allocation problem. For the solution algorithm, a hybrid approach which is based on clustering and mixed integer linear programming (MILP is proposed. The proposed method consists of two parts where in the first part, the K-means clustering technique is used and in the second part, for each cluster an MILP technique is implemented so that the facility which yields the maximum profit is obtained. Numerical examples for clustering and without clustering are presented. Analysis shows that due to clustering the average distance between facility and customer is significantly reduced.

  7. Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants

    Directory of Open Access Journals (Sweden)

    Odile Sauzet

    2017-07-01

    Full Text Available Abstract Background The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Methods Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. Results The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. Conclusions This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.

  8. Automated clustering-based workload characterization

    Science.gov (United States)

    Pentakalos, Odysseas I.; Menasce, Daniel A.; Yesha, Yelena

    1996-01-01

    The demands placed on the mass storage systems at various federal agencies and national laboratories are continuously increasing in intensity. This forces system managers to constantly monitor the system, evaluate the demand placed on it, and tune it appropriately using either heuristics based on experience or analytic models. Performance models require an accurate workload characterization. This can be a laborious and time consuming process. It became evident from our experience that a tool is necessary to automate the workload characterization process. This paper presents the design and discusses the implementation of a tool for workload characterization of mass storage systems. The main features of the tool discussed here are: (1)Automatic support for peak-period determination. Histograms of system activity are generated and presented to the user for peak-period determination; (2) Automatic clustering analysis. The data collected from the mass storage system logs is clustered using clustering algorithms and tightness measures to limit the number of generated clusters; (3) Reporting of varied file statistics. The tool computes several statistics on file sizes such as average, standard deviation, minimum, maximum, frequency, as well as average transfer time. These statistics are given on a per cluster basis; (4) Portability. The tool can easily be used to characterize the workload in mass storage systems of different vendors. The user needs to specify through a simple log description language how the a specific log should be interpreted. The rest of this paper is organized as follows. Section two presents basic concepts in workload characterization as they apply to mass storage systems. Section three describes clustering algorithms and tightness measures. The following section presents the architecture of the tool. Section five presents some results of workload characterization using the tool.Finally, section six presents some concluding remarks.

  9. A Map Spectrum-Based Spatiotemporal Clustering Method for GDP Variation Pattern Analysis Using Nighttime Light Images of the Wuhan Urban Agglomeration

    Directory of Open Access Journals (Sweden)

    Penglin Zhang

    2017-05-01

    Full Text Available Estimates of gross domestic product (GDP play a significant role in evaluating the economic performance of a country or region. Understanding the spatiotemporal process of GDP growth is important for estimating or monitoring the economic state of a region. Various GDP studies have been reported, and several studies have focused on spatiotemporal GDP variations. This study presents a map spectrum-based clustering approach to analyze the spatiotemporal variation patterns of GDP growth. First, a sequence of nighttime light images (from the Defense Meteorological Satellite Program-Operational Linescan System (DMSP-OLS is used to support the spatial distribution of statistical GDP data. Subsequently, the time spectrum of each spatial unit is generated using a time series of dasymetric GDP maps, and then the spatial units with similar time spectra are clustered into one class. Each category has a similar spatiotemporal GDP variation pattern. Finally, the proposed approach is applied to analyze the spatiotemporal patterns of GDP growth in the Wuhan urban agglomeration. The experimental results illustrated regional discrepancies of GDP growth existed in the study area.

  10. A new method to unveil embedded stellar clusters

    Science.gov (United States)

    Lombardi, Marco; Lada, Charles J.; Alves, João

    2017-11-01

    In this paper we present a novel method to identify and characterize stellar clusters deeply embedded in a dark molecular cloud. The method is based on measuring stellar surface density in wide-field infrared images using star counting techniques. It takes advantage of the differing H-band luminosity functions (HLFs) of field stars and young stellar populations and is able to statistically associate each star in an image as a member of either the background stellar population or a young stellar population projected on or near the cloud. Moreover, the technique corrects for the effects of differential extinction toward each individual star. We have tested this method against simulations as well as observations. In particular, we have applied the method to 2MASS point sources observed in the Orion A and B complexes, and the results obtained compare very well with those obtained from deep Spitzer and Chandra observations where presence of infrared excess or X-ray emission directly determines membership status for every star. Additionally, our method also identifies unobscured clusters and a low resolution version of the Orion stellar surface density map shows clearly the relatively unobscured and diffuse OB 1a and 1b sub-groups and provides useful insights on their spatial distribution.

  11. Core Business Selection Based on Ant Colony Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Yu Lan

    2014-01-01

    Full Text Available Core business is the most important business to the enterprise in diversified business. In this paper, we first introduce the definition and characteristics of the core business and then descript the ant colony clustering algorithm. In order to test the effectiveness of the proposed method, Tianjin Port Logistics Development Co., Ltd. is selected as the research object. Based on the current situation of the development of the company, the core business of the company can be acquired by ant colony clustering algorithm. Thus, the results indicate that the proposed method is an effective way to determine the core business for company.

  12. Application of clustering methods: Regularized Markov clustering (R-MCL) for analyzing dengue virus similarity

    Science.gov (United States)

    Lestari, D.; Raharjo, D.; Bustamam, A.; Abdillah, B.; Widhianto, W.

    2017-07-01

    Dengue virus consists of 10 different constituent proteins and are classified into 4 major serotypes (DEN 1 - DEN 4). This study was designed to perform clustering against 30 protein sequences of dengue virus taken from Virus Pathogen Database and Analysis Resource (VIPR) using Regularized Markov Clustering (R-MCL) algorithm and then we analyze the result. By using Python program 3.4, R-MCL algorithm produces 8 clusters with more than one centroid in several clusters. The number of centroid shows the density level of interaction. Protein interactions that are connected in a tissue, form a complex protein that serves as a specific biological process unit. The analysis of result shows the R-MCL clustering produces clusters of dengue virus family based on the similarity role of their constituent protein, regardless of serotypes.

  13. Clustering by Partitioning around Medoids using Distance-Based ...

    African Journals Online (AJOL)

    OLUWASOGO

    object belonging to exactly one cluster. Two common heuristics used to determine cluster membership in partitioning algorithms are a centroid-based technique –. K-means, where each cluster centre is represented by the mean value of the objects in the cluster, and a representative object-based technique – K-Medoids, ...

  14. Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants.

    Science.gov (United States)

    Sauzet, Odile; Peacock, Janet L

    2017-07-20

    The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins) and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.

  15. Prioritizing the risk of plant pests by clustering methods; self-organising maps, k-means and hierarchical clustering

    Directory of Open Access Journals (Sweden)

    Susan Worner

    2013-09-01

    Full Text Available For greater preparedness, pest risk assessors are required to prioritise long lists of pest species with potential to establish and cause significant impact in an endangered area. Such prioritization is often qualitative, subjective, and sometimes biased, relying mostly on expert and stakeholder consultation. In recent years, cluster based analyses have been used to investigate regional pest species assemblages or pest profiles to indicate the risk of new organism establishment. Such an approach is based on the premise that the co-occurrence of well-known global invasive pest species in a region is not random, and that the pest species profile or assemblage integrates complex functional relationships that are difficult to tease apart. In other words, the assemblage can help identify and prioritise species that pose a threat in a target region. A computational intelligence method called a Kohonen self-organizing map (SOM, a type of artificial neural network, was the first clustering method applied to analyse assemblages of invasive pests. The SOM is a well known dimension reduction and visualization method especially useful for high dimensional data that more conventional clustering methods may not analyse suitably. Like all clustering algorithms, the SOM can give details of clusters that identify regions with similar pest assemblages, possible donor and recipient regions. More important, however SOM connection weights that result from the analysis can be used to rank the strength of association of each species within each regional assemblage. Species with high weights that are not already established in the target region are identified as high risk. However, the SOM analysis is only the first step in a process to assess risk to be used alongside or incorporated within other measures. Here we illustrate the application of SOM analyses in a range of contexts in invasive species risk assessment, and discuss other clustering methods such as k

  16. The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

    Science.gov (United States)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-07-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  17. Nuclear Structure Calculations with Coupled Cluster Methods from Quantum Chemistry

    CERN Document Server

    Dean, D.J.; Hagen, G.; Hjorth-Jensen, M.; Kowalski, K.; Papenbrock, T.; Piecuch, P.; Wloch, M.

    2005-01-01

    We present several coupled-cluster calculations of ground and excited states of 4He and 16O employing methods from quantum chemistry. A comparison of coupled cluster results with the results of exact diagonalization of the hamiltonian in the same model space and other truncated shell-model calculations shows that the quantum chemistry inspired coupled cluster approximations provide an excellent description of ground and excited states of nuclei, with much less computational effort than traditional large-scale shell-model approaches. Unless truncations are made, for nuclei like 16O, full-fledged shell-model calculations with four or more major shells are not possible. However, these and even larger systems can be studied with the coupled cluster methods due to the polynomial rather than factorial scaling inherent in standard shell-model studies. This makes the coupled cluster approaches, developed in quantum chemistry, viable methods for describing weakly bound systems of interest for future nuclear facilities...

  18. Information Clustering Based on Fuzzy Multisets.

    Science.gov (United States)

    Miyamoto, Sadaaki

    2003-01-01

    Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…

  19. Multipole moments using extended coupled cluster method

    Science.gov (United States)

    Joshi, Sayali P.; Vaval, Nayana

    2013-05-01

    Using analytic extended coupled cluster (ECC) response approach quadrupole moments, dipole-quadrupole polarizabilities and dipole polarizabilities are studied. In the current implementation of the functional we have included all the double linked terms within (CCSD) approximation. These terms will be important for the accurate description of properties at the stretched geometries. We report the properties for carbon monoxide and hydrogen fluoride molecules, as a function of bond distance and compare our results for carbon monoxide with the full CI results. We have also reported the properties of methane, tetrafluoromethane, acetylene, difluoroacetylene, water and ammonia.

  20. A dynamic lattice searching method with rotation operation for optimization of large clusters

    International Nuclear Information System (INIS)

    Wu Xia; Cai Wensheng; Shao Xueguang

    2009-01-01

    Global optimization of large clusters has been a difficult task, though much effort has been paid and many efficient methods have been proposed. During our works, a rotation operation (RO) is designed to realize the structural transformation from decahedra to icosahedra for the optimization of large clusters, by rotating the atoms below the center atom with a definite degree around the fivefold axis. Based on the RO, a development of the previous dynamic lattice searching with constructed core (DLSc), named as DLSc-RO, is presented. With an investigation of the method for the optimization of Lennard-Jones (LJ) clusters, i.e., LJ 500 , LJ 561 , LJ 600 , LJ 665-667 , LJ 670 , LJ 685 , and LJ 923 , Morse clusters, silver clusters by Gupta potential, and aluminum clusters by NP-B potential, it was found that both the global minima with icosahedral and decahedral motifs can be obtained, and the method is proved to be efficient and universal.

  1. Towards semantically sensitive text clustering: a feature space modeling technology based on dimension extension.

    Science.gov (United States)

    Liu, Yuanchao; Liu, Ming; Wang, Xin

    2015-01-01

    The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach.

  2. Towards semantically sensitive text clustering: a feature space modeling technology based on dimension extension.

    Directory of Open Access Journals (Sweden)

    Yuanchao Liu

    Full Text Available The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach.

  3. Statistical methods in cancer research: Investigating localized clusters of disease

    International Nuclear Information System (INIS)

    Boyle, P.

    1992-01-01

    The search for 'clusters' of cancer cases has been a major feature of epidemiological research, and has come to the attention of the public following the publicity given to some childhood leukemia cases in villages surrounding the Sellafield reprocessing plant in England. Clustering is a poorly defined concept. Modern methodological developments have involved two distinct methods, quadrat counts and distance methods. Although statistical methods can indicate (with some error) whether clustering is present, they provide little information as to the real cause. Putative causes will always require further, more specific, studies before the results become meaningful. 5 refs

  4. Clustering Methods Application for Customer Segmentation to Manage Advertisement Campaign

    Directory of Open Access Journals (Sweden)

    Maciej Kutera

    2010-10-01

    Full Text Available Clustering methods are recently so advanced elaborated algorithms for large collection data analysis that they have been already included today to data mining methods. Clustering methods are nowadays larger and larger group of methods, very quickly evolving and having more and more various applications. In the article, our research concerning usefulness of clustering methods in customer segmentation to manage advertisement campaign is presented. We introduce results obtained by using four selected methods which have been chosen because their peculiarities suggested their applicability to our purposes. One of the analyzed method – k-means clustering with random selected initial cluster seeds gave very good results in customer segmentation to manage advertisement campaign and these results were presented in details in the article. In contrast one of the methods (hierarchical average linkage was found useless in customer segmentation. Further investigations concerning benefits of clustering methods in customer segmentation to manage advertisement campaign is worth continuing, particularly that finding solutions in this field can give measurable profits for marketing activity.

  5. Determining wood chip size: image analysis and clustering methods

    Directory of Open Access Journals (Sweden)

    Paolo Febbi

    2013-09-01

    Full Text Available One of the standard methods for the determination of the size distribution of wood chips is the oscillating screen method (EN 15149- 1:2010. Recent literature demonstrated how image analysis could return highly accurate measure of the dimensions defined for each individual particle, and could promote a new method depending on the geometrical shape to determine the chip size in a more accurate way. A sample of wood chips (8 litres was sieved through horizontally oscillating sieves, using five different screen hole diameters (3.15, 8, 16, 45, 63 mm; the wood chips were sorted in decreasing size classes and the mass of all fractions was used to determine the size distribution of the particles. Since the chip shape and size influence the sieving results, Wang’s theory, which concerns the geometric forms, was considered. A cluster analysis on the shape descriptors (Fourier descriptors and size descriptors (area, perimeter, Feret diameters, eccentricity was applied to observe the chips distribution. The UPGMA algorithm was applied on Euclidean distance. The obtained dendrogram shows a group separation according with the original three sieving fractions. A comparison has been made between the traditional sieve and clustering results. This preliminary result shows how the image analysis-based method has a high potential for the characterization of wood chip size distribution and could be further investigated. Moreover, this method could be implemented in an online detection machine for chips size characterization. An improvement of the results is expected by using supervised multivariate methods that utilize known class memberships. The main objective of the future activities will be to shift the analysis from a 2-dimensional method to a 3- dimensional acquisition process.

  6. Efficient nonparametric and asymptotic Bayesian model selection methods for attributed graph clustering

    KAUST Repository

    Xu, Zhiqiang

    2017-02-16

    Attributed graph clustering, also known as community detection on attributed graphs, attracts much interests recently due to the ubiquity of attributed graphs in real life. Many existing algorithms have been proposed for this problem, which are either distance based or model based. However, model selection in attributed graph clustering has not been well addressed, that is, most existing algorithms assume the cluster number to be known a priori. In this paper, we propose two efficient approaches for attributed graph clustering with automatic model selection. The first approach is a popular Bayesian nonparametric method, while the second approach is an asymptotic method based on a recently proposed model selection criterion, factorized information criterion. Experimental results on both synthetic and real datasets demonstrate that our approaches for attributed graph clustering with automatic model selection significantly outperform the state-of-the-art algorithm.

  7. MHCcluster, a method for functional clustering of MHC molecules

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Lundegaard, Claus; Buus, Søren

    2013-01-01

    The identification of peptides binding to major histocompatibility complexes (MHC) is a critical step in the understanding of T cell immune responses. The human MHC genomic region (HLA) is extremely polymorphic comprising several thousand alleles, many encoding a distinct molecule. The potentially...... unique specificities remain experimentally uncharacterized for the vast majority of HLA molecules. Likewise, for nonhuman species, only a minor fraction of the known MHC molecules have been characterized. Here, we describe a tool, MHCcluster, to functionally cluster MHC molecules based on their predicted...... binding specificity. The method has a flexible web interface that allows the user to include any MHC of interest in the analysis. The output consists of a static heat map and graphical tree-based visualizations of the functional relationship between MHC variants and a dynamic TreeViewer interface where...

  8. A mixed methods protocol for developing and testing implementation strategies for evidence-based obesity prevention in childcare: a cluster randomized hybrid type III trial.

    Science.gov (United States)

    Swindle, Taren; Johnson, Susan L; Whiteside-Mansell, Leanne; Curran, Geoffrey M

    2017-07-18

    Despite the potential to reach at-risk children in childcare, there is a significant gap between current practices and evidence-based obesity prevention in this setting. There are few investigations of the impact of implementation strategies on the uptake of evidence-based practices (EBPs) for obesity prevention and nutrition promotion. This study protocol describes a three-phase approach to developing and testing implementation strategies to support uptake of EBPs for obesity prevention practices in childcare (i.e., key components of the WISE intervention). Informed by the i-PARIHS framework, we will use a stakeholder-driven evidence-based quality improvement (EBQI) process to apply information gathered in qualitative interviews on barriers and facilitators to practice to inform the design of implementation strategies. Then, a Hybrid Type III cluster randomized trial will compare a basic implementation strategy (i.e., intervention as usual) with an enhanced implementation strategy informed by stakeholders. All Head Start centers (N = 12) within one agency in an urban area in a southern state in the USA will be randomized to receive the basic or enhanced implementation with approximately 20 classrooms per group (40 educators, 400 children per group). The educators involved in the study, the data collectors, and the biostastician will be blinded to the study condition. The basic and enhanced implementation strategies will be compared on outcomes specified by the RE-AIM model (e.g., Reach to families, Effectiveness of impact on child diet and health indicators, Adoption commitment of agency, Implementation fidelity and acceptability, and Maintenance after 6 months). Principles of formative evaluation will be used throughout the hybrid trial. This study will test a stakeholder-driven approach to improve implementation, fidelity, and maintenance of EBPs for obesity prevention in childcare. Further, this study provides an example of a systematic process to develop

  9. PARTIAL TRAINING METHOD FOR HEURISTIC ALGORITHM OF POSSIBLE CLUSTERIZATION UNDER UNKNOWN NUMBER OF CLASSES

    Directory of Open Access Journals (Sweden)

    D. A. Viattchenin

    2009-01-01

    Full Text Available A method for constructing a subset of labeled objects which is used in a heuristic algorithm of possible  clusterization with partial  training is proposed in the  paper.  The  method  is  based  on  data preprocessing by the heuristic algorithm of possible clusterization using a transitive closure of a fuzzy tolerance. Method efficiency is demonstrated by way of an illustrative example.

  10. Method for exploratory cluster analysis and visualisation of single-trial ERP ensembles.

    Science.gov (United States)

    Williams, N J; Nasuto, S J; Saddy, J D

    2015-07-30

    The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. We propose a complete pipeline for the cluster analysis of ERP data. To increase the signal-to-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA) to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). After validating the pipeline on simulated data, we tested it on data from two experiments - a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership. Our analysis operates on denoised single-trials, the number of clusters are determined in a principled manner and the results are presented through an intuitive visualisation. Given the cluster structure in some experimental conditions, we suggest application of cluster analysis as a preliminary step before ensemble averaging. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. Genetic algorithm based two-mode clustering of metabolomics data

    NARCIS (Netherlands)

    Hageman, J.A.; van den Berg, R.A.; Westerhuis, J.A.; van der Werf, M.J.; Smilde, A.K.

    2008-01-01

    Metabolomics and other omics tools are generally characterized by large data sets with many variables obtained under different environmental conditions. Clustering methods and more specifically two-mode clustering methods are excellent tools for analyzing this type of data. Two-mode clustering

  12. A new method to prepare colloids of size-controlled clusters from a matrix assembly cluster source

    Science.gov (United States)

    Cai, Rongsheng; Jian, Nan; Murphy, Shane; Bauer, Karl; Palmer, Richard E.

    2017-05-01

    A new method for the production of colloidal suspensions of physically deposited clusters is demonstrated. A cluster source has been used to deposit size-controlled clusters onto water-soluble polymer films, which are then dissolved to produce colloidal suspensions of clusters encapsulated with polymer molecules. This process has been demonstrated using different cluster materials (Au and Ag) and polymers (polyvinylpyrrolidone, polyvinyl alcohol, and polyethylene glycol). Scanning transmission electron microscopy of the clusters before and after colloidal dispersion confirms that the polymers act as stabilizing agents. We propose that this method is suitable for the production of biocompatible colloids of ultraprecise clusters.

  13. Coherence-based Time Series Clustering for Brain Connectivity Visualization

    KAUST Repository

    Euan, Carolina

    2017-11-19

    We develop the hierarchical cluster coherence (HCC) method for brain signals, a procedure for characterizing connectivity in a network by clustering nodes or groups of channels that display high level of coordination as measured by

  14. Single pass kernel k-means clustering method

    Indian Academy of Sciences (India)

    easily implemented and is suitable for large data sets, like those in data mining appli- cations. Experimental results show that, with a small loss of quality, the proposed method can significantly reduce the time taken than the conventional kernel k-means cluster- ing method. The proposed method is also compared with other ...

  15. Evidence based case selection: An innovative knowledge management method to cluster public technical and vocational education and training colleges in South Africa

    CSIR Research Space (South Africa)

    Visser, MM

    2017-03-01

    Full Text Available and vocational education and training (TVET) colleges in South Africa into groups with a similar level of maturity in terms of their information systems. Objectives: The aim of the study was to propose an evidence-based quantitative method for the selection...

  16. Change Detection by Fusing Advantages of Threshold and Clustering Methods

    Science.gov (United States)

    Tan, M.; Hao, M.

    2017-09-01

    In change detection (CD) of medium-resolution remote sensing images, the threshold and clustering methods are two kinds of the most popular ones. It is found that the threshold method of the expectation maximum (EM) algorithm usually generates a CD map including many false alarms but almost detecting all changes, and the fuzzy local information c-means algorithm (FLICM) obtains a homogeneous CD map but with some missed detections. Therefore, we aim to design a framework to improve CD results by fusing the advantages of threshold and clustering methods. Experimental results indicate the effectiveness of the proposed method.

  17. CHANGE DETECTION BY FUSING ADVANTAGES OF THRESHOLD AND CLUSTERING METHODS

    Directory of Open Access Journals (Sweden)

    M. Tan

    2017-09-01

    Full Text Available In change detection (CD of medium-resolution remote sensing images, the threshold and clustering methods are two kinds of the most popular ones. It is found that the threshold method of the expectation maximum (EM algorithm usually generates a CD map including many false alarms but almost detecting all changes, and the fuzzy local information c-means algorithm (FLICM obtains a homogeneous CD map but with some missed detections. Therefore, we aim to design a framework to improve CD results by fusing the advantages of threshold and clustering methods. Experimental results indicate the effectiveness of the proposed method.

  18. Sensitivity evaluation of dynamic speckle activity measurements using clustering methods

    International Nuclear Information System (INIS)

    Etchepareborda, Pablo; Federico, Alejandro; Kaufmann, Guillermo H.

    2010-01-01

    We evaluate and compare the use of competitive neural networks, self-organizing maps, the expectation-maximization algorithm, K-means, and fuzzy C-means techniques as partitional clustering methods, when the sensitivity of the activity measurement of dynamic speckle images needs to be improved. The temporal history of the acquired intensity generated by each pixel is analyzed in a wavelet decomposition framework, and it is shown that the mean energy of its corresponding wavelet coefficients provides a suited feature space for clustering purposes. The sensitivity obtained by using the evaluated clustering techniques is also compared with the well-known methods of Konishi-Fujii, weighted generalized differences, and wavelet entropy. The performance of the partitional clustering approach is evaluated using simulated dynamic speckle patterns and also experimental data.

  19. Consensus of satellite cluster flight using an energy-matching optimal control method

    Science.gov (United States)

    Luo, Jianjun; Zhou, Liang; Zhang, Bo

    2017-11-01

    This paper presents an optimal control method for consensus of satellite cluster flight under a kind of energy matching condition. Firstly, the relation between energy matching and satellite periodically bounded relative motion is analyzed, and the satellite energy matching principle is applied to configure the initial conditions. Then, period-delayed errors are adopted as state variables to establish the period-delayed errors dynamics models of a single satellite and the cluster. Next a novel satellite cluster feedback control protocol with coupling gain is designed, so that the satellite cluster periodically bounded relative motion consensus problem (period-delayed errors state consensus problem) is transformed to the stability of a set of matrices with the same low dimension. Based on the consensus region theory in the research of multi-agent system consensus issues, the coupling gain can be obtained to satisfy the requirement of consensus region and decouple the satellite cluster information topology and the feedback control gain matrix, which can be determined by Linear quadratic regulator (LQR) optimal method. This method can realize the consensus of satellite cluster period-delayed errors, leading to the consistency of semi-major axes (SMA) and the energy-matching of satellite cluster. Then satellites can emerge the global coordinative cluster behavior. Finally the feasibility and effectiveness of the present energy-matching optimal consensus for satellite cluster flight is verified through numerical simulations.

  20. Microarray gene cluster identification and annotation through cluster ensemble and EM-based informative textual summarization.

    Science.gov (United States)

    Hu, Xiaohua; Park, E K; Zhang, Xiaodan

    2009-09-01

    Generating high-quality gene clusters and identifying the underlying biological mechanism of the gene clusters are the important goals of clustering gene expression analysis. To get high-quality cluster results, most of the current approaches rely on choosing the best cluster algorithm, in which the design biases and assumptions meet the underlying distribution of the dataset. There are two issues for this approach: 1) usually, the underlying data distribution of the gene expression datasets is unknown and 2) there are so many clustering algorithms available and it is very challenging to choose the proper one. To provide a textual summary of the gene clusters, the most explored approach is the extractive approach that essentially builds upon techniques borrowed from the information retrieval, in which the objective is to provide terms to be used for query expansion, and not to act as a stand-alone summary for the entire document sets. Another drawback is that the clustering quality and cluster interpretation are treated as two isolated research problems and are studied separately. In this paper, we design and develop a unified system Gene Expression Miner to address these challenging issues in a principled and general manner by integrating cluster ensemble, text clustering, and multidocument summarization and provide an environment for comprehensive gene expression data analysis. We present a novel cluster ensemble approach to generate high-quality gene cluster. In our text summarization module, given a gene cluster, our expectation-maximization based algorithm can automatically identify subtopics and extract most probable terms for each topic. Then, the extracted top k topical terms from each subtopic are combined to form the biological explanation of each gene cluster. Experimental results demonstrate that our system can obtain high-quality clusters and provide informative key terms for the gene clusters.

  1. Method and apparatus for the production of cluster ions

    Science.gov (United States)

    Friedman, Lewis; Beuhler, Robert J.

    1988-01-01

    A method and apparatus for the production of cluster ions, and preferably isotopic hydrogen cluster ions is disclosed. A gas, preferably comprising a carrier gas and a substrate gas, is cooled to about its boiling point and expanded through a supersonic nozzle into a region maintained at a low pressure. Means are provided for the generation of a plasma in the gas before or just as it enters the nozzle.

  2. Semi-supervised dimensionality reduction using orthogonal projection divergence-based clustering for hyperspectral imagery

    Science.gov (United States)

    Su, Hongjun; Du, Peijun; Du, Qian

    2012-11-01

    Band clustering and selection are applied to dimensionality reduction of hyperspectral imagery. The proposed method is based on a hierarchical clustering structure, which aims to group bands using an information or similarity measure. Specifically, the distance based on orthogonal projection divergence is used as a criterion for clustering. After clustering, a band selection step is applied to select representative band to be used in the following data analysis. Different from unsupervised clustering using all the pixels or supervised clustering requiring labeled pixels, the proposed semi-supervised band clustering and selection needs class spectral signatures only. The experimental results show that the proposed algorithm can significantly outperform other existing methods with regard to pixel-based classification task.

  3. Centroid based clustering of high throughput sequencing reads based on n-mer counts.

    Science.gov (United States)

    Solovyov, Alexander; Lipkin, W Ian

    2013-09-08

    Many problems in computational biology require alignment-free sequence comparisons. One of the common tasks involving sequence comparison is sequence clustering. Here we apply methods of alignment-free comparison (in particular, comparison using sequence composition) to the challenge of sequence clustering. We study several centroid based algorithms for clustering sequences based on word counts. Study of their performance shows that using k-means algorithm with or without the data whitening is efficient from the computational point of view. A higher clustering accuracy can be achieved using the soft expectation maximization method, whereby each sequence is attributed to each cluster with a specific probability. We implement an open source tool for alignment-free clustering. It is publicly available from github: https://github.com/luscinius/afcluster. We show the utility of alignment-free sequence clustering for high throughput sequencing analysis despite its limitations. In particular, it allows one to perform assembly with reduced resources and a minimal loss of quality. The major factor affecting performance of alignment-free read clustering is the length of the read.

  4. Model-based clustering with certainty estimation: implication for clade assignment of influenza viruses.

    Science.gov (United States)

    Zhang, Shunpu; Li, Zhong; Beland, Kevin; Lu, Guoqing

    2016-07-21

    Clustering is a common technique used by molecular biologists to group homologous sequences and study evolution. There remain issues such as how to cluster molecular sequences accurately and in particular how to evaluate the certainty of clustering results. We presented a model-based clustering method to analyze molecular sequences, described a subset bootstrap scheme to evaluate a certainty of the clusters, and showed an intuitive way using 3D visualization to examine clusters. We applied the above approach to analyze influenza viral hemagglutinin (HA) sequences. Nine clusters were estimated for high pathogenic H5N1 avian influenza, which agree with previous findings. The certainty for a given sequence that can be correctly assigned to a cluster was all 1.0 whereas the certainty for a given cluster was also very high (0.92-1.0), with an overall clustering certainty of 0.95. For influenza A H7 viruses, ten HA clusters were estimated and the vast majority of sequences could be assigned to a cluster with a certainty of more than 0.99. The certainties for clusters, however, varied from 0.40 to 0.98; such certainty variation is likely attributed to the heterogeneity of sequence data in different clusters. In both cases, the certainty values estimated using the subset bootstrap method are all higher than those calculated based upon the standard bootstrap method, suggesting our bootstrap scheme is applicable for the estimation of clustering certainty. We formulated a clustering analysis approach with the estimation of certainties and 3D visualization of sequence data. We analysed 2 sets of influenza A HA sequences and the results indicate our approach was applicable for clustering analysis of influenza viral sequences.

  5. Motion estimation using point cluster method and Kalman filter.

    Science.gov (United States)

    Senesh, M; Wolf, A

    2009-05-01

    The most frequently used method in a three dimensional human gait analysis involves placing markers on the skin of the analyzed segment. This introduces a significant artifact, which strongly influences the bone position and orientation and joint kinematic estimates. In this study, we tested and evaluated the effect of adding a Kalman filter procedure to the previously reported point cluster technique (PCT) in the estimation of a rigid body motion. We demonstrated the procedures by motion analysis of a compound planar pendulum from indirect opto-electronic measurements of markers attached to an elastic appendage that is restrained to slide along the rigid body long axis. The elastic frequency is close to the pendulum frequency, as in the biomechanical problem, where the soft tissue frequency content is similar to the actual movement of the bones. Comparison of the real pendulum angle to that obtained by several estimation procedures--PCT, Kalman filter followed by PCT, and low pass filter followed by PCT--enables evaluation of the accuracy of the procedures. When comparing the maximal amplitude, no effect was noted by adding the Kalman filter; however, a closer look at the signal revealed that the estimated angle based only on the PCT method was very noisy with fluctuation, while the estimated angle based on the Kalman filter followed by the PCT was a smooth signal. It was also noted that the instantaneous frequencies obtained from the estimated angle based on the PCT method is more dispersed than those obtained from the estimated angle based on Kalman filter followed by the PCT method. Addition of a Kalman filter to the PCT method in the estimation procedure of rigid body motion results in a smoother signal that better represents the real motion, with less signal distortion than when using a digital low pass filter. Furthermore, it can be concluded that adding a Kalman filter to the PCT procedure substantially reduces the dispersion of the maximal and minimal

  6. Method for detecting clusters of possible uranium deposits

    International Nuclear Information System (INIS)

    Conover, W.J.; Bement, T.R.; Iman, R.L.

    1978-01-01

    When a two-dimensional map contains points that appear to be scattered somewhat at random, a question that often arises is whether groups of points that appear to cluster are merely exhibiting ordinary behavior, which one can expect with any random distribution of points, or whether the clusters are too pronounced to be attributable to chance alone. A method for detecting clusters along a straight line is applied to the two-dimensional map of 214 Bi anomalies observed as part of the National Uranium Resource Evaluation Program in the Lubbock, Texas, region. Some exact probabilities associated with this method are computed and compared with two approximate methods. The two methods for approximating probabilities work well in the cases examined and can be used when it is not feasible to obtain the exact probabilities

  7. A NEW METHOD TO QUANTIFY X-RAY SUBSTRUCTURES IN CLUSTERS OF GALAXIES

    Energy Technology Data Exchange (ETDEWEB)

    Andrade-Santos, Felipe; Lima Neto, Gastao B.; Lagana, Tatiana F. [Departamento de Astronomia, Instituto de Astronomia, Geofisica e Ciencias Atmosfericas, Universidade de Sao Paulo, Geofisica e Ciencias Atmosfericas, Rua do Matao 1226, Cidade Universitaria, 05508-090 Sao Paulo, SP (Brazil)

    2012-02-20

    We present a new method to quantify substructures in clusters of galaxies, based on the analysis of the intensity of structures. This analysis is done in a residual image that is the result of the subtraction of a surface brightness model, obtained by fitting a two-dimensional analytical model ({beta}-model or Sersic profile) with elliptical symmetry, from the X-ray image. Our method is applied to 34 clusters observed by the Chandra Space Telescope that are in the redshift range z in [0.02, 0.2] and have a signal-to-noise ratio (S/N) greater than 100. We present the calibration of the method and the relations between the substructure level with physical quantities, such as the mass, X-ray luminosity, temperature, and cluster redshift. We use our method to separate the clusters in two sub-samples of high- and low-substructure levels. We conclude, using Monte Carlo simulations, that the method recuperates very well the true amount of substructure for small angular core radii clusters (with respect to the whole image size) and good S/N observations. We find no evidence of correlation between the substructure level and physical properties of the clusters such as gas temperature, X-ray luminosity, and redshift; however, analysis suggest a trend between the substructure level and cluster mass. The scaling relations for the two sub-samples (high- and low-substructure level clusters) are different (they present an offset, i.e., given a fixed mass or temperature, low-substructure clusters tend to be more X-ray luminous), which is an important result for cosmological tests using the mass-luminosity relation to obtain the cluster mass function, since they rely on the assumption that clusters do not present different scaling relations according to their dynamical state.

  8. A New Method to Quantify X-Ray Substructures in Clusters of Galaxies

    Science.gov (United States)

    Andrade-Santos, Felipe; Lima Neto, Gastão B.; Laganá, Tatiana F.

    2012-02-01

    We present a new method to quantify substructures in clusters of galaxies, based on the analysis of the intensity of structures. This analysis is done in a residual image that is the result of the subtraction of a surface brightness model, obtained by fitting a two-dimensional analytical model (β-model or Sérsic profile) with elliptical symmetry, from the X-ray image. Our method is applied to 34 clusters observed by the Chandra Space Telescope that are in the redshift range z in [0.02, 0.2] and have a signal-to-noise ratio (S/N) greater than 100. We present the calibration of the method and the relations between the substructure level with physical quantities, such as the mass, X-ray luminosity, temperature, and cluster redshift. We use our method to separate the clusters in two sub-samples of high- and low-substructure levels. We conclude, using Monte Carlo simulations, that the method recuperates very well the true amount of substructure for small angular core radii clusters (with respect to the whole image size) and good S/N observations. We find no evidence of correlation between the substructure level and physical properties of the clusters such as gas temperature, X-ray luminosity, and redshift; however, analysis suggest a trend between the substructure level and cluster mass. The scaling relations for the two sub-samples (high- and low-substructure level clusters) are different (they present an offset, i.e., given a fixed mass or temperature, low-substructure clusters tend to be more X-ray luminous), which is an important result for cosmological tests using the mass-luminosity relation to obtain the cluster mass function, since they rely on the assumption that clusters do not present different scaling relations according to their dynamical state.

  9. Clustering-based analysis for residential district heating data

    DEFF Research Database (Denmark)

    Gianniou, Panagiota; Liu, Xiufeng; Heller, Alfred

    2018-01-01

    residential heating consumption data and evaluate information included in national building databases. The proposed method uses the K-means algorithm to segment consumption groups based on consumption intensity and representative patterns and ranks the groups according to daily consumption. This paper also......The wide use of smart meters enables collection of a large amount of fine-granular time series, which can be used to improve the understanding of consumption behavior and used for consumption optimization. This paper presents a clustering-based knowledge discovery in databases method to analyze...

  10. Analysis of historical and modern hard red spring wheat cultivars based on parentage and HPLC of gluten proteins using Ward's clustering method

    Science.gov (United States)

    There have been substantial breeding efforts in North Dakota to produce wheat cultivars that are well adapted to weather conditions and are disease resistant. In this study, 30 hard red spring (HRS) wheat cultivars released between 1910 and 2013 were analyzed with regard to how they cluster in terms...

  11. Based on Similarity Metric Learning for Semi-Supervised Clustering

    Directory of Open Access Journals (Sweden)

    Wei QIU

    2014-08-01

    Full Text Available Semi-supervised clustering employs a small amount of labeled data to aid unsupervised learning. The focus of this paper is on Metric Learning, with particular interest in incorporating side information to make it semi-supervised. This study is primarily motivated by an application: face-image clustering. In the paper introduces metric learning and semi-supervised clustering, Similarity metric learning method that adapt the underlying similarity metric used by the clustering algorithm. This paper provides new methods for the two approaches as well as presents a new semi-supervised clustering algorithm that integrates both of these techniques in a uniform, principled framework. Experimental results demonstrate that the unified approach produces better clusters than both individual approaches as well as previously proposed semi-supervised clustering algorithms. This paper followed by the discussion of experiments on face-image clustering, as well as future work.

  12. Clustering self-organizing maps (SOM) method for human papillomavirus (HPV) DNA as the main cause of cervical cancer disease

    Science.gov (United States)

    Bustamam, A.; Aldila, D.; Fatimah, Arimbi, M. D.

    2017-07-01

    One of the most widely used clustering method, since it has advantage on its robustness, is Self-Organizing Maps (SOM) method. This paper discusses the application of SOM method on Human Papillomavirus (HPV) DNA which is the main cause of cervical cancer disease, the most dangerous cancer in developing countries. We use 18 types of HPV DNA-based on the newest complete genome. By using open-source-based program R, clustering process can separate 18 types of HPV into two different clusters. There are two types of HPV in the first cluster while 16 others in the second cluster. The analyzing result of 18 types HPV based on the malignancy of the virus (the difficultness to cure). Two of HPV types the first cluster can be classified as tame HPV, while 16 others in the second cluster are classified as vicious HPV.

  13. Fast optimization of binary clusters using a novel dynamic lattice searching method

    International Nuclear Information System (INIS)

    Wu, Xia; Cheng, Wen

    2014-01-01

    Global optimization of binary clusters has been a difficult task despite of much effort and many efficient methods. Directing toward two types of elements (i.e., homotop problem) in binary clusters, two classes of virtual dynamic lattices are constructed and a modified dynamic lattice searching (DLS) method, i.e., binary DLS (BDLS) method, is developed. However, it was found that the BDLS can only be utilized for the optimization of binary clusters with small sizes because homotop problem is hard to be solved without atomic exchange operation. Therefore, the iterated local search (ILS) method is adopted to solve homotop problem and an efficient method based on the BDLS method and ILS, named as BDLS-ILS, is presented for global optimization of binary clusters. In order to assess the efficiency of the proposed method, binary Lennard-Jones clusters with up to 100 atoms are investigated. Results show that the method is proved to be efficient. Furthermore, the BDLS-ILS method is also adopted to study the geometrical structures of (AuPd) 79 clusters with DFT-fit parameters of Gupta potential

  14. Cluster Analytical Method of Fault Risk Analysis in Systems

    Science.gov (United States)

    Michaľčonok, German; Horalová Kalinová, Michaela

    2016-12-01

    In providing safety functions, the proposal of safety functions of control systems is an important part of a risk reduction strategy. In the specification of security requirements, it is necessary to determine and document individual characteristics and the desired performance level for each safety. This article presents the results of the experiment cluster analysis. The results of the experiment prove that the methods of cluster analysis provide a suitable tool for analyzing the reliability of safety systems analysis. Regarding the increasing complexity of the systems, we can state that the application of these methods in the subject area is a good choice.

  15. Concept based clustering for descriptive document classification

    Directory of Open Access Journals (Sweden)

    S Senthamarai Kannan

    2007-09-01

    Full Text Available We present an approach for improving the relevance of search results by clustering the search results obtained for a query string with the help of a Concept Clustering Algorithm. The Concept Clustering Algorithm combines common phrase discovery and latent semantic indexing techniques to separate search results into meaningful groups. It looks for meaningful phrases to use as cluster labels and then assigns documents to the labels to form groups. The labels assigned to each document cluster provide meaningful information on the various documents available under that cluster. This provides a more interactive and easier way to probe through search results and identifying the relevant documents for the users using the search engine.

  16. An improved fuzzy c-means clustering algorithm based on shadowed sets and PSO.

    Science.gov (United States)

    Zhang, Jian; Shen, Ling

    2014-01-01

    To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM) based on particle swarm optimization (PSO) and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect.

  17. An Improved Fuzzy c-Means Clustering Algorithm Based on Shadowed Sets and PSO

    Directory of Open Access Journals (Sweden)

    Jian Zhang

    2014-01-01

    Full Text Available To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM based on particle swarm optimization (PSO and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect.

  18. Novel density-based and hierarchical density-based clustering algorithms for uncertain data.

    Science.gov (United States)

    Zhang, Xianchao; Liu, Han; Zhang, Xiaotong

    2017-09-01

    Uncertain data has posed a great challenge to traditional clustering algorithms. Recently, several algorithms have been proposed for clustering uncertain data, and among them density-based techniques seem promising for handling data uncertainty. However, some issues like losing uncertain information, high time complexity and nonadaptive threshold have not been addressed well in the previous density-based algorithm FDBSCAN and hierarchical density-based algorithm FOPTICS. In this paper, we firstly propose a novel density-based algorithm PDBSCAN, which improves the previous FDBSCAN from the following aspects: (1) it employs a more accurate method to compute the probability that the distance between two uncertain objects is less than or equal to a boundary value, instead of the sampling-based method in FDBSCAN; (2) it introduces new definitions of probability neighborhood, support degree, core object probability, direct reachability probability, thus reducing the complexity and solving the issue of nonadaptive threshold (for core object judgement) in FDBSCAN. Then, we modify the algorithm PDBSCAN to an improved version (PDBSCANi), by using a better cluster assignment strategy to ensure that every object will be assigned to the most appropriate cluster, thus solving the issue of nonadaptive threshold (for direct density reachability judgement) in FDBSCAN. Furthermore, as PDBSCAN and PDBSCANi have difficulties for clustering uncertain data with non-uniform cluster density, we propose a novel hierarchical density-based algorithm POPTICS by extending the definitions of PDBSCAN, adding new definitions of fuzzy core distance and fuzzy reachability distance, and employing a new clustering framework. POPTICS can reveal the cluster structures of the datasets with different local densities in different regions better than PDBSCAN and PDBSCANi, and it addresses the issues in FOPTICS. Experimental results demonstrate the superiority of our proposed algorithms over the existing

  19. Communication Base Station Log Analysis Based on Hierarchical Clustering

    Directory of Open Access Journals (Sweden)

    Zhang Shao-Hua

    2017-01-01

    Full Text Available Communication base stations generate massive data every day, these base station logs play an important value in mining of the business circles. This paper use data mining technology and hierarchical clustering algorithm to group the scope of business circle for the base station by recording the data of these base stations.Through analyzing the data of different business circle based on feature extraction and comparing different business circle category characteristics, which can choose a suitable area for operators of commercial marketing.

  20. A cluster approximation for the transfer-matrix method

    International Nuclear Information System (INIS)

    Surda, A.

    1990-08-01

    A cluster approximation for the transfer-method is formulated. The calculation of the partition function of lattice models is transformed to a nonlinear mapping problem. The method yields the free energy, correlation functions and the phase diagrams for a large class of lattice models. The high accuracy of the method is exemplified by the calculation of the critical temperature of the Ising model. (author). 14 refs, 2 figs, 1 tab

  1. An improved initialization center k-means clustering algorithm based on distance and density

    Science.gov (United States)

    Duan, Yanling; Liu, Qun; Xia, Shuyin

    2018-04-01

    Aiming at the problem of the random initial clustering center of k means algorithm that the clustering results are influenced by outlier data sample and are unstable in multiple clustering, a method of central point initialization method based on larger distance and higher density is proposed. The reciprocal of the weighted average of distance is used to represent the sample density, and the data sample with the larger distance and the higher density are selected as the initial clustering centers to optimize the clustering results. Then, a clustering evaluation method based on distance and density is designed to verify the feasibility of the algorithm and the practicality, the experimental results on UCI data sets show that the algorithm has a certain stability and practicality.

  2. A two-stage method for microcalcification cluster segmentation in mammography by deformable models

    International Nuclear Information System (INIS)

    Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.; Karahaliou, A.; Costaridou, L.; Vassiou, K.

    2015-01-01

    Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods are applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST cluster , average of minimum distance—AMINDIST cluster ) and the area overlap measure (AOM cluster ). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing tenfold cross

  3. A two-stage method for microcalcification cluster segmentation in mammography by deformable models

    Energy Technology Data Exchange (ETDEWEB)

    Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.; Karahaliou, A.; Costaridou, L., E-mail: costarid@upatras.gr [Department of Medical Physics, School of Medicine, University of Patras, Patras 26504 (Greece); Vassiou, K. [Department of Anatomy, School of Medicine, University of Thessaly, Larissa 41500 (Greece)

    2015-10-15

    Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods are applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST{sub cluster}, average of minimum distance—AMINDIST{sub cluster}) and the area overlap measure (AOM{sub cluster}). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing

  4. Personalized PageRank Clustering: A graph clustering algorithm based on random walks

    Science.gov (United States)

    A. Tabrizi, Shayan; Shakery, Azadeh; Asadpour, Masoud; Abbasi, Maziar; Tavallaie, Mohammad Ali

    2013-11-01

    Graph clustering has been an essential part in many methods and thus its accuracy has a significant effect on many applications. In addition, exponential growth of real-world graphs such as social networks, biological networks and electrical circuits demands clustering algorithms with nearly-linear time and space complexity. In this paper we propose Personalized PageRank Clustering (PPC) that employs the inherent cluster exploratory property of random walks to reveal the clusters of a given graph. We combine random walks and modularity to precisely and efficiently reveal the clusters of a graph. PPC is a top-down algorithm so it can reveal inherent clusters of a graph more accurately than other nearly-linear approaches that are mainly bottom-up. It also gives a hierarchy of clusters that is useful in many applications. PPC has a linear time and space complexity and has been superior to most of the available clustering algorithms on many datasets. Furthermore, its top-down approach makes it a flexible solution for clustering problems with different requirements.

  5. BioCluster: Tool for Identification and Clustering of Enterobacteriaceae Based on Biochemical Data

    Directory of Open Access Journals (Sweden)

    Ahmed Abdullah

    2015-06-01

    Full Text Available Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochemical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is labor-intensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and similarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface (GUI tool named BioCluster. This tool was designed for automated clustering and identification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering (HC and the Improved Hierarchical Clustering (IHC, a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1–47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clustering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that BioCluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at http://microbialgen.du.ac.bd/biocluster/.

  6. BioCluster: tool for identification and clustering of Enterobacteriaceae based on biochemical data.

    Science.gov (United States)

    Abdullah, Ahmed; Sabbir Alam, S M; Sultana, Munawar; Hossain, M Anwar

    2015-06-01

    Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochemical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is labor-intensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and similarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface (GUI) tool named BioCluster. This tool was designed for automated clustering and identification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering (HC) and the Improved Hierarchical Clustering (IHC), a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1-47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clustering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that BioCluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at http://microbialgen.du.ac.bd/biocluster/. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  7. Green Clustering Implementation Based on DPS-MOPSO

    Directory of Open Access Journals (Sweden)

    Yang Lu

    2014-01-01

    Full Text Available A green clustering implementation is proposed to be as the first method in the framework of an energy-efficient strategy for centralized enterprise high-density WLANs. Traditionally, to maintain the network coverage, all of the APs within the WLAN have to be powered on. Nevertheless, the new algorithm can power off a large proportion of APs while the coverage is maintained as the always-on counterpart. The proposed algorithm is composed of two parallel and concurrent procedures, which are the faster procedure based on K-means and the more accurate procedure based on Dynamic Population Size Multiple Objective Particle Swarm Optimization (DPS-MOPSO. To implement green clustering efficiently and accurately, dynamic population size and mutational operators are introduced as complements for the classical MOPSO. In addition to the function of AP selection, the new green clustering algorithm has another new function as the reference and guidance for AP deployment. This paper also presents simulations in scenarios modeled with ray-tracing method and FDTD technique, and the results show that about 67% up to 90% of energy consumption can be saved while the original network coverage is maintained during periods when few users are online or when the traffic load is low.

  8. Quantum Monte Carlo methods and lithium cluster properties

    Energy Technology Data Exchange (ETDEWEB)

    Owen, Richard Kent [Univ. of California, Berkeley, CA (United States)

    1990-12-01

    Properties of small lithium clusters with sizes ranging from n = 1 to 5 atoms were investigated using quantum Monte Carlo (QMC) methods. Cluster geometries were found from complete active space self consistent field (CASSCF) calculations. A detailed development of the QMC method leading to the variational QMC (V-QMC) and diffusion QMC (D-QMC) methods is shown. The many-body aspect of electron correlation is introduced into the QMC importance sampling electron-electron correlation functions by using density dependent parameters, and are shown to increase the amount of correlation energy obtained in V-QMC calculations. A detailed analysis of D-QMC time-step bias is made and is found to be at least linear with respect to the time-step. The D-QMC calculations determined the lithium cluster ionization potentials to be 0.1982(14) [0.1981], 0.1895(9) [0.1874(4)], 0.1530(34) [0.1599(73)], 0.1664(37) [0.1724(110)], 0.1613(43) [0.1675(110)] Hartrees for lithium clusters n = 1 through 5, respectively; in good agreement with experimental results shown in the brackets. Also, the binding energies per atom was computed to be 0.0177(8) [0.0203(12)], 0.0188(10) [0.0220(21)], 0.0247(8) [0.0310(12)], 0.0253(8) [0.0351(8)] Hartrees for lithium clusters n = 2 through 5, respectively. The lithium cluster one-electron density is shown to have charge concentrations corresponding to nonnuclear attractors. The overall shape of the electronic charge density also bears a remarkable similarity with the anisotropic harmonic oscillator model shape for the given number of valence electrons.

  9. Cluster cosmological analysis with X ray instrumental observables: introduction and testing of AsPIX method

    International Nuclear Information System (INIS)

    Valotti, Andrea

    2016-01-01

    Cosmology is one of the fundamental pillars of astrophysics, as such it contains many unsolved puzzles. To investigate some of those puzzles, we analyze X-ray surveys of galaxy clusters. These surveys are possible thanks to the bremsstrahlung emission of the intra-cluster medium. The simultaneous fit of cluster counts as a function of mass and distance provides an independent measure of cosmological parameters such as Ω_m, σ_s, and the dark energy equation of state w0. A novel approach to cosmological analysis using galaxy cluster data, called top-down, was developed in N. Clerc et al. (2012). This top-down approach is based purely on instrumental observables that are considered in a two-dimensional X-ray color-magnitude diagram. The method self-consistently includes selection effects and scaling relationships. It also provides a means of bypassing the computation of individual cluster masses. My work presents an extension of the top-down method by introducing the apparent size of the cluster, creating a three-dimensional X-ray cluster diagram. The size of a cluster is sensitive to both the cluster mass and its angular diameter, so it must also be included in the assessment of selection effects. The performance of this new method is investigated using a Fisher analysis. In parallel, I have studied the effects of the intrinsic scatter in the cluster size scaling relation on the sample selection as well as on the obtained cosmological parameters. To validate the method, I estimate uncertainties of cosmological parameters with MCMC method Amoeba minimization routine and using two simulated XMM surveys that have an increasing level of complexity. The first simulated survey is a set of toy catalogues of 100 and 10000 deg"2, whereas the second is a 1000 deg"2 catalogue that was generated using an Aardvark semi-analytical N-body simulation. This comparison corroborates the conclusions of the Fisher analysis. In conclusion, I find that a cluster diagram that accounts for

  10. Analysis of protein profiles using fuzzy clustering methods

    DEFF Research Database (Denmark)

    Karemore, Gopal Raghunath; Ukendt, Sujatha; Rai, Lavanya

    clustering methods for their classification followed by various validation  measures.    The  clustering  algorithms  used  for  the  study  were  K-  means,  K- medoid, Fuzzy C-means, Gustafson-Kessel, and Gath-Geva.  The results presented in this study  conclude  that  the  protein  profiles  of  tissue...

  11. Functionalization of atomic cobalt clusters obtained by electrochemical methods

    Energy Technology Data Exchange (ETDEWEB)

    Rodriguez Cobo, Eldara [Laboratorio de Magnetismo y Tecnologia, Instituto Tecnoloxico, Pabillon de Servicios, Campus Sur, 15782 Santiago de Compostela (Spain); Departamento de Quimica Organica y Unidad Asociada al CSIC, Universidad de Santiago de Compostela, 15782 Santiago de Compostela (Spain); Rivas Rey, Jose; Blanco Varela, M. Carmen; Lopez Quintela, M. Arturo [Laboratorio de Magnetismo y Tecnologia, Instituto Tecnoloxico, Pabillon de Servicios, Campus Sur, 15782 Santiago de Compostela (Spain); Mourino Mosquera, Antonio; Torneiro Abuin, Mercedes [Departamento de Quimica Organica y Unidad Asociada al CSIC, Universidad de Santiago de Compostela, 15782 Santiago de Compostela (Spain)

    2006-05-15

    Functionalization of magnetic nanoparticles with appropriate organic molecules is very important for many applications. In the present study, cobalt nanoparticles, with an average diameter of 2 nm corresponding to Co{sub 309} clusters were synthesised by an electrochemical method, and then coated with ADCB (4-(9-deceniloxi)benzoic acid), in order to protect the clusters against oxidation and to obtain a final nanostructure, which can be attached later on to many different materials, like drugs, proteins or some other biological molecules. (copyright 2006 WILEY-VCH Verlag GmbH and Co. KGaA, Weinheim) (orig.)

  12. A Comparison of Methods for Player Clustering via Behavioral Telemetry

    DEFF Research Database (Denmark)

    Drachen, Anders; Thurau, C.; Sifa, R.

    2013-01-01

    patterns in the behavioral data, and developing profiles that are actionable to game developers. There are numerous methods for unsupervised clustering of user behavior, e.g. k-means/c-means, Nonnegative Matrix Factorization, or Principal Component Analysis. Although all yield behavior categorizations......, interpretation of the resulting categories in terms of actual play behavior can be difficult if not impossible. In this paper, a range of unsupervised techniques are applied together with Archetypal Analysis to develop behavioral clusters from playtime data of 70,014 World of Warcraft players, covering a five...

  13. Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions.

    Science.gov (United States)

    Tokuda, Tomoki; Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji

    2017-01-01

    We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data.

  14. Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions.

    Directory of Open Access Journals (Sweden)

    Tomoki Tokuda

    Full Text Available We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data.

  15. Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

    Science.gov (United States)

    Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji

    2017-01-01

    We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data. PMID:29049392

  16. Clustering Methods with Qualitative Data: A Mixed Methods Approach for Prevention Research with Small Samples

    Science.gov (United States)

    Henry, David; Dymnicki, Allison B.; Mohatt, Nathaniel; Allen, James; Kelly, James G.

    2016-01-01

    Qualitative methods potentially add depth to prevention research, but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data, but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-Means clustering, and latent class analysis produced similar levels of accuracy with binary data, and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a “real-world” example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities. PMID:25946969

  17. Clustering Methods with Qualitative Data: a Mixed-Methods Approach for Prevention Research with Small Samples.

    Science.gov (United States)

    Henry, David; Dymnicki, Allison B; Mohatt, Nathaniel; Allen, James; Kelly, James G

    2015-10-01

    Qualitative methods potentially add depth to prevention research but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed-methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed-methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-means clustering, and latent class analysis produced similar levels of accuracy with binary data and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a "real-world" example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities.

  18. Analysis of protein profiles using fuzzy clustering methods

    DEFF Research Database (Denmark)

    Karemore, Gopal Raghunath; Ukendt, Sujatha; Rai, Lavanya

    clustering methods for their classification followed by various validation  measures.    The  clustering  algorithms  used  for  the  study  were  K-  means,  K- medoid, Fuzzy C-means, Gustafson-Kessel, and Gath-Geva.  The results presented in this study  conclude  that  the  protein  profiles  of  tissue......  samples  recorded  by  using  the  HPLC- LIF  system  and  the  data  analyzed  by  clustering  algorithms  quite  successfully  classifies them as belonging from normal and malignant conditions....

  19. Multiple imputation methods for bivariate outcomes in cluster randomised trials.

    Science.gov (United States)

    DiazOrdaz, K; Kenward, M G; Gomes, M; Grieve, R

    2016-09-10

    Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost-effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing-at-random clustered data scenarios were simulated following a full-factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed-effects multiple imputation and too low following single-level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  20. A PROBABILISTIC EMBEDDING CLUSTERING METHOD FOR URBAN STRUCTURE DETECTION

    Directory of Open Access Journals (Sweden)

    X. Lin

    2017-09-01

    Full Text Available Urban structure detection is a basic task in urban geography. Clustering is a core technology to detect the patterns of urban spatial structure, urban functional region, and so on. In big data era, diverse urban sensing datasets recording information like human behaviour and human social activity, suffer from complexity in high dimension and high noise. And unfortunately, the state-of-the-art clustering methods does not handle the problem with high dimension and high noise issues concurrently. In this paper, a probabilistic embedding clustering method is proposed. Firstly, we come up with a Probabilistic Embedding Model (PEM to find latent features from high dimensional urban sensing data by “learning” via probabilistic model. By latent features, we could catch essential features hidden in high dimensional data known as patterns; with the probabilistic model, we can also reduce uncertainty caused by high noise. Secondly, through tuning the parameters, our model could discover two kinds of urban structure, the homophily and structural equivalence, which means communities with intensive interaction or in the same roles in urban structure. We evaluated the performance of our model by conducting experiments on real-world data and experiments with real data in Shanghai (China proved that our method could discover two kinds of urban structure, the homophily and structural equivalence, which means clustering community with intensive interaction or under the same roles in urban space.

  1. Method for discovering relationships in data by dynamic quantum clustering

    Science.gov (United States)

    Weinstein, Marvin; Horn, David

    2014-10-28

    Data clustering is provided according to a dynamical framework based on quantum mechanical time evolution of states corresponding to data points. To expedite computations, we can approximate the time-dependent Hamiltonian formalism by a truncated calculation within a set of Gaussian wave-functions (coherent states) centered around the original points. This allows for analytic evaluation of the time evolution of all such states, opening up the possibility of exploration of relationships among data-points through observation of varying dynamical-distances among points and convergence of points into clusters. This formalism may be further supplemented by preprocessing, such as dimensional reduction through singular value decomposition and/or feature filtering.

  2. Method for discovering relationships in data by dynamic quantum clustering

    Energy Technology Data Exchange (ETDEWEB)

    Weinstein, Marvin; Horn, David

    2017-05-09

    Data clustering is provided according to a dynamical framework based on quantum mechanical time evolution of states corresponding to data points. To expedite computations, we can approximate the time-dependent Hamiltonian formalism by a truncated calculation within a set of Gaussian wave-functions (coherent states) centered around the original points. This allows for analytic evaluation of the time evolution of all such states, opening up the possibility of exploration of relationships among data-points through observation of varying dynamical-distances among points and convergence of points into clusters. This formalism may be further supplemented by preprocessing, such as dimensional reduction through singular value decomposition and/or feature filtering.

  3. Radiobiological analyse based on cell cluster models

    International Nuclear Information System (INIS)

    Lin Hui; Jing Jia; Meng Damin; Xu Yuanying; Xu Liangfeng

    2010-01-01

    The influence of cell cluster dimension on EUD and TCP for targeted radionuclide therapy was studied using the radiobiological method. The radiobiological features of tumor with activity-lack in core were evaluated and analyzed by associating EUD, TCP and SF.The results show that EUD will increase with the increase of tumor dimension under the activity homogeneous distribution. If the extra-cellular activity was taken into consideration, the EUD will increase 47%. Under the activity-lack in tumor center and the requirement of TCP=0.90, the α cross-fire influence of 211 At could make up the maximum(48 μm)3 activity-lack for Nucleus source, but(72 μm)3 for Cytoplasm, Cell Surface, Cell and Voxel sources. In clinic,the physician could prefer the suggested dose of Cell Surface source in case of the future of local tumor control for under-dose. Generally TCP could well exhibit the effect difference between under-dose and due-dose, but not between due-dose and over-dose, which makes TCP more suitable for the therapy plan choice. EUD could well exhibit the difference between different models and activity distributions,which makes it more suitable for the research work. When the user uses EUD to study the influence of activity inhomogeneous distribution, one should keep the consistency of the configuration and volume of the former and the latter models. (authors)

  4. Coresets vs clustering: comparison of methods for redundancy reduction in very large white matter fiber sets

    Science.gov (United States)

    Alexandroni, Guy; Zimmerman Moreno, Gali; Sochen, Nir; Greenspan, Hayit

    2016-03-01

    Recent advances in Diffusion Weighted Magnetic Resonance Imaging (DW-MRI) of white matter in conjunction with improved tractography produce impressive reconstructions of White Matter (WM) pathways. These pathways (fiber sets) often contain hundreds of thousands of fibers, or more. In order to make fiber based analysis more practical, the fiber set needs to be preprocessed to eliminate redundancies and to keep only essential representative fibers. In this paper we demonstrate and compare two distinctive frameworks for selecting this reduced set of fibers. The first framework entails pre-clustering the fibers using k-means, followed by Hierarchical Clustering and replacing each cluster with one representative. For the second clustering stage seven distance metrics were evaluated. The second framework is based on an efficient geometric approximation paradigm named coresets. Coresets present a new approach to optimization and have huge success especially in tasks requiring large computation time and/or memory. We propose a modified version of the coresets algorithm, Density Coreset. It is used for extracting the main fibers from dense datasets, leaving a small set that represents the main structures and connectivity of the brain. A novel approach, based on a 3D indicator structure, is used for comparing the frameworks. This comparison was applied to High Angular Resolution Diffusion Imaging (HARDI) scans of 4 healthy individuals. We show that among the clustering based methods, that cosine distance gives the best performance. In comparing the clustering schemes with coresets, Density Coreset method achieves the best performance.

  5. WEIGHING GALAXY CLUSTERS WITH GAS. I. ON THE METHODS OF COMPUTING HYDROSTATIC MASS BIAS

    Energy Technology Data Exchange (ETDEWEB)

    Lau, Erwin T.; Nagai, Daisuke [Department of Physics, Yale University, New Haven, CT 06520 (United States); Nelson, Kaylea, E-mail: erwin.lau@yale.edu [Department of Astronomy, Yale University, New Haven, CT 06520 (United States)

    2013-11-10

    Mass estimates of galaxy clusters from X-ray and Sunyeav-Zel'dovich observations assume the intracluster gas is in hydrostatic equilibrium with their gravitational potential. However, since galaxy clusters are dynamically active objects whose dynamical states can deviate significantly from the equilibrium configuration, the departure from the hydrostatic equilibrium assumption is one of the largest sources of systematic uncertainties in cluster cosmology. In the literature there have been two methods for computing the hydrostatic mass bias based on the Euler and the modified Jeans equations, respectively, and there has been some confusion about the validity of these two methods. The word 'Jeans' was a misnomer, which incorrectly implies that the gas is collisionless. To avoid further confusion, we instead refer these methods as 'summation' and 'averaging' methods respectively. In this work, we show that these two methods for computing the hydrostatic mass bias are equivalent by demonstrating that the equation used in the second method can be derived from taking spatial averages of the Euler equation. Specifically, we identify the correspondences of individual terms in these two methods mathematically and show that these correspondences are valid to within a few percent level using hydrodynamical simulations of galaxy cluster formation. In addition, we compute the mass bias associated with the acceleration of gas and show that its contribution is small in the virialized regions in the interior of galaxy clusters, but becomes non-negligible in the outskirts of massive galaxy clusters. We discuss future prospects of understanding and characterizing biases in the mass estimate of galaxy clusters using both hydrodynamical simulations and observations and their implications for cluster cosmology.

  6. Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images

    Directory of Open Access Journals (Sweden)

    LI Jian-Wei

    2014-08-01

    Full Text Available On the basis of the cluster validity function based on geometric probability in literature [1, 2], propose a cluster analysis method based on geometric probability to process large amount of data in rectangular area. The basic idea is top-down stepwise refinement, firstly categories then subcategories. On all clustering levels, use the cluster validity function based on geometric probability firstly, determine clusters and the gathering direction, then determine the center of clustering and the border of clusters. Through TM remote sensing image classification examples, compare with the supervision and unsupervised classification in ERDAS and the cluster analysis method based on geometric probability in two-dimensional square which is proposed in literature 2. Results show that the proposed method can significantly improve the classification accuracy.

  7. Comparison of Bayesian clustering and edge detection methods for inferring boundaries in landscape genetics

    Science.gov (United States)

    Safner, T.; Miller, M.P.; McRae, B.H.; Fortin, M.-J.; Manel, S.

    2011-01-01

    Recently, techniques available for identifying clusters of individuals or boundaries between clusters using genetic data from natural populations have expanded rapidly. Consequently, there is a need to evaluate these different techniques. We used spatially-explicit simulation models to compare three spatial Bayesian clustering programs and two edge detection methods. Spatially-structured populations were simulated where a continuous population was subdivided by barriers. We evaluated the ability of each method to correctly identify boundary locations while varying: (i) time after divergence, (ii) strength of isolation by distance, (iii) level of genetic diversity, and (iv) amount of gene flow across barriers. To further evaluate the methods' effectiveness to detect genetic clusters in natural populations, we used previously published data on North American pumas and a European shrub. Our results show that with simulated and empirical data, the Bayesian spatial clustering algorithms outperformed direct edge detection methods. All methods incorrectly detected boundaries in the presence of strong patterns of isolation by distance. Based on this finding, we support the application of Bayesian spatial clustering algorithms for boundary detection in empirical datasets, with necessary tests for the influence of isolation by distance. ?? 2011 by the authors; licensee MDPI, Basel, Switzerland.

  8. Medical Imaging Lesion Detection Based on Unified Gravitational Fuzzy Clustering

    Directory of Open Access Journals (Sweden)

    Jean Marie Vianney Kinani

    2017-01-01

    Full Text Available We develop a swift, robust, and practical tool for detecting brain lesions with minimal user intervention to assist clinicians and researchers in the diagnosis process, radiosurgery planning, and assessment of the patient’s response to the therapy. We propose a unified gravitational fuzzy clustering-based segmentation algorithm, which integrates the Newtonian concept of gravity into fuzzy clustering. We first perform fuzzy rule-based image enhancement on our database which is comprised of T1/T2 weighted magnetic resonance (MR and fluid-attenuated inversion recovery (FLAIR images to facilitate a smoother segmentation. The scalar output obtained is fed into a gravitational fuzzy clustering algorithm, which separates healthy structures from the unhealthy. Finally, the lesion contour is automatically outlined through the initialization-free level set evolution method. An advantage of this lesion detection algorithm is its precision and its simultaneous use of features computed from the intensity properties of the MR scan in a cascading pattern, which makes the computation fast, robust, and self-contained. Furthermore, we validate our algorithm with large-scale experiments using clinical and synthetic brain lesion datasets. As a result, an 84%–93% overlap performance is obtained, with an emphasis on robustness with respect to different and heterogeneous types of lesion and a swift computation time.

  9. Cluster-based global firms' use of local capabilities

    DEFF Research Database (Denmark)

    Andersen, Poul Houman; Bøllingtoft, Anne

    2011-01-01

    replicated studies and possibly triangulation with quantitative studies would further develop the understanding on how globalization impacts on the internal organization of CBFs. Practical implications – For policy makers, cluster policies should be reconsidered if the role of clusters differs from what has...... and developed countries. Originality/value – Several studies have examined the changing role of clusters in the evolving global division of labour. However, research is lacking that addresses the challenges of transformation from the level of the CBF and how these may be affected by cluster evolution. The paper......Purpose – Despite growing interest in clusters role for the global competitiveness of firms, there has been little research into how globalization affects cluster-based firms’ (CBFs) use of local knowledge resources and the combination of local and global knowledge used. Using the cluster...

  10. A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream

    OpenAIRE

    Amini, Amineh; Saboohi, Hadi; Ying Wah, Teh; Herawan, Tutut

    2014-01-01

    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-bas...

  11. A rapid ATR-FTIR spectroscopic method for detection of sibutramine adulteration in tea and coffee based on hierarchical cluster and principal component analyses.

    Science.gov (United States)

    Cebi, Nur; Yilmaz, Mustafa Tahsin; Sagdic, Osman

    2017-08-15

    Sibutramine may be illicitly included in herbal slimming foods and supplements marketed as "100% natural" to enhance weight loss. Considering public health and legal regulations, there is an urgent need for effective, rapid and reliable techniques to detect sibutramine in dietetic herbal foods, teas and dietary supplements. This research comprehensively explored, for the first time, detection of sibutramine in green tea, green coffee and mixed herbal tea using ATR-FTIR spectroscopic technique combined with chemometrics. Hierarchical cluster analysis and PCA principle component analysis techniques were employed in spectral range (2746-2656cm -1 ) for classification and discrimination through Euclidian distance and Ward's algorithm. Unadulterated and adulterated samples were classified and discriminated with respect to their sibutramine contents with perfect accuracy without any false prediction. The results suggest that existence of the active substance could be successfully determined at the levels in the range of 0.375-12mg in totally 1.75g of green tea, green coffee and mixed herbal tea by using FTIR-ATR technique combined with chemometrics. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

    Science.gov (United States)

    Shrimankar, D. D.; Sathe, S. R.

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868

  13. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

    Science.gov (United States)

    Shrimankar, D D; Sathe, S R

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.

  14. Fast Search the Density Peaks and Clustering Method for Check-in Data

    Directory of Open Access Journals (Sweden)

    LIU Meng

    2017-04-01

    Full Text Available Check-in data obtained from Location-based Social Network (LBSN is a sort of crowd geographic data which will reveal daily activities of urban residents. Different check-in behaviors with the same check-in location will produce the phenomenon of location duplication because of location candidate function in LBSN system. The current density-based spatial clustering algorithms have the following problems: ①difficulty to find density peak point. ②clustering error caused by check-in point objects with duplicate positions. In order to solve these problems, we proposed a fast search density peaks and clustering method for check-in data, based on clustering by fast search and find of density peaks (CFSFDP. Firstly, position repetition frequency was introduced and calculated to illustrate the number of the check-in position duplications data. Secondly, a new type of point feature was constructed by adding position repetition frequency of the original check-in position data, which was used as study object to search density peaks. At last, clustering algorithm based on density peak point was constructed in which density connectivity was taken into account to ensure the continuity and integrity of density clusters. Taking check-in data obtained from Sina Microblog as an example, an experiment was designed and implemented. The results demonstrates:①Clustering method can effectively avoid the problem that the outlier location object with high repeatability is chosen as the peak and clustering, and has excellent spatial adaptability as well when comparing with check-in data from other area. ②Extracted density peak points can not only be used to represent the center of the hot zone, but also reflect the concentration trend of the hot zone, which can help to explore the dynamic change of the hot zone.

  15. Use of multiple cluster analysis methods to explore the validity of a community outcomes concept map.

    Science.gov (United States)

    Orsi, Rebecca

    2017-02-01

    Concept mapping is now a commonly-used technique for articulating and evaluating programmatic outcomes. However, research regarding validity of knowledge and outcomes produced with concept mapping is sparse. The current study describes quantitative validity analyses using a concept mapping dataset. We sought to increase the validity of concept mapping evaluation results by running multiple cluster analysis methods and then using several metrics to choose from among solutions. We present four different clustering methods based on analyses using the R statistical software package: partitioning around medoids (PAM), fuzzy analysis (FANNY), agglomerative nesting (AGNES) and divisive analysis (DIANA). We then used the Dunn and Davies-Bouldin indices to assist in choosing a valid cluster solution for a concept mapping outcomes evaluation. We conclude that the validity of the outcomes map is high, based on the analyses described. Finally, we discuss areas for further concept mapping methods research. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Translationally-invariant coupled-cluster method for finite systems

    Energy Technology Data Exchange (ETDEWEB)

    Guardiola, R.; Moliner, I. [Valencia Univ., Burjassot (Spain). Dept. de Fisica Atomica Molecular i Nuclear; Navarro, J.; Portesi, M. [IFIC (Centre Mixt CSIC -Universitat de Valencia), Avda. Dr. Moliner 50, E-46.100 Burjassot (Spain)

    1998-01-12

    The translational invariant formulation of the coupled-cluster method is presented here at the complete SUB(2) level for a system of nucleons treated as bosons. The correlation amplitudes are solutions of a non-linear coupled system of equations. These equations have been solved for light and medium systems, considering the central but still semi-realistic nucleon-nucleon S3 interaction. (orig.). 16 refs.

  17. Analysis of space payload operation modes based on divide-and-conquer clustering

    Directory of Open Access Journals (Sweden)

    Si Feng

    2016-01-01

    Full Text Available With the development of space electronic technology, the space payload operation modes are more and more complex, and manual interpretation is prone to errors for much workload. Generally the space payload’s operation modes are reflected by its telemetry data. By analysing the characteristics of the payload telemetry data, it is proposed an automatic analysis method of payload operation modes based on divide–and–conquer clustering. The clustering method combines division and incremental clustering. The principle of the method is introduced and the method is validated using the actual payload telemetry data. Furthermore the improved method is proposed to the problems encountered. Experimental results show that divide–and–conquer clustering method has the feature of calculation simple and classification accurate, when applied to the classification of payload operation modes. Furthermore this method can be applied to the other areas of payload data processing by extending the method.

  18. GA-Based Membrane Evolutionary Algorithm for Ensemble Clustering

    Directory of Open Access Journals (Sweden)

    Yanhua Wang

    2017-01-01

    Full Text Available Ensemble clustering can improve the generalization ability of a single clustering algorithm and generate a more robust clustering result by integrating multiple base clusterings, so it becomes the focus of current clustering research. Ensemble clustering aims at finding a consensus partition which agrees as much as possible with base clusterings. Genetic algorithm is a highly parallel, stochastic, and adaptive search algorithm developed from the natural selection and evolutionary mechanism of biology. In this paper, an improved genetic algorithm is designed by improving the coding of chromosome. A new membrane evolutionary algorithm is constructed by using genetic mechanisms as evolution rules and combines with the communication mechanism of cell-like P system. The proposed algorithm is used to optimize the base clusterings and find the optimal chromosome as the final ensemble clustering result. The global optimization ability of the genetic algorithm and the rapid convergence of the membrane system make membrane evolutionary algorithm perform better than several state-of-the-art techniques on six real-world UCI data sets.

  19. GA-Based Membrane Evolutionary Algorithm for Ensemble Clustering.

    Science.gov (United States)

    Wang, Yanhua; Liu, Xiyu; Xiang, Laisheng

    2017-01-01

    Ensemble clustering can improve the generalization ability of a single clustering algorithm and generate a more robust clustering result by integrating multiple base clusterings, so it becomes the focus of current clustering research. Ensemble clustering aims at finding a consensus partition which agrees as much as possible with base clusterings. Genetic algorithm is a highly parallel, stochastic, and adaptive search algorithm developed from the natural selection and evolutionary mechanism of biology. In this paper, an improved genetic algorithm is designed by improving the coding of chromosome. A new membrane evolutionary algorithm is constructed by using genetic mechanisms as evolution rules and combines with the communication mechanism of cell-like P system. The proposed algorithm is used to optimize the base clusterings and find the optimal chromosome as the final ensemble clustering result. The global optimization ability of the genetic algorithm and the rapid convergence of the membrane system make membrane evolutionary algorithm perform better than several state-of-the-art techniques on six real-world UCI data sets.

  20. Cluster analysis of European Y-chromosomal STR haplotypes using the discrete Laplace method

    DEFF Research Database (Denmark)

    Andersen, Mikkel Meyer; Eriksen, Poul Svante; Morling, Niels

    2014-01-01

    method can be used for cluster analysis to further validate the discrete Laplace method. A very important practical fact is that the calculations can be performed on a normal computer. We identified two sub-clusters of the Eastern and Western European Y-STR haplotypes similar to results of previous......The European Y-chromosomal short tandem repeat (STR) haplotype distribution has previously been analysed in various ways. Here, we introduce a new way of analysing population substructure using a new method based on clustering within the discrete Laplace exponential family that models...... the probability distribution of the Y-STR haplotypes. Creating a consistent statistical model of the haplotypes enables us to perform a wide range of analyses. Previously, haplotype frequency estimation using the discrete Laplace method has been validated. In this paper we investigate how the discrete Laplace...

  1. Medical Inpatient Journey Modeling and Clustering: A Bayesian Hidden Markov Model Based Approach.

    Science.gov (United States)

    Huang, Zhengxing; Dong, Wei; Wang, Fei; Duan, Huilong

    2015-01-01

    Modeling and clustering medical inpatient journeys is useful to healthcare organizations for a number of reasons including inpatient journey reorganization in a more convenient way for understanding and browsing, etc. In this study, we present a probabilistic model-based approach to model and cluster medical inpatient journeys. Specifically, we exploit a Bayesian Hidden Markov Model based approach to transform medical inpatient journeys into a probabilistic space, which can be seen as a richer representation of inpatient journeys to be clustered. Then, using hierarchical clustering on the matrix of similarities, inpatient journeys can be clustered into different categories w.r.t their clinical and temporal characteristics. We evaluated the proposed approach on a real clinical data set pertaining to the unstable angina treatment process. The experimental results reveal that our method can identify and model latent treatment topics underlying in personalized inpatient journeys, and yield impressive clustering quality.

  2. Histological image segmentation using fast mean shift clustering method.

    Science.gov (United States)

    Wu, Geming; Zhao, Xinyan; Luo, Shuqian; Shi, Hongli

    2015-03-20

    Colour image segmentation is fundamental and critical for quantitative histological image analysis. The complexity of the microstructure and the approach to make histological images results in variable staining and illumination variations. And ultra-high resolution of histological images makes it is hard for image segmentation methods to achieve high-quality segmentation results and low computation cost at the same time. Mean Shift clustering approach is employed for histological image segmentation. Colour histological image is transformed from RGB to CIE L*a*b* colour space, and then a* and b* components are extracted as features. To speed up Mean Shift algorithm, the probability density distribution is estimated in feature space in advance and then the Mean Shift scheme is used to separate the feature space into different regions by finding the density peaks quickly. And an integral scheme is employed to reduce the computation cost of mean shift vector significantly. Finally image pixels are classified into clusters according to which region their features fall into in feature space. Numerical experiments are carried on liver fibrosis histological images. Experimental results demonstrate that Mean Shift clustering achieves more accurate results than k-means but is computational expensive, and the speed of the improved Mean Shift method is comparable to that of k-means while the accuracy of segmentation results is the same as that achieved using standard Mean Shift method. An effective and reliable histological image segmentation approach is proposed in this paper. It employs improved Mean Shift clustering, which is speed up by using probability density distribution estimation and the integral scheme.

  3. Cluster algebras bases on vertex operator algebras

    Czech Academy of Sciences Publication Activity Database

    Zuevsky, Alexander

    2016-01-01

    Roč. 30, 28-29 (2016), č. článku 1640030. ISSN 0217-9792 Institutional support: RVO:67985840 Keywords : cluster alegbras * vertex operator algebras * Riemann surfaces Subject RIV: BA - General Mathematics Impact factor: 0.736, year: 2016 http://www.worldscientific.com/doi/abs/10.1142/S0217979216400300

  4. Risk Probability Estimating Based on Clustering

    DEFF Research Database (Denmark)

    Chen, Yong; Jensen, Christian D.; Gray, Elizabeth

    2003-01-01

    biquitous computing environments are highly dynamic, with new unforeseen circumstances and constantly changing environments, which introduces new risks that cannot be assessed through traditional means of risk analysis. Mobile entities in a ubiquitous computing environment require the ability to ...... on the automatic clustering of defining features of the environment and the other entity, which helps avoid subjective judgments as much as possible....

  5. An Intelligent Clustering Based Methodology for Confusable ...

    African Journals Online (AJOL)

    Journal of the Nigerian Association of Mathematical Physics ... The system assigns patients with severity levels in all the clusters. ... The system compares favorably with diagnosis arrived at by experienced physicians and also provides patients' level of severity in each confusable disease and the degree of confusability of ...

  6. Semi-supervised weighted kernel clustering based on gravitational search for fault diagnosis.

    Science.gov (United States)

    Li, Chaoshun; Zhou, Jianzhong

    2014-09-01

    Supervised learning method, like support vector machine (SVM), has been widely applied in diagnosing known faults, however this kind of method fails to work correctly when new or unknown fault occurs. Traditional unsupervised kernel clustering can be used for unknown fault diagnosis, but it could not make use of the historical classification information to improve diagnosis accuracy. In this paper, a semi-supervised kernel clustering model is designed to diagnose known and unknown faults. At first, a novel semi-supervised weighted kernel clustering algorithm based on gravitational search (SWKC-GS) is proposed for clustering of dataset composed of labeled and unlabeled fault samples. The clustering model of SWKC-GS is defined based on wrong classification rate of labeled samples and fuzzy clustering index on the whole dataset. Gravitational search algorithm (GSA) is used to solve the clustering model, while centers of clusters, feature weights and parameter of kernel function are selected as optimization variables. And then, new fault samples are identified and diagnosed by calculating the weighted kernel distance between them and the fault cluster centers. If the fault samples are unknown, they will be added in historical dataset and the SWKC-GS is used to partition the mixed dataset and update the clustering results for diagnosing new fault. In experiments, the proposed method has been applied in fault diagnosis for rotatory bearing, while SWKC-GS has been compared not only with traditional clustering methods, but also with SVM and neural network, for known fault diagnosis. In addition, the proposed method has also been applied in unknown fault diagnosis. The results have shown effectiveness of the proposed method in achieving expected diagnosis accuracy for both known and unknown faults of rotatory bearing. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.

  7. Development of a data-processing method based on Bayesian k-means clustering to discriminate aneugens and clastogens in a high-content micronucleus assay.

    Science.gov (United States)

    Huang, Z H; Li, N; Rao, K F; Liu, C T; Huang, Y; Ma, M; Wang, Z J

    2018-03-01

    Genotoxicants can be identified as aneugens and clastogens through a micronucleus (MN) assay. The current high-content screening-based MN assays usually discriminate an aneugen from a clastogen based on only one parameter, such as the MN size, intensity, or morphology, which yields low accuracies (70-84%) because each of these parameters may contribute to the results. Therefore, the development of an algorithm that can synthesize high-dimensionality data to attain comparative results is important. To improve the automation and accuracy of detection using the current parameter-based mode of action (MoA), the MN MoA signatures of 20 chemicals were systematically recruited in this study to develop an algorithm. The results of the algorithm showed very good agreement (93.58%) between the prediction and reality, indicating that the proposed algorithm is a validated analytical platform for the rapid and objective acquisition of genotoxic MoA messages.

  8. A fast density-based clustering algorithm for real-time Internet of Things stream.

    Science.gov (United States)

    Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut

    2014-01-01

    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.

  9. A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream

    Directory of Open Access Journals (Sweden)

    Amineh Amini

    2014-01-01

    Full Text Available Data streams are continuously generated over time from Internet of Things (IoT devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.

  10. A New Soft Computing Method for K-Harmonic Means Clustering.

    Science.gov (United States)

    Yeh, Wei-Chang; Jiang, Yunzhi; Chen, Yee-Fen; Chen, Zhe

    2016-01-01

    The K-harmonic means clustering algorithm (KHM) is a new clustering method used to group data such that the sum of the harmonic averages of the distances between each entity and all cluster centroids is minimized. Because it is less sensitive to initialization than K-means (KM), many researchers have recently been attracted to studying KHM. In this study, the proposed iSSO-KHM is based on an improved simplified swarm optimization (iSSO) and integrates a variable neighborhood search (VNS) for KHM clustering. As evidence of the utility of the proposed iSSO-KHM, we present extensive computational results on eight benchmark problems. From the computational results, the comparison appears to support the superiority of the proposed iSSO-KHM over previously developed algorithms for all experiments in the literature.

  11. A Flocking Based algorithm for Document Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.

  12. Application of AI methods in the clustering of architecture interior forms

    Directory of Open Access Journals (Sweden)

    Maryam Banaei

    2017-09-01

    Full Text Available Form or shape is one of the main aspects of architecture design. A gap exists in scientific studies on categorizing different architecture interior forms according to design. This paper presents a methodology for categorizing interior forms of built places. The main innovation of this study was to evaluate the architecture interior forms of real built places as a base for any analysis on form. We proposed a clustering method by selecting 343 images of living rooms from residential places according to their history and interior design style. We labeled all the images in AutoCAD software depending on form features. The labeling results showed that images had 1104 distinct form features, including sloped, vertical and horizontal linear solids, and edges. Regarding the high dimension of data, we used Graphical Clustering Toolkit software for clustering, which involved the use of correlation coefficients and internal similarity among clusters. The clustering analysis grouped all the images into 25 clusters with the highest internal and the lowest external similarities. The descriptive features of each cluster could show its formal characteristics.

  13. Clustering common bean mutants based on heterotic groupings ...

    African Journals Online (AJOL)

    The objective of this study was to cluster bean mutants from a bean mutation breeding programme, based on heterotic groupings. This was achieved by genotyping 16 bean genotypes, using 21 Simple Sequence Repeats (SSR) bean markers. From the results, three different clusters A, B and C, were obtained suggesting ...

  14. A Hybrid Image Filtering Method for Computer-Aided Detection of Microcalcification Clusters in Mammograms

    Directory of Open Access Journals (Sweden)

    Xiaoyong Zhang

    2013-01-01

    Full Text Available The presence of microcalcification clusters (MCs in mammogram is a major indicator of breast cancer. Detection of an MC is one of the key issues for breast cancer control. In this paper, we present a highly accurate method based on a morphological image processing and wavelet transform technique to detect the MCs in mammograms. The microcalcifications are firstly enhanced by using multistructure elements morphological processing. Then, the candidates of microcalcifications are refined by a multilevel wavelet reconstruction approach. Finally, MCs are detected based on their distributions feature. Experiments are performed on 138 clinical mammograms. The proposed method is capable of detecting 92.9% of true microcalcification clusters with an average of 0.08 false microcalcification clusters detected per image.

  15. Reconstruction of a digital core containing clay minerals based on a clustering algorithm

    Science.gov (United States)

    He, Yanlong; Pu, Chunsheng; Jing, Cheng; Gu, Xiaoyu; Chen, Qingdong; Liu, Hongzhi; Khan, Nasir; Dong, Qiaoling

    2017-10-01

    It is difficult to obtain a core sample and information for digital core reconstruction of mature sandstone reservoirs around the world, especially for an unconsolidated sandstone reservoir. Meanwhile, reconstruction and division of clay minerals play a vital role in the reconstruction of the digital cores, although the two-dimensional data-based reconstruction methods are specifically applicable as the microstructure reservoir simulation methods for the sandstone reservoir. However, reconstruction of clay minerals is still challenging from a research viewpoint for the better reconstruction of various clay minerals in the digital cores. In the present work, the content of clay minerals was considered on the basis of two-dimensional information about the reservoir. After application of the hybrid method, and compared with the model reconstructed by the process-based method, the digital core containing clay clusters without the labels of the clusters' number, size, and texture were the output. The statistics and geometry of the reconstruction model were similar to the reference model. In addition, the Hoshen-Kopelman algorithm was used to label various connected unclassified clay clusters in the initial model and then the number and size of clay clusters were recorded. At the same time, the K -means clustering algorithm was applied to divide the labeled, large connecting clusters into smaller clusters on the basis of difference in the clusters' characteristics. According to the clay minerals' characteristics, such as types, textures, and distributions, the digital core containing clay minerals was reconstructed by means of the clustering algorithm and the clay clusters' structure judgment. The distributions and textures of the clay minerals of the digital core were reasonable. The clustering algorithm improved the digital core reconstruction and provided an alternative method for the simulation of different clay minerals in the digital cores.

  16. Reconstruction of a digital core containing clay minerals based on a clustering algorithm.

    Science.gov (United States)

    He, Yanlong; Pu, Chunsheng; Jing, Cheng; Gu, Xiaoyu; Chen, Qingdong; Liu, Hongzhi; Khan, Nasir; Dong, Qiaoling

    2017-10-01

    It is difficult to obtain a core sample and information for digital core reconstruction of mature sandstone reservoirs around the world, especially for an unconsolidated sandstone reservoir. Meanwhile, reconstruction and division of clay minerals play a vital role in the reconstruction of the digital cores, although the two-dimensional data-based reconstruction methods are specifically applicable as the microstructure reservoir simulation methods for the sandstone reservoir. However, reconstruction of clay minerals is still challenging from a research viewpoint for the better reconstruction of various clay minerals in the digital cores. In the present work, the content of clay minerals was considered on the basis of two-dimensional information about the reservoir. After application of the hybrid method, and compared with the model reconstructed by the process-based method, the digital core containing clay clusters without the labels of the clusters' number, size, and texture were the output. The statistics and geometry of the reconstruction model were similar to the reference model. In addition, the Hoshen-Kopelman algorithm was used to label various connected unclassified clay clusters in the initial model and then the number and size of clay clusters were recorded. At the same time, the K-means clustering algorithm was applied to divide the labeled, large connecting clusters into smaller clusters on the basis of difference in the clusters' characteristics. According to the clay minerals' characteristics, such as types, textures, and distributions, the digital core containing clay minerals was reconstructed by means of the clustering algorithm and the clay clusters' structure judgment. The distributions and textures of the clay minerals of the digital core were reasonable. The clustering algorithm improved the digital core reconstruction and provided an alternative method for the simulation of different clay minerals in the digital cores.

  17. Fuzzy Rules for Ant Based Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Amira Hamdi

    2016-01-01

    Full Text Available This paper provides a new intelligent technique for semisupervised data clustering problem that combines the Ant System (AS algorithm with the fuzzy c-means (FCM clustering algorithm. Our proposed approach, called F-ASClass algorithm, is a distributed algorithm inspired by foraging behavior observed in ant colonyT. The ability of ants to find the shortest path forms the basis of our proposed approach. In the first step, several colonies of cooperating entities, called artificial ants, are used to find shortest paths in a complete graph that we called graph-data. The number of colonies used in F-ASClass is equal to the number of clusters in dataset. Hence, the partition matrix of dataset founded by artificial ants is given in the second step, to the fuzzy c-means technique in order to assign unclassified objects generated in the first step. The proposed approach is tested on artificial and real datasets, and its performance is compared with those of K-means, K-medoid, and FCM algorithms. Experimental section shows that F-ASClass performs better according to the error rate classification, accuracy, and separation index.

  18. Applying Clustering Methods in Drawing Maps of Science: Case Study of the Map For Urban Management Science

    Directory of Open Access Journals (Sweden)

    Mohammad Abuei Ardakan

    2010-04-01

    Full Text Available The present paper offers a basic introduction to data clustering and demonstrates the application of clustering methods in drawing maps of science. All approaches towards classification and clustering of information are briefly discussed. Their application to the process of visualization of conceptual information and drawing of science maps are illustrated by reviewing similar researches in this field. By implementing aggregated hierarchical clustering algorithm, which is an algorithm based on complete-link method, the map for urban management science as an emerging, interdisciplinary scientific field is analyzed and reviewed.

  19. Statistical analysis of two-dimensional cluster structures composed of ferromagnetic particles based on a flexible chain model.

    Science.gov (United States)

    Morimoto, Hisao; Maekawa, Toru; Matsumoto, Yoichiro

    2003-12-01

    We investigate two-dimensional cluster structures composed of ferromagnetic colloidal particles, based on a flexible chain model, by the configurational-bias Monte Carlo method. We clarify the dependence of the probabilities of the creation of different types of clusters on the dipole-dipole interactive energy and the cluster size.

  20. A survey on the taxonomy of cluster-based routing protocols for homogeneous wireless sensor networks.

    Science.gov (United States)

    Naeimi, Soroush; Ghafghazi, Hamidreza; Chow, Chee-Onn; Ishii, Hiroshi

    2012-01-01

    The past few years have witnessed increased interest among researchers in cluster-based protocols for homogeneous networks because of their better scalability and higher energy efficiency than other routing protocols. Given the limited capabilities of sensor nodes in terms of energy resources, processing and communication range, the cluster-based protocols should be compatible with these constraints in either the setup state or steady data transmission state. With focus on these constraints, we classify routing protocols according to their objectives and methods towards addressing the shortcomings of clustering process on each stage of cluster head selection, cluster formation, data aggregation and data communication. We summarize the techniques and methods used in these categories, while the weakness and strength of each protocol is pointed out in details. Furthermore, taxonomy of the protocols in each phase is given to provide a deeper understanding of current clustering approaches. Ultimately based on the existing research, a summary of the issues and solutions of the attributes and characteristics of clustering approaches and some open research areas in cluster-based routing protocols that can be further pursued are provided.

  1. A Survey on the Taxonomy of Cluster-Based Routing Protocols for Homogeneous Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Hiroshi Ishii

    2012-05-01

    Full Text Available The past few years have witnessed increased interest among researchers in cluster-based protocols for homogeneous networks because of their better scalability and higher energy efficiency than other routing protocols. Given the limited capabilities of sensor nodes in terms of energy resources, processing and communication range, the cluster-based protocols should be compatible with these constraints in either the setup state or steady data transmission state. With focus on these constraints, we classify routing protocols according to their objectives and methods towards addressing the shortcomings of clustering process on each stage of cluster head selection, cluster formation, data aggregation and data communication. We summarize the techniques and methods used in these categories, while the weakness and strength of each protocol is pointed out in details. Furthermore, taxonomy of the protocols in each phase is given to provide a deeper understanding of current clustering approaches. Ultimately based on the existing research, a summary of the issues and solutions of the attributes and characteristics of clustering approaches and some open research areas in cluster-based routing protocols that can be further pursued are provided.

  2. Fuzzy Logic Method for Enhancement Fault-Tolerant of Cluster Head in Wireless Sensor Networks Clustering

    Directory of Open Access Journals (Sweden)

    Farnaz Pakdeland

    2016-08-01

    Full Text Available Wireless sensor network is comprised of several sensor nodes. The retaining factors influence the network operation. In the clustering structure the cluster head failure can cause loss of information.The aim of this paper is to increase tolerance error in the cluster head node. At first, paying attention to the producing balance in the density of the cluster cause to postpone the death time of the cluster head node and lessen the collision due to the lack of the energy balance in clusters. The innovation in this stage is formed by using two fuzzy logic systems. One in the phase of evaluation of the cluster head chance, and the other in the phase of producing balance and the nodes migration to the qualified clusters to increase balance, Then the focus is on recognizing and repairing the cluster head fault.

  3. Cluster-based spectrum sensing for cognitive radios with imperfect channel to cluster-head

    KAUST Repository

    Ben Ghorbel, Mahdi

    2012-04-01

    Spectrum sensing is considered as the first and main step for cognitive radio systems to achieve an efficient use of spectrum. Cooperation and clustering among cognitive radio users are two techniques that can be employed with spectrum sensing in order to improve the sensing performance by reducing miss-detection and false alarm. In this paper, within the framework of a clustering-based cooperative spectrum sensing scheme, we study the effect of errors in transmitting the local decisions from the secondary users to the cluster heads (or the fusion center), while considering non-identical channel conditions between the secondary users. Closed-form expressions for the global probabilities of detection and false alarm at the cluster head are derived. © 2012 IEEE.

  4. Neural network based cluster creation in the ATLAS silicon Pixel Detector

    CERN Document Server

    Perez Cavalcanti, T; The ATLAS collaboration

    2012-01-01

    The hit signals read out from pixels on planar semi-conductor sensors are grouped into clusters, to reconstruct the location where a charged particle passed through. The resolution of the individual pixel sizes can be improved significantly using the information from the cluster of adjacent pixels. Such analog cluster creation techniques have been used by the ATLAS experiment for many years giving an excellent performance. However, in dense environments, such as those inside high-energy jets, is likely that the charge deposited by two or more close-by tracks merges into one single cluster. A new pattern recognition algorithm based on neural network methods has been developed for the ATLAS Pixel Detector. This can identify the shared clusters, split them if necessary, and estimate the positions of all particles traversing the cluster. The algorithm significantly reduces ambiguities in the assignment of pixel detector measurements to tracks within jets, and improves the positional accuracy with respect to stand...

  5. Unsupervised Clustering Method for Complexity Reduction of Terrestrial Lidar Data in Marshes

    Directory of Open Access Journals (Sweden)

    Chuyen Nguyen

    2018-01-01

    Full Text Available Accurate characterization of marsh elevation and landcover evolution is important for coastal management and conservation. This research proposes a novel unsupervised clustering method specifically developed for segmenting dense terrestrial laser scanning (TLS data in coastal marsh environments. The framework implements unsupervised clustering with the well-known K-means algorithm by applying an optimization to determine the “k” clusters. The fundamental idea behind this novel framework is the application of multi-scale voxel representation of 3D space to create a set of features that characterizes the local complexity and geometry of the terrain. A combination of point- and voxel-generated features are utilized to segment 3D point clouds into homogenous groups in order to study surface changes and vegetation cover. Results suggest that the combination of point and voxel features represent the dataset well. The developed method compresses millions of 3D points representing the complex marsh environment into eight distinct clusters representing different landcover: tidal flat, mangrove, low marsh to high marsh, upland, and power lines. A quantitative assessment of the automated delineation of the tidal flat areas shows acceptable results considering the proposed method is unsupervised with no training data. Clustering results based on K-means are also compared to results based on the Self Organizing Map (SOM clustering algorithm. Results demonstrate that the developed multi-scale voxelization approach and representative feature set are transferrable to other clustering algorithms, thereby providing an unsupervised framework for intelligent scene segmentation of TLS point cloud data in marshes.

  6. Fuzzy clustering-based segmented attenuation correction in whole-body PET

    CERN Document Server

    Zaidi, H; Boudraa, A; Slosman, DO

    2001-01-01

    Segmented-based attenuation correction is now a widely accepted technique to reduce noise contribution of measured attenuation correction. In this paper, we present a new method for segmenting transmission images in positron emission tomography. This reduces the noise on the correction maps while still correcting for differing attenuation coefficients of specific tissues. Based on the Fuzzy C-Means (FCM) algorithm, the method segments the PET transmission images into a given number of clusters to extract specific areas of differing attenuation such as air, the lungs and soft tissue, preceded by a median filtering procedure. The reconstructed transmission image voxels are therefore segmented into populations of uniform attenuation based on the human anatomy. The clustering procedure starts with an over-specified number of clusters followed by a merging process to group clusters with similar properties and remove some undesired substructures using anatomical knowledge. The method is unsupervised, adaptive and a...

  7. Improving Tensor Based Recommenders with Clustering

    DEFF Research Database (Denmark)

    Leginus, Martin; Dolog, Peter; Zemaitis, Valdas

    2012-01-01

    Social tagging systems (STS) model three types of entities (i.e. tag-user-item) and relationships between them are encoded into a 3-order tensor. Latent relationships and patterns can be discovered by applying tensor factorization techniques like Higher Order Singular Value Decomposition (HOSVD...... of the recommendations and execution time are improved and memory requirements are decreased. The clustering is motivated by the fact that many tags in a tag space are semantically similar thus the tags can be grouped. Finally, promising experimental results are presented...

  8. Method of removing crud deposited on fuel element clusters

    International Nuclear Information System (INIS)

    Yokota, Tokunobu; Yashima, Akira; Tajima, Jun-ichiro.

    1982-01-01

    Purpose: To enable easy elimination of claddings deposited on the surface of fuel element. Method: An operator manipulates a pole from above a platform, engages the longitudinal flange of the cover to the opening at the upper end of a channel box and starts up a suction pump. The suction amount of the pump is set such that water flow becomes within the channel box at greater flow rate than the operational flow rate in the channel box of the fuel element clusters during reactor operation. This enables to remove crud deposited on the surface of individual fuel elements with ease and rapidly without detaching the channel box. (Moriyama, K.)

  9. Unsupervised Performance Evaluation Strategy for Bridge Superstructure Based on Fuzzy Clustering and Field Data

    Directory of Open Access Journals (Sweden)

    Yubo Jiao

    2013-01-01

    Full Text Available Performance evaluation of a bridge is critical for determining the optimal maintenance strategy. An unsupervised bridge superstructure state assessment method is proposed in this paper based on fuzzy clustering and bridge field measured data. Firstly, the evaluation index system of bridge is constructed. Secondly, a certain number of bridge health monitoring data are selected as clustering samples to obtain the fuzzy similarity matrix and fuzzy equivalent matrix. Finally, different thresholds are selected to form dynamic clustering maps and determine the best classification based on statistic analysis. The clustering result is regarded as a sample base, and the bridge state can be evaluated by calculating the fuzzy nearness between the unknown bridge state data and the sample base. Nanping Bridge in Jilin Province is selected as the engineering project to verify the effectiveness of the proposed method.

  10. Unsupervised performance evaluation strategy for bridge superstructure based on fuzzy clustering and field data.

    Science.gov (United States)

    Jiao, Yubo; Liu, Hanbing; Zhang, Peng; Wang, Xianqiang; Wei, Haibin

    2013-01-01

    Performance evaluation of a bridge is critical for determining the optimal maintenance strategy. An unsupervised bridge superstructure state assessment method is proposed in this paper based on fuzzy clustering and bridge field measured data. Firstly, the evaluation index system of bridge is constructed. Secondly, a certain number of bridge health monitoring data are selected as clustering samples to obtain the fuzzy similarity matrix and fuzzy equivalent matrix. Finally, different thresholds are selected to form dynamic clustering maps and determine the best classification based on statistic analysis. The clustering result is regarded as a sample base, and the bridge state can be evaluated by calculating the fuzzy nearness between the unknown bridge state data and the sample base. Nanping Bridge in Jilin Province is selected as the engineering project to verify the effectiveness of the proposed method.

  11. Correction for dispersion and Coulombic interactions in molecular clusters with density functional derived methods: Application to polycyclic aromatic hydrocarbon clusters

    Science.gov (United States)

    Rapacioli, Mathias; Spiegelman, Fernand; Talbi, Dahbia; Mineva, Tzonka; Goursot, Annick; Heine, Thomas; Seifert, Gotthard

    2009-06-01

    The density functional based tight binding (DFTB) is a semiempirical method derived from the density functional theory (DFT). It inherits therefore its problems in treating van der Waals clusters. A major error comes from dispersion forces, which are poorly described by commonly used DFT functionals, but which can be accounted for by an a posteriori treatment DFT-D. This correction is used for DFTB. The self-consistent charge (SCC) DFTB is built on Mulliken charges which are known to give a poor representation of Coulombic intermolecular potential. We propose to calculate this potential using the class IV/charge model 3 definition of atomic charges. The self-consistent calculation of these charges is introduced in the SCC procedure and corresponding nuclear forces are derived. Benzene dimer is then studied as a benchmark system with this corrected DFTB (c-DFTB-D) method, but also, for comparison, with the DFT-D. Both methods give similar results and are in agreement with references calculations (CCSD(T) and symmetry adapted perturbation theory) calculations. As a first application, pyrene dimer is studied with the c-DFTB-D and DFT-D methods. For coronene clusters, only the c-DFTB-D approach is used, which finds the sandwich configurations to be more stable than the T-shaped ones.

  12. The Quintuplet cluster - A young massive cluster study based on proper motion membership

    Science.gov (United States)

    Hußmann, Benjamin

    2014-01-01

    Young massive clusters define the high mass range of current clustered star formation and are frequently found in starburst and interacting galaxies. As - with the exception of the nearest galaxies within the local group - extragalactic clusters can not be resolved into individual stars, the few young massive clusters in the Milky Way and the Magellanic Clouds might serve as templates for unresolved young massive clusters in more distant galaxies. Due to their high masses, these clusters sample the full range of stellar masses. In combination with the small or negligible spreads in age or metallicity of their stellar populations, this makes these object unique laboratories to study stellar evolution, especially in the high mass range.Furthermore, they allow to probe the initial mass function, which describes the distribution of masses of a stellar population at its birth, in its entirety. The Quintuplet cluster is one of three known young massive clusters residing in the central molecular zone and is located at a projected distance of 30 pc from the Galactic centre. Because of the rather extreme conditions in this region, a potential dependence of the outcome of the star formation process on the environmental conditions under which the star formation event takes place might leave its imprint in the stellar mass function. As the Quintuplet cluster is lacking a dense core and shows a somewhat dispersed appearance, it is crucial to effectively distinguish between cluster stars and the rich population of stars from the Galactic field along the line of sight to the Galactic centre in order to measure its present-day mass function. In this thesis, a clean sample of cluster stars is derived based on the common bulk proper motion of the cluster with respect to the Galactic field and a subsequent colour selection. The diffraction limited resolution of multi-epoch near-infrared imaging observations obtained at the ESO Very Large Telescope with adaptive optics correction

  13. Unsupervised Learning —A Novel Clustering Method for Rolling Bearing Faults Identification

    Science.gov (United States)

    Kai, Li; Bo, Luo; Tao, Ma; Xuefeng, Yang; Guangming, Wang

    2017-12-01

    To promptly process the massive fault data and automatically provide accurate diagnosis results, numerous studies have been conducted on intelligent fault diagnosis of rolling bearing. Among these studies, such as artificial neural networks, support vector machines, decision trees and other supervised learning methods are used commonly. These methods can detect the failure of rolling bearing effectively, but to achieve better detection results, it often requires a lot of training samples. Based on above, a novel clustering method is proposed in this paper. This novel method is able to find the correct number of clusters automatically the effectiveness of the proposed method is validated using datasets from rolling element bearings. The diagnosis results show that the proposed method can accurately detect the fault types of small samples. Meanwhile, the diagnosis results are also relative high accuracy even for massive samples.

  14. Communication: Time-dependent optimized coupled-cluster method for multielectron dynamics

    Science.gov (United States)

    Sato, Takeshi; Pathak, Himadri; Orimo, Yuki; Ishikawa, Kenichi L.

    2018-02-01

    Time-dependent coupled-cluster method with time-varying orbital functions, called time-dependent optimized coupled-cluster (TD-OCC) method, is formulated for multielectron dynamics in an intense laser field. We have successfully derived the equations of motion for CC amplitudes and orthonormal orbital functions based on the real action functional, and implemented the method including double excitations (TD-OCCD) and double and triple excitations (TD-OCCDT) within the optimized active orbitals. The present method is size extensive and gauge invariant, a polynomial cost-scaling alternative to the time-dependent multiconfiguration self-consistent-field method. The first application of the TD-OCC method of intense-laser driven correlated electron dynamics in Ar atom is reported.

  15. Local Community Detection Algorithm Based on Minimal Cluster

    Directory of Open Access Journals (Sweden)

    Yong Zhou

    2016-01-01

    Full Text Available In order to discover the structure of local community more effectively, this paper puts forward a new local community detection algorithm based on minimal cluster. Most of the local community detection algorithms begin from one node. The agglomeration ability of a single node must be less than multiple nodes, so the beginning of the community extension of the algorithm in this paper is no longer from the initial node only but from a node cluster containing this initial node and nodes in the cluster are relatively densely connected with each other. The algorithm mainly includes two phases. First it detects the minimal cluster and then finds the local community extended from the minimal cluster. Experimental results show that the quality of the local community detected by our algorithm is much better than other algorithms no matter in real networks or in simulated networks.

  16. Clustering economies based on multiple criteria decision making techniques

    Directory of Open Access Journals (Sweden)

    Mansour Momeni

    2011-10-01

    Full Text Available One of the primary concerns on many countries is to determine different important factors affecting economic growth. In this paper, we study some factors such as unemployment rate, inflation ratio, population growth, average annual income, etc to cluster different countries. The proposed model of this paper uses analytical hierarchy process (AHP to prioritize the criteria and then uses a K-mean technique to cluster 59 countries based on the ranked criteria into four groups. The first group includes countries with high standards such as Germany and Japan. In the second cluster, there are some developing countries with relatively good economic growth such as Saudi Arabia and Iran. The third cluster belongs to countries with faster rates of growth compared with the countries located in the second group such as China, India and Mexico. Finally, the fourth cluster includes countries with relatively very low rates of growth such as Jordan, Mali, Niger, etc.

  17. Improved Density Based Spatial Clustering of Applications of Noise Clustering Algorithm for Knowledge Discovery in Spatial Data

    Directory of Open Access Journals (Sweden)

    Arvind Sharma

    2016-01-01

    Full Text Available There are many techniques available in the field of data mining and its subfield spatial data mining is to understand relationships between data objects. Data objects related with spatial features are called spatial databases. These relationships can be used for prediction and trend detection between spatial and nonspatial objects for social and scientific reasons. A huge data set may be collected from different sources as satellite images, X-rays, medical images, traffic cameras, and GIS system. To handle this large amount of data and set relationship between them in a certain manner with certain results is our primary purpose of this paper. This paper gives a complete process to understand how spatial data is different from other kinds of data sets and how it is refined to apply to get useful results and set trends to predict geographic information system and spatial data mining process. In this paper a new improved algorithm for clustering is designed because role of clustering is very indispensable in spatial data mining process. Clustering methods are useful in various fields of human life such as GIS (Geographic Information System, GPS (Global Positioning System, weather forecasting, air traffic controller, water treatment, area selection, cost estimation, planning of rural and urban areas, remote sensing, and VLSI designing. This paper presents study of various clustering methods and algorithms and an improved algorithm of DBSCAN as IDBSCAN (Improved Density Based Spatial Clustering of Application of Noise. The algorithm is designed by addition of some important attributes which are responsible for generation of better clusters from existing data sets in comparison of other methods.

  18. DLTAP: A Network-efficient Scheduling Method for Distributed Deep Learning Workload in Containerized Cluster Environment

    Directory of Open Access Journals (Sweden)

    Qiao Wei

    2017-01-01

    Full Text Available Deep neural networks (DNNs have recently yielded strong results on a range of applications. Training these DNNs using a cluster of commodity machines is a promising approach since training is time consuming and compute-intensive. Furthermore, putting DNN tasks into containers of clusters would enable broader and easier deployment of DNN-based algorithms. Toward this end, this paper addresses the problem of scheduling DNN tasks in the containerized cluster environment. Efficiently scheduling data-parallel computation jobs like DNN over containerized clusters is critical for job performance, system throughput, and resource utilization. It becomes even more challenging with the complex workloads. We propose a scheduling method called Deep Learning Task Allocation Priority (DLTAP which performs scheduling decisions in a distributed manner, and each of scheduling decisions takes aggregation degree of parameter sever task and worker task into account, in particularly, to reduce cross-node network transmission traffic and, correspondingly, decrease the DNN training time. We evaluate the DLTAP scheduling method using a state-of-the-art distributed DNN training framework on 3 benchmarks. The results show that the proposed method can averagely reduce 12% cross-node network traffic, and decrease the DNN training time even with the cluster of low-end servers.

  19. AN EFFICIENT INITIALIZATION METHOD FOR K-MEANS CLUSTERING OF HYPERSPECTRAL DATA

    Directory of Open Access Journals (Sweden)

    A. Alizade Naeini

    2014-10-01

    Full Text Available K-means is definitely the most frequently used partitional clustering algorithm in the remote sensing community. Unfortunately due to its gradient decent nature, this algorithm is highly sensitive to the initial placement of cluster centers. This problem deteriorates for the high-dimensional data such as hyperspectral remotely sensed imagery. To tackle this problem, in this paper, the spectral signatures of the endmembers in the image scene are extracted and used as the initial positions of the cluster centers. For this purpose, in the first step, A Neyman–Pearson detection theory based eigen-thresholding method (i.e., the HFC method has been employed to estimate the number of endmembers in the image. Afterwards, the spectral signatures of the endmembers are obtained using the Minimum Volume Enclosing Simplex (MVES algorithm. Eventually, these spectral signatures are used to initialize the k-means clustering algorithm. The proposed method is implemented on a hyperspectral dataset acquired by ROSIS sensor with 103 spectral bands over the Pavia University campus, Italy. For comparative evaluation, two other commonly used initialization methods (i.e., Bradley & Fayyad (BF and Random methods are implemented and compared. The confusion matrix, overall accuracy and Kappa coefficient are employed to assess the methods’ performance. The evaluations demonstrate that the proposed solution outperforms the other initialization methods and can be applied for unsupervised classification of hyperspectral imagery for landcover mapping.

  20. A Cluster-based Approach Towards Detecting and Modeling Network Dictionary Attacks

    Directory of Open Access Journals (Sweden)

    A. Tajari Siahmarzkooh

    2016-12-01

    Full Text Available In this paper, we provide an approach to detect network dictionary attacks using a data set collected as flows based on which a clustered graph is resulted. These flows provide an aggregated view of the network traffic in which the exchanged packets in the network are considered so that more internally connected nodes would be clustered. We show that dictionary attacks could be detected through some parameters namely the number and the weight of clusters in time series and their evolution over the time. Additionally, the Markov model based on the average weight of clusters,will be also created. Finally, by means of our suggested model, we demonstrate that artificial clusters of the flows are created for normal and malicious traffic. The results of the proposed approach on CAIDA 2007 data set suggest a high accuracy for the model and, therefore, it provides a proper method for detecting the dictionary attack.

  1. An AK-LDMeans algorithm based on image clustering

    Science.gov (United States)

    Chen, Huimin; Li, Xingwei; Zhang, Yongbin; Chen, Nan

    2018-03-01

    Clustering is an effective analytical technique for handling unmarked data for value mining. Its ultimate goal is to mark unclassified data quickly and correctly. We use the roadmap for the current image processing as the experimental background. In this paper, we propose an AK-LDMeans algorithm to automatically lock the K value by designing the Kcost fold line, and then use the long-distance high-density method to select the clustering centers to further replace the traditional initial clustering center selection method, which further improves the efficiency and accuracy of the traditional K-Means Algorithm. And the experimental results are compared with the current clustering algorithm and the results are obtained. The algorithm can provide effective reference value in the fields of image processing, machine vision and data mining.

  2. ROC-based determination of the number of clusters for fMRI activation detection

    Science.gov (United States)

    Jahanian, Hesamoddin; Soltanian-Zadeh, Hamid; Hossein-Zadeh, Gholam A.; Siadat, Mohammad-Reza

    2004-05-01

    Fuzzy C-means (FCM), in spite of its potent advantages in exploratory analyze of functional magnetic resonance imaging (fMRI), suffers from limitations such as a priori determination of number of clusters, unknown statistical significance for the results, and instability of the results when it is applied on raw fMRI time series. Choosing different number of clusters, or thresholding the membership degree at different levels, lead to considerably different activation maps. However, research work for finding a standard index to determine the number of clusters has not yet succeeded. Using randomization, we developed a method to control false positive rate in FCM, which gives a meaningful statistical significance to the results. Making use of this novel method and an ROC-based cluster validity measure, we determined the optimal number of clusters. In this study, we applied the FCM on a feature space that takes the variability of hemodynamic response function into account (HRF-based feature space). The proposed method found the accurate number of clusters in simulated fMRI data. In addition, the proposed method generated excellent results for experimental fMRI data and showed a good reproducibility for determining the number of clusters.

  3. A Clustering-Based Automatic Transfer Function Design for Volume Visualization

    Directory of Open Access Journals (Sweden)

    Tianjin Zhang

    2016-01-01

    Full Text Available The two-dimensional transfer functions (TFs designed based on intensity-gradient magnitude (IGM histogram are effective tools for the visualization and exploration of 3D volume data. However, traditional design methods usually depend on multiple times of trial-and-error. We propose a novel method for the automatic generation of transfer functions by performing the affinity propagation (AP clustering algorithm on the IGM histogram. Compared with previous clustering algorithms that were employed in volume visualization, the AP clustering algorithm has much faster convergence speed and can achieve more accurate clustering results. In order to obtain meaningful clustering results, we introduce two similarity measurements: IGM similarity and spatial similarity. These two similarity measurements can effectively bring the voxels of the same tissue together and differentiate the voxels of different tissues so that the generated TFs can assign different optical properties to different tissues. Before performing the clustering algorithm on the IGM histogram, we propose to remove noisy voxels based on the spatial information of voxels. Our method does not require users to input the number of clusters, and the classification and visualization process is automatic and efficient. Experiments on various datasets demonstrate the effectiveness of the proposed method.

  4. Substructures in DAFT/FADA survey clusters based on XMM and optical data

    Science.gov (United States)

    Durret, F.; DAFT/FADA Team

    2014-07-01

    The DAFT/FADA survey was initiated to perform weak lensing tomography on a sample of 90 massive clusters in the redshift range [0.4,0.9] with HST imaging available. The complementary deep multiband imaging constitutes a high quality imaging data base for these clusters. In X-rays, we have analysed the XMM-Newton and/or Chandra data available for 32 clusters, and for 23 clusters we fit the X-ray emissivity with a beta-model and subtract it to search for substructures in the X-ray gas. This study was coupled with a dynamical analysis for the 18 clusters with at least 15 spectroscopic galaxy redshifts in the cluster range, based on a Serna & Gerbal (SG) analysis. We detected ten substructures in eight clusters by both methods (X-rays and SG). The percentage of mass included in substructures is found to be roughly constant with redshift, with values of 5-15%. Most of the substructures detected both in X-rays and with the SG method are found to be relatively recent infalls, probably at their first cluster pericenter approach.

  5. Recognition of genetically modified product based on affinity propagation clustering and terahertz spectroscopy

    Science.gov (United States)

    Liu, Jianjun; Kan, Jianquan

    2018-04-01

    In this paper, based on the terahertz spectrum, a new identification method of genetically modified material by support vector machine (SVM) based on affinity propagation clustering is proposed. This algorithm mainly uses affinity propagation clustering algorithm to make cluster analysis and labeling on unlabeled training samples, and in the iterative process, the existing SVM training data are continuously updated, when establishing the identification model, it does not need to manually label the training samples, thus, the error caused by the human labeled samples is reduced, and the identification accuracy of the model is greatly improved.

  6. Rough hypercuboid based supervised clustering of miRNAs.

    Science.gov (United States)

    Paul, Sushmita; Vera, Julio

    2015-07-01

    The microRNAs are small, endogenous non-coding RNAs found in plants, animals, and some viruses, which function in RNA silencing and post-transcriptional regulation of gene expression. It is suggested by various genome-wide studies that a substantial fraction of miRNA genes is likely to form clusters. The coherent expression of the miRNA clusters can then be used to classify samples according to the clinical outcome. In this regard, a new clustering algorithm, termed as rough hypercuboid based supervised attribute clustering (RH-SAC), is proposed to find such groups of miRNAs. The proposed algorithm is based on the theory of rough set, which directly incorporates the information of sample categories into the miRNA clustering process, generating a supervised clustering algorithm for miRNAs. The effectiveness of the new approach is demonstrated on several publicly available miRNA expression data sets using support vector machine. The so-called B.632+ bootstrap error estimate is used to minimize the variability and biasedness of the derived results. The association of the miRNA clusters to various biological pathways is also shown by doing pathway enrichment analysis.

  7. Simple method to calculate percolation, Ising and Potts clusters

    International Nuclear Information System (INIS)

    Tsallis, C.

    1981-01-01

    A procedure ('break-collapse method') is introduced which considerably simplifies the calculation of two - or multirooted clusters like those commonly appearing in real space renormalization group (RG) treatments of bond-percolation, and pure and random Ising and Potts problems. The method is illustrated through two applications for the q-state Potts ferromagnet. The first of them concerns a RG calculation of the critical exponent ν for the isotropic square lattice: numerical consistence is obtained (particularly for q→0) with den Nijs conjecture. The second application is a compact reformulation of the standard star-triangle and duality transformations which provide the exact critical temperature for the anisotropic triangular and honeycomb lattices. (Author) [pt

  8. Profiling physical activity motivation based on self-determination theory: a cluster analysis approach.

    Science.gov (United States)

    Friederichs, Stijn Ah; Bolman, Catherine; Oenema, Anke; Lechner, Lilian

    2015-01-01

    In order to promote physical activity uptake and maintenance in individuals who do not comply with physical activity guidelines, it is important to increase our understanding of physical activity motivation among this group. The present study aimed to examine motivational profiles in a large sample of adults who do not comply with physical activity guidelines. The sample for this study consisted of 2473 individuals (31.4% male; age 44.6 ± 12.9). In order to generate motivational profiles based on motivational regulation, a cluster analysis was conducted. One-way analyses of variance were then used to compare the clusters in terms of demographics, physical activity level, motivation to be active and subjective experience while being active. Three motivational clusters were derived based on motivational regulation scores: a low motivation cluster, a controlled motivation cluster and an autonomous motivation cluster. These clusters differed significantly from each other with respect to physical activity behavior, motivation to be active and subjective experience while being active. Overall, the autonomous motivation cluster displayed more favorable characteristics compared to the other two clusters. The results of this study provide additional support for the importance of autonomous motivation in the context of physical activity behavior. The three derived clusters may be relevant in the context of physical activity interventions as individuals within the different clusters might benefit most from different intervention approaches. In addition, this study shows that cluster analysis is a useful method for differentiating between motivational profiles in large groups of individuals who do not comply with physical activity guidelines.

  9. Open-Source Sequence Clustering Methods Improve the State Of the Art.

    Science.gov (United States)

    Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob

    2016-01-01

    Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http

  10. A robust approach based on Weibull distribution for clustering gene expression data

    Directory of Open Access Journals (Sweden)

    Gong Binsheng

    2011-05-01

    Full Text Available Abstract Background Clustering is a widely used technique for analysis of gene expression data. Most clustering methods group genes based on the distances, while few methods group genes according to the similarities of the distributions of the gene expression levels. Furthermore, as the biological annotation resources accumulated, an increasing number of genes have been annotated into functional categories. As a result, evaluating the performance of clustering methods in terms of the functional consistency of the resulting clusters is of great interest. Results In this paper, we proposed the WDCM (Weibull Distribution-based Clustering Method, a robust approach for clustering gene expression data, in which the gene expressions of individual genes are considered as the random variables following unique Weibull distributions. Our WDCM is based on the concept that the genes with similar expression profiles have similar distribution parameters, and thus the genes are clustered via the Weibull distribution parameters. We used the WDCM to cluster three cancer gene expression data sets from the lung cancer, B-cell follicular lymphoma and bladder carcinoma and obtained well-clustered results. We compared the performance of WDCM with k-means and Self Organizing Map (SOM using functional annotation information given by the Gene Ontology (GO. The results showed that the functional annotation ratios of WDCM are higher than those of the other methods. We also utilized the external measure Adjusted Rand Index to validate the performance of the WDCM. The comparative results demonstrate that the WDCM provides the better clustering performance compared to k-means and SOM algorithms. The merit of the proposed WDCM is that it can be applied to cluster incomplete gene expression data without imputing the missing values. Moreover, the robustness of WDCM is also evaluated on the incomplete data sets. Conclusions The results demonstrate that our WDCM produces clusters

  11. Bilingual Cluster Based Models for Statistical Machine Translation

    Science.gov (United States)

    Yamamoto, Hirofumi; Sumita, Eiichiro

    We propose a domain specific model for statistical machine translation. It is well-known that domain specific language models perform well in automatic speech recognition. We show that domain specific language and translation models also benefit statistical machine translation. However, there are two problems with using domain specific models. The first is the data sparseness problem. We employ an adaptation technique to overcome this problem. The second issue is domain prediction. In order to perform adaptation, the domain must be provided, however in many cases, the domain is not known or changes dynamically. For these cases, not only the translation target sentence but also the domain must be predicted. This paper focuses on the domain prediction problem for statistical machine translation. In the proposed method, a bilingual training corpus, is automatically clustered into sub-corpora. Each sub-corpus is deemed to be a domain. The domain of a source sentence is predicted by using its similarity to the sub-corpora. The predicted domain (sub-corpus) specific language and translation models are then used for the translation decoding. This approach gave an improvement of 2.7 in BLEU score on the IWSLT05 Japanese to English evaluation corpus (improving the score from 52.4 to 55.1). This is a substantial gain and indicates the validity of the proposed bilingual cluster based models.

  12. Evaluation of sliding baseline methods for spatial estimation for cluster detection in the biosurveillance system

    Directory of Open Access Journals (Sweden)

    Leuze Michael

    2009-07-01

    Full Text Available Abstract Background The Centers for Disease Control and Prevention's (CDC's BioSense system provides near-real time situational awareness for public health monitoring through analysis of electronic health data. Determination of anomalous spatial and temporal disease clusters is a crucial part of the daily disease monitoring task. Our study focused on finding useful anomalies at manageable alert rates according to available BioSense data history. Methods The study dataset included more than 3 years of daily counts of military outpatient clinic visits for respiratory and rash syndrome groupings. We applied four spatial estimation methods in implementations of space-time scan statistics cross-checked in Matlab and C. We compared the utility of these methods according to the resultant background cluster rate (a false alarm surrogate and sensitivity to injected cluster signals. The comparison runs used a spatial resolution based on the facility zip code in the patient record and a finer resolution based on the residence zip code. Results Simple estimation methods that account for day-of-week (DOW data patterns yielded a clear advantage both in background cluster rate and in signal sensitivity. A 28-day baseline gave the most robust results for this estimation; the preferred baseline is long enough to remove daily fluctuations but short enough to reflect recent disease trends and data representation. Background cluster rates were lower for the rash syndrome counts than for the respiratory counts, likely because of seasonality and the large scale of the respiratory counts. Conclusion The spatial estimation method should be chosen according to characteristics of the selected data streams. In this dataset with strong day-of-week effects, the overall best detection performance was achieved using subregion averages over a 28-day baseline stratified by weekday or weekend/holiday behavior. Changing the estimation method for particular scenarios involving

  13. DSN Beowulf Cluster-Based VLBI Correlator

    Science.gov (United States)

    Rogstad, Stephen P.; Jongeling, Andre P.; Finley, Susan G.; White, Leslie A.; Lanyi, Gabor E.; Clark, John E.; Goodhart, Charles E.

    2009-01-01

    The NASA Deep Space Network (DSN) requires a broadband VLBI (very long baseline interferometry) correlator to process data routinely taken as part of the VLBI source Catalogue Maintenance and Enhancement task (CAT M&E) and the Time and Earth Motion Precision Observations task (TEMPO). The data provided by these measurements are a crucial ingredient in the formation of precision deep-space navigation models. In addition, a VLBI correlator is needed to provide support for other VLBI related activities for both internal and external customers. The JPL VLBI Correlator (JVC) was designed, developed, and delivered to the DSN as a successor to the legacy Block II Correlator. The JVC is a full-capability VLBI correlator that uses software processes running on multiple computers to cross-correlate two-antenna broadband noise data. Components of this new system (see Figure 1) consist of Linux PCs integrated into a Beowulf Cluster, an existing Mark5 data storage system, a RAID array, an existing software correlator package (SoftC) originally developed for Delta DOR Navigation processing, and various custom- developed software processes and scripts. Parallel processing on the JVC is achieved by assigning slave nodes of the Beowulf cluster to process separate scans in parallel until all scans have been processed. Due to the single stream sequential playback of the Mark5 data, some ramp-up time is required before all nodes can have access to required scan data. Core functions of each processing step are accomplished using optimized C programs. The coordination and execution of these programs across the cluster is accomplished using Pearl scripts, PostgreSQL commands, and a handful of miscellaneous system utilities. Mark5 data modules are loaded on Mark5 Data systems playback units, one per station. Data processing is started when the operator scans the Mark5 systems and runs a script that reads various configuration files and then creates an experiment-dependent status database

  14. PBL - Problem Based Learning for Companies and Clusters

    Energy Technology Data Exchange (ETDEWEB)

    Hamburg, I; Vladut, G.

    2016-07-01

    Small and medium sized companies (SMEs) assure economic growth in Europe. Generally many SMEs are struggling to survive in an ongoing global recession and often they are becoming reluctant to release or pay for staff training. In this paper we present shortly the learning methods in SMEs particularly the Problem Based Learning (PBL) as an efficient form for SMEs and entrepreneurship education. In the field of Urban Logistics it was developed four Clusters with potential of innovation and research in four European Regions: Tuscany - Italy, Valencia - Spain, Lisbon and Tagus - Portugal, Oltenia – Romania. Training and mentoring for SMEs, are essential to create competitiveness. Information and communication technologies (ICT) support the tutors by using an ICT platform which is in the development. (Author)

  15. Detecting and extracting clusters in atom probe data: A simple, automated method using Voronoi cells

    International Nuclear Information System (INIS)

    Felfer, P.; Ceguerra, A.V.; Ringer, S.P.; Cairney, J.M.

    2015-01-01

    The analysis of the formation of clusters in solid solutions is one of the most common uses of atom probe tomography. Here, we present a method where we use the Voronoi tessellation of the solute atoms and its geometric dual, the Delaunay triangulation to test for spatial/chemical randomness of the solid solution as well as extracting the clusters themselves. We show how the parameters necessary for cluster extraction can be determined automatically, i.e. without user interaction, making it an ideal tool for the screening of datasets and the pre-filtering of structures for other spatial analysis techniques. Since the Voronoi volumes are closely related to atomic concentrations, the parameters resulting from this analysis can also be used for other concentration based methods such as iso-surfaces. - Highlights: • Cluster analysis of atom probe data can be significantly simplified by using the Voronoi cell volumes of the atomic distribution. • Concentration fields are defined on a single atomic basis using Voronoi cells. • All parameters for the analysis are determined by optimizing the separation probability of bulk atoms vs clustered atoms

  16. Detecting and extracting clusters in atom probe data: A simple, automated method using Voronoi cells

    Energy Technology Data Exchange (ETDEWEB)

    Felfer, P., E-mail: peter.felfer@sydney.edu.au [Australian Centre for Microscopy and Microanalysis, The University of Sydney, NSW 2006 (Australia); School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006 (Australia); Ceguerra, A.V., E-mail: anna.ceguerra@sydney.edu.au [Australian Centre for Microscopy and Microanalysis, The University of Sydney, NSW 2006 (Australia); School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006 (Australia); Ringer, S.P., E-mail: simon.ringer@sydney.edu.au [Australian Centre for Microscopy and Microanalysis, The University of Sydney, NSW 2006 (Australia); School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006 (Australia); Cairney, J.M., E-mail: julie.cairney@sydney.edu.au [Australian Centre for Microscopy and Microanalysis, The University of Sydney, NSW 2006 (Australia); School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006 (Australia)

    2015-03-15

    The analysis of the formation of clusters in solid solutions is one of the most common uses of atom probe tomography. Here, we present a method where we use the Voronoi tessellation of the solute atoms and its geometric dual, the Delaunay triangulation to test for spatial/chemical randomness of the solid solution as well as extracting the clusters themselves. We show how the parameters necessary for cluster extraction can be determined automatically, i.e. without user interaction, making it an ideal tool for the screening of datasets and the pre-filtering of structures for other spatial analysis techniques. Since the Voronoi volumes are closely related to atomic concentrations, the parameters resulting from this analysis can also be used for other concentration based methods such as iso-surfaces. - Highlights: • Cluster analysis of atom probe data can be significantly simplified by using the Voronoi cell volumes of the atomic distribution. • Concentration fields are defined on a single atomic basis using Voronoi cells. • All parameters for the analysis are determined by optimizing the separation probability of bulk atoms vs clustered atoms.

  17. Vertex finding by sparse model-based clustering

    Science.gov (United States)

    Frühwirth, R.; Eckstein, K.; Frühwirth-Schnatter, S.

    2016-10-01

    The application of sparse model-based clustering to the problem of primary vertex finding is discussed. The observed z-positions of the charged primary tracks in a bunch crossing are modeled by a Gaussian mixture. The mixture parameters are estimated via Markov Chain Monte Carlo (MCMC). Sparsity is achieved by an appropriate prior on the mixture weights. The results are shown and compared to clustering by the expectation-maximization (EM) algorithm.

  18. Cluster-based localization and tracking in ubiquitous computing systems

    CERN Document Server

    Martínez-de Dios, José Ramiro; Torres-González, Arturo; Ollero, Anibal

    2017-01-01

    Localization and tracking are key functionalities in ubiquitous computing systems and techniques. In recent years a very high variety of approaches, sensors and techniques for indoor and GPS-denied environments have been developed. This book briefly summarizes the current state of the art in localization and tracking in ubiquitous computing systems focusing on cluster-based schemes. Additionally, existing techniques for measurement integration, node inclusion/exclusion and cluster head selection are also described in this book.

  19. bcl::Cluster : A method for clustering biological molecules coupled with visualization in the Pymol Molecular Graphics System.

    Science.gov (United States)

    Alexander, Nathan; Woetzel, Nils; Meiler, Jens

    2011-02-01

    Clustering algorithms are used as data analysis tools in a wide variety of applications in Biology. Clustering has become especially important in protein structure prediction and virtual high throughput screening methods. In protein structure prediction, clustering is used to structure the conformational space of thousands of protein models. In virtual high throughput screening, databases with millions of drug-like molecules are organized by structural similarity, e.g. common scaffolds. The tree-like dendrogram structure obtained from hierarchical clustering can provide a qualitative overview of the results, which is important for focusing detailed analysis. However, in practice it is difficult to relate specific components of the dendrogram directly back to the objects of which it is comprised and to display all desired information within the two dimensions of the dendrogram. The current work presents a hierarchical agglomerative clustering method termed bcl::Cluster. bcl::Cluster utilizes the Pymol Molecular Graphics System to graphically depict dendrograms in three dimensions. This allows simultaneous display of relevant biological molecules as well as additional information about the clusters and the members comprising them.

  20. Two clusters of child molesters based on impulsiveness

    Directory of Open Access Journals (Sweden)

    Danilo A. Baltieri

    2015-06-01

    Full Text Available Objective:High impulsiveness is a general problem that affects most criminal offenders and is associated with greater recidivism risk. A cluster analysis of impulsiveness measured by the Barratt Impulsiveness Scale - Version 11 (BIS-11 was performed on a sample of hands-on child molesters.Methods:The sample consisted of 208 child molesters enrolled in two different sectional studies carried out in São Paulo, Brazil. Using three factors from the BIS-11, a k-means cluster analysis was performed using the average silhouette width to determine cluster number. Direct logistic regression was performed to analyze the association of criminological and clinical features with the resulting clusters.Results:Two clusters were delineated. The cluster characterized by higher impulsiveness showed higher scores on the Sexual Screening for Pedophilic Interests (SSPI, Static-99, and Sexual Addiction Screening Test.Conclusions:Given that child molesters are an extremely heterogeneous population, the “number of victims” item of the SSPI should call attention to those offenders with the highest motor, attentional, and non-planning impulsiveness. Our findings could have implications in terms of differences in therapeutic management for these two groups, with the most impulsive cluster benefitting from psychosocial strategies combined with pharmacological interventions.

  1. Authentication Based on Multilayer Clustering in Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    Suh Heyi-Sook

    2005-01-01

    Full Text Available In this paper, we describe a secure cluster-routing protocol based on a multilayer scheme in ad hoc networks. This work provides scalable, threshold authentication scheme in ad hoc networks. We present detailed security threats against ad hoc routing protocols, specifically examining cluster-based routing. Our proposed protocol, called "authentication based on multilayer clustering for ad hoc networks" (AMCAN, designs an end-to-end authentication protocol that relies on mutual trust between nodes in other clusters. The AMCAN strategy takes advantage of a multilayer architecture that is designed for an authentication protocol in a cluster head (CH using a new concept of control cluster head (CCH scheme. We propose an authentication protocol that uses certificates containing an asymmetric key and a multilayer architecture so that the CCH is achieved using the threshold scheme, thereby reducing the computational overhead and successfully defeating all identified attacks. We also use a more extensive area, such as a CCH, using an identification protocol to build a highly secure, highly available authentication service, which forms the core of our security framework.

  2. Evaluation of sliding baseline methods for spatial estimation for cluster detection in the biosurveillance system.

    Science.gov (United States)

    Xing, Jian; Burkom, Howard; Moniz, Linda; Edgerton, James; Leuze, Michael; Tokars, Jerome

    2009-07-17

    The Centers for Disease Control and Prevention's (CDC's) BioSense system provides near-real time situational awareness for public health monitoring through analysis of electronic health data. Determination of anomalous spatial and temporal disease clusters is a crucial part of the daily disease monitoring task. Our study focused on finding useful anomalies at manageable alert rates according to available BioSense data history. The study dataset included more than 3 years of daily counts of military outpatient clinic visits for respiratory and rash syndrome groupings. We applied four spatial estimation methods in implementations of space-time scan statistics cross-checked in Matlab and C. We compared the utility of these methods according to the resultant background cluster rate (a false alarm surrogate) and sensitivity to injected cluster signals. The comparison runs used a spatial resolution based on the facility zip code in the patient record and a finer resolution based on the residence zip code. Simple estimation methods that account for day-of-week (DOW) data patterns yielded a clear advantage both in background cluster rate and in signal sensitivity. A 28-day baseline gave the most robust results for this estimation; the preferred baseline is long enough to remove daily fluctuations but short enough to reflect recent disease trends and data representation. Background cluster rates were lower for the rash syndrome counts than for the respiratory counts, likely because of seasonality and the large scale of the respiratory counts. The spatial estimation method should be chosen according to characteristics of the selected data streams. In this dataset with strong day-of-week effects, the overall best detection performance was achieved using subregion averages over a 28-day baseline stratified by weekday or weekend/holiday behavior. Changing the estimation method for particular scenarios involving different spatial resolution or other syndromes can yield further

  3. Clustered iterative stochastic ensemble method for multi-modal calibration of subsurface flow models

    KAUST Repository

    Elsheikh, Ahmed H.

    2013-05-01

    A novel multi-modal parameter estimation algorithm is introduced. Parameter estimation is an ill-posed inverse problem that might admit many different solutions. This is attributed to the limited amount of measured data used to constrain the inverse problem. The proposed multi-modal model calibration algorithm uses an iterative stochastic ensemble method (ISEM) for parameter estimation. ISEM employs an ensemble of directional derivatives within a Gauss-Newton iteration for nonlinear parameter estimation. ISEM is augmented with a clustering step based on k-means algorithm to form sub-ensembles. These sub-ensembles are used to explore different parts of the search space. Clusters are updated at regular intervals of the algorithm to allow merging of close clusters approaching the same local minima. Numerical testing demonstrates the potential of the proposed algorithm in dealing with multi-modal nonlinear parameter estimation for subsurface flow models. © 2013 Elsevier B.V.

  4. An ant colony based resilience approach to cascading failures in cluster supply network

    Science.gov (United States)

    Wang, Yingcong; Xiao, Renbin

    2016-11-01

    Cluster supply chain network is a typical complex network and easily suffers cascading failures under disruption events, which is caused by the under-load of enterprises. Improving network resilience can increase the ability of recovery from cascading failures. Social resilience is found in ant colony and comes from ant's spatial fidelity zones (SFZ). Starting from the under-load failures, this paper proposes a resilience method to cascading failures in cluster supply chain network by leveraging on social resilience of ant colony. First, the mapping between ant colony SFZ and cluster supply chain network SFZ is presented. Second, a new cascading model for cluster supply chain network is constructed based on under-load failures. Then, the SFZ-based resilience method and index to cascading failures are developed according to ant colony's social resilience. Finally, a numerical simulation and a case study are used to verify the validity of the cascading model and the resilience method. Experimental results show that, the cluster supply chain network becomes resilient to cascading failures under the SFZ-based resilience method, and the cluster supply chain network resilience can be enhanced by improving the ability of enterprises to recover and adjust.

  5. CASSIS and SMIPS: promoter-based prediction of secondary metabolite gene clusters in eukaryotic genomes.

    Science.gov (United States)

    Wolf, Thomas; Shelest, Vladimir; Nath, Neetika; Shelest, Ekaterina

    2016-04-15

    Secondary metabolites (SM) are structurally diverse natural products of high pharmaceutical importance. Genes involved in their biosynthesis are often organized in clusters, i.e., are co-localized and co-expressed. In silico cluster prediction in eukaryotic genomes remains problematic mainly due to the high variability of the clusters' content and lack of other distinguishing sequence features. We present Cluster Assignment by Islands of Sites (CASSIS), a method for SM cluster prediction in eukaryotic genomes, and Secondary Metabolites by InterProScan (SMIPS), a tool for genome-wide detection of SM key enzymes ('anchor' genes): polyketide synthases, non-ribosomal peptide synthetases and dimethylallyl tryptophan synthases. Unlike other tools based on protein similarity, CASSIS exploits the idea of co-regulation of the cluster genes, which assumes the existence of common regulatory patterns in the cluster promoters. The method searches for 'islands' of enriched cluster-specific motifs in the vicinity of anchor genes. It was validated in a series of cross-validation experiments and showed high sensitivity and specificity. CASSIS and SMIPS are freely available at https://sbi.hki-jena.de/cassis thomas.wolf@leibniz-hki.de or ekaterina.shelest@leibniz-hki.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  6. A Novel Clustering-Based Feature Representation for the Classification of Hyperspectral Imagery

    Directory of Open Access Journals (Sweden)

    Qikai Lu

    2014-06-01

    Full Text Available In this study, a new clustering-based feature extraction algorithm is proposed for the spectral-spatial classification of hyperspectral imagery. The clustering approach is able to group the high-dimensional data into a subspace by mining the salient information and suppressing the redundant information. In this way, the relationship between neighboring pixels, which was hidden in the original data, can be extracted more effectively. Specifically, in the proposed algorithm, a two-step process is adopted to make use of the clustering-based information. A clustering approach is first used to produce the initial clustering map, and, subsequently, a multiscale cluster histogram (MCH is proposed to represent the spatial information around each pixel. In order to evaluate the robustness of the proposed MCH, four clustering techniques are employed to analyze the influence of the clustering methods. Meanwhile, the performance of the MCH is compared to three other widely used spatial features: the gray-level co-occurrence matrix (GLCM, the 3D wavelet texture, and differential morphological profiles (DMPs. The experiments conducted on four well-known hyperspectral datasets verify that the proposed MCH can significantly improve the classification accuracy, and it outperforms other commonly used spatial features.

  7. Internet2-based 3D PET image reconstruction using a PC cluster.

    Science.gov (United States)

    Shattuck, D W; Rapela, J; Asma, E; Chatzioannou, A; Qi, J; Leahy, R M

    2002-08-07

    We describe an approach to fast iterative reconstruction from fully three-dimensional (3D) PET data using a network of PentiumIII PCs configured as a Beowulf cluster. To facilitate the use of this system, we have developed a browser-based interface using Java. The system compresses PET data on the user's machine, sends these data over a network, and instructs the PC cluster to reconstruct the image. The cluster implements a parallelized version of our preconditioned conjugate gradient method for fully 3D MAP image reconstruction. We report on the speed-up factors using the Beowulf approach and the impacts of communication latencies in the local cluster network and the network connection between the user's machine and our PC cluster.

  8. Internet2-based 3D PET image reconstruction using a PC cluster

    International Nuclear Information System (INIS)

    Shattuck, D.W.; Rapela, J.; Asma, E.; Leahy, R.M.; Chatzioannou, A.; Qi, J.

    2002-01-01

    We describe an approach to fast iterative reconstruction from fully three-dimensional (3D) PET data using a network of PentiumIII PCs configured as a Beowulf cluster. To facilitate the use of this system, we have developed a browser-based interface using Java. The system compresses PET data on the user's machine, sends these data over a network, and instructs the PC cluster to reconstruct the image. The cluster implements a parallelized version of our preconditioned conjugate gradient method for fully 3D MAP image reconstruction. We report on the speed-up factors using the Beowulf approach and the impacts of communication latencies in the local cluster network and the network connection between the user's machine and our PC cluster. (author)

  9. A novel clustering algorithm based on quantum games

    International Nuclear Information System (INIS)

    Li Qiang; He Yan; Jiang Jingping

    2009-01-01

    Enormous successes have been made by quantum algorithms during the last decade. In this paper, we combine the quantum game with the problem of data clustering, and then develop a quantum-game-based clustering algorithm, in which data points in a dataset are considered as players who can make decisions and implement quantum strategies in quantum games. After each round of a quantum game, each player's expected payoff is calculated. Later, he uses a link-removing-and-rewiring (LRR) function to change his neighbors and adjust the strength of links connecting to them in order to maximize his payoff. Further, algorithms are discussed and analyzed in two cases of strategies, two payoff matrixes and two LRR functions. Consequently, the simulation results have demonstrated that data points in datasets are clustered reasonably and efficiently, and the clustering algorithms have fast rates of convergence. Moreover, the comparison with other algorithms also provides an indication of the effectiveness of the proposed approach.

  10. Random Walk Quantum Clustering Algorithm Based on Space

    Science.gov (United States)

    Xiao, Shufen; Dong, Yumin; Ma, Hongyang

    2018-01-01

    In the random quantum walk, which is a quantum simulation of the classical walk, data points interacted when selecting the appropriate walk strategy by taking advantage of quantum-entanglement features; thus, the results obtained when the quantum walk is used are different from those when the classical walk is adopted. A new quantum walk clustering algorithm based on space is proposed by applying the quantum walk to clustering analysis. In this algorithm, data points are viewed as walking participants, and similar data points are clustered using the walk function in the pay-off matrix according to a certain rule. The walk process is simplified by implementing a space-combining rule. The proposed algorithm is validated by a simulation test and is proved superior to existing clustering algorithms, namely, Kmeans, PCA + Kmeans, and LDA-Km. The effects of some of the parameters in the proposed algorithm on its performance are also analyzed and discussed. Specific suggestions are provided.

  11. WebGimm: An integrated web-based platform for cluster analysis, functional analysis, and interactive visualization of results

    Directory of Open Access Journals (Sweden)

    Medvedovic Mario

    2011-01-01

    Full Text Available Abstract Cluster analysis methods have been extensively researched, but the adoption of new methods is often hindered by technical barriers in their implementation and use. WebGimm is a free cluster analysis web-service, and an open source general purpose clustering web-server infrastructure designed to facilitate easy deployment of integrated cluster analysis servers based on clustering and functional annotation algorithms implemented in R. Integrated functional analyses and interactive browsing of both, clustering structure and functional annotations provides a complete analytical environment for cluster analysis and interpretation of results. The Java Web Start client-based interface is modeled after the familiar cluster/treeview packages making its use intuitive to a wide array of biomedical researchers. For biomedical researchers, WebGimm provides an avenue to access state of the art clustering procedures. For Bioinformatics methods developers, WebGimm offers a convenient avenue to deploy their newly developed clustering methods. WebGimm server, software and manuals can be freely accessed at http://ClusterAnalysis.org/.

  12. WebGimm: An integrated web-based platform for cluster analysis, functional analysis, and interactive visualization of results.

    Science.gov (United States)

    Joshi, Vineet K; Freudenberg, Johannes M; Hu, Zhen; Medvedovic, Mario

    2011-01-17

    Cluster analysis methods have been extensively researched, but the adoption of new methods is often hindered by technical barriers in their implementation and use. WebGimm is a free cluster analysis web-service, and an open source general purpose clustering web-server infrastructure designed to facilitate easy deployment of integrated cluster analysis servers based on clustering and functional annotation algorithms implemented in R. Integrated functional analyses and interactive browsing of both, clustering structure and functional annotations provides a complete analytical environment for cluster analysis and interpretation of results. The Java Web Start client-based interface is modeled after the familiar cluster/treeview packages making its use intuitive to a wide array of biomedical researchers. For biomedical researchers, WebGimm provides an avenue to access state of the art clustering procedures. For Bioinformatics methods developers, WebGimm offers a convenient avenue to deploy their newly developed clustering methods. WebGimm server, software and manuals can be freely accessed at http://ClusterAnalysis.org/.

  13. Clustering Multiple Sclerosis Subgroups with Multifractal Methods and Self-Organizing Map Algorithm

    Science.gov (United States)

    Karaca, Yeliz; Cattani, Carlo

    Magnetic resonance imaging (MRI) is the most sensitive method to detect chronic nervous system diseases such as multiple sclerosis (MS). In this paper, Brownian motion Hölder regularity functions (polynomial, periodic (sine), exponential) for 2D image, such as multifractal methods were applied to MR brain images, aiming to easily identify distressed regions, in MS patients. With these regions, we have proposed an MS classification based on the multifractal method by using the Self-Organizing Map (SOM) algorithm. Thus, we obtained a cluster analysis by identifying pixels from distressed regions in MR images through multifractal methods and by diagnosing subgroups of MS patients through artificial neural networks.

  14. The Views of Turkish Pre-Service Teachers about Effectiveness of Cluster Method as a Teaching Writing Method

    Science.gov (United States)

    Kitis, Emine; Türkel, Ali

    2017-01-01

    The aim of this study is to find out Turkish pre-service teachers' views on effectiveness of cluster method as a writing teaching method. The Cluster Method can be defined as a connotative creative writing method. The way the method works is that the person who brainstorms on connotations of a word or a concept in abscence of any kind of…

  15. Base motif recognition and design of DNA templates for fluorescent silver clusters by machine learning.

    Science.gov (United States)

    Copp, Stacy M; Bogdanov, Petko; Debord, Mark; Singh, Ambuj; Gwinn, Elisabeth

    2014-09-03

    Discriminative base motifs within DNA templates for fluorescent silver clusters are identified using methods that combine large experimental data sets with machine learning tools for pattern recognition. Combining the discovery of certain multibase motifs important for determining fluorescence brightness with a generative algorithm, the probability of selecting DNA templates that stabilize fluorescent silver clusters is increased by a factor of >3. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. clusters

    Indian Academy of Sciences (India)

    2017-09-27

    Sep 27, 2017 ... while CuCoNO, Co3NO, Cu3CoNO, Cu2Co3NO, Cu3Co3NO and Cu6CoNO clusters display stronger chemical stability. Magnetic and electronic properties are also discussed. The magnetic moment is affected by charge transfer and the spd hybridization. Keywords. CumConNO (m + n = 2–7) clusters; ...

  17. A Comparison Study of Validity Indices on Swarm-Intelligence-Based Clustering.

    Science.gov (United States)

    Rui Xu; Jie Xu; Wunsch, D C

    2012-08-01

    Swarm intelligence has emerged as a worthwhile class of clustering methods due to its convenient implementation, parallel capability, ability to avoid local minima, and other advantages. In such applications, clustering validity indices usually operate as fitness functions to evaluate the qualities of the obtained clusters. However, as the validity indices are usually data dependent and are designed to address certain types of data, the selection of different indices as the fitness functions may critically affect cluster quality. Here, we compare the performances of eight well-known and widely used clustering validity indices, namely, the Caliński-Harabasz index, the CS index, the Davies-Bouldin index, the Dunn index with two of its generalized versions, the I index, and the silhouette statistic index, on both synthetic and real data sets in the framework of differential-evolution-particle-swarm-optimization (DEPSO)-based clustering. DEPSO is a hybrid evolutionary algorithm of the stochastic optimization approach (differential evolution) and the swarm intelligence method (particle swarm optimization) that further increases the search capability and achieves higher flexibility in exploring the problem space. According to the experimental results, we find that the silhouette statistic index stands out in most of the data sets that we examined. Meanwhile, we suggest that users reach their conclusions not just based on only one index, but after considering the results of several indices to achieve reliable clustering structures.

  18. Implementation of K-Means Clustering Method for Electronic Learning Model

    Science.gov (United States)

    Latipa Sari, Herlina; Suranti Mrs., Dewi; Natalia Zulita, Leni

    2017-12-01

    Teaching and Learning process at SMK Negeri 2 Bengkulu Tengah has applied e-learning system for teachers and students. The e-learning was based on the classification of normative, productive, and adaptive subjects. SMK Negeri 2 Bengkulu Tengah consisted of 394 students and 60 teachers with 16 subjects. The record of e-learning database was used in this research to observe students’ activity pattern in attending class. K-Means algorithm in this research was used to classify students’ learning activities using e-learning, so that it was obtained cluster of students’ activity and improvement of student’s ability. Implementation of K-Means Clustering method for electronic learning model at SMK Negeri 2 Bengkulu Tengah was conducted by observing 10 students’ activities, namely participation of students in the classroom, submit assignment, view assignment, add discussion, view discussion, add comment, download course materials, view article, view test, and submit test. In the e-learning model, the testing was conducted toward 10 students that yielded 2 clusters of membership data (C1 and C2). Cluster 1: with membership percentage of 70% and it consisted of 6 members, namely 1112438 Anggi Julian, 1112439 Anis Maulita, 1112441 Ardi Febriansyah, 1112452 Berlian Sinurat, 1112460 Dewi Anugrah Anwar and 1112467 Eka Tri Oktavia Sari. Cluster 2:with membership percentage of 30% and it consisted of 4 members, namely 1112463 Dosita Afriyani, 1112471 Erda Novita, 1112474 Eskardi and 1112477 Fachrur Rozi.

  19. Voxel-based clustered imaging by multiparameter diffusion tensor images for glioma grading.

    Science.gov (United States)

    Inano, Rika; Oishi, Naoya; Kunieda, Takeharu; Arakawa, Yoshiki; Yamao, Yukihiro; Shibata, Sumiya; Kikuchi, Takayuki; Fukuyama, Hidenao; Miyamoto, Susumu

    2014-01-01

    Gliomas are the most common intra-axial primary brain tumour; therefore, predicting glioma grade would influence therapeutic strategies. Although several methods based on single or multiple parameters from diagnostic images exist, a definitive method for pre-operatively determining glioma grade remains unknown. We aimed to develop an unsupervised method using multiple parameters from pre-operative diffusion tensor images for obtaining a clustered image that could enable visual grading of gliomas. Fourteen patients with low-grade gliomas and 19 with high-grade gliomas underwent diffusion tensor imaging and three-dimensional T1-weighted magnetic resonance imaging before tumour resection. Seven features including diffusion-weighted imaging, fractional anisotropy, first eigenvalue, second eigenvalue, third eigenvalue, mean diffusivity and raw T2 signal with no diffusion weighting, were extracted as multiple parameters from diffusion tensor imaging. We developed a two-level clustering approach for a self-organizing map followed by the K-means algorithm to enable unsupervised clustering of a large number of input vectors with the seven features for the whole brain. The vectors were grouped by the self-organizing map as protoclusters, which were classified into the smaller number of clusters by K-means to make a voxel-based diffusion tensor-based clustered image. Furthermore, we also determined if the diffusion tensor-based clustered image was really helpful for predicting pre-operative glioma grade in a supervised manner. The ratio of each class in the diffusion tensor-based clustered images was calculated from the regions of interest manually traced on the diffusion tensor imaging space, and the common logarithmic ratio scales were calculated. We then applied support vector machine as a classifier for distinguishing between low- and high-grade gliomas. Consequently, the sensitivity, specificity, accuracy and area under the curve of receiver operating characteristic

  20. CLUE: cluster-based retrieval of images by unsupervised learning.

    Science.gov (United States)

    Chen, Yixin; Wang, James Z; Krovetz, Robert

    2005-08-01

    In a typical content-based image retrieval (CBIR) system, target images (images in the database) are sorted by feature similarities with respect to the query. Similarities among target images are usually ignored. This paper introduces a new technique, cluster-based retrieval of images by unsupervised learning (CLUE), for improving user interaction with image retrieval systems by fully exploiting the similarity information. CLUE retrieves image clusters by applying a graph-theoretic clustering algorithm to a collection of images in the vicinity of the query. Clustering in CLUE is dynamic. In particular, clusters formed depend on which images are retrieved in response to the query. CLUE can be combined with any real-valued symmetric similarity measure (metric or nonmetric). Thus, it may be embedded in many current CBIR systems, including relevance feedback systems. The performance of an experimental image retrieval system using CLUE is evaluated on a database of around 60,000 images from COREL. Empirical results demonstrate improved performance compared with a CBIR system using the same image similarity measure. In addition, results on images returned by Google's Image Search reveal the potential of applying CLUE to real-world image data and integrating CLUE as a part of the interface for keyword-based image retrieval systems.

  1. A Hybrid Image Filtering Method for Computer-Aided Detection of Microcalcification Clusters in Mammograms

    OpenAIRE

    Zhang, Xiaoyong; Homma, Noriyasu; Goto, Shotaro; Kawasumi, Yosuke; Ishibashi, Tadashi; Abe, Makoto; Sugita, Norihiro; Yoshizawa, Makoto

    2013-01-01

    The presence of microcalcification clusters (MCs) in mammogram is a major indicator of breast cancer. Detection of an MC is one of the key issues for breast cancer control. In this paper, we present a highly accurate method based on a morphological image processing and wavelet transform technique to detect the MCs in mammograms. The microcalcifications are firstly enhanced by using multistructure elements morphological processing. Then, the candidates of microcalcifications are refined by a m...

  2. Carbon based nanostructures: diamond clusters structured with nanotubes

    Directory of Open Access Journals (Sweden)

    O.A. Shenderova

    2003-01-01

    Full Text Available Feasibility of designing composites from carbon nanotubes and nanodiamond clusters is discussed based on atomistic simulations. Depending on nanotube size and morphology, some types of open nanotubes can be chemically connected with different facets of diamond clusters. The geometrical relation between different types of nanotubes and different diamond facets for construction of mechanically stable composites with all bonds saturated is summarized. Potential applications of the suggested nanostructures are briefly discussed based on the calculations of their electronic properties using environment dependent self-consistent tight-binding approach.

  3. Fuzzy ensemble clustering based on random projections for DNA microarray data analysis.

    Science.gov (United States)

    Avogadri, Roberto; Valentini, Giorgio

    2009-01-01

    Two major problems related the unsupervised analysis of gene expression data are represented by the accuracy and reliability of the discovered clusters, and by the biological fact that the boundaries between classes of patients or classes of functionally related genes are sometimes not clearly defined. The main goal of this work consists in the exploration of new strategies and in the development of new clustering methods to improve the accuracy and robustness of clustering results, taking into account the uncertainty underlying the assignment of examples to clusters in the context of gene expression data analysis. We propose a fuzzy ensemble clustering approach both to improve the accuracy of clustering results and to take into account the inherent fuzziness of biological and bio-medical gene expression data. We applied random projections that obey the Johnson-Lindenstrauss lemma to obtain several instances of lower dimensional gene expression data from the original high-dimensional ones, approximately preserving the information and the metric structure of the original data. Then we adopt a double fuzzy approach to obtain a consensus ensemble clustering, by first applying a fuzzy k-means algorithm to the different instances of the projected low-dimensional data and then by using a fuzzy t-norm to combine the multiple clusterings. Several variants of the fuzzy ensemble clustering algorithms are proposed, according to different techniques to combine the base clusterings and to obtain the final consensus clustering. We applied our proposed fuzzy ensemble methods to the gene expression analysis of leukemia, lymphoma, adenocarcinoma and melanoma patients, and we compared the results with other state of the art ensemble methods. Results show that in some cases, taking into account the natural fuzziness of the data, we can improve the discovery of classes of patients defined at bio-molecular level. The reduction of the dimension of the data, achieved through random

  4. Clustering Based Approximation in Facial Image Retrieval

    OpenAIRE

    R.Pitchaiah

    2016-01-01

    The web search tool returns a great many pictures positioned by the essential words separated from the encompassing content. Existing article acknowledgment systems to prepare characterization models from human-named preparing pictures or endeavor to deduce the connection/probabilities in the middle of pictures and commented magic words. Albeit proficient in supporting in mining comparatively looking facial picture results utilizing feebly named ones, the learning phase of above bunch based c...

  5. Efficient Clustering for Irregular Geometries Based on Identification of Concavities

    Directory of Open Access Journals (Sweden)

    Velázquez-Villegas Fernando

    2014-04-01

    Full Text Available Two dimensional clustering problem has much relevance in applications related to the efficient use of raw material, such as cutting stock, packing, etc. This is a very complex problem in which multiple bodies are accommodated efficiently in a way that they occupy as little space as possible. The complexity of the problem increases with the complexity of the bodies. Clearly the number of possible arrangements between bodies is huge. No Fit Polygon (NFP allows to determine the entire relative positions between two patterns (regular or irregular in contact, non-overlapping, therefore the best position can be selected. However, NFP generation requires a lot of calculations; besides, selecting the best cluster isn’t a simple task because, between two irregular patterns in contact, hollows (unusable areas and external concavities (usable areas can be produced. This work presents a quick and simple method to reduce calculations associated with NFP generation and to minimize unusable areas in a cluster. This method consists of generating partial NFP, just on concave regions of the patterns, and selecting the best cluster using a total weighted efficiency, i.e. a weighted value of enclosure efficiency (ratio of occupied area on convex hull area and hollow efficiency (ratio of occupied area on cluster area. The proposed method produces similar results as those obtained by other methods; however the shape of the clusters obtained allows to accommodate more parts in similar spaces, which is a desirable result when it comes to optimizing the use of material. We present two examples to show the performance of the proposal.

  6. Subtypes of autism by cluster analysis based on structural MRI data.

    Science.gov (United States)

    Hrdlicka, Michal; Dudova, Iva; Beranova, Irena; Lisy, Jiri; Belsan, Tomas; Neuwirth, Jiri; Komarek, Vladimir; Faladova, Ludvika; Havlovicova, Marketa; Sedlacek, Zdenek; Blatny, Marek; Urbanek, Tomas

    2005-05-01

    The aim of our study was to subcategorize Autistic Spectrum Disorders (ASD) using a multidisciplinary approach. Sixty four autistic patients (mean age 9.4+/-5.6 years) were entered into a cluster analysis. The clustering analysis was based on MRI data. The clusters obtained did not differ significantly in the overall severity of autistic symptomatology as measured by the total score on the Childhood Autism Rating Scale (CARS). The clusters could be characterized as showing significant differences: Cluster 1: showed the largest sizes of the genu and splenium of the corpus callosum (CC), the lowest pregnancy order and the lowest frequency of facial dysmorphic features. Cluster 2: showed the largest sizes of the amygdala and hippocampus (HPC), the least abnormal visual response on the CARS, the lowest frequency of epilepsy and the least frequent abnormal psychomotor development during the first year of life. Cluster 3: showed the largest sizes of the caput of the nucleus caudatus (NC), the smallest sizes of the HPC and facial dysmorphic features were always present. Cluster 4: showed the smallest sizes of the genu and splenium of the CC, as well as the amygdala, and caput of the NC, the most abnormal visual response on the CARS, the highest frequency of epilepsy, the highest pregnancy order, abnormal psychomotor development during the first year of life was always present and facial dysmorphic features were always present. This multidisciplinary approach seems to be a promising method for subtyping autism.

  7. Transient identification by clustering based on Integrated Deterministic and Probabilistic Safety Analysis outcomes

    International Nuclear Information System (INIS)

    Di Maio, Francesco; Vagnoli, Matteo; Zio, Enrico

    2016-01-01

    Highlights: • We develop an Integrated Deterministic and Probabilistic Safety Analysis (IDPSA). • We present a transient identification approach for retrieving IDPSA scenarios information. • We post-process the IDPSA scenarios for clustering Prime Implicants and Near Misses. • The approach is useful for an on-line cluster assignment of an unknown developing scenario. • We apply the approach to the accidental scenarios of a dynamic Steam Generator of a NPP. - Abstract: In this work, we present a transient identification approach that utilizes clustering for retrieving scenarios information from an Integrated Deterministic and Probabilistic Safety Analysis (IDPSA). The approach requires: (i) creation of a database of scenarios by IDPSA; (ii) scenario post-processing for clustering Prime Implicants (PIs), i.e., minimum combinations of failure events that are capable of leading the system into a fault state, and Near Misses, i.e., combinations of failure events that lead the system to a quasi-fault state; (iii) on-line cluster assignment of an unknown developing scenario. In the step (ii), we adopt a visual interactive method and risk-based clustering to identify PIs and Near Misses, respectively; in the on-line step (iii), to assign a scenario to a cluster we consider the sequence of events in the scenario and evaluate the Hamming similarity to the sequences of the previously clustered scenarios. The feasibility of the analysis is shown with respect to the accidental scenarios of a dynamic Steam Generator (SG) of a NPP.

  8. Quantum cluster variational method and message passing algorithms revisited

    Science.gov (United States)

    Domínguez, E.; Mulet, Roberto

    2018-02-01

    We present a general framework to study quantum disordered systems in the context of the Kikuchi's cluster variational method (CVM). The method relies in the solution of message passing-like equations for single instances or in the iterative solution of complex population dynamic algorithms for an average case scenario. We first show how a standard application of the Kikuchi's CVM can be easily translated to message passing equations for specific instances of the disordered system. We then present an "ad hoc" extension of these equations to a population dynamic algorithm representing an average case scenario. At the Bethe level, these equations are equivalent to the dynamic population equations that can be derived from a proper cavity ansatz. However, at the plaquette approximation, the interpretation is more subtle and we discuss it taking also into account previous results in classical disordered models. Moreover, we develop a formalism to properly deal with the average case scenario using a replica-symmetric ansatz within this CVM for quantum disordered systems. Finally, we present and discuss numerical solutions of the different approximations for the quantum transverse Ising model and the quantum random field Ising model in two-dimensional lattices.

  9. Variable Selection in Model-based Clustering: A General Variable Role Modeling

    OpenAIRE

    Maugis, Cathy; Celeux, Gilles; Martin-Magniette, Marie-Laure

    2008-01-01

    The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. We propose a more versatile variable selection model which describes three possible roles for each variable: The relevant clustering variables, the irrelevant clustering variables dependent on a part of the relevant clustering variables and the irrelevant clustering variables totally indepe...

  10. Use of a modified cluster sampling method to perform rapid needs assessment after Hurricane Andrew.

    Science.gov (United States)

    Hlady, W G; Quenemoen, L E; Armenia-Cope, R R; Hurt, K J; Malilay, J; Noji, E K; Wurm, G

    1994-04-01

    To rapidly obtain population-based estimates of needs in the early aftermath of Hurricane Andrew in South Florida. We used a modified cluster-sampling method (the Expanded Programme on Immunization [EPI] method) for three surveys. We selected a systematic sample of 30 quarter-mile square clusters for each survey and, beginning from a random start, interviewed members of seven consecutive occupied households in each cluster. Two surveys were of the most affected area (1990 population, 32,672) at three and ten days after the hurricane struck; one survey was of a less affected area (1990 population, 15,576) seven days after the hurricane struck. Results were available within 24 hours of beginning each survey. Initial findings emphasized the need for restoring utilities and sanitation and helped to focus medical relief on primary care and preventive services. The second survey of the most affected area showed improvement in the availability of food, water, electricity, and sanitation (P < or = .05). There was no evidence of disease outbreaks. For the first time, the EPI method provided population-based information to guide and evaluate relief operations after a sudden-impact natural disaster. An improvement over previous approaches, the EPI method warrants further evaluation as a needs assessment tool in acute disasters.

  11. a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis

    Science.gov (United States)

    Huang, W.; Li, S.; Xu, S.

    2016-06-01

    How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the

  12. A THREE-STEP SPATIAL-TEMPORAL-SEMANTIC CLUSTERING METHOD FOR HUMAN ACTIVITY PATTERN ANALYSIS

    Directory of Open Access Journals (Sweden)

    W. Huang

    2016-06-01

    Full Text Available How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time to four dimensions (space, time and semantics. More specifically, not only a location and time that people stay and spend are collected, but also what people “say” for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The

  13. Metabolomic-based identification of clusters that reflect dietary patterns.

    Science.gov (United States)

    Gibbons, Helena; Carr, Eibhlin; McNulty, Breige A; Nugent, Anne P; Walton, Janette; Flynn, Albert; Gibney, Michael J; Brennan, Lorraine

    2017-10-01

    Classification of subjects into dietary patterns generally relies on self-reporting dietary data which are prone to error. The aim of the present study was to develop a model for objective classification of people into dietary patterns based on metabolomic data. Dietary and urinary metabolomic data from the National Adult Nutrition Survey (NANS) was used in the analysis (n = 567). Two-step cluster analysis was applied to the urinary data to identify clusters. The subsequent model was used in an independent cohort to classify people into dietary patterns. Two distinct dietary patterns were identified. Cluster 1 was characterized by significantly higher intakes of breakfast cereals, low fat and skimmed milks, potatoes, fruit, fish and fish dishes (p patterns based on metabolomics data. Future applications of this approach could be developed for rapid and objective assignment of subjects into dietary patterns. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. Progressive Amalgamation of Building Clusters for Map Generalization Based on Scaling Subgroups

    Directory of Open Access Journals (Sweden)

    Xianjin He

    2018-03-01

    Full Text Available Map generalization utilizes transformation operations to derive smaller-scale maps from larger-scale maps, and is a key procedure for the modelling and understanding of geographic space. Studies to date have largely applied a fixed tolerance to aggregate clustered buildings into a single object, resulting in the loss of details that meet cartographic constraints and may be of importance for users. This study aims to develop a method that amalgamates clustered buildings gradually without significant modification of geometry, while preserving the map details as much as possible under cartographic constraints. The amalgamation process consists of three key steps. First, individual buildings are grouped into distinct clusters by using the graph-based spatial clustering application with random forest (GSCARF method. Second, building clusters are decomposed into scaling subgroups according to homogeneity with regard to the mean distance of subgroups. Thus, hierarchies of building clusters can be derived based on scaling subgroups. Finally, an amalgamation operation is progressively performed from the bottom-level subgroups to the top-level subgroups using the maximum distance of each subgroup as the amalgamating tolerance instead of using a fixed tolerance. As a consequence of this step, generalized intermediate scaling results are available, which can form the multi-scale representation of buildings. The experimental results show that the proposed method can generate amalgams with correct details, statistical area balance and orthogonal shape while satisfying cartographic constraints (e.g., minimum distance and minimum area.

  15. Threshold selection for classification of MR brain images by clustering method

    International Nuclear Information System (INIS)

    Moldovanu, Simona; Obreja, Cristian; Moraru, Luminita

    2015-01-01

    Given a grey-intensity image, our method detects the optimal threshold for a suitable binarization of MR brain images. In MR brain image processing, the grey levels of pixels belonging to the object are not substantially different from the grey levels belonging to the background. Threshold optimization is an effective tool to separate objects from the background and further, in classification applications. This paper gives a detailed investigation on the selection of thresholds. Our method does not use the well-known method for binarization. Instead, we perform a simple threshold optimization which, in turn, will allow the best classification of the analyzed images into healthy and multiple sclerosis disease. The dissimilarity (or the distance between classes) has been established using the clustering method based on dendrograms. We tested our method using two classes of images: the first consists of 20 T2-weighted and 20 proton density PD-weighted scans from two healthy subjects and from two patients with multiple sclerosis. For each image and for each threshold, the number of the white pixels (or the area of white objects in binary image) has been determined. These pixel numbers represent the objects in clustering operation. The following optimum threshold values are obtained, T = 80 for PD images and T = 30 for T2w images. Each mentioned threshold separate clearly the clusters that belonging of the studied groups, healthy patient and multiple sclerosis disease

  16. Health state evaluation of shield tunnel SHM using fuzzy cluster method

    Science.gov (United States)

    Zhou, Fa; Zhang, Wei; Sun, Ke; Shi, Bin

    2015-04-01

    Shield tunnel SHM is in the path of rapid development currently while massive monitoring data processing and quantitative health grading remain a real challenge, since multiple sensors belonging to different types are employed in SHM system. This paper addressed the fuzzy cluster method based on fuzzy equivalence relationship for the health evaluation of shield tunnel SHM. The method was optimized by exporting the FSV map to automatically generate the threshold value. A new holistic health score(HHS) was proposed and its effectiveness was validated by conducting a pilot test. A case study on Nanjing Yangtze River Tunnel was presented to apply this method. Three types of indicators, namely soil pressure, pore pressure and steel strain, were used to develop the evaluation set U. The clustering results were verified by analyzing the engineering geological conditions; the applicability and validity of the proposed method was also demonstrated. Besides, the advantage of multi-factor evaluation over single-factor model was discussed by using the proposed HHS. This investigation indicated the fuzzy cluster method and HHS is capable of characterizing the fuzziness of tunnel health, and it is beneficial to clarify the tunnel health evaluation uncertainties.

  17. Threshold selection for classification of MR brain images by clustering method

    Energy Technology Data Exchange (ETDEWEB)

    Moldovanu, Simona [Faculty of Sciences and Environment, Department of Chemistry, Physics and Environment, Dunărea de Jos University of Galaţi, 47 Domnească St., 800008, Romania, Phone: +40 236 460 780 (Romania); Dumitru Moţoc High School, 15 Milcov St., 800509, Galaţi (Romania); Obreja, Cristian; Moraru, Luminita, E-mail: luminita.moraru@ugal.ro [Faculty of Sciences and Environment, Department of Chemistry, Physics and Environment, Dunărea de Jos University of Galaţi, 47 Domnească St., 800008, Romania, Phone: +40 236 460 780 (Romania)

    2015-12-07

    Given a grey-intensity image, our method detects the optimal threshold for a suitable binarization of MR brain images. In MR brain image processing, the grey levels of pixels belonging to the object are not substantially different from the grey levels belonging to the background. Threshold optimization is an effective tool to separate objects from the background and further, in classification applications. This paper gives a detailed investigation on the selection of thresholds. Our method does not use the well-known method for binarization. Instead, we perform a simple threshold optimization which, in turn, will allow the best classification of the analyzed images into healthy and multiple sclerosis disease. The dissimilarity (or the distance between classes) has been established using the clustering method based on dendrograms. We tested our method using two classes of images: the first consists of 20 T2-weighted and 20 proton density PD-weighted scans from two healthy subjects and from two patients with multiple sclerosis. For each image and for each threshold, the number of the white pixels (or the area of white objects in binary image) has been determined. These pixel numbers represent the objects in clustering operation. The following optimum threshold values are obtained, T = 80 for PD images and T = 30 for T2w images. Each mentioned threshold separate clearly the clusters that belonging of the studied groups, healthy patient and multiple sclerosis disease.

  18. Developing cluster strategy of apples dodol SMEs by integration K-means clustering and analytical hierarchy process method

    Science.gov (United States)

    Mustaniroh, S. A.; Effendi, U.; Silalahi, R. L. R.; Sari, T.; Ala, M.

    2018-03-01

    The purposes of this research were to determine the grouping of apples dodol small and medium enterprises (SMEs) in Batu City and to determine an appropriate development strategy for each cluster. The methods used for clustering SMEs was k-means. The Analytical Hierarchy Process (AHP) approach was then applied to determine the development strategy priority for each cluster. The variables used in grouping include production capacity per month, length of operation, investment value, average sales revenue per month, amount of SMEs assets, and the number of workers. Several factors were considered in AHP include industry cluster, government, as well as related and supporting industries. Data was collected using the methods of questionaire and interviews. SMEs respondents were selected among SMEs appels dodol in Batu City using purposive sampling. The result showed that two clusters were formed from five apples dodol SMEs. The 1stcluster of apples dodol SMEs, classified as small enterprises, included SME A, SME C, and SME D. The 2ndcluster of SMEs apples dodol, classified as medium enterprises, consisted of SME B and SME E. The AHP results indicated that the priority development strategy for the 1stcluster of apples dodol SMEs was improving quality and the product standardisation, while for the 2nd cluster was increasing the marketing access.

  19. Clusters in the biopharmaceutical industry: toward a new method of analysis.

    Science.gov (United States)

    Erden, Zeynep; von Krogh, Georg

    2011-05-01

    Clusters are groups of co-located and interconnected firms and institutions linked by commonalities in their strategies and complementarities in their activities and resources. There are several reasons for the geographical clustering of firms in the biopharmaceutical industry. This review unpacks some advantages and disadvantages of cluster participation, and proposes a new method to enable managers and researchers to identify clusters in the biopharmaceutical industry. Copyright © 2011 Elsevier Ltd. All rights reserved.

  20. A Cooperative Learning-Based Clustering Approach to Lip Segmentation Without Knowing Segment Number.

    Science.gov (United States)

    Cheung, Yiu-Ming; Li, Meng; Peng, Qinmu; Chen, C L Philip

    2017-01-01

    It is usually hard to predetermine the true number of segments in lip segmentation. This paper, therefore, presents a clustering-based approach to lip segmentation without knowing the true segment number. The objective function in the proposed approach is a variant of the partition entropy (PE) and features that the coincident cluster centroids in pattern space can be equivalently substituted by one centroid with the function value unchanged. It is shown that the minimum of the proposed objective function can be reached provided that: 1) the number of positions occupied by cluster centroids in pattern space is equal to the true number of clusters and 2) these positions are coincident with the optimal cluster centroids obtained under PE criterion. In implementation, we first randomly initialize the clusters provided that the number of clusters is greater than or equal to the ground truth. Then, an iterative algorithm is utilized to minimize the proposed objective function. For each iterative step, not only is the winner, i.e., the centroid with the maximum membership degree, updated to adapt to the corresponding input data, but also the other centroids are adjusted with a specific cooperation strength, so that they are each close to the winner. Subsequently, the initial overpartition will be gradually faded out with the redundant centroids superposed over the convergence of the algorithm. Based upon the proposed algorithm, we present a lip segmentation scheme. Empirical studies have shown its efficacy in comparison with the existing methods.

  1. cluster

    Indian Academy of Sciences (India)

    has been investigated electrochemically in positive and negative microenvironments, both in solution and in film. Charge nature around the active centre ... in plants, bacteria and also in mammals. This cluster is also an important constituent of a ..... selection of non-cysteine amino acid in the active centre of Rieske proteins.

  2. Multispectral image compression algorithm based on spectral clustering and wavelet transform

    Science.gov (United States)

    Huang, Rong; Qiao, Weidong; Yang, Jianfeng; Wang, Hong; Xue, Bin; Tao, Jinyou

    2017-11-01

    In this paper, a method based on spectral clustering and the discrete wavelet transform (DWT) is proposed, which is based on the problem of the high degree of space-time redundancy in the current multispectral image compression algorithm. First, the spectral images are grouped by spectral clustering methods, and the clusters of similar heights are grouped together to remove the redundancy of the spectra. Then, wavelet transform and coding of the class representative are performed, and the space redundancy is eliminated, and the difference composition is applied to the Karhunen-Loeve transform (KLT) and wavelet transform. Experimental results show that with JPEG2000 and upon KLT + DWT algorithm, compared with the method has better peak signal-to-noise ratio and compression ratio, and it is suitable for compression of different spectral bands.

  3. Problem decomposition by mutual information and force-based clustering

    Science.gov (United States)

    Otero, Richard Edward

    The scale of engineering problems has sharply increased over the last twenty years. Larger coupled systems, increasing complexity, and limited resources create a need for methods that automatically decompose problems into manageable sub-problems by discovering and leveraging problem structure. The ability to learn the coupling (inter-dependence) structure and reorganize the original problem could lead to large reductions in the time to analyze complex problems. Such decomposition methods could also provide engineering insight on the fundamental physics driving problem solution. This work forwards the current state of the art in engineering decomposition through the application of techniques originally developed within computer science and information theory. The work describes the current state of automatic problem decomposition in engineering and utilizes several promising ideas to advance the state of the practice. Mutual information is a novel metric for data dependence and works on both continuous and discrete data. Mutual information can measure both the linear and non-linear dependence between variables without the limitations of linear dependence measured through covariance. Mutual information is also able to handle data that does not have derivative information, unlike other metrics that require it. The value of mutual information to engineering design work is demonstrated on a planetary entry problem. This study utilizes a novel tool developed in this work for planetary entry system synthesis. A graphical method, force-based clustering, is used to discover related sub-graph structure as a function of problem structure and links ranked by their mutual information. This method does not require the stochastic use of neural networks and could be used with any link ranking method currently utilized in the field. Application of this method is demonstrated on a large, coupled low-thrust trajectory problem. Mutual information also serves as the basis for an

  4. Fuzzy C-means method with empirical mode decomposition for clustering microarray data.

    Science.gov (United States)

    Wang, Yan-Fei; Yu, Zu-Guo; Anh, Vo

    2013-01-01

    Microarray techniques have revolutionised genomic research by making it possible to monitor the expression of thousands of genes in parallel. The Fuzzy C-Means (FCM) method is an efficient clustering approach devised for microarray data analysis. However, microarray data contains noise, which would affect clustering results. In this paper, we propose to combine the FCM method with the Empirical Mode Decomposition (EMD) for clustering microarray data to reduce the effect of the noise. The results suggest the clustering structures of denoised microarray data are more reasonable and genes have tighter association with their clusters than those using FCM only.

  5. Nonuniform Sparse Data Clustering Cascade Algorithm Based on Dynamic Cumulative Entropy

    Directory of Open Access Journals (Sweden)

    Ning Li

    2016-01-01

    Full Text Available A small amount of prior knowledge and randomly chosen initial cluster centers have a direct impact on the accuracy of the performance of iterative clustering algorithm. In this paper we propose a new algorithm to compute initial cluster centers for k-means clustering and the best number of the clusters with little prior knowledge and optimize clustering result. It constructs the Euclidean distance control factor based on aggregation density sparse degree to select the initial cluster center of nonuniform sparse data and obtains initial data clusters by multidimensional diffusion density distribution. Multiobjective clustering approach based on dynamic cumulative entropy is adopted to optimize the initial data clusters and the best number of the clusters. The experimental results show that the newly proposed algorithm has good performance to obtain the initial cluster centers for the k-means algorithm and it effectively improves the clustering accuracy of nonuniform sparse data by about 5%.

  6. Research on retailer data clustering algorithm based on Spark

    Science.gov (United States)

    Huang, Qiuman; Zhou, Feng

    2017-03-01

    Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.

  7. The Cluster Variation Method: A Primer for Neuroscientists

    Directory of Open Access Journals (Sweden)

    Alianna J. Maren

    2016-09-01

    Full Text Available Effective Brain–Computer Interfaces (BCIs require that the time-varying activation patterns of 2-D neural ensembles be modelled. The cluster variation method (CVM offers a means for the characterization of 2-D local pattern distributions. This paper provides neuroscientists and BCI researchers with a CVM tutorial that will help them to understand how the CVM statistical thermodynamics formulation can model 2-D pattern distributions expressing structural and functional dynamics in the brain. The premise is that local-in-time free energy minimization works alongside neural connectivity adaptation, supporting the development and stabilization of consistent stimulus-specific responsive activation patterns. The equilibrium distribution of local patterns, or configuration variables, is defined in terms of a single interaction enthalpy parameter (h for the case of an equiprobable distribution of bistate (neural/neural ensemble units. Thus, either one enthalpy parameter (or two, for the case of non-equiprobable distribution yields equilibrium configuration variable values. Modeling 2-D neural activation distribution patterns with the representational layer of a computational engine, we can thus correlate variational free energy minimization with specific configuration variable distributions. The CVM triplet configuration variables also map well to the notion of a M = 3 functional motif. This paper addresses the special case of an equiprobable unit distribution, for which an analytic solution can be found.

  8. Design of the optimum insulator gate bipolar transistor using response surface method with cluster analysis

    CERN Document Server

    Wang, Chi Ling; Huang Sy Ruen; Yeh Chao Yu

    2004-01-01

    In this paper, a statistical methodology that can be used for the optimization of the Insulator Gate Bipolar Transistor (IGBT) devices is proposed. This is achieved by integrating the response surface method (RSM) with cluster analysis, weighted composite method and genetic algorithm (GA). The device characteristic of IGBT was simulated based upon the fabrication simulator, ATHENA, and the device simulator, ATLAS. This methodology, yielded another way to investigate the IGBT device and to make a decision in the tradeoff between the breakdown voltage and the on-resistance. In this methodology, we also show how to use cluster analysis to determine the dominant factors that are not visible in the screening of all experiments. 20 Refs.

  9. Renormalization group method for predicting frequency clusters in a chain of nearest-neighbor Kuramoto oscillators.

    Science.gov (United States)

    Kogan, Oleg; Refael, Gil; Cross, Michael; Rogers, Jeffrey

    2008-03-01

    We develop a renormalization group (RG) method to predict frequency clusters and their statistical properties in a 1-dimensional chain of nearest-neighbor coupled Kuramoto oscillators. The intrinsic frequencies and couplings are random numbers chosen from a distribution. The method is designed to work in the regime of strong randomness, where the distribution of intrinsic frequencies and couplings has long tails. Two types of decimation steps are possible: elimination of oscillators with exceptionally large frequency and renormalization of two oscillators bonded by a very large coupling into a single one. Based on these steps, we perform a numerical RG calculation. The oscillators in the renormalized chain correspond to frequency clusters. We compare the RG results with those obtained directly from the numerical solution of the chain's equations of motion.

  10. Effective Social Relationship Measurement and Cluster Based Routing in Mobile Opportunistic Networks †

    Science.gov (United States)

    Zeng, Feng; Zhao, Nan; Li, Wenjia

    2017-01-01

    In mobile opportunistic networks, the social relationship among nodes has an important impact on data transmission efficiency. Motivated by the strong share ability of “circles of friends” in communication networks such as Facebook, Twitter, Wechat and so on, we take a real-life example to show that social relationships among nodes consist of explicit and implicit parts. The explicit part comes from direct contact among nodes, and the implicit part can be measured through the “circles of friends”. We present the definitions of explicit and implicit social relationships between two nodes, adaptive weights of explicit and implicit parts are given according to the contact feature of nodes, and the distributed mechanism is designed to construct the “circles of friends” of nodes, which is used for the calculation of the implicit part of social relationship between nodes. Based on effective measurement of social relationships, we propose a social-based clustering and routing scheme, in which each node selects the nodes with close social relationships to form a local cluster, and the self-control method is used to keep all cluster members always having close relationships with each other. A cluster-based message forwarding mechanism is designed for opportunistic routing, in which each node only forwards the copy of the message to nodes with the destination node as a member of the local cluster. Simulation results show that the proposed social-based clustering and routing outperforms the other classic routing algorithms. PMID:28498309

  11. Effective Social Relationship Measurement and Cluster Based Routing in Mobile Opportunistic Networks.

    Science.gov (United States)

    Zeng, Feng; Zhao, Nan; Li, Wenjia

    2017-05-12

    In mobile opportunistic networks, the social relationship among nodes has an important impact on data transmission efficiency. Motivated by the strong share ability of "circles of friends" in communication networks such as Facebook, Twitter, Wechat and so on, we take a real-life example to show that social relationships among nodes consist of explicit and implicit parts. The explicit part comes from direct contact among nodes, and the implicit part can be measured through the "circles of friends". We present the definitions of explicit and implicit social relationships between two nodes, adaptive weights of explicit and implicit parts are given according to the contact feature of nodes, and the distributed mechanism is designed to construct the "circles of friends" of nodes, which is used for the calculation of the implicit part of social relationship between nodes. Based on effective measurement of social relationships, we propose a social-based clustering and routing scheme, in which each node selects the nodes with close social relationships to form a local cluster, and the self-control method is used to keep all cluster members always having close relationships with each other. A cluster-based message forwarding mechanism is designed for opportunistic routing, in which each node only forwards the copy of the message to nodes with the destination node as a member of the local cluster. Simulation results show that the proposed social-based clustering and routing outperforms the other classic routing algorithms.

  12. Clustering method for counting passengers getting in a bus with single camera

    Science.gov (United States)

    Yang, Tao; Zhang, Yanning; Shao, Dapei; Li, Ying

    2010-03-01

    Automatic counting of passengers is very important for both business and security applications. We present a single-camera-based vision system that is able to count passengers in a highly crowded situation at the entrance of a traffic bus. The unique characteristics of the proposed system include, First, a novel feature-point-tracking- and online clustering-based passenger counting framework, which performs much better than those of background-modeling-and foreground-blob-tracking-based methods. Second, a simple and highly accurate clustering algorithm is developed that projects the high-dimensional feature point trajectories into a 2-D feature space by their appearance and disappearance times and counts the number of people through online clustering. Finally, all test video sequences in the experiment are captured from a real traffic bus in Shanghai, China. The results show that the system can process two 320×240 video sequences at a frame rate of 25 fps simultaneously, and can count passengers reliably in various difficult scenarios with complex interaction and occlusion among people. The method achieves high accuracy rates up to 96.5%.

  13. Information bottleneck based incremental fuzzy clustering for large biomedical data.

    Science.gov (United States)

    Liu, Yongli; Wan, Xing

    2016-08-01

    Incremental fuzzy clustering combines advantages of fuzzy clustering and incremental clustering, and therefore is important in classifying large biomedical literature. Conventional algorithms, suffering from data sparsity and high-dimensionality, often fail to produce reasonable results and may even assign all the objects to a single cluster. In this paper, we propose two incremental algorithms based on information bottleneck, Single-Pass fuzzy c-means (spFCM-IB) and Online fuzzy c-means (oFCM-IB). These two algorithms modify conventional algorithms by considering different weights for each centroid and object and scoring mutual information loss to measure the distance between centroids and objects. spFCM-IB and oFCM-IB are used to group a collection of biomedical text abstracts from Medline database. Experimental results show that clustering performances of our approaches are better than such prominent counterparts as spFCM, spHFCM, oFCM and oHFCM, in terms of accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. GENERALISED MODEL BASED CONFIDENCE INTERVALS IN TWO STAGE CLUSTER SAMPLING

    Directory of Open Access Journals (Sweden)

    Christopher Ouma Onyango

    2010-09-01

    Full Text Available Chambers and Dorfman (2002 constructed bootstrap confidence intervals in model based estimation for finite population totals assuming that auxiliary values are available throughout a target population and that the auxiliary values are independent. They also assumed that the cluster sizes are known throughout the target population. We now extend to two stage sampling in which the cluster sizes are known only for the sampled clusters, and we therefore predict the unobserved part of the population total. Jan and Elinor (2008 have done similar work, but unlike them, we use a general model, in which the auxiliary values are not necessarily independent. We demonstrate that the asymptotic properties of our proposed estimator and its coverage rates are better than those constructed under the model assisted local polynomial regression model.

  15. VANET Clustering Based Routing Protocol Suitable for Deserts

    Science.gov (United States)

    Mohammed Nasr, Mohammed Mohsen; Abdelgader, Abdeldime Mohamed Salih; Wang, Zhi-Gong; Shen, Lian-Feng

    2016-01-01

    In recent years, there has emerged applications of vehicular ad hoc networks (VANETs) towards security, safety, rescue, exploration, military and communication redundancy systems in non-populated areas, besides its ordinary use in urban environments as an essential part of intelligent transportation systems (ITS). This paper proposes a novel algorithm for the process of organizing a cluster structure and cluster head election (CHE) suitable for VANETs. Moreover, it presents a robust clustering-based routing protocol, which is appropriate for deserts and can achieve high communication efficiency, ensuring reliable information delivery and optimal exploitation of the equipment on each vehicle. A comprehensive simulation is conducted to evaluate the performance of the proposed CHE and routing algorithms. PMID:27058539

  16. Automatic detection of arterial input function in dynamic contrast enhanced MRI based on affinity propagation clustering.

    Science.gov (United States)

    Shi, Lin; Wang, Defeng; Liu, Wen; Fang, Kui; Wang, Yi-Xiang J; Huang, Wenhua; King, Ann D; Heng, Pheng Ann; Ahuja, Anil T

    2014-05-01

    To automatically and robustly detect the arterial input function (AIF) with high detection accuracy and low computational cost in dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). In this study, we developed an automatic AIF detection method using an accelerated version (Fast-AP) of affinity propagation (AP) clustering. The validity of this Fast-AP-based method was proved on two DCE-MRI datasets, i.e., rat kidney and human head and neck. The detailed AIF detection performance of this proposed method was assessed in comparison with other clustering-based methods, namely original AP and K-means, as well as the manual AIF detection method. Both the automatic AP- and Fast-AP-based methods achieved satisfactory AIF detection accuracy, but the computational cost of Fast-AP could be reduced by 64.37-92.10% on rat dataset and 73.18-90.18% on human dataset compared with the cost of AP. The K-means yielded the lowest computational cost, but resulted in the lowest AIF detection accuracy. The experimental results demonstrated that both the AP- and Fast-AP-based methods were insensitive to the initialization of cluster centers, and had superior robustness compared with K-means method. The Fast-AP-based method enables automatic AIF detection with high accuracy and efficiency. Copyright © 2013 Wiley Periodicals, Inc.

  17. A study of several CAD methods for classification of clustered microcalcifications

    Science.gov (United States)

    Wei, Liyang; Yang, Yongyi; Nishikawa, Robert M.; Jiang, Yulei

    2005-04-01

    In this paper we investigate several state-of-the-art machine-learning methods for automated classification of clustered microcalcifications (MCs), aimed to assisting radiologists for more accurate diagnosis of breast cancer in a computer-aided diagnosis (CADx) scheme. The methods we consider include: support vector machine (SVM), kernel Fisher discriminant (KFD), and committee machines (ensemble averaging and AdaBoost), most of which have been developed recently in statistical learning theory. We formulate differentiation of malignant from benign MCs as a supervised learning problem, and apply these learning methods to develop the classification algorithms. As input, these methods use image features automatically extracted from clustered MCs. We test these methods using a database of 697 clinical mammograms from 386 cases, which include a wide spectrum of difficult-to-classify cases. We use receiver operating characteristic (ROC) analysis to evaluate and compare the classification performance by the different methods. In addition, we also investigate how to combine information from multiple-view mammograms of the same case so that the best decision can be made by a classifier. In our experiments, the kernel-based methods (i.e., SVM, KFD) yield the best performance, significantly outperforming a well-established CADx approach based on neural network learning.

  18. Conveyor Performance based on Motor DC 12 Volt Eg-530ad-2f using K-Means Clustering

    Science.gov (United States)

    Arifin, Zaenal; Artini, Sri DP; Much Ibnu Subroto, Imam

    2017-04-01

    To produce goods in industry, a controlled tool to improve production is required. Separation process has become a part of production process. Separation process is carried out based on certain criteria to get optimum result. By knowing the characteristics performance of a controlled tools in separation process the optimum results is also possible to be obtained. Clustering analysis is popular method for clustering data into smaller segments. Clustering analysis is useful to divide a group of object into a k-group in which the member value of the group is homogeny or similar. Similarity in the group is set based on certain criteria. The work in this paper based on K-Means method to conduct clustering of loading in the performance of a conveyor driven by a dc motor 12 volt eg-530-2f. This technique gives a complete clustering data for a prototype of conveyor driven by dc motor to separate goods in term of height. The parameters involved are voltage, current, time of travelling. These parameters give two clusters namely optimal cluster with center of cluster 10.50 volt, 0.3 Ampere, 10.58 second, and unoptimal cluster with center of cluster 10.88 volt, 0.28 Ampere and 40.43 second.

  19. An improved clustering algorithm based on reverse learning in intelligent transportation

    Science.gov (United States)

    Qiu, Guoqing; Kou, Qianqian; Niu, Ting

    2017-05-01

    With the development of artificial intelligence and data mining technology, big data has gradually entered people's field of vision. In the process of dealing with large data, clustering is an important processing method. By introducing the reverse learning method in the clustering process of PAM clustering algorithm, to further improve the limitations of one-time clustering in unsupervised clustering learning, and increase the diversity of clustering clusters, so as to improve the quality of clustering. The algorithm analysis and experimental results show that the algorithm is feasible.

  20. A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles.

    Science.gov (United States)

    Zhang, Lin; Meng, Jia; Liu, Hui; Huang, Yufei

    2012-01-01

    DNA methylation occurs in the context of a CpG dinucleotide. It is an important epigenetic modification, which can be inherited through cell division. The two major types of methylation include hypomethylation and hypermethylation. Unique methylation patterns have been shown to exist in diseases including various types of cancer. DNA methylation analysis promises to become a powerful tool in cancer diagnosis, treatment and prognostication. Large-scale methylation arrays are now available for studying methylation genome-wide. The Illumina methylation platform simultaneously measures cytosine methylation at more than 1500 CpG sites associated with over 800 cancer-related genes. Cluster analysis is often used to identify DNA methylation subgroups for prognosis and diagnosis. However, due to the unique non-Gaussian characteristics, traditional clustering methods may not be appropriate for DNA and methylation data, and the determination of optimal cluster number is still problematic. A Dirichlet process beta mixture model (DPBMM) is proposed that models the DNA methylation expressions as an infinite number of beta mixture distribution. The model allows automatic learning of the relevant parameters such as the cluster mixing proportion, the parameters of beta distribution for each cluster, and especially the number of potential clusters. Since the model is high dimensional and analytically intractable, we proposed a Gibbs sampling "no-gaps" solution for computing the posterior distributions, hence the estimates of the parameters. The proposed algorithm was tested on simulated data as well as methylation data from 55 Glioblastoma multiform (GBM) brain tissue samples. To reduce the computational burden due to the high data dimensionality, a dimension reduction method is adopted. The two GBM clusters yielded by DPBMM are based on data of different number of loci (P-value < 0.1), while hierarchical clustering cannot yield statistically significant clusters.

  1. Management of Energy Consumption on Cluster Based Routing Protocol for MANET

    Science.gov (United States)

    Hosseini-Seno, Seyed-Amin; Wan, Tat-Chee; Budiarto, Rahmat; Yamada, Masashi

    The usage of light-weight mobile devices is increasing rapidly, leading to demand for more telecommunication services. Consequently, mobile ad hoc networks and their applications have become feasible with the proliferation of light-weight mobile devices. Many protocols have been developed to handle service discovery and routing in ad hoc networks. However, the majority of them did not consider one critical aspect of this type of network, which is the limited of available energy in each node. Cluster Based Routing Protocol (CBRP) is a robust/scalable routing protocol for Mobile Ad hoc Networks (MANETs) and superior to existing protocols such as Ad hoc On-demand Distance Vector (AODV) in terms of throughput and overhead. Therefore, based on this strength, methods to increase the efficiency of energy usage are incorporated into CBRP in this work. In order to increase the stability (in term of life-time) of the network and to decrease the energy consumption of inter-cluster gateway nodes, an Enhanced Gateway Cluster Based Routing Protocol (EGCBRP) is proposed. Three methods have been introduced by EGCBRP as enhancements to the CBRP: improving the election of cluster Heads (CHs) in CBRP which is based on the maximum available energy level, implementing load balancing for inter-cluster traffic using multiple gateways, and implementing sleep state for gateway nodes to further save the energy. Furthermore, we propose an Energy Efficient Cluster Based Routing Protocol (EECBRP) which extends the EGCBRP sleep state concept into all idle member nodes, excluding the active nodes in all clusters. The experiment results show that the EGCBRP decreases the overall energy consumption of the gateway nodes up to 10% and the EECBRP reduces the energy consumption of the member nodes up to 60%, both of which in turn contribute to stabilizing the network.

  2. Cerebrospinal fluid image segmentation using spatial fuzzy clustering method with improved evolutionary Expectation Maximization.

    Science.gov (United States)

    Abdullah, Afnizanfaizal; Hirayama, Akihiro; Yatsushiro, Satoshi; Matsumae, Mitsunori; Kuroda, Kagayaki

    2013-01-01

    Visualization of cerebrospinal fluid (CSF), that flow in the brain and spinal cord, plays an important role to detect neurodegenerative diseases such as Alzheimer's disease. This is performed by measuring the substantial changes in the CSF flow dynamics, volume and/or pressure gradient. Magnetic resonance imaging (MRI) technique has become a prominent tool to quantitatively measure these changes and image segmentation method has been widely used to distinguish the CSF flows from the brain tissues. However, this is often hampered by the presence of partial volume effect in the images. In this paper, a new hybrid evolutionary spatial fuzzy clustering method is introduced to overcome the partial volume effect in the MRI images. The proposed method incorporates Expectation Maximization (EM) method, which is improved by the evolutionary operations of the Genetic Algorithm (GA) to differentiate the CSF from the brain tissues. The proposed improvement is incorporated into a spatial-based fuzzy clustering (SFCM) method to improve segmentation of the boundary curve of the CSF and the brain tissues. The proposed method was validated using MRI images of Alzheimer's disease patient. The results presented that the proposed method is capable to filter the CSF regions from the brain tissues more effectively compared to the standard EM, FCM, and SFCM methods.

  3. Atomic and electronic structure of clusters from car-Parrinello method

    International Nuclear Information System (INIS)

    Kumar, V.

    1994-06-01

    With the development of ab-initio molecular dynamics method, it has now become possible to study the static and dynamical properties of clusters containing up to a few tens of atoms. Here I present a review of the method within the framework of the density functional theory and pseudopotential approach to represent the electron-ion interaction and discuss some of its applications to clusters. Particular attention is focussed on the structure and bonding properties of clusters as a function of their size. Applications to clusters of alkali metals and Al, non-metal - metal transition in divalent metal clusters, molecular clusters of carbon and Sb are discussed in detail. Some results are also presented on mixed clusters. (author). 121 refs, 24 ifigs

  4. Unsupervised active learning based on hierarchical graph-theoretic clustering.

    Science.gov (United States)

    Hu, Weiming; Hu, Wei; Xie, Nianhua; Maybank, Steve

    2009-10-01

    Most existing active learning approaches are supervised. Supervised active learning has the following problems: inefficiency in dealing with the semantic gap between the distribution of samples in the feature space and their labels, lack of ability in selecting new samples that belong to new categories that have not yet appeared in the training samples, and lack of adaptability to changes in the semantic interpretation of sample categories. To tackle these problems, we propose an unsupervised active learning framework based on hierarchical graph-theoretic clustering. In the framework, two promising graph-theoretic clustering algorithms, namely, dominant-set clustering and spectral clustering, are combined in a hierarchical fashion. Our framework has some advantages, such as ease of implementation, flexibility in architecture, and adaptability to changes in the labeling. Evaluations on data sets for network intrusion detection, image classification, and video classification have demonstrated that our active learning framework can effectively reduce the workload of manual classification while maintaining a high accuracy of automatic classification. It is shown that, overall, our framework outperforms the support-vector-machine-based supervised active learning, particularly in terms of dealing much more efficiently with new samples whose categories have not yet appeared in the training samples.

  5. Different Methods of Forming Cold Fronts in Nonmerging Clusters

    Science.gov (United States)

    Dupke, Renato; White, Raymond E., III; Bregman, Joel N.

    2007-12-01

    Sharp edges in X-ray surface brightness with continuous gas pressure called cold fronts have often been found in relaxed galaxy clusters such as Abell 496. Models that explain cold fronts as surviving cores of head-on subcluster mergers do not work well for these clusters, and competing models involving gas sloshing have been recently proposed. Here, we test some concrete predictions of these models in a combined analysis of density, temperature, metal abundances, and abundance ratios in a deep Chandra exposure of Abell 496. We confirm that the chemical discontinuities found in this cluster are not consistent with a core merger remnant scenario. However, we find chemical gradients across a spiral ``arm'' discovered at 73 kpc north of the cluster center and coincident with the sharp edge of the main cold front in the cluster. Despite the overall SN Ia iron mass fraction dominance found within the cooling radius of this cluster, the metal enrichment along the arm, determined from silicon and iron abundances, is consistent with a lower SN Ia iron mass fraction (51%+/-14%) than that measured in the surrounding regions (85%+/-14%). The ``arm'' is also significantly colder than the surroundings by 0.5-1.6 keV. The arm extends from a boxy colder region surrounding the center of the cluster, where two other cold fronts are found. This cold arm is a prediction of current high resolution numerical simulations as a result of an off-center encounter with a less massive pure dark matter halo, and we suggest that the cold fronts in A496 provide the first clear corroboration of such model, where the closest encounter happened ~0.5 Gyr ago. We also argue for a possible candidate dark matter halo responsible for the cold fronts in the outskirts of A496.

  6. Risk Assessment for Bridges Safety Management during Operation Based on Fuzzy Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Xia Hanyu

    2016-01-01

    Full Text Available In recent years, large span and large sea-crossing bridges are built, bridges accidents caused by improper operational management occur frequently. In order to explore the better methods for risk assessment of the bridges operation departments, the method based on fuzzy clustering algorithm is selected. Then, the implementation steps of fuzzy clustering algorithm are described, the risk evaluation system is built, and Taizhou Bridge is selected as an example, the quantitation of risk factors is described. After that, the clustering algorithm based on fuzzy equivalence is calculated on MATLAB 2010a. In the last, Taizhou Bridge operation management departments are classified and sorted according to the degree of risk, and the safety situation of operation departments is analyzed.

  7. 3D Building Models Segmentation Based on K-Means++ Cluster Analysis

    Science.gov (United States)

    Zhang, C.; Mao, B.

    2016-10-01

    3D mesh model segmentation is drawing increasing attentions from digital geometry processing field in recent years. The original 3D mesh model need to be divided into separate meaningful parts or surface patches based on certain standards to support reconstruction, compressing, texture mapping, model retrieval and etc. Therefore, segmentation is a key problem for 3D mesh model segmentation. In this paper, we propose a method to segment Collada (a type of mesh model) 3D building models into meaningful parts using cluster analysis. Common clustering methods segment 3D mesh models by K-means, whose performance heavily depends on randomized initial seed points (i.e., centroid) and different randomized centroid can get quite different results. Therefore, we improved the existing method and used K-means++ clustering algorithm to solve this problem. Our experiments show that K-means++ improves both the speed and the accuracy of K-means, and achieve good and meaningful results.

  8. 3D BUILDING MODELS SEGMENTATION BASED ON K-MEANS++ CLUSTER ANALYSIS

    Directory of Open Access Journals (Sweden)

    C. Zhang

    2016-10-01

    Full Text Available 3D mesh model segmentation is drawing increasing attentions from digital geometry processing field in recent years. The original 3D mesh model need to be divided into separate meaningful parts or surface patches based on certain standards to support reconstruction, compressing, texture mapping, model retrieval and etc. Therefore, segmentation is a key problem for 3D mesh model segmentation. In this paper, we propose a method to segment Collada (a type of mesh model 3D building models into meaningful parts using cluster analysis. Common clustering methods segment 3D mesh models by K-means, whose performance heavily depends on randomized initial seed points (i.e., centroid and different randomized centroid can get quite different results. Therefore, we improved the existing method and used K-means++ clustering algorithm to solve this problem. Our experiments show that K-means++ improves both the speed and the accuracy of K-means, and achieve good and meaningful results.

  9. The Home Care Crew Scheduling Problem: Preference-based visit clustering and temporal dependencies

    DEFF Research Database (Denmark)

    Rasmussen, Matias Sevel; Justesen, Tor Fog; Dohn, Anders Høeg

    2012-01-01

    branch-and-price solution algorithm, as this method has previously given solid results for classical vehicle routing problems. Temporal dependencies are modelled as generalised precedence constraints and enforced through the branching. We introduce a novel visit clustering approach based on the soft...

  10. Earthquakes clustering based on the magnitude and the depths in Molluca Province

    Energy Technology Data Exchange (ETDEWEB)

    Wattimanela, H. J., E-mail: hwattimaela@yahoo.com [Pattimura University, Ambon (Indonesia); Institute of Technology Bandung, Bandung (Indonesia); Pasaribu, U. S.; Indratno, S. W.; Puspito, A. N. T. [Institute of Technology Bandung, Bandung (Indonesia)

    2015-12-22

    In this paper, we present a model to classify the earthquakes occurred in Molluca Province. We use K-Means clustering method to classify the earthquake based on the magnitude and the depth of the earthquake. The result can be used for disaster mitigation and for designing evacuation route in Molluca Province.

  11. Maximizing genetic differentiation in core collections by PCA-based clustering of molecular marker data

    NARCIS (Netherlands)

    Heerwaarden, van J.; Odong, T.L.; Eeuwijk, van F.A.

    2013-01-01

    Developing genetically diverse core sets is key to the effective management and use of crop genetic resources. Core selection increasingly uses molecular marker-based dissimilarity and clustering methods, under the implicit assumption that markers and genes of interest are genetically correlated. In

  12. Earthquakes clustering based on the magnitude and the depths in Molluca Province

    International Nuclear Information System (INIS)

    Wattimanela, H. J.; Pasaribu, U. S.; Indratno, S. W.; Puspito, A. N. T.

    2015-01-01

    In this paper, we present a model to classify the earthquakes occurred in Molluca Province. We use K-Means clustering method to classify the earthquake based on the magnitude and the depth of the earthquake. The result can be used for disaster mitigation and for designing evacuation route in Molluca Province

  13. The impact of semantic document expansion on cluster-based fusion for microblog search

    NARCIS (Netherlands)

    Liang, S.; Ren, Z.; de Rijke, M.; de Rijke, M.; Kenter, T.; de Vries, A.P.; Zhai, C.X.; de Jong, F.; Radinsky, K.; Hofmann, K.

    2014-01-01

    Searching microblog posts, with their limited length and creative language usage, is challenging. We frame the microblog search problem as a data fusion problem. We examine the effectiveness of a recent cluster-based fusion method on the task of retrieving microblog posts. We find that in the

  14. Clustering cliques for graph-based summarization of the biomedical research literature

    DEFF Research Database (Denmark)

    Zhang, Han; Fiszman, Marcelo; Shin, Dongwook

    2013-01-01

    Background: Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts).Results: Sem...

  15. Equations of explicitly-correlated coupled-cluster methods.

    Science.gov (United States)

    Shiozaki, Toru; Kamiya, Muneaki; Hirata, So; Valeev, Edward F

    2008-06-21

    The tensor contraction expressions defining a variety of high-rank coupled-cluster energies and wave functions that include the interelectronic distances (r(12)) explicitly (CC-R12) have been derived with the aid of a newly-developed computerized symbolic algebra smith. Efficient computational sequences to perform these tensor contractions have also been suggested, defining intermediate tensors-some reusable-as a sum of binary tensor contractions. smith can elucidate the index permutation symmetry of intermediate tensors that arise from a Slater-determinant expectation value of any number of excitation, deexcitation and other general second-quantized operators. smith also automates additional algebraic transformation steps specific to R12 methods, i.e. the identification and isolation of the special intermediates that need to be evaluated analytically and the resolution-of-the-identity insertion to facilitate high-dimensional molecular integral computation. The tensor contraction expressions defining the CC-R12 methods including through the connected quadruple excitation operator (CCSDTQ-R12) have been documented and efficient computational sequences have been suggested not just for the ground state but also for excited states via the equation-of-motion formalism (EOM-CC-R12) and for the so-called Lambda equation (Lambda-CC-R12) of the CC analytical gradient theory. Additional equations (the geminal amplitude equation) arise in CC-R12 that need to be solved to determine the coefficients multiplying the r(12)-dependent factors. The operation cost of solving the geminal amplitude equations of rank-k CC-R12 and EOM-CC-R12 (right-hand side) scales as O(n(6)) (k = 2) or O(n(7)) (k > or = 3) with the number of orbitals n and is surpassed by the cost of solving the usual amplitude equations O(n(2k+2)). While the complexity of the geminal amplitude equations of Lambda- and EOM-CC-R12 (left-hand side) nominally scales as O(n(2k+2)), it is less than that of the other O(n(2k

  16. Classification of excessive domestic water consumption using Fuzzy Clustering Method

    Science.gov (United States)

    Zairi Zaidi, A.; Rasmani, Khairul A.

    2016-08-01

    Demand for clean and treated water is increasing all over the world. Therefore it is crucial to conserve water for better use and to avoid unnecessary, excessive consumption or wastage of this natural resource. Classification of excessive domestic water consumption is a difficult task due to the complexity in determining the amount of water usage per activity, especially as the data is known to vary between individuals. In this study, classification of excessive domestic water consumption is carried out using a well-known Fuzzy C-Means (FCM) clustering algorithm. Consumer data containing information on daily, weekly and monthly domestic water usage was employed for the purpose of classification. Using the same dataset, the result produced by the FCM clustering algorithm is compared with the result obtained from a statistical control chart. The finding of this study demonstrates the potential use of the FCM clustering algorithm for the classification of domestic consumer water consumption data.

  17. Anharmonic effects in the quantum cluster equilibrium method

    Science.gov (United States)

    von Domaros, Michael; Perlt, Eva

    2017-03-01

    The well-established quantum cluster equilibrium (QCE) model provides a statistical thermodynamic framework to apply high-level ab initio calculations of finite cluster structures to macroscopic liquid phases using the partition function. So far, the harmonic approximation has been applied throughout the calculations. In this article, we apply an important correction in the evaluation of the one-particle partition function and account for anharmonicity. Therefore, we implemented an analytical approximation to the Morse partition function and the derivatives of its logarithm with respect to temperature, which are required for the evaluation of thermodynamic quantities. This anharmonic QCE approach has been applied to liquid hydrogen chloride and cluster distributions, and the molar volume, the volumetric thermal expansion coefficient, and the isobaric heat capacity have been calculated. An improved description for all properties is observed if anharmonic effects are considered.

  18. A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays.

    Directory of Open Access Journals (Sweden)

    Leila M Naeni

    Full Text Available In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries, to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays.

  19. Nationwide Registry-Based Analysis of Cancer Clustering Detects Strong Familial Occurrence of Kaposi Sarcoma

    Science.gov (United States)

    Vahteristo, Pia; Patama, Toni; Li, Yilong; Saarinen, Silva; Kilpivaara, Outi; Pitkänen, Esa; Knekt, Paul; Laaksonen, Maarit; Artama, Miia; Lehtonen, Rainer; Aaltonen, Lauri A.; Pukkala, Eero

    2013-01-01

    Many cancer predisposition syndromes are rare or have incomplete penetrance, and traditional epidemiological tools are not well suited for their detection. Here we have used an approach that employs the entire population based data in the Finnish Cancer Registry (FCR) for analyzing familial aggregation of all types of cancer, in order to find evidence for previously unrecognized cancer susceptibility conditions. We performed a systematic clustering of 878,593 patients in FCR based on family name at birth, municipality of birth, and tumor type, diagnosed between years 1952 and 2011. We also estimated the familial occurrence of the tumor types using cluster score that reflects the proportion of patients belonging to the most significant clusters compared to all patients in Finland. The clustering effort identified 25,910 birth name-municipality based clusters representing 183 different tumor types characterized by topography and morphology. We produced information about familial occurrence of hundreds of tumor types, and many of the tumor types with high cluster score represented known cancer syndromes. Unexpectedly, Kaposi sarcoma (KS) also produced a very high score (cluster score 1.91, p-value <0.0001). We verified from population records that many of the KS patients forming the clusters were indeed close relatives, and identified one family with five affected individuals in two generations and several families with two first degree relatives. Our approach is unique in enabling systematic examination of a national epidemiological database to derive evidence of aberrant familial aggregation of all tumor types, both common and rare. It allowed effortless identification of families displaying features of both known as well as potentially novel cancer predisposition conditions, including striking familial aggregation of KS. Further work with high-throughput methods should elucidate the molecular basis of the potentially novel predisposition conditions found in this

  20. Semantic-based multilingual document clustering via tensor modeling

    OpenAIRE

    Romeo, S.; Tagarelli, A.; Ienco, D.

    2014-01-01

    EMNLP, Conference on Empirical Methods in Natural Language Processing , Doha, QAT, 25-/10/2014 - 29/10/2014; International audience; A major challenge in document clustering research arises from the growing amount of text data written in different languages. Previous approaches depend on language-specific solutions (e.g., bilingual dictionaries, sequential machine translation) to evaluate document similarities, and the required transformations may alter the original document semantics. To cop...

  1. Implementation of Data Mining on Rice Imports by Major Country of Origin Using Algorithm Using K-Means Clustering Method

    Directory of Open Access Journals (Sweden)

    Agus Perdana Windarto

    2017-10-01

    Full Text Available Indonesia is a country where most of its people rely on the agricultural sector as a livelihood. Indonesia's rice production is so high that it can not meet the needs of its population, consequently Indonesia still has to import rice from other food producing countries. One of the main causes is the enormous population. Statistics show that in the range of 230-237 million people, the staple food of all residents is rice so it is clear that the need for rice becomes very large. This study discusses the application of datamining on rice import by main country of origin using K-Means Clustering Method. Sources of data of this study were collected based on import import declaration documents produced by the Directorate General of Customs and Excise. In addition since 2015, import data also comes from PT. Pos Indonesia, records of other agencies at the border, and the results of cross-border maritime trade surveys. The data used in this study is the data of rice imports by country of origin from 2000-2015 consisting of 10 countries namely Vietnam, Thailand, China, India, Pakistan, United States, Taiwan, Singapore, Myanmar and Others. Variable used (1 total import of rice (net and (2 import purchase value (CIF. The data will be processed by clustering rice imports by main country of origin in 3 clusters ie high imported cluster, medium imported cluster and low import level cluster. The clustering method used in this research is K-Means method. Cetroid data for high import level clusters 7429180 and 2735452,25, Cetroid data for medium import level clusters 1046359.5 and 337703.05 and Cetroid data for low import level clusters 185559.425 and 53089.225. The result is an assessment based on rice import index with 2 high imported cluster countries namely Vietnam and Thailand, 4 medium-level clusters of moderate import countries namely China, India, Pakistan and Lainya and 4 low imported cluster countries namely USA, Taiwan, Singapore and Myanmar. The

  2. An optimal design of cluster spacing intervals for staged fracturing in horizontal shale gas wells based on the optimal SRVs

    Directory of Open Access Journals (Sweden)

    Lan Ren

    2017-09-01

    Full Text Available When horizontal well staged cluster fracturing is applied in shale gas reservoirs, the cluster spacing is essential to fracturing performance. If the cluster spacing is too small, the stimulated area between major fractures will be overlapped, and the efficiency of fracturing stimulation will be decreased. If the cluster spacing is too large, the area between major fractures cannot be stimulated completely and reservoir recovery extent will be adversely impacted. At present, cluster spacing design is mainly based on the static model with the potential reservoir stimulation area as the target, and there is no cluster spacing design method in accordance with the actual fracturing process and targets dynamic stimulated reservoir volume (SRV. In this paper, a dynamic SRV calculation model for cluster fracture propagation was established by analyzing the coupling mechanisms among fracture propagation, fracturing fluid loss and stress. Then, the cluster spacing was optimized to reach the target of the optimal SRVs. This model was applied for validation on site in the Jiaoshiba shale gasfield in the Fuling area of the Sichuan Basin. The key geological engineering parameters influencing the optimal cluster spacing intervals were analyzed. The reference charts for the optimal cluster spacing design were prepared based on the geological characteristics of south and north blocks in the Jiaoshiba shale gasfield. It is concluded that the cluster spacing optimal design method proposed in this paper is of great significance in overcoming the blindness in current cluster perforation design and guiding the optimal design of volume fracturing in shale gas reservoirs. Keywords: Shale gas, Horizontal well, Staged fracturing, Cluster spacing, Reservoir, Stimulated reservoir volume (SRV, Mathematical model, Optimal method, Sichuan basin, Jiaoshiba shale gasfield

  3. Priority Based Congestion Control Dynamic Clustering Protocol in Mobile Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    R. Beulah Jayakumari

    2015-01-01

    Full Text Available Wireless sensor network is widely used to monitor natural phenomena because natural disaster has globally increased which causes significant loss of life, economic setback, and social development. Saving energy in a wireless sensor network (WSN is a critical factor to be considered. The sensor nodes are deployed to sense, compute, and communicate alerts in a WSN which are used to prevent natural hazards. Generally communication consumes more energy than sensing and computing; hence cluster based protocol is preferred. Even with clustering, multiclass traffic creates congested hotspots in the cluster, thereby causing packet loss and delay. In order to conserve energy and to avoid congestion during multiclass traffic a novel Priority Based Congestion Control Dynamic Clustering (PCCDC protocol is developed. PCCDC is designed with mobile nodes which are organized dynamically into clusters to provide complete coverage and connectivity. PCCDC computes congestion at intra- and intercluster level using linear and binary feedback method. Each mobile node within the cluster has an appropriate queue model for scheduling prioritized packet during congestion without drop or delay. Simulation results have proven that packet drop, control overhead, and end-to-end delay are much lower in PCCDC which in turn significantly increases packet delivery ratio, network lifetime, and residual energy when compared with PASCC protocol.

  4. Chinese Text Clustering for Topic Detection Based on Word Pattern Relation

    Science.gov (United States)

    Yang, Yen-Ju; Yu, Su-Hsin

    This research adopt the method of word expansion to compose relevant features into the same semantic concept, then conduct the corresponding documents to concept clusters, and finally merge the concepts with common documents into document clusters. We expect the mechanism, the use of semantic concept to form a feature index, can reduce the problems of polysemy and synonymy. The frequent two or three sequent nouns in the same sentence are used to form a key pattern to replace the keyword as the feature of the text. The distributive strength of key patterns is measured by Pattern Frequency, Pattern Frequency-Inverse Document Frequency, Conditional Probability, Mutual Information, and Association Norm. According to the strength the agglomerate hierarchical clustering technique is applied to cluster these key patterns into semantic concepts. Then, based on the common documents between concepts, several semantic concepts are merged to a group, in which the corresponding text will be considered as topic-related. The experimental results show that our proposed text clustering based on five strength measures of key patterns are all better than the traditional VSM clustering. PFIDF is the best in average F-measure, 97.5%.

  5. A Dirichlet Process Mixture Based Name Origin Clustering and Alignment Model for Transliteration

    Directory of Open Access Journals (Sweden)

    Chunyue Zhang

    2015-01-01

    Full Text Available In machine transliteration, it is common that the transliterated names in the target language come from multiple language origins. A conventional maximum likelihood based single model can not deal with this issue very well and often suffers from overfitting. In this paper, we exploit a coupled Dirichlet process mixture model (cDPMM to address overfitting and names multiorigin cluster issues simultaneously in the transliteration sequence alignment step over the name pairs. After the alignment step, the cDPMM clusters name pairs into many groups according to their origin information automatically. In the decoding step, in order to use the learned origin information sufficiently, we use a cluster combination method (CCM to build clustering-specific transliteration models by combining small clusters into large ones based on the perplexities of name language and transliteration model, which makes sure each origin cluster has enough data for training a transliteration model. On the three different Western-Chinese multiorigin names corpora, the cDPMM outperforms two state-of-the-art baseline models in terms of both the top-1 accuracy and mean F-score, and furthermore the CCM significantly improves the cDPMM.

  6. Energy Aware Cluster Based Routing Scheme For Wireless Sensor Network

    Directory of Open Access Journals (Sweden)

    Roy Sohini

    2015-09-01

    Full Text Available Wireless Sensor Network (WSN has emerged as an important supplement to the modern wireless communication systems due to its wide range of applications. The recent researches are facing the various challenges of the sensor network more gracefully. However, energy efficiency has still remained a matter of concern for the researches. Meeting the countless security needs, timely data delivery and taking a quick action, efficient route selection and multi-path routing etc. can only be achieved at the cost of energy. Hierarchical routing is more useful in this regard. The proposed algorithm Energy Aware Cluster Based Routing Scheme (EACBRS aims at conserving energy with the help of hierarchical routing by calculating the optimum number of cluster heads for the network, selecting energy-efficient route to the sink and by offering congestion control. Simulation results prove that EACBRS performs better than existing hierarchical routing algorithms like Distributed Energy-Efficient Clustering (DEEC algorithm for heterogeneous wireless sensor networks and Energy Efficient Heterogeneous Clustered scheme for Wireless Sensor Network (EEHC.

  7. Saccharomyces cerevisiae-based system for studying clustered DNA damages

    Energy Technology Data Exchange (ETDEWEB)

    Moscariello, M.M.; Sutherland, B.

    2010-08-01

    DNA-damaging agents can induce clustered lesions or multiply damaged sites (MDSs) on the same or opposing DNA strands. In the latter, attempts to repair MDS can generate closely opposed single-strand break intermediates that may convert non-lethal or mutagenic base damage into double-strand breaks (DSBs). We constructed a diploid S. cerevisiae yeast strain with a chromosomal context targeted by integrative DNA fragments carrying different damages to determine whether closely opposed base damages are converted to DSBs following the outcomes of the homologous recombination repair pathway. As a model of MDS, we studied clustered uracil DNA damages with a known location and a defined distance separating the lesions. The system we describe might well be extended to assessing the repair of MDSs with different compositions, and to most of the complex DNA lesions induced by physical and chemical agents.

  8. Analytical network process based optimum cluster head selection in wireless sensor network.

    Science.gov (United States)

    Farman, Haleem; Javed, Huma; Jan, Bilal; Ahmad, Jamil; Ali, Shaukat; Khalil, Falak Naz; Khan, Murad

    2017-01-01

    Wireless Sensor Networks (WSNs) are becoming ubiquitous in everyday life due to their applications in weather forecasting, surveillance, implantable sensors for health monitoring and other plethora of applications. WSN is equipped with hundreds and thousands of small sensor nodes. As the size of a sensor node decreases, critical issues such as limited energy, computation time and limited memory become even more highlighted. In such a case, network lifetime mainly depends on efficient use of available resources. Organizing nearby nodes into clusters make it convenient to efficiently manage each cluster as well as the overall network. In this paper, we extend our previous work of grid-based hybrid network deployment approach, in which merge and split technique has been proposed to construct network topology. Constructing topology through our proposed technique, in this paper we have used analytical network process (ANP) model for cluster head selection in WSN. Five distinct parameters: distance from nodes (DistNode), residual energy level (REL), distance from centroid (DistCent), number of times the node has been selected as cluster head (TCH) and merged node (MN) are considered for CH selection. The problem of CH selection based on these parameters is tackled as a multi criteria decision system, for which ANP method is used for optimum cluster head selection. Main contribution of this work is to check the applicability of ANP model for cluster head selection in WSN. In addition, sensitivity analysis is carried out to check the stability of alternatives (available candidate nodes) and their ranking for different scenarios. The simulation results show that the proposed method outperforms existing energy efficient clustering protocols in terms of optimum CH selection and minimizing CH reselection process that results in extending overall network lifetime. This paper analyzes that ANP method used for CH selection with better understanding of the dependencies of

  9. Computer-Assisted Sleep Staging Based on Segmentation and Clustering

    Science.gov (United States)

    2001-10-25

    9) and females (3) with differnt sleep related complaints (8 normals and 4 with different pathologies ). The age of the subjects ranged form 17 to 63...COMPUTER-ASSISTED SLEEP STAGING BASED ON SEGMENTATION AND CLUSTERING Rajeev Agarwal12∗ and Jean Gotman13 1Stellate Systems, Montreal Canada...can be used to au- tomatically classify sleep states in an all-night polysomno- gram (PSG) to generate a hypnogram for the assesment of sleep -related

  10. A comparison of DNA barcode clustering methods applied to ...

    Indian Academy of Sciences (India)

    2012-10-15

    Oct 15, 2012 ... ABGD; biodiversity inventory; cluster analysis; cryptic species; cytochrome oxidase subunit I; DNA barcode of life; Fuzzy. Identification; GMYC; SAP .... set of phylogenetic trees is sampled using Bayesian Markov chain Monte Carlo ..... Critical factors for assembling a high volume of DNA barcodes. Philos.

  11. A New Cluster Analysis-Marker-Controlled Watershed Method for Separating Particles of Granular Soils

    Directory of Open Access Journals (Sweden)

    Md Ferdous Alam

    2017-10-01

    Full Text Available An accurate determination of particle-level fabric of granular soils from tomography data requires a maximum correct separation of particles. The popular marker-controlled watershed separation method is widely used to separate particles. However, the watershed method alone is not capable of producing the maximum separation of particles when subjected to boundary stresses leading to crushing of particles. In this paper, a new separation method, named as Monash Particle Separation Method (MPSM, has been introduced. The new method automatically determines the optimal contrast coefficient based on cluster evaluation framework to produce the maximum accurate separation outcomes. Finally, the particles which could not be separated by the optimal contrast coefficient were separated by integrating cuboid markers generated from the clustering by Gaussian mixture models into the routine watershed method. The MPSM was validated on a uniformly graded sand volume subjected to one-dimensional compression loading up to 32 MPa. It was demonstrated that the MPSM is capable of producing the best possible separation of particles required for the fabric analysis.

  12. Seminal Quality Prediction Using Clustering-Based Decision Forests

    Directory of Open Access Journals (Sweden)

    Hong Wang

    2014-08-01

    Full Text Available Prediction of seminal quality with statistical learning tools is an emerging methodology in decision support systems in biomedical engineering and is very useful in early diagnosis of seminal patients and selection of semen donors candidates. However, as is common in medical diagnosis, seminal quality prediction faces the class imbalance problem. In this paper, we propose a novel supervised ensemble learning approach, namely Clustering-Based Decision Forests, to tackle unbalanced class learning problem in seminal quality prediction. Experiment results on real fertility diagnosis dataset have shown that Clustering-Based Decision Forests outperforms decision tree, Support Vector Machines, random forests, multilayer perceptron neural networks and logistic regression by a noticeable margin. Clustering-Based Decision Forests can also be used to evaluate variables’ importance and the top five important factors that may affect semen concentration obtained in this study are age, serious trauma, sitting time, the season when the semen sample is produced, and high fevers in the last year. The findings could be helpful in explaining seminal concentration problems in infertile males or pre-screening semen donor candidates.

  13. Wavelet-based clustering of resting state MRI data in the rat

    Science.gov (United States)

    Medda, Alessio; Hoffmann, Lukas; Magnuson, Matthew; Thompson, Garth; Pan, Wen-Ju; Keilholz, Shella

    2015-01-01

    While functional connectivity has typically been calculated over the entire length of the scan (5-10 min), interest has been growing in dynamic analysis methods that can detect changes in connectivity on the order of cognitive processes (seconds). Previous work with sliding window correlation has shown that changes in functional connectivity can be observed on these time scales in the awake human and in anesthetized animals. This exciting advance creates a need for improved approaches to characterize dynamic functional networks in the brain. Previous studies were performed using sliding window analysis on regions of interest defined based on anatomy or obtained from traditional steady-state analysis methods. The parcellation of the brain may therefore be suboptimal, and the characteristics of the time-varying connectivity between regions are dependent upon the length of the sliding window chosen. This manuscript describes an algorithm based on wavelet decomposition that allows data-driven clustering of voxels into functional regions based on temporal and spectral properties. Previous work has shown that different networks have characteristic frequency fingerprints, and the use of wavelets ensures that both the frequency and the timing of the BOLD fluctuations are considered during the clustering process. The method was applied to resting state data acquired from anesthetized rats, and the resulting clusters agreed well with known anatomical areas. Clusters were highly reproducible across subjects. Wavelet cross-correlation values between clusters from a single scan were significantly higher than the values from randomly-matched clusters that shared no temporal information, indicating that wavelet-based analysis is sensitive to the relationship between areas. PMID:26481903

  14. K-Line Patterns’ Predictive Power Analysis Using the Methods of Similarity Match and Clustering

    Directory of Open Access Journals (Sweden)

    Lv Tao

    2017-01-01

    Full Text Available Stock price prediction based on K-line patterns is the essence of candlestick technical analysis. However, there are some disputes on whether the K-line patterns have predictive power in academia. To help resolve the debate, this paper uses the data mining methods of pattern recognition, pattern clustering, and pattern knowledge mining to research the predictive power of K-line patterns. The similarity match model and nearest neighbor-clustering algorithm are proposed for solving the problem of similarity match and clustering of K-line series, respectively. The experiment includes testing the predictive power of the Three Inside Up pattern and Three Inside Down pattern with the testing dataset of the K-line series data of Shanghai 180 index component stocks over the latest 10 years. Experimental results show that (1 the predictive power of a pattern varies a great deal for different shapes and (2 each of the existing K-line patterns requires further classification based on the shape feature for improving the prediction performance.

  15. a Web-Based Interactive Platform for Co-Clustering Spatio-Temporal Data

    Science.gov (United States)

    Wu, X.; Poorthuis, A.; Zurita-Milla, R.; Kraak, M.-J.

    2017-09-01

    Since current studies on clustering analysis mainly focus on exploring spatial or temporal patterns separately, a co-clustering algorithm is utilized in this study to enable the concurrent analysis of spatio-temporal patterns. To allow users to adopt and adapt the algorithm for their own analysis, it is integrated within the server side of an interactive web-based platform. The client side of the platform, running within any modern browser, is a graphical user interface (GUI) with multiple linked visualizations that facilitates the understanding, exploration and interpretation of the raw dataset and co-clustering results. Users can also upload their own datasets and adjust clustering parameters within the platform. To illustrate the use of this platform, an annual temperature dataset from 28 weather stations over 20 years in the Netherlands is used. After the dataset is loaded, it is visualized in a set of linked visualizations: a geographical map, a timeline and a heatmap. This aids the user in understanding the nature of their dataset and the appropriate selection of co-clustering parameters. Once the dataset is processed by the co-clustering algorithm, the results are visualized in the small multiples, a heatmap and a timeline to provide various views for better understanding and also further interpretation. Since the visualization and analysis are integrated in a seamless platform, the user can explore different sets of co-clustering parameters and instantly view the results in order to do iterative, exploratory data analysis. As such, this interactive web-based platform allows users to analyze spatio-temporal data using the co-clustering method and also helps the understanding of the results using multiple linked visualizations.

  16. Yeast homologous recombination-based promoter engineering for the activation of silent natural product biosynthetic gene clusters.

    Science.gov (United States)

    Montiel, Daniel; Kang, Hahk-Soo; Chang, Fang-Yuan; Charlop-Powers, Zachary; Brady, Sean F

    2015-07-21

    Large-scale sequencing of prokaryotic (meta)genomic DNA suggests that most bacterial natural product gene clusters are not expressed under common laboratory culture conditions. Silent gene clusters represent a promising resource for natural product discovery and the development of a new generation of therapeutics. Unfortunately, the characterization of molecules encoded by these clusters is hampered owing to our inability to express these gene clusters in the laboratory. To address this bottleneck, we have developed a promoter-engineering platform to transcriptionally activate silent gene clusters in a model heterologous host. Our approach uses yeast homologous recombination, an auxotrophy complementation-based yeast selection system and sequence orthogonal promoter cassettes to exchange all native promoters in silent gene clusters with constitutively active promoters. As part of this platform, we constructed and validated a set of bidirectional promoter cassettes consisting of orthogonal promoter sequences, Streptomyces ribosome binding sites, and yeast selectable marker genes. Using these tools we demonstrate the ability to simultaneously insert multiple promoter cassettes into a gene cluster, thereby expediting the reengineering process. We apply this method to model active and silent gene clusters (rebeccamycin and tetarimycin) and to the silent, cryptic pseudogene-containing, environmental DNA-derived Lzr gene cluster. Complete promoter refactoring and targeted gene exchange in this "dead" cluster led to the discovery of potent indolotryptoline antiproliferative agents, lazarimides A and B. This potentially scalable and cost-effective promoter reengineering platform should streamline the discovery of natural products from silent natural product biosynthetic gene clusters.

  17. Emergy-based comparative analysis on industrial clusters: economic and technological development zone of Shenyang area, China.

    Science.gov (United States)

    Liu, Zhe; Geng, Yong; Zhang, Pan; Dong, Huijuan; Liu, Zuoxi

    2014-09-01

    In China, local governments of many areas prefer to give priority to the development of heavy industrial clusters in pursuit of high value of gross domestic production (GDP) growth to get political achievements, which usually results in higher costs from ecological degradation and environmental pollution. Therefore, effective methods and reasonable evaluation system are urgently needed to evaluate the overall efficiency of industrial clusters. Emergy methods links economic and ecological systems together, which can evaluate the contribution of ecological products and services as well as the load placed on environmental systems. This method has been successfully applied in many case studies of ecosystem but seldom in industrial clusters. This study applied the methodology of emergy analysis to perform the efficiency of industrial clusters through a series of emergy-based indices as well as the proposed indicators. A case study of Shenyang Economic Technological Development Area (SETDA) was investigated to show the emergy method's practical potential to evaluate industrial clusters to inform environmental policy making. The results of our study showed that the industrial cluster of electric equipment and electronic manufacturing produced the most economic value and had the highest efficiency of energy utilization among the four industrial clusters. However, the sustainability index of the industrial cluster of food and beverage processing was better than the other industrial clusters.

  18. Discrimination of neutrons and γ-rays in liquid scintillators based of fuzzy c-means clustering

    International Nuclear Information System (INIS)

    Luo Xiaoliang; Liu Guofu; Yang Jun

    2011-01-01

    A novel method based on fuzzy c-means (FCM) clustering for the discrimination of neutrons and γ-rays in liquid scintillators was presented. The neutrons and γ-rays in the environment were firstly acquired by the portable real-time n-γ discriminator and then discriminated using fuzzy c-means clustering and pulse gradient analysis, respectively. By comparing the results with each other, it is shown that the discrimination results of the fuzzy c-means clustering are consistent with those of the pulse gradient analysis. The decrease in uncertainty and the improvement in discrimination performance of the fuzzy c-means clustering were also observed. (authors)

  19. Multi-documents summarization based on clustering of learning object using hierarchical clustering

    Science.gov (United States)

    Mustamiin, M.; Budi, I.; Santoso, H. B.

    2018-03-01

    The Open Educational Resources (OER) is a portal of teaching, learning and research resources that is available in public domain and freely accessible. Learning contents or Learning Objects (LO) are granular and can be reused for constructing new learning materials. LO ontology-based searching techniques can be used to search for LO in the Indonesia OER. In this research, LO from search results are used as an ingredient to create new learning materials according to the topic searched by users. Summarizing-based grouping of LO use Hierarchical Agglomerative Clustering (HAC) with the dependency context to the user’s query which has an average value F-Measure of 0.487, while summarizing by K-Means F-Measure only has an average value of 0.336.

  20. Flocking-based Document Clustering on the Graphics Processing Unit

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL; Patton, Robert M [ORNL; ST Charles, Jesse Lee [ORNL

    2008-01-01

    Abstract?Analyzing and grouping documents by content is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. Each bird represents a single document and flies toward other documents that are similar to it. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly difficult to receive results in a reasonable amount of time. However, flocking behavior, along with most naturally inspired algorithms such as ant colony optimization and particle swarm optimization, are highly parallel and have found increased performance on expensive cluster computers. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. Some applications see a huge increase in performance on this new platform. The cost of these high-performance devices is also marginal when compared with the price of cluster machines. In this paper, we have conducted research to exploit this architecture and apply its strengths to the document flocking problem. Our results highlight the potential benefit the GPU brings to all naturally inspired algorithms. Using the CUDA platform from NIVIDA? we developed a document flocking implementation to be run on the NIVIDA?GEFORCE 8800. Additionally, we developed a similar but sequential implementation of the same algorithm to be run on a desktop CPU. We tested the performance of each on groups of news articles ranging in size from 200 to 3000 documents. The results of these tests were very significant. Performance gains ranged from three to nearly five times improvement of the GPU over the CPU implementation. This dramatic improvement in runtime makes the GPU a potentially revolutionary platform for document clustering algorithms.

  1. Comparison of Skin Moisturizer: Consumer-Based Brand Equity (CBBE Factors in Clusters Based on Consumer Ethnocentrism

    Directory of Open Access Journals (Sweden)

    Yossy Hanna Garlina

    2014-09-01

    Full Text Available This research aims to analyze relevant factors contributing to the four dimensions of consumer-based brand equity in skin moisturizer industry. It is then followed by the clustering of female consumers of skin moisturizer based on ethnocentrism and differentiating each cluster’s consumer-based brand equity dimensions towards a domestic skin moisturizer brand Mustika Ratu, skin moisturizer. Research used descriptive survey method analysis. Primary data was obtained through questionnaire distribution to 70 female respondents for factor analysis and 120 female respondents for cluster analysis and one way analysis of variance (ANOVA. This research employed factor analysis to obtain relevant factors contributing to the five dimensions of consumer-based brand equity in skin moisturizer industry. Cluster analysis and one way analysis of variance (ANOVA were to see the difference of consumer-based brand equity between highly ethnocentric consumer and low ethnocentric consumer towards the same skin moisturizer domestic brand, Mustika Ratu skin moisturizer. Research found in all individual dimension analysis, all variable means and individual means show distinct difference between the high ethnocentric consumer and the low ethnocentric consumer. The low ethnocentric consumer cluster tends to be lower in mean score of Brand Loyalty, Perceived Quality, Brand Awareness, Brand Association, and Overall Brand Equity than the high ethnocentric consumer cluster. Research concludes consumer ethnocentrism is positively correlated with preferences towards domestic products and negatively correlated with foreign-made product preference. It is, then, highly ethnocentric consumers have positive perception towards domestic product.

  2. Cluster Method Analysis of K. S. C. Image

    Science.gov (United States)

    Rodriguez, Joe, Jr.; Desai, M.

    1997-01-01

    Information obtained from satellite-based systems has moved to the forefront as a method in the identification of many land cover types. Identification of different land features through remote sensing is an effective tool for regional and global assessment of geometric characteristics. Classification data acquired from remote sensing images have a wide variety of applications. In particular, analysis of remote sensing images have special applications in the classification of various types of vegetation. Results obtained from classification studies of a particular area or region serve towards a greater understanding of what parameters (ecological, temporal, etc.) affect the region being analyzed. In this paper, we make a distinction between both types of classification approaches although, focus is given to the unsupervised classification method using 1987 Thematic Mapped (TM) images of Kennedy Space Center.

  3. Cluster chain based energy efficient routing protocol for moblie WSN

    Directory of Open Access Journals (Sweden)

    WU Ziyu

    2016-04-01

    Full Text Available With the ubiquitous smart devices acting as mobile sensor nodes in the wireless sensor networks(WSNs to sense and transmit physical information,routing protocols should be designed to accommodate the mobility issues,in addition to conventional considerations on energy efficiency.However,due to frequent topology change,traditional routing schemes cannot perform well.Moreover,existence of mobile nodes poses new challenges on energy dissipation and packet loss.In this paper,a novel routing scheme called cluster chain based routing protocol(CCBRP is proposed,which employs a combination of cluster and chain structure to accomplish data collection and transmission and thereafter selects qualified cluster heads as chain leaders to transmit data to the sink.Furthermore,node mobility is handled based on periodical membership update of mobile nodes.Simulation results demonstrate that CCBRP has a good performance in terms of network lifetime and packet delivery,also strikes a better balance between successful packet reception and energy consumption.

  4. Research on Bridge Sensor Validation Based on Correlation in Cluster

    Directory of Open Access Journals (Sweden)

    Huang Xiaowei

    2016-01-01

    Full Text Available In order to avoid the false alarm and alarm failure caused by sensor malfunction or failure, it has been critical to diagnose the fault and analyze the failure of the sensor measuring system in major infrastructures. Based on the real time monitoring of bridges and the study on the correlation probability distribution between multisensors adopted in the fault diagnosis system, a clustering algorithm based on k-medoid is proposed, by dividing sensors of the same type into k clusters. Meanwhile, the value of k is optimized by a specially designed evaluation function. Along with the further study of the correlation of sensors within the same cluster, this paper presents the definition and corresponding calculation algorithm of the sensor’s validation. The algorithm is applied to the analysis of the sensor data from an actual health monitoring system. The result reveals that the algorithm can not only accurately measure the failure degree and orientate the malfunction in time domain but also quantitatively evaluate the performance of sensors and eliminate error of diagnosis caused by the failure of the reference sensor.

  5. Direct Reconstruction of CT-based Attenuation Correction Images for PET with Cluster-Based Penalties

    Science.gov (United States)

    Kim, Soo Mee; Alessio, Adam M.; De Man, Bruno; Asma, Evren; Kinahan, Paul E.

    2015-01-01

    Extremely low-dose CT acquisitions for the purpose of PET attenuation correction will have a high level of noise and biasing artifacts due to factors such as photon starvation. This work explores a priori knowledge appropriate for CT iterative image reconstruction for PET attenuation correction. We investigate the maximum a posteriori (MAP) framework with cluster-based, multinomial priors for the direct reconstruction of the PET attenuation map. The objective function for direct iterative attenuation map reconstruction was modeled as a Poisson log-likelihood with prior terms consisting of quadratic (Q) and mixture (M) distributions. The attenuation map is assumed to have values in 4 clusters: air+background, lung, soft tissue, and bone. Under this assumption, the MP was a mixture probability density function consisting of one exponential and three Gaussian distributions. The relative proportion of each cluster was jointly estimated during each voxel update of direct iterative coordinate decent (dICD) method. Noise-free data were generated from NCAT phantom and Poisson noise was added. Reconstruction with FBP (ramp filter) was performed on the noise-free (ground truth) and noisy data. For the noisy data, dICD reconstruction was performed with the combination of different prior strength parameters (β and γ) of Q- and M-penalties. The combined quadratic and mixture penalties reduces the RMSE by 18.7% compared to post-smoothed iterative reconstruction and only 0.7% compared to quadratic alone. For direct PET attenuation map reconstruction from ultra-low dose CT acquisitions, the combination of quadratic and mixture priors offers regularization of both variance and bias and is a potential method to derive attenuation maps with negligible patient dose. However, the small improvement in quantitative accuracy relative to the substantial increase in algorithm complexity does not currently justify the use of mixture-based PET attenuation priors for reconstruction of CT

  6. Spike detection II: automatic, perception-based detection and clustering.

    Science.gov (United States)

    Wilson, S B; Turner, C A; Emerson, R G; Scheuer, M L

    1999-03-01

    We developed perception-based spike detection and clustering algorithms. The detection algorithm employs a novel, multiple monotonic neural network (MMNN). It is tested on two short-duration EEG databases containing 2400 spikes from 50 epilepsy patients and 10 control subjects. Previous studies are compared for database difficulty and reliability and algorithm accuracy. Automatic grouping of spikes via hierarchical clustering (using topology and morphology) is visually compared with hand marked grouping on a single record. The MMNN algorithm is found to operate close to the ability of a human expert while alleviating problems related to overtraining. The hierarchical and hand marked spike groupings are found to be strikingly similar. An automatic detection algorithm need not be as accurate as a human expert to be clinically useful. A user interface that allows the neurologist to quickly delete artifacts and determine whether there are multiple spike generators is sufficient.

  7. Image Registration Algorithm Based on Parallax Constraint and Clustering Analysis

    Science.gov (United States)

    Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song

    2018-01-01

    To resolve the problem of slow computation speed and low matching accuracy in image registration, a new image registration algorithm based on parallax constraint and clustering analysis is proposed. Firstly, Harris corner detection algorithm is used to extract the feature points of two images. Secondly, use Normalized Cross Correlation (NCC) function to perform the approximate matching of feature points, and the initial feature pair is obtained. Then, according to the parallax constraint condition, the initial feature pair is preprocessed by K-means clustering algorithm, which is used to remove the feature point pairs with obvious errors in the approximate matching process. Finally, adopt Random Sample Consensus (RANSAC) algorithm to optimize the feature points to obtain the final feature point matching result, and the fast and accurate image registration is realized. The experimental results show that the image registration algorithm proposed in this paper can improve the accuracy of the image matching while ensuring the real-time performance of the algorithm.

  8. Modeling Molecular Systems at Extreme Pressure by an Extension of the Polarizable Continuum Model (PCM) Based on the Symmetry-Adapted Cluster-Configuration Interaction (SAC-CI) Method: Confined Electronic Excited States of Furan as a Test Case.

    Science.gov (United States)

    Fukuda, Ryoichi; Ehara, Masahiro; Cammi, Roberto

    2015-05-12

    Novel molecular photochemistry can be developed by combining high pressure and laser irradiation. For studying such high-pressure effects on the confined electronic ground and excited states, we extend the PCM (polarizable continuum model) SAC (symmetry-adapted cluster) and SAC-CI (SAC-configuration interaction) methods to the PCM-XP (extreme pressure) framework. By using the PCM-XP SAC/SAC-CI method, molecular systems in various electronic states can be confined by polarizable media in a smooth and flexible way. The PCM-XP SAC/SAC-CI method is applied to a furan (C4H4O) molecule in cyclohexane at high pressure (1-60 GPa). The relationship between the calculated free-energy and cavity volume can be approximately represented with the Murnaghan equation of state. The excitation energies of furan in cyclohexane show blueshifts with increasing pressure, and the extents of the blueshifts significantly depend on the character of the excitations. Particularly large confinement effects are found in the Rydberg states. The energy ordering of the lowest Rydberg and valence states alters under high-pressure. The pressure effects on the electronic structure may be classified into two contributions: a confinement of the molecular orbital and a suppression of the mixing between the valence and Rydberg configurations. The valence or Rydberg character in an excited state is, therefore, enhanced under high pressure.

  9. Prediction of CpG-island function: CpG clustering vs. sliding-window methods

    Directory of Open Access Journals (Sweden)

    Luque-Escamilla Pedro L

    2010-05-01

    Full Text Available Abstract Background Unmethylated stretches of CpG dinucleotides (CpG islands are an outstanding property of mammal genomes. Conventionally, these regions are detected by sliding window approaches using %G + C, CpG observed/expected ratio and length thresholds as main parameters. Recently, clustering methods directly detect clusters of CpG dinucleotides as a statistical property of the genome sequence. Results We compare sliding-window to clustering (i.e. CpGcluster predictions by applying new ways to detect putative functionality of CpG islands. Analyzing the co-localization with several genomic regions as a function of window size vs. statistical significance (p-value, CpGcluster shows a higher overlap with promoter regions and highly conserved elements, at the same time showing less overlap with Alu retrotransposons. The major difference in the prediction was found for short islands (CpG islets, often exclusively predicted by CpGcluster. Many of these islets seem to be functional, as they are unmethylated, highly conserved and/or located within the promoter region. Finally, we show that window-based islands can spuriously overlap several, differentially regulated promoters as well as different methylation domains, which might indicate a wrong merge of several CpG islands into a single, very long island. The shorter CpGcluster islands seem to be much more specific when concerning the overlap with alternative transcription start sites or the detection of homogenous methylation domains. Conclusions The main difference between sliding-window approaches and clustering methods is the length of the predicted islands. Short islands, often differentially methylated, are almost exclusively predicted by CpGcluster. This suggests that CpGcluster may be the algorithm of choice to explore the function of these short, but putatively functional CpG islands.

  10. A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles

    Directory of Open Access Journals (Sweden)

    Zhang Lin

    2012-10-01

    Full Text Available Abstract Background DNA methylation occurs in the context of a CpG dinucleotide. It is an important epigenetic modification, which can be inherited through cell division. The two major types of methylation include hypomethylation and hypermethylation. Unique methylation patterns have been shown to exist in diseases including various types of cancer. DNA methylation analysis promises to become a powerful tool in cancer diagnosis, treatment and prognostication. Large-scale methylation arrays are now available for studying methylation genome-wide. The Illumina methylation platform simultaneously measures cytosine methylation at more than 1500 CpG sites associated with over 800 cancer-related genes. Cluster analysis is often used to identify DNA methylation subgroups for prognosis and diagnosis. However, due to the unique non-Gaussian characteristics, traditional clustering methods may not be appropriate for DNA and methylation data, and the determination of optimal cluster number is still problematic. Method A Dirichlet process beta mixture model (DPBMM is proposed that models the DNA methylation expressions as an infinite number of beta mixture distribution. The model allows automatic learning of the relevant parameters such as the cluster mixing proportion, the parameters of beta distribution for each cluster, and especially the number of potential clusters. Since the model is high dimensional and analytically intractable, we proposed a Gibbs sampling "no-gaps" solution for computing the posterior distributions, hence the estimates of the parameters. Result The proposed algorithm was tested on simulated data as well as methylation data from 55 Glioblastoma multiform (GBM brain tissue samples. To reduce the computational burden due to the high data dimensionality, a dimension reduction method is adopted. The two GBM clusters yielded by DPBMM are based on data of different number of loci (P-value

  11. PCA based clustering for brain tumor segmentation of T1w MRI images.

    Science.gov (United States)

    Kaya, Irem Ersöz; Pehlivanlı, Ayça Çakmak; Sekizkardeş, Emine Gezmez; Ibrikci, Turgay

    2017-03-01

    Medical images are huge collections of information that are difficult to store and process consuming extensive computing time. Therefore, the reduction techniques are commonly used as a data pre-processing step to make the image data less complex so that a high-dimensional data can be identified by an appropriate low-dimensional representation. PCA is one of the most popular multivariate methods for data reduction. This paper is focused on T1-weighted MRI images clustering for brain tumor segmentation with dimension reduction by different common Principle Component Analysis (PCA) algorithms. Our primary aim is to present a comparison between different variations of PCA algorithms on MRIs for two cluster methods. Five most common PCA algorithms; namely the conventional PCA, Probabilistic Principal Component Analysis (PPCA), Expectation Maximization Based Principal Component Analysis (EM-PCA), Generalize Hebbian Algorithm (GHA), and Adaptive Principal Component Extraction (APEX) were applied to reduce dimensionality in advance of two clustering algorithms, K-Means and Fuzzy C-Means. In the study, the T1-weighted MRI images of the human brain with brain tumor were used for clustering. In addition to the original size of 512 lines and 512 pixels per line, three more different sizes, 256 × 256, 128 × 128 and 64 × 64, were included in the study to examine their effect on the methods. The obtained results were compared in terms of both the reconstruction errors and the Euclidean distance errors among the clustered images containing the same number of principle components. According to the findings, the PPCA obtained the best results among all others. Furthermore, the EM-PCA and the PPCA assisted K-Means algorithm to accomplish the best clustering performance in the majority as well as achieving significant results with both clustering algorithms for all size of T1w MRI images. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  12. Clustering Batik Images using Fuzzy C-Means Algorithm Based on Log-Average Luminance

    Directory of Open Access Journals (Sweden)

    Ahmad Sanmorino

    2012-06-01

    Full Text Available Batik is a fabric or clothes that are made ​​with a special staining technique called wax-resist dyeing and is one of the cultural heritage which has high artistic value. In order to improve the efficiency and give better semantic to the image, some researchers apply clustering algorithm for managing images before they can be retrieved. Image clustering is a process of grouping images based on their similarity. In this paper we attempt to provide an alternative method of grouping batik image using fuzzy c-means (FCM algorithm based on log-average luminance of the batik. FCM clustering algorithm is an algorithm that works using fuzzy models that allow all data from all cluster members are formed with different degrees of membership between 0 and 1. Log-average luminance (LAL is the average value of the lighting in an image. We can compare different image lighting from one image to another using LAL. From the experiments that have been made, it can be concluded that fuzzy c-means algorithm can be used for batik image clustering based on log-average luminance of each image possessed.

  13. A clustering-based graph Laplacian framework for value function approximation in reinforcement learning.

    Science.gov (United States)

    Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold

    2014-12-01

    In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.

  14. A novel Bayesian DNA motif comparison method for clustering and retrieval.

    Directory of Open Access Journals (Sweden)

    Naomi Habib

    2008-02-01

    Full Text Available Characterizing the DNA-binding specificities of transcription factors is a key problem in computational biology that has been addressed by multiple algorithms. These usually take as input sequences that are putatively bound by the same factor and output one or more DNA motifs. A common practice is to apply several such algorithms simultaneously to improve coverage at the price of redundancy. In interpreting such results, two tasks are crucial: clustering of redundant motifs, and attributing the motifs to transcription factors by retrieval of similar motifs from previously characterized motif libraries. Both tasks inherently involve motif comparison. Here we present a novel method for comparing and merging motifs, based on Bayesian probabilistic principles. This method takes into account both the similarity in positional nucleotide distributions of the two motifs and their dissimilarity to the background distribution. We demonstrate the use of the new comparison method as a basis for motif clustering and retrieval procedures, and compare it to several commonly used alternatives. Our results show that the new method outperforms other available methods in accuracy and sensitivity. We incorporated the resulting motif clustering and retrieval procedures in a large-scale automated pipeline for analyzing DNA motifs. This pipeline integrates the results of various DNA motif discovery algorithms and automatically merges redundant motifs from multiple training sets into a coherent annotated library of motifs. Application of this pipeline to recent genome-wide transcription factor location data in S. cerevisiae successfully identified DNA motifs in a manner that is as good as semi-automated analysis reported in the literature. Moreover, we show how this analysis elucidates the mechanisms of condition-specific preferences of transcription factors.

  15. Comparison of methods for estimating the intraclass correlation coefficient for binary responses in cancer prevention cluster randomized trials.

    Science.gov (United States)

    Wu, Sheng; Crespi, Catherine M; Wong, Weng Kee

    2012-09-01

    The intraclass correlation coefficient (ICC) is a fundamental parameter of interest in cluster randomized trials as it can greatly affect statistical power. We compare common methods of estimating the ICC in cluster randomized trials with binary outcomes, with a specific focus on their application to community-based cancer prevention trials with primary outcome of self-reported cancer screening. Using three real data sets from cancer screening intervention trials with different numbers and types of clusters and cluster sizes, we obtained point estimates and 95% confidence intervals for the ICC using five methods: the analysis of variance estimator, the Fleiss-Cuzick estimator, the Pearson estimator, an estimator based on generalized estimating equations and an estimator from a random intercept logistic regression model. We compared estimates of the ICC for the overall sample and by study condition. Our results show that ICC estimates from different methods can be quite different, although confidence intervals generally overlap. The ICC varied substantially by study condition in two studies, suggesting that the common practice of assuming a common ICC across all clusters in the trial is questionable. A simulation study confirmed pitfalls of erroneously assuming a common ICC. Investigators should consider using sample size and analysis methods that allow the ICC to vary by study condition. Copyright © 2012. Published by Elsevier Inc.

  16. INTERSECTION DETECTION BASED ON QUALITATIVE SPATIAL REASONING ON STOPPING POINT CLUSTERS

    Directory of Open Access Journals (Sweden)

    S. Zourlidou

    2016-06-01

    Full Text Available The purpose of this research is to propose and test a method for detecting intersections by analysing collectively acquired trajectories of moving vehicles. Instead of solely relying on the geometric features of the trajectories, such as heading changes, which may indicate turning points and consequently intersections, we extract semantic features of the trajectories in form of sequences of stops and moves. Under this spatiotemporal prism, the extracted semantic information which indicates where vehicles stop can reveal important locations, such as junctions. The advantage of the proposed approach in comparison with existing turning-points oriented approaches is that it can detect intersections even when not all the crossing road segments are sampled and therefore no turning points are observed in the trajectories. The challenge with this approach is that first of all, not all vehicles stop at the same location – thus, the stop-location is blurred along the direction of the road; this, secondly, leads to the effect that nearby junctions can induce similar stop-locations. As a first step, a density-based clustering is applied on the layer of stop observations and clusters of stop events are found. Representative points of the clusters are determined (one per cluster and in a last step the existence of an intersection is clarified based on spatial relational cluster reasoning, with which less informative geospatial clusters, in terms of whether a junction exists and where its centre lies, are transformed in more informative ones. Relational reasoning criteria, based on the relative orientation of the clusters with their adjacent ones are discussed for making sense of the relation that connects them, and finally for forming groups of stop events that belong to the same junction.

  17. A three-stage strategy for optimal price offering by a retailer based on clustering techniques

    International Nuclear Information System (INIS)

    Mahmoudi-Kohan, N.; Shayesteh, E.; Moghaddam, M. Parsa; Sheikh-El-Eslami, M.K.

    2010-01-01

    In this paper, an innovative strategy for optimal price offering to customers for maximizing the profit of a retailer is proposed. This strategy is based on load profile clustering techniques and includes three stages. For the purpose of clustering, an improved weighted fuzzy average K-means is proposed. Also, in this paper a new acceptance function for increasing the profit of the retailer is proposed. The new method is evaluated by implementation on a group of 300 customers of a 20 kV distribution network. (author)

  18. DLTAP: A Network-efficient Scheduling Method for Distributed Deep Learning Workload in Containerized Cluster Environment

    OpenAIRE

    Qiao Wei; Li Ying; Wu Zhong-Hai

    2017-01-01

    Deep neural networks (DNNs) have recently yielded strong results on a range of applications. Training these DNNs using a cluster of commodity machines is a promising approach since training is time consuming and compute-intensive. Furthermore, putting DNN tasks into containers of clusters would enable broader and easier deployment of DNN-based algorithms. Toward this end, this paper addresses the problem of scheduling DNN tasks in the containerized cluster environment. Efficiently scheduling ...

  19. Performance Based Clustering for Benchmarking of Container Ports: an Application of Dea and Cluster Analysis Technique

    Directory of Open Access Journals (Sweden)

    Jie Wu

    2010-12-01

    Full Text Available The operational performance of container ports has received more and more attentions in both academic and practitioner circles, the performance evaluation and process improvement of container ports have also been the focus of several studies. In this paper, Data Envelopment Analysis (DEA, an effective tool for relative efficiency assessment, is utilized for measuring the performances and benchmarking of the 77 world container ports in 2007. The used approaches in the current study consider four inputs (Capacity of Cargo Handling Machines, Number of Berths, Terminal Area and Storage Capacity and a single output (Container Throughput. The results for the efficiency scores are analyzed, and a unique ordering of the ports based on average cross efficiency is provided, also cluster analysis technique is used to select the more appropriate targets for poorly performing ports to use as benchmarks.

  20. Identification of suitable genes contributes to lung adenocarcinoma clustering by multiple meta-analysis methods.

    Science.gov (United States)

    Yang, Ze-Hui; Zheng, Rui; Gao, Yuan; Zhang, Qiang

    2016-09-01

    With the widespread application of high-throughput technology, numerous meta-analysis methods have been proposed for differential expression profiling across multiple studies. We identified the suitable differentially expressed (DE) genes that contributed to lung adenocarcinoma (ADC) clustering based on seven popular multiple meta-analysis methods. Seven microarray expression profiles of ADC and normal controls were extracted from the ArrayExpress database. The Bioconductor was used to perform the data preliminary preprocessing. Then, DE genes across multiple studies were identified. Hierarchical clustering was applied to compare the classification performance for microarray data samples. The classification efficiency was compared based on accuracy, sensitivity and specificity. Across seven datasets, 573 ADC cases and 222 normal controls were collected. After filtering out unexpressed and noninformative genes, 3688 genes were remained for further analysis. The classification efficiency analysis showed that DE genes identified by sum of ranks method separated ADC from normal controls with the best accuracy, sensitivity and specificity of 0.953, 0.969 and 0.932, respectively. The gene set with the highest classification accuracy mainly participated in the regulation of response to external stimulus (P = 7.97E-04), cyclic nucleotide-mediated signaling (P = 0.01), regulation of cell morphogenesis (P = 0.01) and regulation of cell proliferation (P = 0.01). Evaluation of DE genes identified by different meta-analysis methods in classification efficiency provided a new perspective to the choice of the suitable method in a given application. Varying meta-analysis methods always present varying abilities, so synthetic consideration should be taken when providing meta-analysis methods for particular research. © 2015 John Wiley & Sons Ltd.

  1. Evaluating the Relevance of Corporate Ontological Knowledge Base Documents Using Their Hierarchical Role Clustering

    Directory of Open Access Journals (Sweden)

    A. P. Karpenko

    2014-01-01

    Full Text Available The article considers a corporate knowledge base representing a set of the large number of various semi-structured documents, which describe to some detail precedents i.e. some situations and decisions made in these situations. The task is to find the decision in such knowledge base and search the most suitable precedents and their corresponding documents in it.The work continues a series of the authors’ works, which develop an original approach to the solution of this task. It is supposed that metadata of the knowledge base documents are based on the ontology of the corresponding subject domain specified as a semantic network. As opposed to the previous works it is necessary that ontology of subject domain, and thereby and corresponding semantic network, is hierarchically clustered based on the roles of concepts.Documents in the knowledge base, and also search queries are presented as frames, which are called ‘patterns of design and inquiry’, respectively. Slots of these patterns correspond to roles of concepts of the used ontology. Above-noted roles break concepts of ontology, document, and request to the knowledge base into clusters. It is supposed that semantic networks of the abovementioned clusters are designed by a technique for creation of a semantic network of the document. Thus, search images of the document and inquiry are presented as a set of the semantic networks corresponding to slots of a pattern of design and a pattern of inquiry.The work offers the model of a semantic network of the ontological knowledge base using a hierarchical role clustering of concepts of this ontology. The task for a multi-criteria evaluation of relevance of corporate ontological knowledge base documents using their hierarchical role clustering is set. The method is offered to solve the task using the original measures of proximity of role patterns of design of the required document and inquiry.

  2. EUV resists based on tin-oxo clusters

    Science.gov (United States)

    Cardineau, Brian; Del Re, Ryan; Al-Mashat, Hashim; Marnell, Miles; Vockenhuber, Michaela; Ekinci, Yasin; Sarma, Chandra; Neisser, Mark; Freedman, Daniel A.; Brainard, Robert L.

    2014-03-01

    We have studied the photolysis of tin clusters of the type [(RSn)12O14(OH)6] X2 using extreme ultraviolet (EUV, 13.5 nm) light, and developed these clusters into novel high-resolution photoresists. A thin film of [(BuSn)12O14(OH)6][p-toluenesulfonate]2 (1) was prepared by spin coating a solution of (1) in 2-butanone onto a silicon wafer. Exposure to EUV light caused the compound (1) to be converted into a substance that was markedly less soluble in aqueous isopropanol. To optimize the EUV lithographic performance of resists using tin-oxo clusters, and to gain insight into the mechanism of their photochemical reactions, we prepared several compounds based on [(RSn)12O14(OH)6] X2. The sensitivity of tin-oxide films to EUV light were studied as a function of variations in the structure of the counter-anions (X, primarily carboxylates) and organic ligands bound to tin (R). Correlations were sought between the EUV sensitivity of these complexes vs. the strength of the carbon-carboxylate bonds in the counteranions and vs. the strength of the carbon-tin bonds. No correlation was observed between the strength of the carboncarboxylate bonds in the counter-anions (X) and the EUV photosensitivity. However, the EUV sensitivity of the tinoxide films appears to be well-correlated with the strength of the carbon-tin bonds. We hypothesize this correlation indicates a mechanism of carbon-tin bond homolysis during exposure. Using these tin clusters, 18-nm lines were printed showcasing the high resolution capabilities of these materials as photoresists for EUV lithography.

  3. Clustering Methods; Part IV of Scientific Report No. ISR-18, Information Storage and Retrieval...

    Science.gov (United States)

    Cornell Univ., Ithaca, NY. Dept. of Computer Science.

    Two papers are included as Part Four of this report on Salton's Magical Automatic Retriever of Texts (SMART) project report. The first paper: "A Controlled Single Pass Classification Algorithm with Application to Multilevel Clustering" by D. B. Johnson and J. M. Laferente presents a single pass clustering method which compares favorably…

  4. Clustering of hydrological data: a review of methods for runoff predictions in ungauged basins

    Science.gov (United States)

    Dogulu, Nilay; Kentel, Elcin

    2017-04-01

    There is a great body of research that has looked into the challenge of hydrological predictions in ungauged basins as driven by the Prediction in Ungauged Basins (PUB) initiative of the International Association of Hydrological Sciences (IAHS). Transfer of hydrological information (e.g. model parameters, flow signatures) from gauged to ungauged catchment, often referred as "regionalization", is the main objective and benefits from identification of hydrologically homogenous regions. Within this context, indirect representation of hydrologic similarity for ungauged catchments, which is not a straightforward task due to absence of streamflow measurements and insufficient knowledge of hydrologic behavior, has been explored in the literature. To this aim, clustering methods have been widely adopted. While most of the studies employ hard clustering techniques such as hierarchical (divisive or agglomerative) clustering, there have been more recent attempts taking advantage of fuzzy set theory (fuzzy clustering) and nonlinear methods (e.g. self-organizing maps). The relevant research findings from this fundamental task of hydrologic sciences have revealed the value of different clustering methods for improved understanding of catchment hydrology. However, despite advancements there still remains challenges and yet opportunities for research on clustering for regionalization purposes. The present work provides an overview of clustering techniques and their applications in hydrology with focus on regionalization for the PUB problem. Identifying their advantages and disadvantages, we discuss the potential of innovative clustering methods and reflect on future challenges in view of the research objectives of the PUB initiative.

  5. The initial conditions of observed star clusters - I. Method description and validation

    NARCIS (Netherlands)

    Pijloo, J.T.; Portegies Zwart, S.F.; Alexander, P.E.R.; Gieles, M.; Larsen, S.S.; Groot, P.J.; Devecchi, B.A.

    2015-01-01

    We have coupled a fast, parametrized star cluster evolution code to a Markov Chain Monte Carlo code to determine the distribution of probable initial conditions of observed star clusters, that may serve as a starting point for future N-body calculations. In this paper, we validate our method by

  6. Possible world based consistency learning model for clustering and classifying uncertain data.

    Science.gov (United States)

    Liu, Han; Zhang, Xianchao; Zhang, Xiaotong

    2018-06-01

    Possible world has shown to be effective for handling various types of data uncertainty in uncertain data management. However, few uncertain data clustering and classification algorithms are proposed based on possible world. Moreover, existing possible world based algorithms suffer from the following issues: (1) they deal with each possible world independently and ignore the consistency principle across different possible worlds; (2) they require the extra post-processing procedure to obtain the final result, which causes that the effectiveness highly relies on the post-processing method and the efficiency is also not very good. In this paper, we propose a novel possible world based consistency learning model for uncertain data, which can be extended both for clustering and classifying uncertain data. This model utilizes the consistency principle to learn a consensus affinity matrix for uncertain data, which can make full use of the information across different possible worlds and then improve the clustering and classification performance. Meanwhile, this model imposes a new rank constraint on the Laplacian matrix of the consensus affinity matrix, thereby ensuring that the number of connected components in the consensus affinity matrix is exactly equal to the number of classes. This also means that the clustering and classification results can be directly obtained without any post-processing procedure. Furthermore, for the clustering and classification tasks, we respectively derive the efficient optimization methods to solve the proposed model. Experimental results on real benchmark datasets and real world uncertain datasets show that the proposed model outperforms the state-of-the-art uncertain data clustering and classification algorithms in effectiveness and performs competitively in efficiency. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. Performance Analysis of Entropy Methods on K Means in Clustering Process

    Science.gov (United States)

    Dicky Syahputra Lubis, Mhd.; Mawengkang, Herman; Suwilo, Saib

    2017-12-01

    K Means is a non-hierarchical data clustering method that attempts to partition existing data into one or more clusters / groups. This method partitions the data into clusters / groups so that data that have the same characteristics are grouped into the same cluster and data that have different characteristics are grouped into other groups.The purpose of this data clustering is to minimize the objective function set in the clustering process, which generally attempts to minimize variation within a cluster and maximize the variation between clusters. However, the main disadvantage of this method is that the number k is often not known before. Furthermore, a randomly chosen starting point may cause two points to approach the distance to be determined as two centroids. Therefore, for the determination of the starting point in K Means used entropy method where this method is a method that can be used to determine a weight and take a decision from a set of alternatives. Entropy is able to investigate the harmony in discrimination among a multitude of data sets. Using Entropy criteria with the highest value variations will get the highest weight. Given this entropy method can help K Means work process in determining the starting point which is usually determined at random. Thus the process of clustering on K Means can be more quickly known by helping the entropy method where the iteration process is faster than the K Means Standard process. Where the postoperative patient dataset of the UCI Repository Machine Learning used and using only 12 data as an example of its calculations is obtained by entropy method only with 2 times iteration can get the desired end result.

  8. Recommendations for choosing an analysis method that controls Type I error for unbalanced cluster sample designs with Gaussian outcomes.

    Science.gov (United States)

    Johnson, Jacqueline L; Kreidler, Sarah M; Catellier, Diane J; Murray, David M; Muller, Keith E; Glueck, Deborah H

    2015-11-30

    We used theoretical and simulation-based approaches to study Type I error rates for one-stage and two-stage analytic methods for cluster-randomized designs. The one-stage approach uses the observed data as outcomes and accounts for within-cluster correlation using a general linear mixed model. The two-stage model uses the cluster specific means as the outcomes in a general linear univariate model. We demonstrate analytically that both one-stage and two-stage models achieve exact Type I error rates when cluster sizes are equal. With unbalanced data, an exact size α test does not exist, and Type I error inflation may occur. Via simulation, we compare the Type I error rates for four one-stage and six two-stage hypothesis testing approaches for unbalanced data. With unbalanced data, the two-stage model, weighted by the inverse of the estimated theoretical variance of the cluster means, and with variance constrained to be positive, provided the best Type I error control for studies having at least six clusters per arm. The one-stage model with Kenward-Roger degrees of freedom and unconstrained variance performed well for studies having at least 14 clusters per arm. The popular analytic method of using a one-stage model with denominator degrees of freedom appropriate for balanced data performed poorly for small sample sizes and low intracluster correlation. Because small sample sizes and low intracluster correlation are common features of cluster-randomized trials, the Kenward-Roger method is the preferred one-stage approach. Copyright © 2015 John Wiley & Sons, Ltd.

  9. Group analyses of connectivity-based cortical parcellation using repeated k-means clustering.

    Science.gov (United States)

    Nanetti, Luca; Cerliani, Leonardo; Gazzola, Valeria; Renken, Remco; Keysers, Christian

    2009-10-01

    K-means clustering has become a popular tool for connectivity-based cortical segmentation using Diffusion Weighted Imaging (DWI) data. A sometimes ignored issue is, however, that the output of the algorithm depends on the initial placement of starting points, and that different sets of starting points therefore could lead to different solutions. In this study we explore this issue. We apply k-means clustering a thousand times to the same DWI dataset collected in 10 individuals to segment two brain regions: the SMA-preSMA on the medial wall, and the insula. At the level of single subjects, we found that in both brain regions, repeatedly applying k-means indeed often leads to a variety of rather different cortical based parcellations. By assessing the similarity and frequency of these different solutions, we show that approximately 256 k-means repetitions are needed to accurately estimate the distribution of possible solutions. Using nonparametric group statistics, we then propose a method to employ the variability of clustering solutions to assess the reliability with which certain voxels can be attributed to a particular cluster. In addition, we show that the proportion of voxels that can be attributed significantly to either cluster in the SMA and preSMA is relatively higher than in the insula and discuss how this difference may relate to differences in the anatomy of these regions.

  10. Hessian regularization based non-negative matrix factorization for gene expression data clustering.

    Science.gov (United States)

    Liu, Xiao; Shi, Jun; Wang, Congzhi

    2015-01-01

    Since a key step in the analysis of gene expression data is to detect groups of genes that have similar expression patterns, clustering technique is then commonly used to analyze gene expression data. Data representation plays an important role in clustering analysis. The non-negative matrix factorization (NMF) is a widely used data representation method with great success in machine learning. Although the traditional manifold regularization method, Laplacian regularization (LR), can improve the performance of NMF, LR still suffers from the problem of its weak extrapolating power. Hessian regularization (HR) is a newly developed manifold regularization method, whose natural properties make it more extrapolating, especially for small sample data. In this work, we propose the HR-based NMF (HR-NMF) algorithm, and then apply it to represent gene expression data for further clustering task. The clustering experiments are conducted on five commonly used gene datasets, and the results indicate that the proposed HR-NMF outperforms LR-based NMM and original NMF, which suggests the potential application of HR-NMF for gene expression data.

  11. Feature Selection and Kernel Learning for Local Learning-Based Clustering.

    Science.gov (United States)

    Zeng, Hong; Cheung, Yiu-ming

    2011-08-01

    The performance of the most clustering algorithms highly relies on the representation of data in the input space or the Hilbert space of kernel methods. This paper is to obtain an appropriate data representation through feature selection or kernel learning within the framework of the Local Learning-Based Clustering (LLC) (Wu and Schölkopf 2006) method, which can outperform the global learning-based ones when dealing with the high-dimensional data lying on manifold. Specifically, we associate a weight to each feature or kernel and incorporate it into the built-in regularization of the LLC algorithm to take into account the relevance of each feature or kernel for the clustering. Accordingly, the weights are estimated iteratively in the clustering process. We show that the resulting weighted regularization with an additional constraint on the weights is equivalent to a known sparse-promoting penalty. Hence, the weights of those irrelevant features or kernels can be shrunk toward zero. Extensive experiments show the efficacy of the proposed methods on the benchmark data sets.

  12. The swift UVOT stars survey. I. Methods and test clusters

    International Nuclear Information System (INIS)

    Siegel, Michael H.; Porterfield, Blair L.; Linevsky, Jacquelyn S.; Bond, Howard E.; Hoversten, Erik A.; Berrier, Joshua L.; Gronwall, Caryl A.; Holland, Stephen T.; Breeveld, Alice A.; Brown, Peter J.

    2014-01-01

    We describe the motivations and background of a large survey of nearby stellar populations using the Ultraviolet Optical Telescope (UVOT) on board the Swift Gamma-Ray Burst Mission. UVOT, with its wide field, near-UV sensitivity, and 2.″3 spatial resolution, is uniquely suited to studying nearby stellar populations and providing insight into the near-UV properties of hot stars and the contribution of those stars to the integrated light of more distant stellar populations. We review the state of UV stellar photometry, outline the survey, and address problems specific to wide- and crowded-field UVOT photometry. We present color–magnitude diagrams of the nearby open clusters M67, NGC 188, and NGC 2539, and the globular cluster M79. We demonstrate that UVOT can easily discern the young- and intermediate-age main sequences, blue stragglers, and hot white dwarfs, producing results consistent with previous studies. We also find that it characterizes the blue horizontal branch of M79 and easily identifies a known post-asymptotic giant branch star.

  13. Neighborhood clustering of non-communicable diseases: results from a community-based study in Northern Tanzania

    Directory of Open Access Journals (Sweden)

    John W. Stanifer

    2016-03-01

    Full Text Available Abstract Background In order to begin to address the burden of non-communicable diseases (NCDs in sub-Saharan Africa, high quality community-based epidemiological studies from the region are urgently needed. Cluster-designed sampling methods may be most efficient, but designing such studies requires assumptions about the clustering of the outcomes of interest. Currently, few studies from Sub-Saharan Africa have been published that describe the clustering of NCDs. Therefore, we report the neighborhood clustering of several NCDs from a community-based study in Northern Tanzania. Methods We conducted a cluster-designed cross-sectional household survey between January and June 2014. We used a three-stage cluster probability sampling method to select thirty-seven sampling areas from twenty-nine neighborhood clusters, stratified by urban and rural. Households were then randomly selected from each of the sampling areas, and eligible participants were tested for chronic kidney disease (CKD, glucose impairment including diabetes, hypertension, and obesity as part of the CKD-AFRiKA study. We used linear mixed models to explore clustering across each of the samplings units, and we estimated absolute-agreement intra-cluster correlation (ICC coefficients (ρ for the neighborhood clusters. Results We enrolled 481 participants from 346 urban and rural households. Neighborhood cluster sizes ranged from 6 to 49 participants (median: 13.0; 25th–75th percentiles: 9–21. Clustering varied across neighborhoods and differed by urban or rural setting. Among NCDs, hypertension (ρ = 0.075 exhibited the strongest clustering within neighborhoods followed by CKD (ρ = 0.440, obesity (ρ = 0.040, and glucose impairment (ρ = 0.039. Conclusion The neighborhood clustering was substantial enough to contribute to a design effect for NCD outcomes including hypertension, CKD, obesity, and glucose impairment, and it may also highlight NCD risk factors that vary

  14. CLUSS: Clustering of protein sequences based on a new similarity measure

    Directory of Open Access Journals (Sweden)

    Brzezinski Ryszard

    2007-08-01

    Full Text Available Abstract Background The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. In the rest of the paper, we use the term "phylogenetic" in the sense of "relatedness of biological functions". Results To show the effectiveness of CLUSS, we performed an extensive clustering on COG database. To demonstrate its ability to deal with hard-to-align sequences, we tested it on the GH2 family. In addition, we carried out experimental comparisons of CLUSS with a variety of mainstream algorithms. These comparisons were made on hard-to-align and easy-to-align protein sequences. The results of these experiments show the superiority of CLUSS in yielding clusters of proteins

  15. Comparison of clustering methods for tracking features in RGB-D images

    CSIR Research Space (South Africa)

    Pancham, Ardhisha

    2016-10-01

    Full Text Available and the performance of k-means, mean shift, a contrario, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Gaussian Mixture Models (GMM) clustering algorithms are validated in tests with synthetic and RGB-D data. Results indicate that mean...

  16. A first packet processing subdomain cluster model based on SDN

    Science.gov (United States)

    Chen, Mingyong; Wu, Weimin

    2017-08-01

    For the current controller cluster packet processing performance bottlenecks and controller downtime problems. An SDN controller is proposed to allocate the priority of each device in the SDN (Software Defined Network) network, and the domain contains several network devices and Controller, the controller is responsible for managing the network equipment within the domain, the switch performs data delivery based on the load of the controller, processing network equipment data. The experimental results show that the model can effectively solve the risk of single point failure of the controller, and can solve the performance bottleneck of the first packet processing.

  17. Estimating the concrete compressive strength using hard clustering and fuzzy clustering based regression techniques.

    Science.gov (United States)

    Nagwani, Naresh Kumar; Deo, Shirish V

    2014-01-01

    Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm.

  18. Multi-class clustering of cancer subtypes through SVM based ensemble of pareto-optimal solutions for gene marker identification.

    Science.gov (United States)

    Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra; Maulik, Ujjwal

    2010-11-12

    With the advancement of microarray technology, it is now possible to study the expression profiles of thousands of genes across different experimental conditions or tissue samples simultaneously. Microarray cancer datasets, organized as samples versus genes fashion, are being used for classification of tissue samples into benign and malignant or their subtypes. They are also useful for identifying potential gene markers for each cancer subtype, which helps in successful diagnosis of particular cancer types. In this article, we have presented an unsupervised cancer classification technique based on multiobjective genetic clustering of the tissue samples. In this regard, a real-coded encoding of the cluster centers is used and cluster compactness and separation are simultaneously optimized. The resultant set of near-Pareto-optimal solutions contains a number of non-dominated solutions. A novel approach to combine the clustering information possessed by the non-dominated solutions through Support Vector Machine (SVM) classifier has been proposed. Final clustering is obtained by consensus among the clusterings yielded by different kernel functions. The performance of the proposed multiobjective clustering method has been compared with that of several other microarray clustering algorithms for three publicly available benchmark cancer datasets. Moreover, statistical significance tests have been conducted to establish the statistical superiority of the proposed clustering method. Furthermore, relevant gene markers have been identified using the clustering result produced by the proposed clustering method and demonstrated visually. Biological relationships among the gene markers are also studied based on gene ontology. The results obtained are found to be promising and can possibly have important impact in the area of unsupervised cancer classification as well as gene marker identification for multiple cancer subtypes.

  19. Automated three-dimensional morphology-based clustering of human erythrocytes with regular shapes: stomatocytes, discocytes, and echinocytes

    Science.gov (United States)

    Ahmadzadeh, Ezat; Jaferzadeh, Keyvan; Lee, Jieun; Moon, Inkyu

    2017-07-01

    We present unsupervised clustering methods for automatic grouping of human red blood cells (RBCs) extracted from RBC quantitative phase images obtained by digital holographic microscopy into three RBC clusters with regular shapes, including biconcave, stomatocyte, and sphero-echinocyte. We select some good features related to the RBC profile and morphology, such as RBC average thickness, sphericity coefficient, and mean corpuscular volume, and clustering methods, including density-based spatial clustering applications with noise, k-medoids, and k-means, are applied to the set of morphological features. The clustering results of RBCs using a set of three-dimensional features are compared against a set of two-dimensional features. Our experimental results indicate that by utilizing the introduced set of features, two groups of biconcave RBCs and old RBCs (suffering from the sphero-echinocyte process) can be perfectly clustered. In addition, by increasing the number of clusters, the three RBC types can be effectively clustered in an automated unsupervised manner with high accuracy. The performance evaluation of the clustering techniques reveals that they can assist hematologists in further diagnosis.

  20. Heuristic methods using grasp, path relinking and variable neighborhood search for the clustered traveling salesman problem

    Directory of Open Access Journals (Sweden)

    Mário Mestria

    2013-08-01

    Full Text Available The Clustered Traveling Salesman Problem (CTSP is a generalization of the Traveling Salesman Problem (TSP in which the set of vertices is partitioned into disjoint clusters and objective is to find a minimum cost Hamiltonian cycle such that the vertices of each cluster are visited contiguously. The CTSP is NP-hard and, in this context, we are proposed heuristic methods for the CTSP using GRASP, Path Relinking and Variable Neighborhood Descent (VND. The heuristic methods were tested using Euclidean instances with up to 2000 vertices and clusters varying between 4 to 150 vertices. The computational tests were performed to compare the performance of the heuristic methods with an exact algorithm using the Parallel CPLEX software. The computational results showed that the hybrid heuristic method using VND outperforms other heuristic methods.

  1. Regions of micro-calcifications clusters detection based on new features from imbalance data in mammograms

    Science.gov (United States)

    Wang, Keju; Dong, Min; Yang, Zhen; Guo, Yanan; Ma, Yide

    2017-02-01

    Breast cancer is the most common cancer among women. Micro-calcification cluster on X-ray mammogram is one of the most important abnormalities, and it is effective for early cancer detection. Surrounding Region Dependence Method (SRDM), a statistical texture analysis method is applied for detecting Regions of Interest (ROIs) containing microcalcifications. Inspired by the SRDM, we present a method that extract gray and other features which are effective to predict the positive and negative regions of micro-calcifications clusters in mammogram. By constructing a set of artificial images only containing micro-calcifications, we locate the suspicious pixels of calcifications of a SRDM matrix in original image map. Features are extracted based on these pixels for imbalance date and then the repeated random subsampling method and Random Forest (RF) classifier are used for classification. True Positive (TP) rate and False Positive (FP) can reflect how the result will be. The TP rate is 90% and FP rate is 88.8% when the threshold q is 10. We draw the Receiver Operating Characteristic (ROC) curve and the Area Under the ROC Curve (AUC) value reaches 0.9224. The experiment indicates that our method is effective. A novel regions of micro-calcifications clusters detection method is developed, which is based on new features for imbalance data in mammography, and it can be considered to help improving the accuracy of computer aided diagnosis breast cancer.

  2. Methods for clustered encouragement design studies with noncompliance and missing data.

    Science.gov (United States)

    Taylor, Leslie; Zhou, Xiao-Hua

    2011-04-01

    Encouragement design studies are particularly useful for estimating the effect of an intervention that cannot itself be randomly administered to some and not to others. They require a randomly selected group receive extra encouragement to undertake the treatment of interest, where the encouragement typically takes the form of additional information or incentives. We consider a "clustered encouragement design" (CED), where the randomization is at the level of the clusters (e.g. physicians), but the compliance with assignment is at the level of the units (e.g. patients) within clusters. Noncompliance and missing data are particular problems in encouragement design studies, where encouragement to take the treatment, rather than the treatment itself, is randomized. The motivating study looks at whether computer-based care suggestions can improve patient outcomes in veterans with chronic heart failure. Since physician adherence has been inadequate, the original study focused on methods to improve physician adherence, although an equally important question is whether physician adherence improves patient outcomes. Here, we reanalyze the data to determine the effect of physician adherence on patient outcomes. We propose causal inference methodology for the effect of a treatment versus a control in a randomized CED study with all-or-none compliance at the unit level. These methods extend the current approaches to account for nonignorable missing data and use an alternative approach to inference using multiple imputation methods, which have been successfully applied to a wide variety of missing data problems and have recently been applied to the potential outcomes framework of causal inference (Taylor and Zhou, 2009b).

  3. Stability and structure of rare-gas ionic clusters using density functional methods: A study of helium clusters

    Energy Technology Data Exchange (ETDEWEB)

    Gianturco, F.A.; De Lara-Castells, M.P. [Univ. of Rome (Italy)

    1996-10-05

    Several modelings of exchange and correlation forces which can be carried out using density functional theory (DFT) methods have been analyzed to study their efficiency and reliability when evaluating possible competing structures of helium ionic clusters of increasing size. This study examines He{sub n}{sup +} systems with n from 1 to 7 and compares the present calculations with earlier evaluations that used more conventional, and more computationally intensive, methods with configuration interaction (CI) approaches. The present results indicate that it is indeed possible to strike a fruitful balance between reduction of computational times and quality of the ensuing structural information. 62 refs., 1 fig., 8 tabs.

  4. A comparison of latent class, K-means, and K-median methods for clustering dichotomous data.

    Science.gov (United States)

    Brusco, Michael J; Shireman, Emilie; Steinley, Douglas

    2017-09-01

    The problem of partitioning a collection of objects based on their measurements on a set of dichotomous variables is a well-established problem in psychological research, with applications including clinical diagnosis, educational testing, cognitive categorization, and choice analysis. Latent class analysis and K-means clustering are popular methods for partitioning objects based on dichotomous measures in the psychological literature. The K-median clustering method has recently been touted as a potentially useful tool for psychological data and might be preferable to its close neighbor, K-means, when the variable measures are dichotomous. We conducted simulation-based comparisons of the latent class, K-means, and K-median approaches for partitioning dichotomous data. Although all 3 methods proved capable of recovering cluster structure, K-median clustering yielded the best average performance, followed closely by latent class analysis. We also report results for the 3 methods within the context of an application to transitive reasoning data, in which it was found that the 3 approaches can exhibit profound differences when applied to real data. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  5. Data Clustering

    Science.gov (United States)

    Wagstaff, Kiri L.

    2012-03-01

    particular application involves considerations of the kind of data being analyzed, algorithm runtime efficiency, and how much prior knowledge is available about the problem domain, which can dictate the nature of clusters sought. Fundamentally, the clustering method and its representations of clusters carries with it a definition of what a cluster is, and it is important that this be aligned with the analysis goals for the problem at hand. In this chapter, I emphasize this point by identifying for each algorithm the cluster representation as a model, m_j , even for algorithms that are not typically thought of as creating a “model.” This chapter surveys a basic collection of clustering methods useful to any practitioner who is interested in applying clustering to a new data set. The algorithms include k-means (Section 25.2), EM (Section 25.3), agglomerative (Section 25.4), and spectral (Section 25.5) clustering, with side mentions of variants such as kernel k-means and divisive clustering. The chapter also discusses each algorithm’s strengths and limitations and provides pointers to additional in-depth reading for each subject. Section 25.6 discusses methods for incorporating domain knowledge into the clustering process. This chapter concludes with a brief survey of interesting applications of clustering methods to astronomy data (Section 25.7). The chapter begins with k-means because it is both generally accessible and so widely used that understanding it can be considered a necessary prerequisite for further work in the field. EM can be viewed as a more sophisticated version of k-means that uses a generative model for each cluster and probabilistic item assignments. Agglomerative clustering is the most basic form of hierarchical clustering and provides a basis for further exploration of algorithms in that vein. Spectral clustering permits a departure from feature-vector-based clustering and can operate on data sets instead represented as affinity, or similarity

  6. Entropy-Based Incomplete Cholesky Decomposition for a Scalable Spectral Clustering Algorithm: Computational Studies and Sensitivity Analysis

    Directory of Open Access Journals (Sweden)

    Rocco Langone

    2016-05-01

    Full Text Available Spectral clustering methods allow datasets to be partitioned into clusters by mapping the input datapoints into the space spanned by the eigenvectors of the Laplacian matrix. In this article, we make use of the incomplete Cholesky decomposition (ICD to construct an approximation of the graph Laplacian and reduce the size of the related eigenvalue problem from N to m, with m ≪ N . In particular, we introduce a new stopping criterion based on normalized mutual information between consecutive partitions, which terminates the ICD when the change in the cluster assignments is below a given threshold. Compared with existing ICD-based spectral clustering approaches, the proposed method allows the reduction of the number m of selected pivots (i.e., to obtain a sparser model and at the same time, to maintain high clustering quality. The method scales linearly with respect to the number of input datapoints N and has low memory requirements, because only matrices of size N × m and m × m are calculated (in contrast to standard spectral clustering, where the construction of the full N × N similarity matrix is needed. Furthermore, we show that the number of clusters can be reliably selected based on the gap heuristics computed using just a small matrix R of size m × m instead of the entire graph Laplacian. The effectiveness of the proposed algorithm is tested on several datasets.

  7. Identification of rural landscape classes through a GIS clustering method

    Directory of Open Access Journals (Sweden)

    Irene Diti

    2013-09-01

    Full Text Available The paper presents a methodology aimed at supporting the rural planning process. The analysis of the state of the art of local and regional policies focused on rural and suburban areas, and the study of the scientific literature in the field of spatial analysis methodologies, have allowed the definition of the basic concept of the research. The proposed method, developed in a GIS, is based on spatial metrics selected and defined to cover various agricultural, environmental, and socio-economic components. The specific goal of the proposed methodology is to identify homogeneous extra-urban areas through their objective characterization at different scales. Once areas with intermediate urban-rural characters have been identified, the analysis is then focused on the more detailed definition of periurban agricultural areas. The synthesis of the results of the analysis of the various landscape components is achieved through an original interpretative key which aims to quantify the potential impacts of rural areas on the urban system. This paper presents the general framework of the methodology and some of the main results of its first implementation through an Italian case study.

  8. Relation between financial market structure and the real economy: comparison between clustering methods.

    Science.gov (United States)

    Musmeci, Nicoló; Aste, Tomaso; Di Matteo, T

    2015-01-01

    We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing the clustering structure with the underlying industrial activity classification. We apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. By taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover,we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging [corrected].

  9. Crowd Analysis by Using Optical Flow and Density Based Clustering

    DEFF Research Database (Denmark)

    Santoro, Francesco; Pedro, Sergio; Tan, Zheng-Hua

    2010-01-01

    In this paper, we present a system to detect and track crowds in a video sequence captured by a camera. In a first step, we compute optical flows by means of pyramidal Lucas-Kanade feature tracking. Afterwards, a density based clustering is used to group similar vectors. In the last step, it is a......In this paper, we present a system to detect and track crowds in a video sequence captured by a camera. In a first step, we compute optical flows by means of pyramidal Lucas-Kanade feature tracking. Afterwards, a density based clustering is used to group similar vectors. In the last step......, it is applied a crowd tracker in every frame, allowing us to detect and track the crowds. Our system gives the output as a graphic overlay, i.e it adds arrows and colors to the original frame sequence, in order to identify crowds and their movements. For the evaluation, we check when our system detect certains...

  10. An adaptive enhancement algorithm for infrared video based on modified k-means clustering

    Science.gov (United States)

    Zhang, Linze; Wang, Jingqi; Wu, Wen

    2016-09-01

    In this paper, we have proposed a video enhancement algorithm to improve the output video of the infrared camera. Sometimes the video obtained by infrared camera is very dark since there is no clear target. In this case, infrared video should be divided into frame images by frame extraction, in order to carry out the image enhancement. For the first frame image, which can be divided into k sub images by using K-means clustering according to the gray interval it occupies before k sub images' histogram equalization according to the amount of information per sub image, we used a method to solve a problem that final cluster centers close to each other in some cases; and for the other frame images, their initial cluster centers can be determined by the final clustering centers of the previous ones, and the histogram equalization of each sub image will be carried out after image segmentation based on K-means clustering. The histogram equalization can make the gray value of the image to the whole gray level, and the gray level of each sub image is determined by the ratio of pixels to a frame image. Experimental results show that this algorithm can improve the contrast of infrared video where night target is not obvious which lead to a dim scene, and reduce the negative effect given by the overexposed pixels adaptively in a certain range.

  11. Nonnegative Matrix Factorization-Based Spatial-Temporal Clustering for Multiple Sensor Data Streams

    Directory of Open Access Journals (Sweden)

    Di-Hua Sun

    2014-01-01

    Full Text Available Cyber physical systems have grown exponentially and have been attracting a lot of attention over the last few years. To retrieve and mine the useful information from massive amounts of sensor data streams with spatial, temporal, and other multidimensional information has become an active research area. Moreover, recent research has shown that clusters of streams change with a comprehensive spatial-temporal viewpoint in real applications. In this paper, we propose a spatial-temporal clustering algorithm (STClu based on nonnegative matrix trifactorization by utilizing time-series observational data streams and geospatial relationship for clustering multiple sensor data streams. Instead of directly clustering multiple data streams periodically, STClu incorporates the spatial relationship between two sensors in proximity and integrates the historical information into consideration. Furthermore, we develop an iterative updating optimization algorithm STClu. The effectiveness and efficiency of the algorithm STClu are both demonstrated in experiments on real and synthetic data sets. The results show that the proposed STClu algorithm outperforms existing methods for clustering sensor data streams.

  12. Recruiting faith- and non-faith-based schools, adolescents and parents to a cluster randomised sexual-health trial: experiences, challenges and lessons from the mixed-methods Jack Feasibility Trial.

    Science.gov (United States)

    Aventin, Áine; Lohan, Maria; Maguire, Lisa; Clarke, Mike

    2016-07-29

    The move toward evidence-based education has led to increasing numbers of randomised trials in schools. However, the literature on recruitment to non-clinical trials is relatively underdeveloped, when compared to that of clinical trials. Recruitment to school-based randomised trials is, however, challenging, even more so when the focus of the study is a sensitive issue such as sexual health. This article reflects on the challenges of recruiting post-primary schools, adolescent pupils and parents to a cluster randomised feasibility trial of a sexual-health intervention, and the strategies employed to address them. The Jack Trial was funded by the UK National Institute for Health Research. It comprised a feasibility study of an interactive film-based sexual-health intervention entitled If I Were Jack, recruiting over 800 adolescents from eight socio-demographically diverse post-primary schools in Northern Ireland. It aimed to determine the facilitators and barriers to recruitment and retention to a school-based sexual-health trial and identify optimal multi-level strategies for an effectiveness study. As part of an embedded process evaluation, we conducted semi-structured interviews and focus groups with principals, vice-principals, teachers, pupils and parents recruited to the study as well as classroom observations and a parents' survey. With reference to social learning theory, we identified a number of individual-, behavioural- and environmental-level factors that influenced recruitment. Commonly identified facilitators included perceptions of the relevance and potential benefit of the intervention to adolescents, the credibility of the organisation and individuals running the study, support offered by trial staff, and financial incentives. Key barriers were prior commitment to other research, lack of time and resources, and perceptions that the intervention was incompatible with pupil or parent needs or the school ethos. Reflecting on the methodological

  13. FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data.

    Science.gov (United States)

    Fu, Limin; Medico, Enzo

    2007-01-04

    Data clustering analysis has been extensively applied to extract information from gene expression profiles obtained with DNA microarrays. To this aim, existing clustering approaches, mainly developed in computer science, have been adapted to microarray data analysis. However, previous studies revealed that microarray datasets have very diverse structures, some of which may not be correctly captured by current clustering methods. We therefore approached the problem from a new starting point, and developed a clustering algorithm designed to capture dataset-specific structures at the beginning of the process. The clustering algorithm is named Fuzzy clustering by Local Approximation of MEmbership (FLAME). Distinctive elements of FLAME are: (i) definition of the neighborhood of each object (gene or sample) and identification of objects with "archetypal" features named Cluster Supporting Objects, around which to construct the clusters; (ii) assignment to each object of a fuzzy membership vector approximated from the memberships of its neighboring objects, by an iterative converging process in which membership spreads from the Cluster Supporting Objects through their neighbors. Comparative analysis with K-means, hierarchical, fuzzy C-means and fuzzy self-organizing maps (SOM) showed that data partitions generated by FLAME are not superimposable to those of other methods and, although different types of datasets are better partitioned by different algorithms, FLAME displays the best overall performance. FLAME is implemented, together with all the above-mentioned algorithms, in a C++ software with graphical interface for Linux and Windows, capable of handling very large datasets, named Gene Expression Data Analysis Studio (GEDAS), freely available under GNU General Public License. The FLAME algorithm has intrinsic advantages, such as the ability to capture non-linear relationships and non-globular clusters, the automated definition of the number of clusters, and the

  14. A Clustering K-Anonymity Privacy-Preserving Method for Wearable IoT Devices

    Directory of Open Access Journals (Sweden)

    Fang Liu

    2018-01-01

    Full Text Available Wearable technology is one of the greatest applications of the Internet of Things. The popularity of wearable devices has led to a massive scale of personal (user-specific data. Generally, data holders (manufacturers of wearable devices are willing to share these data with others to get benefits. However, significant privacy concerns would arise when sharing the data with the third party in an improper manner. In this paper, we first propose a specific threat model about the data sharing process of wearable devices’ data. Then we propose a K-anonymity method based on clustering to preserve privacy of wearable IoT devices’ data and guarantee the usability of the collected data. Experiment results demonstrate the effectiveness of the proposed method.

  15. Phenotypic clustering: a novel method for microglial morphology analysis.

    Science.gov (United States)

    Verdonk, Franck; Roux, Pascal; Flamant, Patricia; Fiette, Laurence; Bozza, Fernando A; Simard, Sébastien; Lemaire, Marc; Plaud, Benoit; Shorte, Spencer L; Sharshar, Tarek; Chrétien, Fabrice; Danckaert, Anne

    2016-06-17

    Microglial cells are tissue-resident macrophages of the central nervous system. They are extremely dynamic, sensitive to their microenvironment and present a characteristic complex and heterogeneous morphology and distribution within the brain tissue. Many experimental clues highlight a strong link between their morphology and their function in response to aggression. However, due to their complex "dendritic-like" aspect that constitutes the major pool of murine microglial cells and their dense network, precise and powerful morphological studies are not easy to realize and complicate correlation with molecular or clinical parameters. Using the knock-in mouse model CX3CR1(GFP/+), we developed a 3D automated confocal tissue imaging system coupled with morphological modelling of many thousands of microglial cells revealing precise and quantitative assessment of major cell features: cell density, cell body area, cytoplasm area and number of primary, secondary and tertiary processes. We determined two morphological criteria that are the complexity index (CI) and the covered environment area (CEA) allowing an innovative approach lying in (i) an accurate and objective study of morphological changes in healthy or pathological condition, (ii) an in situ mapping of the microglial distribution in different neuroanatomical regions and (iii) a study of the clustering of numerous cells, allowing us to discriminate different sub-populations. Our results on more than 20,000 cells by condition confirm at baseline a regional heterogeneity of the microglial distribution and phenotype that persists after induction of neuroinflammation by systemic injection of lipopolysaccharide (LPS). Using clustering analysis, we highlight that, at resting state, microglial cells are distributed in four microglial sub-populations defined by their CI and CEA with a regional pattern and a specific behaviour after challenge. Our results counteract the classical view of a homogenous regional resting

  16. Bootstrap-Based Improvements for Inference with Clustered Errors

    OpenAIRE

    Doug Miller; A. Colin Cameron; Jonah B. Gelbach

    2006-01-01

    Microeconometrics researchers have increasingly realized the essential need to account for any within-group dependence in estimating standard errors of regression parameter estimates. The typical preferred solution is to calculate cluster-robust or sandwich standard errors that permit quite general heteroskedasticity and within-cluster error correlation, but presume that the number of clusters is large. In applications with few (5-30) clusters, standard asymptotic tests can over-reject consid...

  17. A PSO-Based Subtractive Data Clustering Algorithm

    OpenAIRE

    Gamal Abdel-Azeem; Mahmoud Marie; Rehab Abdel-Kader; Mariam El-Tarabily

    2013-01-01

    There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algori...

  18. The multi-scattering-Xα method for analysis of the electronic structure of atomic clusters

    International Nuclear Information System (INIS)

    Bahurmuz, A.A.; Woo, C.H.

    1984-12-01

    A computer program, MSXALPHA, has been developed to carry out a quantum-mechanical analysis of the electronic structure of molecules and atomic clusters using the Multi-Scattering-Xα (MSXα) method. The MSXALPHA program is based on a code obtained from the University of Alberta; several improvements and new features were incorporated to increase generality and efficiency. The major ones are: (1) minimization of core memory usage, (2) reduction of execution time, (3) introduction of a dynamic core allocation scheme for a large number of arrays, (4) incorporation of an atomic program to generate numerical orbitals used to construct the initial molecular potential, and (5) inclusion of a routine to evaluate total energy. This report is divided into three parts. The first discusses the theory of the MSXα method. The second gives a detailed description of the program, MSXALPHA. The third discusses the results of calculations carried out for the methane molecule (CH 4 ) and a four-atom zirconium cluster (Zr 4 )

  19. A method for improved clustering and classification of microscopy images using quantitative co-localization coefficients

    LENUS (Irish Health Repository)

    Singan, Vasanth R

    2012-06-08

    AbstractBackgroundThe localization of proteins to specific subcellular structures in eukaryotic cells provides important information with respect to their function. Fluorescence microscopy approaches to determine localization distribution have proved to be an essential tool in the characterization of unknown proteins, and are now particularly pertinent as a result of the wide availability of fluorescently-tagged constructs and antibodies. However, there are currently very few image analysis options able to effectively discriminate proteins with apparently similar distributions in cells, despite this information being important for protein characterization.FindingsWe have developed a novel method for combining two existing image analysis approaches, which results in highly efficient and accurate discrimination of proteins with seemingly similar distributions. We have combined image texture-based analysis with quantitative co-localization coefficients, a method that has traditionally only been used to study the spatial overlap between two populations of molecules. Here we describe and present a novel application for quantitative co-localization, as applied to the study of Rab family small GTP binding proteins localizing to the endomembrane system of cultured cells.ConclusionsWe show how quantitative co-localization can be used alongside texture feature analysis, resulting in improved clustering of microscopy images. The use of co-localization as an additional clustering parameter is non-biased and highly applicable to high-throughput image data sets.

  20. Non-hierarchical clustering of Manihot esculenta Crantz germplasm based on quantitative traits

    OpenAIRE

    Oliveira, Eder Jorge de; Aud, Fabiana Ferraz; Morales, Cinara Fernanda Garcia; Oliveira, Saulo Alves Santos de; Santos, Vanderlei da Silva

    2016-01-01

    ABSTRACT The knowledge of the phenotypic variation of cassava (Manihot esculenta Crantz) germplasm allows the estimative of the genetic variability to support the selection of contrasting genitors. Therefore, the aim of this work was to define homogeneous groups of cassava germplasm based on yield traits, disease resistance and root quality using K-means as a non-hierarchical method. Breeding values estimated by Best Linear Unbiased Predictor (BLUP) were used for the cluster analysis. The num...

  1. Clustering the objective interestingness measures based on tendency of variation in statistical implications

    Directory of Open Access Journals (Sweden)

    Nghia Quoc Phan

    2016-05-01

    Full Text Available In recent years, the research cluster of objective interestingness measures has rapidly developed in order to assist users to choose the appropriate measure for their application. Researchers in this field mainly focus on three main directions: clustering based on the properties of the measures, clustering based on the behavior of measures and clustering tendency of variation in statistical implications. In this paper we propose a new approach to cluster the objective interestingness measures based on tendency of variation in statistical implications. In this proposal, we built the statistical implication data of 31 objective interestingness measures based on the examination of the partial derivatives on four parameters. From this data, two distance matrices of interestingness measures are established based on Euclidean and Manhattan distance. The similarity trees are built based on distance matrix that gave results of 31 measures clustering with two different clustering thresholds.

  2. [Predicting Incidence of Hepatitis E in Chinausing Fuzzy Time Series Based on Fuzzy C-Means Clustering Analysis].

    Science.gov (United States)

    Luo, Yi; Zhang, Tao; Li, Xiao-song

    2016-05-01

    To explore the application of fuzzy time series model based on fuzzy c-means clustering in forecasting monthly incidence of Hepatitis E in mainland China. Apredictive model (fuzzy time series method based on fuzzy c-means clustering) was developed using Hepatitis E incidence data in mainland China between January 2004 and July 2014. The incidence datafrom August 2014 to November 2014 were used to test the fitness of the predictive model. The forecasting results were compared with those resulted from traditional fuzzy time series models. The fuzzy time series model based on fuzzy c-means clustering had 0.001 1 mean squared error (MSE) of fitting and 6.977 5 x 10⁻⁴ MSE of forecasting, compared with 0.0017 and 0.0014 from the traditional forecasting model. The results indicate that the fuzzy time series model based on fuzzy c-means clustering has a better performance in forecasting incidence of Hepatitis E.

  3. Methodology сomparative statistical analysis of Russian industry based on cluster analysis

    Directory of Open Access Journals (Sweden)

    Sergey S. Shishulin

    2017-01-01

    Full Text Available The article is devoted to researching of the possibilities of applying multidimensional statistical analysis in the study of industrial production on the basis of comparing its growth rates and structure with other developed and developing countries of the world. The purpose of this article is to determine the optimal set of statistical methods and the results of their application to industrial production data, which would give the best acce