WorldWideScience

Sample records for clustering tests based

  1. Model-based testing of a vehicle instrument cluster for design validation using machine vision

    International Nuclear Information System (INIS)

    Huang, Yingping; McMurran, Ross; Dhadyalla, Gunwant; Jones, R Peter; Mouzakitis, Alexandros

    2009-01-01

    This paper presents an advanced testing system, combining model-based testing and machine vision technologies, for automated design validation of a vehicle instrument cluster. In the system, a hardware-in-the-loop (HIL) tester, supported by model-based approaches, simulates vehicle operations in real time and dynamically provides all essential signals to the instrument cluster under test. A machine vision system with advanced image processing algorithms is designed to inspect the visual displays. Experiments demonstrate that the system developed is accurate for measuring the pointer position, bar graph position, pointer angular velocity and indicator flash rate, and is highly robust for validating various functionalities including warning lights status, symbol and text displays. Moreover, the system developed greatly eases the task of tedious validation testing and makes onerous repeated tests possible

  2. Family-based clusters of cognitive test performance in familial schizophrenia

    Directory of Open Access Journals (Sweden)

    Partonen Timo

    2004-07-01

    Full Text Available Abstract Background Cognitive traits derived from neuropsychological test data are considered to be potential endophenotypes of schizophrenia. Previously, these traits have been found to form a valid basis for clustering samples of schizophrenia patients into homogeneous subgroups. We set out to identify such clusters, but apart from previous studies, we included both schizophrenia patients and family members into the cluster analysis. The aim of the study was to detect family clusters with similar cognitive test performance. Methods Test scores from 54 randomly selected families comprising at least two siblings with schizophrenia spectrum disorders, and at least two unaffected family members were included in a complete-linkage cluster analysis with interactive data visualization. Results A well-performing, an impaired, and an intermediate family cluster emerged from the analysis. While the neuropsychological test scores differed significantly between the clusters, only minor differences were observed in the clinical variables. Conclusions The visually aided clustering algorithm was successful in identifying family clusters comprising both schizophrenia patients and their relatives. The present classification method may serve as a basis for selecting phenotypically more homogeneous groups of families in subsequent genetic analyses.

  3. Uncovering and testing the fuzzy clusters based on lumped Markov chain in complex network.

    Science.gov (United States)

    Jing, Fan; Jianbin, Xie; Jinlong, Wang; Jinshuai, Qu

    2013-01-01

    Identifying clusters, namely groups of nodes with comparatively strong internal connectivity, is a fundamental task for deeply understanding the structure and function of a network. By means of a lumped Markov chain model of a random walker, we propose two novel ways of inferring the lumped markov transition matrix. Furthermore, some useful results are proposed based on the analysis of the properties of the lumped Markov process. To find the best partition of complex networks, a novel framework including two algorithms for network partition based on the optimal lumped Markovian dynamics is derived to solve this problem. The algorithms are constructed to minimize the objective function under this framework. It is demonstrated by the simulation experiments that our algorithms can efficiently determine the probabilities with which a node belongs to different clusters during the learning process and naturally supports the fuzzy partition. Moreover, they are successfully applied to real-world network, including the social interactions between members of a karate club.

  4. A voxelation-corrected non-stationary 3D cluster-size test based on random field theory.

    Science.gov (United States)

    Li, Huanjie; Nickerson, Lisa D; Zhao, Xuna; Nichols, Thomas E; Gao, Jia-Hong

    2015-09-01

    Cluster-size tests (CSTs) based on random field theory (RFT) are commonly adopted to identify significant differences in brain images. However, the use of RFT in CSTs rests on the assumption of uniform smoothness (stationarity). When images are non-stationary, CSTs based on RFT will likely lead to increased false positives in smooth regions and reduced power in rough regions. An adjustment to the cluster size according to the local smoothness at each voxel has been proposed for the standard test based on RFT to address non-stationarity, however, this technique requires images with a large degree of spatial smoothing, large degrees of freedom and high intensity thresholding. Recently, we proposed a voxelation-corrected 3D CST based on Gaussian random field theory that does not place constraints on the degree of spatial smoothness. However, this approach is only applicable to stationary images, requiring further modification to enable use for non-stationary images. In this study, we present modifications of this method to develop a voxelation-corrected non-stationary 3D CST based on RFT. Both simulated and real data were used to compare the voxelation-corrected non-stationary CST to the standard cluster-size adjusted non-stationary CST based on RFT and the voxelation-corrected stationary CST. We found that voxelation-corrected stationary CST is liberal for non-stationary images and the voxelation-corrected non-stationary CST performs better than cluster-size adjusted non-stationary CST based on RFT under low smoothness, low intensity threshold and low degrees of freedom. Published by Elsevier Inc.

  5. International Network Performance and Security Testing Based on Distributed Abyss Storage Cluster and Draft of Data Lake Framework

    Directory of Open Access Journals (Sweden)

    ByungRae Cha

    2018-01-01

    Full Text Available The megatrends and Industry 4.0 in ICT (Information Communication & Technology are concentrated in IoT (Internet of Things, BigData, CPS (Cyber Physical System, and AI (Artificial Intelligence. These megatrends do not operate independently, and mass storage technology is essential as large computing technology is needed in the background to support them. In order to evaluate the performance of high-capacity storage based on open source Ceph, we carry out the network performance test of Abyss storage with domestic and overseas sites using KOREN (Korea Advanced Research Network. And storage media and network bonding are tested to evaluate the performance of the storage itself. Additionally, the security test is demonstrated by Cuckoo sandbox and Yara malware detection among Abyss storage cluster and oversea sites. Lastly, we have proposed the draft design of Data Lake framework in order to solve garbage dump problem.

  6. Relation chain based clustering analysis

    Science.gov (United States)

    Zhang, Cheng-ning; Zhao, Ming-yang; Luo, Hai-bo

    2011-08-01

    Clustering analysis is currently one of well-developed branches in data mining technology which is supposed to find the hidden structures in the multidimensional space called feature or pattern space. A datum in the space usually possesses a vector form and the elements in the vector represent several specifically selected features. These features are often of efficiency to the problem oriented. Generally, clustering analysis goes into two divisions: one is based on the agglomerative clustering method, and the other one is based on divisive clustering method. The former refers to a bottom-up process which regards each datum as a singleton cluster while the latter refers to a top-down process which regards entire data as a cluster. As the collected literatures, it is noted that the divisive clustering is currently overwhelming both in application and research. Although some famous divisive clustering methods are designed and well developed, clustering problems are still far from being solved. The k - means algorithm is the original divisive clustering method which initially assigns some important index values, such as the clustering number and the initial clustering prototype positions, and that could not be reasonable in some certain occasions. More than the initial problem, the k - means algorithm may also falls into local optimum, clusters in a rigid way and is not available for non-Gaussian distribution. One can see that seeking for a good or natural clustering result, in fact, originates from the one's understanding of the concept of clustering. Thus, the confusion or misunderstanding of the definition of clustering always derives some unsatisfied clustering results. One should consider the definition deeply and seriously. This paper demonstrates the nature of clustering, gives the way of understanding clustering, discusses the methodology of designing a clustering algorithm, and proposes a new clustering method based on relation chains among 2D patterns. In

  7. Testing use of payers to facilitate evidence-based practice adoption: protocol for a cluster-randomized trial.

    Science.gov (United States)

    Molfenter, Todd; Kim, Jee-Seon; Quanbeck, Andrew; Patel-Porter, Terry; Starr, Sandy; McCarty, Dennis

    2013-05-10

    More effective methods are needed to implement evidence-based findings into practice. The Advancing Recovery Framework offers a multi-level approach to evidence-based practice implementation by aligning purchasing and regulatory policies at the payer level with organizational change strategies at the organizational level. The Advancing Recovery Buprenorphine Implementation Study is a cluster-randomized controlled trial designed to increase use of the evidence-based practice buprenorphine medication to treat opiate addiction. Ohio Alcohol, Drug Addiction, and Mental Health Services Boards (ADAMHS), who are payers, and their addiction treatment organizations were recruited for a trial to assess the effects of payer and treatment organization changes (using the Advancing Recovery Framework) versus treatment organization changes alone on the use of buprenorphine. A matched-pair randomization, based on county characteristics, was applied, resulting in seven county ADAMHS boards and twenty-five treatment organizations in each arm. Opioid dependent patients are nested within cluster (treatment organization), and treatment organization clusters are nested within ADAMHS county board. The primary outcome is the percentage of individuals with an opioid dependence diagnosis who use buprenorphine during the 24-month intervention period and the 12-month sustainability period. The trial is currently in the baseline data collection stage. Although addiction treatment providers are under increasing pressure to implement evidence-based practices that have been proven to improve patient outcomes, adoption of these practices lags, compared to other areas of healthcare. Reasons frequently cited for the slow adoption of EBPs in addiction treatment include, regulatory issues, staff, or client resistance and lack of resources. Yet the way addiction treatment is funded, the payer's role-has not received a lot of attention in research on EBP adoption.This research is unique because it

  8. Testing use of payers to facilitate evidence-based practice adoption: protocol for a cluster-randomized trial

    OpenAIRE

    Molfenter, Todd; Kim, Jee-Seon; Quanbeck, Andrew; Patel-Porter, Terry; Starr, Sandy; McCarty, Dennis

    2013-01-01

    Background More effective methods are needed to implement evidence-based findings into practice. The Advancing Recovery Framework offers a multi-level approach to evidence-based practice implementation by aligning purchasing and regulatory policies at the payer level with organizational change strategies at the organizational level. Methods The Advancing Recovery Buprenorphine Implementation Study is a cluster-randomized controlled trial designed to increase use of the evidence-based practice...

  9. Cluster Based Text Classification Model

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases......, the classifier is trained on each cluster having reduced dimensionality and less number of examples. The experimental results show that the proposed model outperforms the existing classification models for the task of suspicious email detection and topic categorization on the Reuters-21578 and 20 Newsgroups...... datasets. Our model also outperforms A Decision Cluster Classification (ADCC) and the Decision Cluster Forest Classification (DCFC) models on the Reuters-21578 dataset....

  10. Distribution-Based Cluster Structure Selection.

    Science.gov (United States)

    Yu, Zhiwen; Zhu, Xianjun; Wong, Hau-San; You, Jane; Zhang, Jun; Han, Guoqiang

    2017-11-01

    The objective of cluster structure ensemble is to find a unified cluster structure from multiple cluster structures obtained from different datasets. Unfortunately, not all the cluster structures contribute to the unified cluster structure. This paper investigates the problem of how to select the suitable cluster structures in the ensemble which will be summarized to a more representative cluster structure. Specifically, the cluster structure is first represented by a mixture of Gaussian distributions, the parameters of which are estimated using the expectation-maximization algorithm. Then, several distribution-based distance functions are designed to evaluate the similarity between two cluster structures. Based on the similarity comparison results, we propose a new approach, which is referred to as the distribution-based cluster structure ensemble (DCSE) framework, to find the most representative unified cluster structure. We then design a new technique, the distribution-based cluster structure selection strategy (DCSSS), to select a subset of cluster structures. Finally, we propose using a distribution-based normalized hypergraph cut algorithm to generate the final result. In our experiments, a nonparametric test is adopted to evaluate the difference between DCSE and its competitors. We adopt 20 real-world datasets obtained from the University of California, Irvine and knowledge extraction based on evolutionary learning repositories, and a number of cancer gene expression profiles to evaluate the performance of the proposed methods. The experimental results show that: 1) DCSE works well on the real-world datasets and 2) DCSE based on DCSSS can further improve the performance of the algorithm.

  11. Advances in Significance Testing for Cluster Detection

    Science.gov (United States)

    Coleman, Deidra Andrea

    surveillance data while controlling the Bayesian False Discovery Rate (BFDR). The procedure entails choosing an appropriate Bayesian model that captures the spatial dependency inherent in epidemiological data and considers all days of interest, selecting a test statistic based on a chosen measure that provides the magnitude of the maximumal spatial cluster for each day, and identifying a cutoff value that controls the BFDR for rejecting the collective null hypothesis of no outbreak over a collection of days for a specified region.We use our procedure to analyze botulism-like syndrome data collected by the North Carolina Disease Event Tracking and Epidemiologic Collection Tool (NC DETECT).

  12. Normalization based K means Clustering Algorithm

    OpenAIRE

    Virmani, Deepali; Taneja, Shweta; Malhotra, Geetika

    2015-01-01

    K-means is an effective clustering technique used to separate similar data into groups based on initial centroids of clusters. In this paper, Normalization based K-means clustering algorithm(N-K means) is proposed. Proposed N-K means clustering algorithm applies normalization prior to clustering on the available data as well as the proposed approach calculates initial centroids based on weights. Experimental results prove the betterment of proposed N-K means clustering algorithm over existing...

  13. Testing cosmology with galaxy clusters

    DEFF Research Database (Denmark)

    Rapetti Serra, David Angelo

    2011-01-01

    PASCOS 2011 will be held in Cambridge UK. The conference will be hosted by the Centre for Theoretical Cosmology (DAMTP) at the Mathematical Sciences site in the University of Cambridge. The aim of the conference is to explore and develop synergies between particle physics, string theory and cosmo......PASCOS 2011 will be held in Cambridge UK. The conference will be hosted by the Centre for Theoretical Cosmology (DAMTP) at the Mathematical Sciences site in the University of Cambridge. The aim of the conference is to explore and develop synergies between particle physics, string theory...... and cosmology. There will be an emphasis on timely interdisciplinary topics: • critical tests of inflationary cosmology • advances in fundamental cosmology • applications of string theory (AdS/CMT) • particle and string phenomenology • new experimental particle physics results • and cosmological probes...

  14. Cluster forest based fuzzy logic for massive data clustering

    Science.gov (United States)

    Lahmar, Ines; Ben Ayed, Abdelkarim; Ben Halima, Mohamed; Alimi, Adel M.

    2017-03-01

    This article is focused in developing an improved cluster ensemble method based cluster forests. Cluster forests (CF) is considered as a version of clustering inspired from Random Forests (RF) in the context of clustering for massive data. It aggregates intermediate Fuzzy C-Means (FCM) clustering results via spectral clustering since pseudo-clustering results are presented in the spectral space in order to classify these data sets in the multidimensional data space. One of the main advantages is the use of FCM, which allows building fuzzy membership to all partitions of the datasets due to the fuzzy logic whereas the classical algorithms as K-means permitted to build just hard partitions. In the first place, we ameliorate the CF clustering algorithm with the integration of fuzzy FCM and we compare it with other existing clustering methods. In the second place, we compare K-means and FCM clustering methods with the agglomerative hierarchical clustering (HAC) and other theory presented methods using data benchmarks from UCI repository.

  15. Projection-based curve clustering

    International Nuclear Information System (INIS)

    Auder, Benjamin; Fischer, Aurelie

    2012-01-01

    This paper focuses on unsupervised curve classification in the context of nuclear industry. At the Commissariat a l'Energie Atomique (CEA), Cadarache (France), the thermal-hydraulic computer code CATHARE is used to study the reliability of reactor vessels. The code inputs are physical parameters and the outputs are time evolution curves of a few other physical quantities. As the CATHARE code is quite complex and CPU time-consuming, it has to be approximated by a regression model. This regression process involves a clustering step. In the present paper, the CATHARE output curves are clustered using a k-means scheme, with a projection onto a lower dimensional space. We study the properties of the empirically optimal cluster centres found by the clustering method based on projections, compared with the 'true' ones. The choice of the projection basis is discussed, and an algorithm is implemented to select the best projection basis among a library of orthonormal bases. The approach is illustrated on a simulated example and then applied to the industrial problem. (authors)

  16. Spanning Tree Based Attribute Clustering

    DEFF Research Database (Denmark)

    Zeng, Yifeng; Jorge, Cordero Hernandez

    2009-01-01

    inconsistent edges from a maximum spanning tree by starting appropriate initial modes, therefore generating stable clusters. It discovers sound clusters through simple graph operations and achieves significant computational savings. We compare the Star Discovery algorithm against earlier attribute clustering...

  17. ADVANCED CLUSTER BASED IMAGE SEGMENTATION

    Directory of Open Access Journals (Sweden)

    D. Kesavaraja

    2011-11-01

    Full Text Available This paper presents efficient and portable implementations of a useful image segmentation technique which makes use of the faster and a variant of the conventional connected components algorithm which we call parallel Components. In the Modern world majority of the doctors are need image segmentation as the service for various purposes and also they expect this system is run faster and secure. Usually Image segmentation Algorithms are not working faster. In spite of several ongoing researches in Conventional Segmentation and its Algorithms might not be able to run faster. So we propose a cluster computing environment for parallel image Segmentation to provide faster result. This paper is the real time implementation of Distributed Image Segmentation in Clustering of Nodes. We demonstrate the effectiveness and feasibility of our method on a set of Medical CT Scan Images. Our general framework is a single address space, distributed memory programming model. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. The image segmentation algorithm makes use of an efficient cluster process which uses a novel approach for parallel merging. Our experimental results are consistent with the theoretical analysis and practical results. It provides the faster execution time for segmentation, when compared with Conventional method. Our test data is different CT scan images from the Medical database. More efficient implementations of Image Segmentation will likely result in even faster execution times.

  18. Cluster identification based on correlations.

    Science.gov (United States)

    Schulman, L S

    2012-04-01

    The problem addressed is the identification of cooperating agents based on correlations created as a result of the joint action of these and other agents. A systematic method for using correlations beyond second moments is developed. The technique is applied to a didactic example, the identification of alphabet letters based on correlations among the pixels used in an image of the letter. As in this example, agents can belong to more than one cluster. Moreover, the identification scheme does not require that the patterns be known ahead of time.

  19. Cluster identification based on correlations

    Science.gov (United States)

    Schulman, L. S.

    2012-04-01

    The problem addressed is the identification of cooperating agents based on correlations created as a result of the joint action of these and other agents. A systematic method for using correlations beyond second moments is developed. The technique is applied to a didactic example, the identification of alphabet letters based on correlations among the pixels used in an image of the letter. As in this example, agents can belong to more than one cluster. Moreover, the identification scheme does not require that the patterns be known ahead of time.

  20. Testing chameleon gravity with the Coma cluster

    International Nuclear Information System (INIS)

    Terukina, Ayumu; Yamamoto, Kazuhiro; Lombriser, Lucas; Bacon, David; Koyama, Kazuya; Nichol, Robert C.

    2014-01-01

    We propose a novel method to test the gravitational interactions in the outskirts of galaxy clusters. When gravity is modified, this is typically accompanied by the introduction of an additional scalar degree of freedom, which mediates an attractive fifth force. The presence of an extra gravitational coupling, however, is tightly constrained by local measurements. In chameleon modifications of gravity, local tests can be evaded by employing a screening mechanism that suppresses the fifth force in dense environments. While the chameleon field may be screened in the interior of the cluster, its outer region can still be affected by the extra force, introducing a deviation between the hydrostatic and lensing mass of the cluster. Thus, the chameleon modification can be tested by combining the gas and lensing measurements of the cluster. We demonstrate the operability of our method with the Coma cluster, for which both a lensing measurement and gas observations from the X-ray surface brightness, the X-ray temperature, and the Sunyaev-Zel'dovich effect are available. Using the joint observational data set, we perform a Markov chain Monte Carlo analysis of the parameter space describing the different profiles in both the Newtonian and chameleon scenarios. We report competitive constraints on the chameleon field amplitude and its coupling strength to matter. In the case of f(R) gravity, corresponding to a specific choice of the coupling, we find an upper bound on the background field amplitude of |f R0 | < 6 × 10 −5 , which is currently the tightest constraint on cosmological scales

  1. Testing chameleon gravity with the Coma cluster

    Energy Technology Data Exchange (ETDEWEB)

    Terukina, Ayumu; Yamamoto, Kazuhiro [Department of Physical Science, Hiroshima University, Higashi-Hiroshima, Kagamiyama 1-3-1, 739-8526 (Japan); Lombriser, Lucas; Bacon, David; Koyama, Kazuya; Nichol, Robert C., E-mail: telkina@theo.phys.sci.hiroshima-u.ac.jp, E-mail: lucas.lombriser@port.ac.uk, E-mail: kazuhiro@hiroshima-u.ac.jp, E-mail: david.bacon@port.ac.uk, E-mail: kazuya.koyama@port.ac.uk, E-mail: bob.nichol@port.ac.uk [Institute of Cosmology and Gravitation, University of Portsmouth, Dennis Sciama Building, Portsmouth, PO1 3FX (United Kingdom)

    2014-04-01

    We propose a novel method to test the gravitational interactions in the outskirts of galaxy clusters. When gravity is modified, this is typically accompanied by the introduction of an additional scalar degree of freedom, which mediates an attractive fifth force. The presence of an extra gravitational coupling, however, is tightly constrained by local measurements. In chameleon modifications of gravity, local tests can be evaded by employing a screening mechanism that suppresses the fifth force in dense environments. While the chameleon field may be screened in the interior of the cluster, its outer region can still be affected by the extra force, introducing a deviation between the hydrostatic and lensing mass of the cluster. Thus, the chameleon modification can be tested by combining the gas and lensing measurements of the cluster. We demonstrate the operability of our method with the Coma cluster, for which both a lensing measurement and gas observations from the X-ray surface brightness, the X-ray temperature, and the Sunyaev-Zel'dovich effect are available. Using the joint observational data set, we perform a Markov chain Monte Carlo analysis of the parameter space describing the different profiles in both the Newtonian and chameleon scenarios. We report competitive constraints on the chameleon field amplitude and its coupling strength to matter. In the case of f(R) gravity, corresponding to a specific choice of the coupling, we find an upper bound on the background field amplitude of |f{sub R0}| < 6 × 10{sup −5}, which is currently the tightest constraint on cosmological scales.

  2. Effect of a congregation-based intervention on uptake of HIV testing and linkage to care in pregnant women in Nigeria (Baby Shower): a cluster randomised trial.

    Science.gov (United States)

    Ezeanolue, Echezona E; Obiefune, Michael C; Ezeanolue, Chinenye O; Ehiri, John E; Osuji, Alice; Ogidi, Amaka G; Hunt, Aaron T; Patel, Dina; Yang, Wei; Pharr, Jennifer; Ogedegbe, Gbenga

    2015-11-01

    Few effective community-based interventions exist to increase HIV testing and uptake of antiretroviral therapy (ART) in pregnant women in hard-to-reach resource-limited settings. We assessed whether delivery of an intervention through churches, the Healthy Beginning Initiative, would increase uptake of HIV testing in pregnant women compared with standard health facility referral. In this cluster randomised trial, we enrolled self-identified pregnant women aged 18 years and older who attended churches in southeast Nigeria. We randomised churches (clusters) to intervention or control groups, stratified by mean annual number of infant baptisms (showers in intervention group churches, whereas participants in control group churches were referred to health facilities as standard. Participants and investigators were aware of church allocation. The primary outcome was confirmed HIV testing. This trial is registered with ClinicalTrials.gov, identifier number NCT 01795261. Between Jan 20, 2013, and Aug 31, 2014, we enrolled 3002 participants at 40 churches (20 per group). 1309 (79%) of 1647 women attended antenatal care in the intervention group compared with 1080 (80%) of 1355 in the control group. 1514 women (92%) in the intervention group had an HIV test compared with 740 (55%) controls (adjusted odds ratio 11·2, 95% CI 8·77-14·25; p<0·0001). Culturally adapted, community-based programmes such as the Healthy Beginning Initiative can be effective in increasing HIV screening in pregnant women in resource-limited settings. US National Institutes of Health and US President's Emergency Plan for AIDS Relief. Copyright © 2015 Ezeanolue et al. Open Access article published under the terms of CC BY-NC-ND. Published by Elsevier Ltd.. All rights reserved.

  3. A Test for Cluster Bias: Detecting Violations of Measurement Invariance across Clusters in Multilevel Data

    Science.gov (United States)

    Jak, Suzanne; Oort, Frans J.; Dolan, Conor V.

    2013-01-01

    We present a test for cluster bias, which can be used to detect violations of measurement invariance across clusters in 2-level data. We show how measurement invariance assumptions across clusters imply measurement invariance across levels in a 2-level factor model. Cluster bias is investigated by testing whether the within-level factor loadings…

  4. Progressive Exponential Clustering-Based Steganography

    Directory of Open Access Journals (Sweden)

    Li Yue

    2010-01-01

    Full Text Available Cluster indexing-based steganography is an important branch of data-hiding techniques. Such schemes normally achieve good balance between high embedding capacity and low embedding distortion. However, most cluster indexing-based steganographic schemes utilise less efficient clustering algorithms for embedding data, which causes redundancy and leaves room for increasing the embedding capacity further. In this paper, a new clustering algorithm, called progressive exponential clustering (PEC, is applied to increase the embedding capacity by avoiding redundancy. Meanwhile, a cluster expansion algorithm is also developed in order to further increase the capacity without sacrificing imperceptibility.

  5. Classical Music Clustering Based on Acoustic Features

    OpenAIRE

    Wang, Xindi; Haque, Syed Arefinul

    2017-01-01

    In this paper we cluster 330 classical music pieces collected from MusicNet database based on their musical note sequence. We use shingling and chord trajectory matrices to create signature for each music piece and performed spectral clustering to find the clusters. Based on different resolution, the output clusters distinctively indicate composition from different classical music era and different composing style of the musicians.

  6. Increasing chlamydia screening tests in general practice: a modified Zelen prospective Cluster Randomised Controlled Trial evaluating a complex intervention based on the Theory of Planned Behaviour.

    Science.gov (United States)

    McNulty, Cliodna A M; Hogan, Angela H; Ricketts, Ellie J; Wallace, Louise; Oliver, Isabel; Campbell, Rona; Kalwij, Sebastian; O'Connell, Elaine; Charlett, Andre

    2014-05-01

    To determine if a structured complex intervention increases opportunistic chlamydia screening testing of patients aged 15-24 years attending English general practitioner (GP) practices. A prospective, Cluster Randomised Controlled Trial with a modified Zelen design involving 160 practices in South West England in 2010. The intervention was based on the Theory of Planned Behaviour (TPB). It comprised of practice-based education with up to two additional contacts to increase the importance of screening to GP staff and their confidence to offer tests through skill development (including videos). Practical resources (targets, posters, invitation cards, computer reminders, newsletters including feedback) aimed to actively influence social cognitions of staff, increasing their testing intention. Data from 76 intervention and 81 control practices were analysed. In intervention practices, chlamydia screening test rates were 2.43/100 15-24-year-olds registered preintervention, 4.34 during intervention and 3.46 postintervention; controls testing rates were 2.61/100 registered patients prior intervention, 3.0 during intervention and 2.82 postintervention. During the intervention period, testing in intervention practices was 1.76 times as great (CI 1.24 to 2.48) as controls; this persisted for 9 months postintervention (1.57 times as great, CI 1.27 to 2.30). Chlamydia infections detected increased in intervention practices from 2.1/1000 registered 15-24-year-olds prior intervention to 2.5 during the intervention compared with 2.0 and 2.3/1000 in controls (Estimated Rate Ratio intervention versus controls 1.4 (CI 1.01 to 1.93). This complex intervention doubled chlamydia screening tests in fully engaged practices. The modified Zelen design gave realistic measures of practice full engagement (63%) and efficacy of this educational intervention in general practice; it should be used more often. The trial was registered on the UK Clinical Research Network Study Portfolio database

  7. BioCluster: Tool for Identification and Clustering of Enterobacteriaceae Based on Biochemical Data

    Directory of Open Access Journals (Sweden)

    Ahmed Abdullah

    2015-06-01

    Full Text Available Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochemical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is labor-intensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and similarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface (GUI tool named BioCluster. This tool was designed for automated clustering and identification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering (HC and the Improved Hierarchical Clustering (IHC, a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1–47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clustering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that BioCluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at http://microbialgen.du.ac.bd/biocluster/.

  8. BioCluster: tool for identification and clustering of Enterobacteriaceae based on biochemical data.

    Science.gov (United States)

    Abdullah, Ahmed; Sabbir Alam, S M; Sultana, Munawar; Hossain, M Anwar

    2015-06-01

    Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochemical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is labor-intensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and similarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface (GUI) tool named BioCluster. This tool was designed for automated clustering and identification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering (HC) and the Improved Hierarchical Clustering (IHC), a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1-47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clustering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that BioCluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at http://microbialgen.du.ac.bd/biocluster/. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  9. Impact of a congregation-based intervention on uptake of HIV testing and linkage to care among pregnant women in Nigeria: The Baby Shower cluster randomized trial

    Science.gov (United States)

    Ezeanolue, Echezona E.; Obiefune, Michael C; Ezeanolue, Chinenye O; Ehiri, John E.; Osuji, Alice; Ogidi, Amaka G.; Hunt, Aaron T.; Patel, Dina; Yang, Wei; Pharr, Jennifer; Ogedegbe, Gbenga

    2016-01-01

    Summary Background There is a dearth of effective community-based interventions to increase HIV testing and uptake of antiretroviral therapy (ART) among pregnant women in hard–to-reach resource-limited settings. We assessed whether a faith-based intervention, the Healthy Beginning Initiative (HBI), would increase uptake of HIV testing and ART among pregnant women as compared to health facility referral. Methods This trial was conducted in southeast Nigeria, between January 20, 2013, and August 31, 2014. Eligible churches had at least 20 annual infant baptisms. Forty churches (clusters), stratified by number of infant baptisms (80) were randomized 1:1 to intervention (IG) or control (CG). Three thousand and two (3002) self-identified pregnant women aged 18 and older participated. Intervention included heath education and onsite laboratory testing implemented during baby shower in IG churches, while participants in CG churches were referred to health facilities. Primary outcome (confirmed HIV testing) and secondary outcome (receipt of ART during pregnancy) were assessed at the individual level. Findings Antenatal care attendance was similar in both groups (IG=79.4% [1309/1647] vs. CG=79.7% [1080/1355], P=0.8). The intervention was associated with higher HIV testing (CG=54.6% [740/1355] vs. IG =91.9% [1514/1647]; [AOR= 11.2; 95% CI: 8.77-14.25, P-value=<0.001]. Women in the IG were significantly more likely to be linked to care prior to delivery (P<0.01) and more likely to have received ART during pregnancy (P=0.042) compared to those in the CG. Interpretation Culturally-adapted, community-based programs such as HBI can be effective in increasing HIV screening and ART among pregnant women in resource-limited settings. Funding National Institute of Health and President's Emergency Plan for AIDS Relief PMID:26475016

  10. Scalable Density-Based Subspace Clustering

    DEFF Research Database (Denmark)

    Müller, Emmanuel; Assent, Ira; Günnemann, Stephan

    2011-01-01

    method that steers mining to few selected subspace clusters. Our novel steering technique reduces subspace processing by identifying and clustering promising subspaces and their combinations directly. Thereby, it narrows down the search space while maintaining accuracy. Thorough experiments on real...... and synthetic databases show that steering is efficient and scalable, with high quality results. For future work, our steering paradigm for density-based subspace clustering opens research potential for speeding up other subspace clustering approaches as well....

  11. Cluster-based tangible programming

    CSIR Research Space (South Africa)

    Smith, Andrew C

    2014-05-01

    Full Text Available Clustering is the act of grouping items that belong together. In this paper we explore clustering as a means to construct tangible program logic, and specifically as a means to use multiple tangible objects collectively as a single tangible program...

  12. A mixed methods protocol for developing and testing implementation strategies for evidence-based obesity prevention in childcare: a cluster randomized hybrid type III trial.

    Science.gov (United States)

    Swindle, Taren; Johnson, Susan L; Whiteside-Mansell, Leanne; Curran, Geoffrey M

    2017-07-18

    Despite the potential to reach at-risk children in childcare, there is a significant gap between current practices and evidence-based obesity prevention in this setting. There are few investigations of the impact of implementation strategies on the uptake of evidence-based practices (EBPs) for obesity prevention and nutrition promotion. This study protocol describes a three-phase approach to developing and testing implementation strategies to support uptake of EBPs for obesity prevention practices in childcare (i.e., key components of the WISE intervention). Informed by the i-PARIHS framework, we will use a stakeholder-driven evidence-based quality improvement (EBQI) process to apply information gathered in qualitative interviews on barriers and facilitators to practice to inform the design of implementation strategies. Then, a Hybrid Type III cluster randomized trial will compare a basic implementation strategy (i.e., intervention as usual) with an enhanced implementation strategy informed by stakeholders. All Head Start centers (N = 12) within one agency in an urban area in a southern state in the USA will be randomized to receive the basic or enhanced implementation with approximately 20 classrooms per group (40 educators, 400 children per group). The educators involved in the study, the data collectors, and the biostastician will be blinded to the study condition. The basic and enhanced implementation strategies will be compared on outcomes specified by the RE-AIM model (e.g., Reach to families, Effectiveness of impact on child diet and health indicators, Adoption commitment of agency, Implementation fidelity and acceptability, and Maintenance after 6 months). Principles of formative evaluation will be used throughout the hybrid trial. This study will test a stakeholder-driven approach to improve implementation, fidelity, and maintenance of EBPs for obesity prevention in childcare. Further, this study provides an example of a systematic process to develop

  13. Effect of Test-Based versus Presumptive Treatment of Malaria in Under-Five Children in Rural Ghana--A Cluster-Randomised Trial.

    Science.gov (United States)

    Baiden, Frank; Bruce, Jane; Webster, Jayne; Tivura, Mathilda; Delmini, Rupert; Amengo-Etego, Seeba; Owusu-Agyei, Seth; Chandramohan, Daniel

    2016-01-01

    Malaria-endemic countries in sub-Saharan Africa are shifting from the presumptive approach that is based on clinical judgement (CJ) to the test-based approach that is based on confirmation through test with rapid diagnostic tests (RDT). It has been suggested that the loss of the prophylactic effect of presumptive-administered ACT in children who do not have malaria will result in increase in their risk of malaria and anaemia. We undertook a cluster-randomized controlled trial to compare the effects of the presumptive approach using clinical judgment (CJ-arm) and the test-based approach using RDTs (RDT-arm in a high-transmission setting in Ghana. A total of 3046 eligible children (1527 in the RDT arm and 1519 in the CJ- arm) living around 32 health centres were enrolled. Nearly half were female (48.7%) and 47.8% were below the age of 12 months as at enrolment. Over 24-months, the incidence of all episodes of malaria following the first febrile illness was 0.64 (95% CI 0.49-0.82) and 0.76 (0.63-0.93) per child per year in the RDT and CJ arms respectively (adjusted rate ratio 1.13 (0.82-1.55). After the first episode of febrile illness, the incidence of severe anaemia was the same in both arms (0.11 per child per year) and that of moderate anaemia was 0.16 (0.13-0.21) vs. 0.17 (0.14-0.21) per child year respectively. The incidence of severe febrile illness was 0.15 (0.09, 0.24) in the RDT arm compared to 0.17 (0.11, 0.28) per child per year respectively. The proportion of fever cases receiving ACT was lower in the RDT arm (72% vs 81%; p = 0.02). The test-based approach to the management of malaria did not increase the incidence of malaria or anaemia among under-five children in this setting. ClinicalTrials.gov NCT00832754.

  14. Effects of home-based voluntary counselling and testing on HIV-related stigma: findings from a cluster-randomized trial in Zambia.

    Science.gov (United States)

    Jürgensen, Marte; Sandøy, Ingvild Fossgard; Michelo, Charles; Fylkesnes, Knut

    2013-03-01

    HIV-related stigma continues to be a prominent barrier to testing, treatment and care. However, few studies have investigated changes in stigma over time and the factors contributing to these changes, and there is no evidence of the impact of HIV testing and counselling on stigma. This study was nested within a pair-matched cluster-randomized trial on the acceptance of home-based voluntary HIV counselling and testing conducted in a rural district in Zambia between 2009 and 2011, and investigated changes in stigma over time and the impact of HIV testing and counselling on stigma. Data from a baseline survey (n = 1500) and a follow-up survey (n = 1107) were used to evaluate changes in stigma. There was an overall reduction of seven per cent in stigma from baseline to follow-up. This was mainly due to a reduction in individual stigmatizing attitudes but not in perceived stigma. The reduction did not differ between the trial arms (β = -0.22, p = 0.423). Being tested for HIV was associated with a reduction in stigma (β = -0.57, p = 0.030), and there was a trend towards home-based Voluntary Counselling and Testing having a larger impact on stigma than other testing approaches (β = -0.78, p = 0.080 vs. β = -0.37, p = 0.551), possibly explained by a strong focus on counselling and the safe environment of the home. The reduction observed in both arms may give reason to be optimistic as it may have consequences for disclosure, treatment access and adherence. Yet, the change in stigma may have been affected by social desirability bias, as extensive community mobilization was carried out in both arms. The study underscores the challenges in measuring and monitoring HIV-related stigma. Adjustment for social desirability bias and inclusion of qualitative methods are recommended for further studies on the impact of HIV testing on stigma. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. Increasing chlamydia screening tests in general practice: a modified Zelen prospective Cluster Randomised Controlled Trial evaluating a complex intervention based on the Theory of Planned Behaviour

    Science.gov (United States)

    McNulty, Cliodna A M; Hogan, Angela H; Ricketts, Ellie J; Wallace, Louise; Oliver, Isabel; Campbell, Rona; Kalwij, Sebastian; O'Connell, Elaine; Charlett, Andre

    2014-01-01

    Objective To determine if a structured complex intervention increases opportunistic chlamydia screening testing of patients aged 15–24 years attending English general practitioner (GP) practices. Methods A prospective, Cluster Randomised Controlled Trial with a modified Zelen design involving 160 practices in South West England in 2010. The intervention was based on the Theory of Planned Behaviour (TPB). It comprised of practice-based education with up to two additional contacts to increase the importance of screening to GP staff and their confidence to offer tests through skill development (including videos). Practical resources (targets, posters, invitation cards, computer reminders, newsletters including feedback) aimed to actively influence social cognitions of staff, increasing their testing intention. Results Data from 76 intervention and 81 control practices were analysed. In intervention practices, chlamydia screening test rates were 2.43/100 15–24-year-olds registered preintervention, 4.34 during intervention and 3.46 postintervention; controls testing rates were 2.61/100 registered patients prior intervention, 3.0 during intervention and 2.82 postintervention. During the intervention period, testing in intervention practices was 1.76 times as great (CI 1.24 to 2.48) as controls; this persisted for 9 months postintervention (1.57 times as great, CI 1.27 to 2.30). Chlamydia infections detected increased in intervention practices from 2.1/1000 registered 15–24-year-olds prior intervention to 2.5 during the intervention compared with 2.0 and 2.3/1000 in controls (Estimated Rate Ratio intervention versus controls 1.4 (CI 1.01 to 1.93). Conclusions This complex intervention doubled chlamydia screening tests in fully engaged practices. The modified Zelen design gave realistic measures of practice full engagement (63%) and efficacy of this educational intervention in general practice; it should be used more often. Trial registration The trial was

  16. Experimental Tests of the Algebraic Cluster Model

    Science.gov (United States)

    Gai, Moshe

    2018-02-01

    The Algebraic Cluster Model (ACM) of Bijker and Iachello that was proposed already in 2000 has been recently applied to 12C and 16O with much success. We review the current status in 12C with the outstanding observation of the ground state rotational band composed of the spin-parity states of: 0+, 2+, 3-, 4± and 5-. The observation of the 4± parity doublet is a characteristic of (tri-atomic) molecular configuration where the three alpha- particles are arranged in an equilateral triangular configuration of a symmetric spinning top. We discuss future measurement with electron scattering, 12C(e,e’) to test the predicted B(Eλ) of the ACM.

  17. Clustering-based classification of road traffic accidents using hierarchical clustering and artificial neural networks.

    Science.gov (United States)

    Taamneh, Madhar; Taamneh, Salah; Alkheder, Sharaf

    2017-09-01

    Artificial neural networks (ANNs) have been widely used in predicting the severity of road traffic crashes. All available information about previously occurred accidents is typically used for building a single prediction model (i.e., classifier). Too little attention has been paid to the differences between these accidents, leading, in most cases, to build less accurate predictors. Hierarchical clustering is a well-known clustering method that seeks to group data by creating a hierarchy of clusters. Using hierarchical clustering and ANNs, a clustering-based classification approach for predicting the injury severity of road traffic accidents was proposed. About 6000 road accidents occurred over a six-year period from 2008 to 2013 in Abu Dhabi were used throughout this study. In order to reduce the amount of variation in data, hierarchical clustering was applied on the data set to organize it into six different forms, each with different number of clusters (i.e., clusters from 1 to 6). Two ANN models were subsequently built for each cluster of accidents in each generated form. The first model was built and validated using all accidents (training set), whereas only 66% of the accidents were used to build the second model, and the remaining 34% were used to test it (percentage split). Finally, the weighted average accuracy was computed for each type of models in each from of data. The results show that when testing the models using the training set, clustering prior to classification achieves (11%-16%) more accuracy than without using clustering, while the percentage split achieves (2%-5%) more accuracy. The results also suggest that partitioning the accidents into six clusters achieves the best accuracy if both types of models are taken into account.

  18. A scheduling algorithm based on Clara clustering

    Science.gov (United States)

    Kuang, Ling; Zhang, Lichen

    2017-08-01

    Task scheduling is a key issue in cloud computing. A new algorithm for queuing task scheduling based on Clara clustering and SJF cloud computing is proposed to introduce the Clara clustering for the shortcomings of SJF algorithm load imbalance. The Clara clustering method prepares the task clustering based on the task execution time and the waiting time of the task, and then divides the task into three groups according to the reference point obtained by the clustering. Based on the number of tasks per group in the proportion of the total number of tasks assigned to the implementation of the quota. Each queue team will perform task scheduling based on these quotas and SJF. The simulation results show that the algorithm has good load balancing and system performance.

  19. Testing dark energy and dark matter cosmological models with clusters of galaxies

    Energy Technology Data Exchange (ETDEWEB)

    Boehringer, Hans [Max-Planck-Institut fuer Extraterrestrische Physik, Garching (Germany)

    2008-07-01

    Galaxy clusters are, as the largest building blocks of our Universe, ideal probes to study the large-scale structure and to test cosmological models. The principle approach und the status of this research is reviewed. Clusters lend themselves for tests in serveral ways: the cluster mass function, the spatial clustering, the evolution of both functions with reshift, and the internal composition can be used to constrain cosmological parameters. X-ray observations are currently the best means of obtaining the relevant data on the galaxy cluster population. We illustrate in particular all the above mentioned methods with our ROSAT based cluster surveys. The mass calibration of clusters is an important issue, that is currently solved with XMM-Newton and Chandra studies. Based on the current experience we provide an outlook for future research, especially with eROSITA.

  20. Brief counselling after home-based HIV counselling and testing strongly increases linkage to care: a cluster-randomized trial in Uganda.

    Science.gov (United States)

    Ruzagira, Eugene; Grosskurth, Heiner; Kamali, Anatoli; Baisley, Kathy

    2017-10-01

    The aim of this study was to determine whether counselling provided subsequent to HIV testing and referral for care increases linkage to care among HIV-positive persons identified through home-based HIV counselling and testing (HBHCT) in Masaka, Uganda. The study was an open-label cluster-randomized trial. 28 rural communities were randomly allocated (1:1) to intervention (HBHCT, referral and counselling at one and two months) or control (HBHCT and referral only). HIV-positive care-naïve adults (≥18 years) were enrolled. To conceal participants' HIV status, one HIV-negative person was recruited for every three HIV-positive participants. Primary outcomes were linkage to care (clinic-verified registration for care) status at six months, and time to linkage. Primary analyses were intention-to-treat using random effects logistic regression or Cox regression with shared frailty, as appropriate. Three hundred and two(intervention, n = 149; control, n = 153) HIV-positive participants were enrolled. Except for travel time to the nearest HIV clinic, baseline participant characteristics were generally balanced between trial arms. Retention was similar across trial arms (92% overall). One hundred and twenty-seven (42.1%) participants linked to care: 76 (51.0%) in the intervention arm versus 51 (33.3%) in the control arm [odds ratio = 2.18, 95% confidence interval (CI) = 1.26-3.78; p = 0.008)]. There was evidence of interaction between trial arm and follow-up time (p = 0.009). The probability of linkage to care, did not differ between arms in the first two months of follow-up, but was subsequently higher in the intervention arm versus the control arm [hazard ratio = 4.87, 95% CI = 1.79-13.27, p = 0.002]. Counselling substantially increases linkage to care among HIV-positive adults identified through HBHCT and may enhance efforts to increase antiretroviral therapy coverage in sub-Saharan Africa. © 2017 The Authors. Journal of the International AIDS

  1. GA-Based Membrane Evolutionary Algorithm for Ensemble Clustering

    OpenAIRE

    Wang, Yanhua; Liu, Xiyu; Xiang, Laisheng

    2017-01-01

    Ensemble clustering can improve the generalization ability of a single clustering algorithm and generate a more robust clustering result by integrating multiple base clusterings, so it becomes the focus of current clustering research. Ensemble clustering aims at finding a consensus partition which agrees as much as possible with base clusterings. Genetic algorithm is a highly parallel, stochastic, and adaptive search algorithm developed from the natural selection and evolutionary mechanism of...

  2. Community-based intermittent mass testing and treatment for malaria in an area of high transmission intensity, western Kenya: study design and methodology for a cluster randomized controlled trial.

    Science.gov (United States)

    Samuels, Aaron M; Awino, Nobert; Odongo, Wycliffe; Abong'o, Benard; Gimnig, John; Otieno, Kephas; Shi, Ya Ping; Were, Vincent; Allen, Denise Roth; Were, Florence; Sang, Tony; Obor, David; Williamson, John; Hamel, Mary J; Patrick Kachur, S; Slutsker, Laurence; Lindblade, Kim A; Kariuki, Simon; Desai, Meghna

    2017-06-07

    Most human Plasmodium infections in western Kenya are asymptomatic and are believed to contribute importantly to malaria transmission. Elimination of asymptomatic infections requires active treatment approaches, such as mass testing and treatment (MTaT) or mass drug administration (MDA), as infected persons do not seek care for their infection. Evaluations of community-based approaches that are designed to reduce malaria transmission require careful attention to study design to ensure that important effects can be measured accurately. This manuscript describes the study design and methodology of a cluster-randomized controlled trial to evaluate a MTaT approach for malaria transmission reduction in an area of high malaria transmission. Ten health facilities in western Kenya were purposively selected for inclusion. The communities within 3 km of each health facility were divided into three clusters of approximately equal population size. Two clusters around each health facility were randomly assigned to the control arm, and one to the intervention arm. Three times per year for 2 years, after the long and short rains, and again before the long rains, teams of community health volunteers visited every household within the intervention arm, tested all consenting individuals with malaria rapid diagnostic tests, and treated all positive individuals with an effective anti-malarial. The effect of mass testing and treatment on malaria transmission was measured through population-based longitudinal cohorts, outpatient visits for clinical malaria, periodic population-based cross-sectional surveys, and entomological indices.

  3. The Reliability of Inverse Screen Tests for Cluster Analysis.

    Science.gov (United States)

    Lathrop, Richard G.; Williams, Janice E.

    1987-01-01

    A Monte Carlo study, involving 6,000 "computer subjects" and three raters, explored the reliability of the inverse screen test for cluster analysis. Results indicate that the inverse screen may be a useful and reliable cluster analytic technique for determining the number of true groups. (TJH)

  4. Simulation-based marginal likelihood for cluster strong lensing cosmology

    Science.gov (United States)

    Killedar, M.; Borgani, S.; Fabjan, D.; Dolag, K.; Granato, G.; Meneghetti, M.; Planelles, S.; Ragone-Figueroa, C.

    2018-01-01

    Comparisons between observed and predicted strong lensing properties of galaxy clusters have been routinely used to claim either tension or consistency with Λ cold dark matter cosmology. However, standard approaches to such cosmological tests are unable to quantify the preference for one cosmology over another. We advocate approximating the relevant Bayes factor using a marginal likelihood that is based on the following summary statistic: the posterior probability distribution function for the parameters of the scaling relation between Einstein radii and cluster mass, α and β. We demonstrate, for the first time, a method of estimating the marginal likelihood using the X-ray selected z > 0.5 Massive Cluster Survey clusters as a case in point and employing both N-body and hydrodynamic simulations of clusters. We investigate the uncertainty in this estimate and consequential ability to compare competing cosmologies, which arises from incomplete descriptions of baryonic processes, discrepancies in cluster selection criteria, redshift distribution and dynamical state. The relation between triaxial cluster masses at various overdensities provides a promising alternative to the strong lensing test.

  5. Random Walk Quantum Clustering Algorithm Based on Space

    Science.gov (United States)

    Xiao, Shufen; Dong, Yumin; Ma, Hongyang

    2018-01-01

    In the random quantum walk, which is a quantum simulation of the classical walk, data points interacted when selecting the appropriate walk strategy by taking advantage of quantum-entanglement features; thus, the results obtained when the quantum walk is used are different from those when the classical walk is adopted. A new quantum walk clustering algorithm based on space is proposed by applying the quantum walk to clustering analysis. In this algorithm, data points are viewed as walking participants, and similar data points are clustered using the walk function in the pay-off matrix according to a certain rule. The walk process is simplified by implementing a space-combining rule. The proposed algorithm is validated by a simulation test and is proved superior to existing clustering algorithms, namely, Kmeans, PCA + Kmeans, and LDA-Km. The effects of some of the parameters in the proposed algorithm on its performance are also analyzed and discussed. Specific suggestions are provided.

  6. Old star clusters: Bench tests of low mass stellar models

    Directory of Open Access Journals (Sweden)

    Salaris M.

    2013-03-01

    Full Text Available Old star clusters in the Milky Way and external galaxies have been (and still are traditionally used to constrain the age of the universe and the timescales of galaxy formation. A parallel avenue of old star cluster research considers these objects as bench tests of low-mass stellar models. This short review will highlight some recent tests of stellar evolution models that make use of photometric and spectroscopic observations of resolved old star clusters. In some cases these tests have pointed to additional physical processes efficient in low-mass stars, that are not routinely included in model computations. Moreover, recent results from the Kepler mission about the old open cluster NGC6791 are adding new tight constraints to the models.

  7. Semantic based cluster content discovery in description first clustering algorithm

    International Nuclear Information System (INIS)

    Khan, M.W.; Asif, H.M.S.

    2017-01-01

    In the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms which firstly group the documents on the basis of similarity and then assign the meaningful labels to those groups. Description first clustering algorithm belong to the category in which the meaningful description is deduced first and then relevant documents are assigned to that description. LINGO (Label Induction Grouping Algorithm) is the algorithm of description first clustering category which is used for the automatic grouping of documents obtained from search results. It uses LSI (Latent Semantic Indexing); an IR (Information Retrieval) technique for induction of meaningful labels for clusters and VSM (Vector Space Model) for cluster content discovery. In this paper we present the LINGO while it is using LSI during cluster label induction and cluster content discovery phase. Finally, we compare results obtained from the said algorithm while it uses VSM and Latent semantic analysis during cluster content discovery phase. (author)

  8. Semantic Based Cluster Content Discovery in Description First Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    MUHAMMAD WASEEM KHAN

    2017-01-01

    Full Text Available In the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms which firstly group the documents on the basis of similarity and then assign the meaningful labels to those groups. Description first clustering algorithm belong to the category in which the meaningful description is deduced first and then relevant documents are assigned to that description. LINGO (Label Induction Grouping Algorithm is the algorithm of description first clustering category which is used for the automatic grouping of documents obtained from search results. It uses LSI (Latent Semantic Indexing; an IR (Information Retrieval technique for induction of meaningful labels for clusters and VSM (Vector Space Model for cluster content discovery. In this paper we present the LINGO while it is using LSI during cluster label induction and cluster content discovery phase. Finally, we compare results obtained from the said algorithm while it uses VSM and Latent semantic analysis during cluster content discovery phase.

  9. Automated clustering-based workload characterization

    Science.gov (United States)

    Pentakalos, Odysseas I.; Menasce, Daniel A.; Yesha, Yelena

    1996-01-01

    The demands placed on the mass storage systems at various federal agencies and national laboratories are continuously increasing in intensity. This forces system managers to constantly monitor the system, evaluate the demand placed on it, and tune it appropriately using either heuristics based on experience or analytic models. Performance models require an accurate workload characterization. This can be a laborious and time consuming process. It became evident from our experience that a tool is necessary to automate the workload characterization process. This paper presents the design and discusses the implementation of a tool for workload characterization of mass storage systems. The main features of the tool discussed here are: (1)Automatic support for peak-period determination. Histograms of system activity are generated and presented to the user for peak-period determination; (2) Automatic clustering analysis. The data collected from the mass storage system logs is clustered using clustering algorithms and tightness measures to limit the number of generated clusters; (3) Reporting of varied file statistics. The tool computes several statistics on file sizes such as average, standard deviation, minimum, maximum, frequency, as well as average transfer time. These statistics are given on a per cluster basis; (4) Portability. The tool can easily be used to characterize the workload in mass storage systems of different vendors. The user needs to specify through a simple log description language how the a specific log should be interpreted. The rest of this paper is organized as follows. Section two presents basic concepts in workload characterization as they apply to mass storage systems. Section three describes clustering algorithms and tightness measures. The following section presents the architecture of the tool. Section five presents some results of workload characterization using the tool.Finally, section six presents some concluding remarks.

  10. Clustering by Partitioning around Medoids using Distance-Based ...

    African Journals Online (AJOL)

    OLUWASOGO

    object belonging to exactly one cluster. Two common heuristics used to determine cluster membership in partitioning algorithms are a centroid-based technique –. K-means, where each cluster centre is represented by the mean value of the objects in the cluster, and a representative object-based technique – K-Medoids, ...

  11. Information Clustering Based on Fuzzy Multisets.

    Science.gov (United States)

    Miyamoto, Sadaaki

    2003-01-01

    Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…

  12. Core Business Selection Based on Ant Colony Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Yu Lan

    2014-01-01

    Full Text Available Core business is the most important business to the enterprise in diversified business. In this paper, we first introduce the definition and characteristics of the core business and then descript the ant colony clustering algorithm. In order to test the effectiveness of the proposed method, Tianjin Port Logistics Development Co., Ltd. is selected as the research object. Based on the current situation of the development of the company, the core business of the company can be acquired by ant colony clustering algorithm. Thus, the results indicate that the proposed method is an effective way to determine the core business for company.

  13. Testing the accuracy of clustering redshifts with simulations

    Science.gov (United States)

    Scottez, V.; Benoit-Lévy, A.; Coupon, J.; Ilbert, O.; Mellier, Y.

    2018-03-01

    We explore the accuracy of clustering-based redshift inference within the MICE2 simulation. This method uses the spatial clustering of galaxies between a spectroscopic reference sample and an unknown sample. This study give an estimate of the reachable accuracy of this method. First, we discuss the requirements for the number objects in the two samples, confirming that this method does not require a representative spectroscopic sample for calibration. In the context of next generation of cosmological surveys, we estimated that the density of the Quasi Stellar Objects in BOSS allows us to reach 0.2 per cent accuracy in the mean redshift. Secondly, we estimate individual redshifts for galaxies in the densest regions of colour space ( ˜ 30 per cent of the galaxies) without using the photometric redshifts procedure. The advantage of this procedure is threefold. It allows: (i) the use of cluster-zs for any field in astronomy, (ii) the possibility to combine photo-zs and cluster-zs to get an improved redshift estimation, (iii) the use of cluster-z to define tomographic bins for weak lensing. Finally, we explore this last option and build five cluster-z selected tomographic bins from redshift 0.2 to 1. We found a bias on the mean redshift estimate of 0.002 per bin. We conclude that cluster-z could be used as a primary redshift estimator by next generation of cosmological surveys.

  14. Fuzzy Rules for Ant Based Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Amira Hamdi

    2016-01-01

    Full Text Available This paper provides a new intelligent technique for semisupervised data clustering problem that combines the Ant System (AS algorithm with the fuzzy c-means (FCM clustering algorithm. Our proposed approach, called F-ASClass algorithm, is a distributed algorithm inspired by foraging behavior observed in ant colonyT. The ability of ants to find the shortest path forms the basis of our proposed approach. In the first step, several colonies of cooperating entities, called artificial ants, are used to find shortest paths in a complete graph that we called graph-data. The number of colonies used in F-ASClass is equal to the number of clusters in dataset. Hence, the partition matrix of dataset founded by artificial ants is given in the second step, to the fuzzy c-means technique in order to assign unclassified objects generated in the first step. The proposed approach is tested on artificial and real datasets, and its performance is compared with those of K-means, K-medoid, and FCM algorithms. Experimental section shows that F-ASClass performs better according to the error rate classification, accuracy, and separation index.

  15. A Novel Cluster Head Selection Algorithm Based on Fuzzy Clustering and Particle Swarm Optimization.

    Science.gov (United States)

    Ni, Qingjian; Pan, Qianqian; Du, Huimin; Cao, Cen; Zhai, Yuqing

    2017-01-01

    An important objective of wireless sensor network is to prolong the network life cycle, and topology control is of great significance for extending the network life cycle. Based on previous work, for cluster head selection in hierarchical topology control, we propose a solution based on fuzzy clustering preprocessing and particle swarm optimization. More specifically, first, fuzzy clustering algorithm is used to initial clustering for sensor nodes according to geographical locations, where a sensor node belongs to a cluster with a determined probability, and the number of initial clusters is analyzed and discussed. Furthermore, the fitness function is designed considering both the energy consumption and distance factors of wireless sensor network. Finally, the cluster head nodes in hierarchical topology are determined based on the improved particle swarm optimization. Experimental results show that, compared with traditional methods, the proposed method achieved the purpose of reducing the mortality rate of nodes and extending the network life cycle.

  16. Microarray gene cluster identification and annotation through cluster ensemble and EM-based informative textual summarization.

    Science.gov (United States)

    Hu, Xiaohua; Park, E K; Zhang, Xiaodan

    2009-09-01

    Generating high-quality gene clusters and identifying the underlying biological mechanism of the gene clusters are the important goals of clustering gene expression analysis. To get high-quality cluster results, most of the current approaches rely on choosing the best cluster algorithm, in which the design biases and assumptions meet the underlying distribution of the dataset. There are two issues for this approach: 1) usually, the underlying data distribution of the gene expression datasets is unknown and 2) there are so many clustering algorithms available and it is very challenging to choose the proper one. To provide a textual summary of the gene clusters, the most explored approach is the extractive approach that essentially builds upon techniques borrowed from the information retrieval, in which the objective is to provide terms to be used for query expansion, and not to act as a stand-alone summary for the entire document sets. Another drawback is that the clustering quality and cluster interpretation are treated as two isolated research problems and are studied separately. In this paper, we design and develop a unified system Gene Expression Miner to address these challenging issues in a principled and general manner by integrating cluster ensemble, text clustering, and multidocument summarization and provide an environment for comprehensive gene expression data analysis. We present a novel cluster ensemble approach to generate high-quality gene cluster. In our text summarization module, given a gene cluster, our expectation-maximization based algorithm can automatically identify subtopics and extract most probable terms for each topic. Then, the extracted top k topical terms from each subtopic are combined to form the biological explanation of each gene cluster. Experimental results demonstrate that our system can obtain high-quality clusters and provide informative key terms for the gene clusters.

  17. Cluster Ensemble-Based Image Segmentation

    Directory of Open Access Journals (Sweden)

    Xiaoru Wang

    2013-07-01

    Full Text Available Image segmentation is the foundation of computer vision applications. In this paper, we propose a new cluster ensemble-based image segmentation algorithm, which overcomes several problems of traditional methods. We make two main contributions in this paper. First, we introduce the cluster ensemble concept to fuse the segmentation results from different types of visual features effectively, which can deliver a better final result and achieve a much more stable performance for broad categories of images. Second, we exploit the PageRank idea from Internet applications and apply it to the image segmentation task. This can improve the final segmentation results by combining the spatial information of the image and the semantic similarity of regions. Our experiments on four public image databases validate the superiority of our algorithm over conventional single type of feature or multiple types of features-based algorithms, since our algorithm can fuse multiple types of features effectively for better segmentation results. Moreover, our method is also proved to be very competitive in comparison with other state-of-the-art segmentation algorithms.

  18. Orbit Clustering Based on Transfer Cost

    Science.gov (United States)

    Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.

    2013-01-01

    We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.

  19. Bootstrap-Based Improvements for Inference with Clustered Errors

    OpenAIRE

    Doug Miller; A. Colin Cameron; Jonah B. Gelbach

    2006-01-01

    Microeconometrics researchers have increasingly realized the essential need to account for any within-group dependence in estimating standard errors of regression parameter estimates. The typical preferred solution is to calculate cluster-robust or sandwich standard errors that permit quite general heteroskedasticity and within-cluster error correlation, but presume that the number of clusters is large. In applications with few (5-30) clusters, standard asymptotic tests can over-reject consid...

  20. Comparing clustering models in bank customers: Based on Fuzzy relational clustering approach

    Directory of Open Access Journals (Sweden)

    Ayad Hendalianpour

    2016-11-01

    Full Text Available Clustering is absolutely useful information to explore data structures and has been employed in many places. It organizes a set of objects into similar groups called clusters, and the objects within one cluster are both highly similar and dissimilar with the objects in other clusters. The K-mean, C-mean, Fuzzy C-mean and Kernel K-mean algorithms are the most popular clustering algorithms for their easy implementation and fast work, but in some cases we cannot use these algorithms. Regarding this, in this paper, a hybrid model for customer clustering is presented that is applicable in five banks of Fars Province, Shiraz, Iran. In this way, the fuzzy relation among customers is defined by using their features described in linguistic and quantitative variables. As follows, the customers of banks are grouped according to K-mean, C-mean, Fuzzy C-mean and Kernel K-mean algorithms and the proposed Fuzzy Relation Clustering (FRC algorithm. The aim of this paper is to show how to choose the best clustering algorithms based on density-based clustering and present a new clustering algorithm for both crisp and fuzzy variables. Finally, we apply the proposed approach to five datasets of customer's segmentation in banks. The result of the FCR shows the accuracy and high performance of FRC compared other clustering methods.

  1. Comparison of background EEG activity of different groups of patients with idiopathic epilepsy using Shannon spectral entropy and cluster-based permutation statistical testing.

    Directory of Open Access Journals (Sweden)

    Jose Antonio Urigüen

    Full Text Available Idiopathic epilepsy is characterized by generalized seizures with no apparent cause. One of its main problems is the lack of biomarkers to monitor the evolution of patients. The only tools they can use are limited to inspecting the amount of seizures during previous periods of time and assessing the existence of interictal discharges. As a result, there is a need for improving the tools to assist the diagnosis and follow up of these patients. The goal of the present study is to compare and find a way to differentiate between two groups of patients suffering from idiopathic epilepsy, one group that could be followed-up by means of specific electroencephalographic (EEG signatures (intercritical activity present, and another one that could not due to the absence of these markers. To do that, we analyzed the background EEG activity of each in the absence of seizures and epileptic intercritical activity. We used the Shannon spectral entropy (SSE as a metric to discriminate between the two groups and performed permutation-based statistical tests to detect the set of frequencies that show significant differences. By constraining the spectral entropy estimation to the [6.25-12.89 Hz range, we detect statistical differences (at below 0.05 alpha-level between both types of epileptic patients at all available recording channels. Interestingly, entropy values follow a trend that is inversely related to the elapsed time from the last seizure. Indeed, this trend shows asymptotical convergence to the SSE values measured in a group of healthy subjects, which present SSE values lower than any of the two groups of patients. All these results suggest that the SSE, measured in a specific range of frequencies, could serve to follow up the evolution of patients suffering from idiopathic epilepsy. Future studies remain to be conducted in order to assess the predictive value of this approach for the anticipation of seizures.

  2. Concept based clustering for descriptive document classification

    Directory of Open Access Journals (Sweden)

    S Senthamarai Kannan

    2007-09-01

    Full Text Available We present an approach for improving the relevance of search results by clustering the search results obtained for a query string with the help of a Concept Clustering Algorithm. The Concept Clustering Algorithm combines common phrase discovery and latent semantic indexing techniques to separate search results into meaningful groups. It looks for meaningful phrases to use as cluster labels and then assigns documents to the labels to form groups. The labels assigned to each document cluster provide meaningful information on the various documents available under that cluster. This provides a more interactive and easier way to probe through search results and identifying the relevant documents for the users using the search engine.

  3. Communication Base Station Log Analysis Based on Hierarchical Clustering

    Directory of Open Access Journals (Sweden)

    Zhang Shao-Hua

    2017-01-01

    Full Text Available Communication base stations generate massive data every day, these base station logs play an important value in mining of the business circles. This paper use data mining technology and hierarchical clustering algorithm to group the scope of business circle for the base station by recording the data of these base stations.Through analyzing the data of different business circle based on feature extraction and comparing different business circle category characteristics, which can choose a suitable area for operators of commercial marketing.

  4. APPECT: An Approximate Backbone-Based Clustering Algorithm for Tags

    DEFF Research Database (Denmark)

    Zong, Yu; Xu, Guandong; Jin, Pin

    2011-01-01

    resulting from the severe difficulty of ambiguity, redundancy and less semantic nature of tags. Clustering method is a useful tool to address the aforementioned difficulties. Most of the researches on tag clustering are directly using traditional clustering algorithms such as K-means or Hierarchical...... algorithm for Tags (APPECT). The main steps of APPECT are: (1) we execute the K-means algorithm on a tag similarity matrix for M times and collect a set of tag clustering results Z={C1,C2,…,Cm}; (2) we form the approximate backbone of Z by executing a greedy search; (3) we fix the approximate backbone...... Agglomerative Clustering on tagging data, which possess the inherent drawbacks, such as the sensitivity of initialization. In this paper, we instead make use of the approximate backbone of tag clustering results to find out better tag clusters. In particular, we propose an APProximate backbonE-based Clustering...

  5. Spike detection II: automatic, perception-based detection and clustering.

    Science.gov (United States)

    Wilson, S B; Turner, C A; Emerson, R G; Scheuer, M L

    1999-03-01

    We developed perception-based spike detection and clustering algorithms. The detection algorithm employs a novel, multiple monotonic neural network (MMNN). It is tested on two short-duration EEG databases containing 2400 spikes from 50 epilepsy patients and 10 control subjects. Previous studies are compared for database difficulty and reliability and algorithm accuracy. Automatic grouping of spikes via hierarchical clustering (using topology and morphology) is visually compared with hand marked grouping on a single record. The MMNN algorithm is found to operate close to the ability of a human expert while alleviating problems related to overtraining. The hierarchical and hand marked spike groupings are found to be strikingly similar. An automatic detection algorithm need not be as accurate as a human expert to be clinically useful. A user interface that allows the neurologist to quickly delete artifacts and determine whether there are multiple spike generators is sufficient.

  6. TESTING STRICT HYDROSTATIC EQUILIBRIUM IN SIMULATED CLUSTERS OF GALAXIES: IMPLICATIONS FOR A1689

    International Nuclear Information System (INIS)

    Molnar, S. M.; Umetsu, K.; Chiu, I.-N.; Chen, P.; Hearn, N.; Broadhurst, T.; Bryan, G.; Shang, C.

    2010-01-01

    Accurate mass determination of clusters of galaxies is crucial if they are to be used as cosmological probes. However, there are some discrepancies between cluster masses determined based on gravitational lensing and X-ray observations assuming strict hydrostatic equilibrium (i.e., the equilibrium gas pressure is provided entirely by thermal pressure). Cosmological simulations suggest that turbulent gas motions remaining from hierarchical structure formation may provide a significant contribution to the equilibrium pressure in clusters. We analyze a sample of massive clusters of galaxies drawn from high-resolution cosmological simulations and find a significant contribution (20%-45%) from non-thermal pressure near the center of relaxed clusters, and, in accord with previous studies, a minimum contribution at about 0.1 R vir , growing to about 30%-45% at the virial radius, R vir . Our results strongly suggest that relaxed clusters should have significant non-thermal support in their core region. As an example, we test the validity of strict hydrostatic equilibrium in the well-studied massive galaxy cluster A1689 using the latest high-resolution gravitational lensing and X-ray observations. We find a contribution of about 40% from non-thermal pressure within the core region of A1689, suggesting an alternate explanation for the mass discrepancy: the strict hydrostatic equilibrium is not valid in this region.

  7. Cluster-based exposure variation analysis

    Science.gov (United States)

    2013-01-01

    Background Static posture, repetitive movements and lack of physical variation are known risk factors for work-related musculoskeletal disorders, and thus needs to be properly assessed in occupational studies. The aims of this study were (i) to investigate the effectiveness of a conventional exposure variation analysis (EVA) in discriminating exposure time lines and (ii) to compare it with a new cluster-based method for analysis of exposure variation. Methods For this purpose, we simulated a repeated cyclic exposure varying within each cycle between “low” and “high” exposure levels in a “near” or “far” range, and with “low” or “high” velocities (exposure change rates). The duration of each cycle was also manipulated by selecting a “small” or “large” standard deviation of the cycle time. Theses parameters reflected three dimensions of exposure variation, i.e. range, frequency and temporal similarity. Each simulation trace included two realizations of 100 concatenated cycles with either low (ρ = 0.1), medium (ρ = 0.5) or high (ρ = 0.9) correlation between the realizations. These traces were analyzed by conventional EVA, and a novel cluster-based EVA (C-EVA). Principal component analysis (PCA) was applied on the marginal distributions of 1) the EVA of each of the realizations (univariate approach), 2) a combination of the EVA of both realizations (multivariate approach) and 3) C-EVA. The least number of principal components describing more than 90% of variability in each case was selected and the projection of marginal distributions along the selected principal component was calculated. A linear classifier was then applied to these projections to discriminate between the simulated exposure patterns, and the accuracy of classified realizations was determined. Results C-EVA classified exposures more correctly than univariate and multivariate EVA approaches; classification accuracy was 49%, 47% and 52% for EVA (univariate

  8. An image segmentation method based on network clustering model

    Science.gov (United States)

    Jiao, Yang; Wu, Jianshe; Jiao, Licheng

    2018-01-01

    Network clustering phenomena are ubiquitous in nature and human society. In this paper, a method involving a network clustering model is proposed for mass segmentation in mammograms. First, the watershed transform is used to divide an image into regions, and features of the image are computed. Then a graph is constructed from the obtained regions and features. The network clustering model is applied to realize clustering of nodes in the graph. Compared with two classic methods, the algorithm based on the network clustering model performs more effectively in experiments.

  9. Cluster-based global firms' use of local capabilities

    DEFF Research Database (Denmark)

    Andersen, Poul Houman; Bøllingtoft, Anne

    2011-01-01

    replicated studies and possibly triangulation with quantitative studies would further develop the understanding on how globalization impacts on the internal organization of CBFs. Practical implications – For policy makers, cluster policies should be reconsidered if the role of clusters differs from what has...... and developed countries. Originality/value – Several studies have examined the changing role of clusters in the evolving global division of labour. However, research is lacking that addresses the challenges of transformation from the level of the CBF and how these may be affected by cluster evolution. The paper......Purpose – Despite growing interest in clusters role for the global competitiveness of firms, there has been little research into how globalization affects cluster-based firms’ (CBFs) use of local knowledge resources and the combination of local and global knowledge used. Using the cluster...

  10. Improving local clustering based top-L link prediction methods via asymmetric link clustering information

    Science.gov (United States)

    Wu, Zhihao; Lin, Youfang; Zhao, Yiji; Yan, Hongyan

    2018-02-01

    Networks can represent a wide range of complex systems, such as social, biological and technological systems. Link prediction is one of the most important problems in network analysis, and has attracted much research interest recently. Many link prediction methods have been proposed to solve this problem with various techniques. We can note that clustering information plays an important role in solving the link prediction problem. In previous literatures, we find node clustering coefficient appears frequently in many link prediction methods. However, node clustering coefficient is limited to describe the role of a common-neighbor in different local networks, because it cannot distinguish different clustering abilities of a node to different node pairs. In this paper, we shift our focus from nodes to links, and propose the concept of asymmetric link clustering (ALC) coefficient. Further, we improve three node clustering based link prediction methods via the concept of ALC. The experimental results demonstrate that ALC-based methods outperform node clustering based methods, especially achieving remarkable improvements on food web, hamster friendship and Internet networks. Besides, comparing with other methods, the performance of ALC-based methods are very stable in both globalized and personalized top-L link prediction tasks.

  11. Single-Pass Clustering Algorithm Based on Storm

    Science.gov (United States)

    Fang, LI; Longlong, DAI; Zhiying, JIANG; Shunzi, LI

    2017-02-01

    The dramatically increasing volume of data makes the computational complexity of traditional clustering algorithm rise rapidly accordingly, which leads to the longer time. So as to improve the efficiency of the stream data clustering, a distributed real-time clustering algorithm (S-Single-Pass) based on the classic Single-Pass [1] algorithm and Storm [2] computation framework was designed in this paper. By employing this kind of method in the Topic Detection and Tracking (TDT) [3], the real-time performance of topic detection arises effectively. The proposed method splits the clustering process into two parts: one part is to form clusters for the multi-thread parallel clustering, the other part is to merge the generated clusters in the previous process and update the global clusters. Through the experimental results, the conclusion can be drawn that the proposed method have the nearly same clustering accuracy as the traditional Single-Pass algorithm and the clustering accuracy remains steady, computing rate increases linearly when increasing the number of cluster machines and nodes (processing threads).

  12. A fuzzy logic based clustering strategy for improving vehicular ad ...

    Indian Academy of Sciences (India)

    tions of cluster head selection approaches common performance metrics are utilized such as CH changes, CH stability, etc. But, reasonable comparison of various clustering approaches is a hard task due to lack of scenarios and standard testing processes, therefore standardization and more researches are needed in this ...

  13. PROSPECTS OF THE REGIONAL INTEGRATION POLICY BASED ON CLUSTER FORMATION

    Directory of Open Access Journals (Sweden)

    Elena Tsepilova

    2018-01-01

    Full Text Available The purpose of this article is to develop the theoretical foundations of regional integration policy and to determine its prospects on the basis of cluster formation. The authors use such research methods as systematization, comparative and complex analysis, synthesis, statistical method. Within the framework of the research, the concept of regional integration policy is specified, and its integration core – cluster – is allocated. The authors work out an algorithm of regional clustering, which will ensure the growth of economy and tax income. Measures have been proposed to optimize the organizational mechanism of interaction between the participants of the territorial cluster and the authorities that allow to ensure the effective functioning of clusters, including taxation clusters. Based on the results of studying the existing methods for assessing the effectiveness of cluster policy, the authors propose their own approach to evaluating the consequences of implementing the regional integration policy, according to which the list of quantitative and qualitative indicators is defined. The present article systematizes the experience and results of the cluster policy of certain European countries, that made it possible to determine the prospects and synergetic effect from the development of clusters as an integration foundation of regional policy in the Russian Federation. The authors carry out the analysis of activity of cluster formations using the example of the Rostov region – a leader in the formation of conditions for the cluster policy development in the Southern Federal District. 11 clusters and cluster initiatives are developing in this region. As a result, the authors propose measures for support of the already existing clusters and creation of the new ones.

  14. Structure based alignment and clustering of proteins (STRALCP)

    Science.gov (United States)

    Zemla, Adam T.; Zhou, Carol E.; Smith, Jason R.; Lam, Marisa W.

    2013-06-18

    Disclosed are computational methods of clustering a set of protein structures based on local and pair-wise global similarity values. Pair-wise local and global similarity values are generated based on pair-wise structural alignments for each protein in the set of protein structures. Initially, the protein structures are clustered based on pair-wise local similarity values. The protein structures are then clustered based on pair-wise global similarity values. For each given cluster both a representative structure and spans of conserved residues are identified. The representative protein structure is used to assign newly-solved protein structures to a group. The spans are used to characterize conservation and assign a "structural footprint" to the cluster.

  15. GA-Based Membrane Evolutionary Algorithm for Ensemble Clustering

    Directory of Open Access Journals (Sweden)

    Yanhua Wang

    2017-01-01

    Full Text Available Ensemble clustering can improve the generalization ability of a single clustering algorithm and generate a more robust clustering result by integrating multiple base clusterings, so it becomes the focus of current clustering research. Ensemble clustering aims at finding a consensus partition which agrees as much as possible with base clusterings. Genetic algorithm is a highly parallel, stochastic, and adaptive search algorithm developed from the natural selection and evolutionary mechanism of biology. In this paper, an improved genetic algorithm is designed by improving the coding of chromosome. A new membrane evolutionary algorithm is constructed by using genetic mechanisms as evolution rules and combines with the communication mechanism of cell-like P system. The proposed algorithm is used to optimize the base clusterings and find the optimal chromosome as the final ensemble clustering result. The global optimization ability of the genetic algorithm and the rapid convergence of the membrane system make membrane evolutionary algorithm perform better than several state-of-the-art techniques on six real-world UCI data sets.

  16. GA-Based Membrane Evolutionary Algorithm for Ensemble Clustering.

    Science.gov (United States)

    Wang, Yanhua; Liu, Xiyu; Xiang, Laisheng

    2017-01-01

    Ensemble clustering can improve the generalization ability of a single clustering algorithm and generate a more robust clustering result by integrating multiple base clusterings, so it becomes the focus of current clustering research. Ensemble clustering aims at finding a consensus partition which agrees as much as possible with base clusterings. Genetic algorithm is a highly parallel, stochastic, and adaptive search algorithm developed from the natural selection and evolutionary mechanism of biology. In this paper, an improved genetic algorithm is designed by improving the coding of chromosome. A new membrane evolutionary algorithm is constructed by using genetic mechanisms as evolution rules and combines with the communication mechanism of cell-like P system. The proposed algorithm is used to optimize the base clusterings and find the optimal chromosome as the final ensemble clustering result. The global optimization ability of the genetic algorithm and the rapid convergence of the membrane system make membrane evolutionary algorithm perform better than several state-of-the-art techniques on six real-world UCI data sets.

  17. Cluster algebras bases on vertex operator algebras

    Czech Academy of Sciences Publication Activity Database

    Zuevsky, Alexander

    2016-01-01

    Roč. 30, 28-29 (2016), č. článku 1640030. ISSN 0217-9792 Institutional support: RVO:67985840 Keywords : cluster alegbras * vertex operator algebras * Riemann surfaces Subject RIV: BA - General Mathematics Impact factor: 0.736, year: 2016 http://www.worldscientific.com/doi/abs/10.1142/S0217979216400300

  18. Risk Probability Estimating Based on Clustering

    DEFF Research Database (Denmark)

    Chen, Yong; Jensen, Christian D.; Gray, Elizabeth

    2003-01-01

    biquitous computing environments are highly dynamic, with new unforeseen circumstances and constantly changing environments, which introduces new risks that cannot be assessed through traditional means of risk analysis. Mobile entities in a ubiquitous computing environment require the ability to ...... on the automatic clustering of defining features of the environment and the other entity, which helps avoid subjective judgments as much as possible....

  19. An Intelligent Clustering Based Methodology for Confusable ...

    African Journals Online (AJOL)

    Journal of the Nigerian Association of Mathematical Physics ... The system assigns patients with severity levels in all the clusters. ... The system compares favorably with diagnosis arrived at by experienced physicians and also provides patients' level of severity in each confusable disease and the degree of confusability of ...

  20. Oak Ridge Institutional Cluster Autotune Test Drive Report

    Energy Technology Data Exchange (ETDEWEB)

    Jibonananda, Sanyal [ORNL; New, Joshua Ryan [ORNL

    2014-02-01

    The Oak Ridge Institutional Cluster (OIC) provides general purpose computational resources for the ORNL staff to run computation heavy jobs that are larger than desktop applications but do not quite require the scale and power of the Oak Ridge Leadership Computing Facility (OLCF). This report details the efforts made and conclusions derived in performing a short test drive of the cluster resources on Phase 5 of the OIC. EnergyPlus was used in the analysis as a candidate user program and the overall software environment was evaluated against anticipated challenges experienced with resources such as the shared memory-Nautilus (JICS) and Titan (OLCF). The OIC performed within reason and was found to be acceptable in the context of running EnergyPlus simulations. The number of cores per node and the availability of scratch space per node allow non-traditional desktop focused applications to leverage parallel ensemble execution. Although only individual runs of EnergyPlus were executed, the software environment on the OIC appeared suitable to run ensemble simulations with some modifications to the Autotune workflow. From a standpoint of general usability, the system supports common Linux libraries, compilers, standard job scheduling software (Torque/Moab), and the OpenMPI library (the only MPI library) for MPI communications. The file system is a Panasas file system which literature indicates to be an efficient file system.

  1. Efficient Computation of Multiple Density-Based Clustering Hierarchies

    OpenAIRE

    Neto, Antonio Cavalcante Araujo; Sander, Joerg; Campello, Ricardo J. G. B.; Nascimento, Mario A.

    2017-01-01

    HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts in the sense that a small change in mpts typically leads to only a small or no change in the clustering structure, choosing a "good" mpts value can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters...

  2. A Flocking Based algorithm for Document Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.

  3. Clustering common bean mutants based on heterotic groupings ...

    African Journals Online (AJOL)

    The objective of this study was to cluster bean mutants from a bean mutation breeding programme, based on heterotic groupings. This was achieved by genotyping 16 bean genotypes, using 21 Simple Sequence Repeats (SSR) bean markers. From the results, three different clusters A, B and C, were obtained suggesting ...

  4. Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

    Directory of Open Access Journals (Sweden)

    Olszewski Kellen L

    2007-07-01

    Full Text Available Abstract Background The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes. Results We developed Nearest Neighbor Networks (NNN, a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. Conclusion The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the

  5. Effect of point-of-care CD4 cell count results on linkage to care and antiretroviral initiation during a home-based HIV testing campaign: a non-blinded, cluster-randomised trial.

    Science.gov (United States)

    Desai, Mitesh A; Okal, Dancun O; Rose, Charles E; Ndivo, Richard; Oyaro, Boaz; Otieno, Fredrick O; Williams, Tiffany; Chen, Robert T; Zeh, Clement; Samandari, Taraz

    2017-09-01

    HIV disease staging with referral laboratory-based CD4 cell count testing is a key barrier to the initiation of antiretroviral treatment (ART). Point-of-care CD4 cell counts can improve linkage to HIV care among people living with HIV, but its effect has not been assessed with a randomised controlled trial in the context of home-based HIV counselling and testing (HBCT). We did a two-arm, cluster-randomised, controlled efficacy trial in two districts of western Kenya with ongoing HBCT. Housing compounds were randomly assigned (1:1) to point-of-care CD4 cell counts (366 compounds with 417 participants) or standard-of-care (318 compounds with 353 participants) CD4 cell counts done at one of three referral laboratories serving the study catchment area. In each compound, we enrolled people with HIV not engaged in care in the previous 6 months. All participants received post-test counselling and referral for HIV care. Point-of-care test participants received additional counselling on the result, including ART eligibility if CD4 was less than 350 cells per μL, the cutoff in Kenyan guidelines. Participants were interviewed 6 months after enrolment to ascertain whether they sought HIV care, verified through chart reviews at 23 local clinics. The prevalence of loss to follow-up at 6 months (LTFU) was listed as the main outcome in the study protocol. We analysed linkage to care at 6 months (defined as 1-LTFU) as the primary outcome. All analyses were by intention to treat. This trial is registered at ClinicalTrials.gov, number NCT02515149. We enrolled 770 participants between July 1, 2013, and Feb 28, 2014. 692 (90%) had verified linkage to care status and 78 (10%) were lost to follow-up. Of 371 participants in the point-of-care group, 215 (58%) had linked to care within 6 months versus 108 (34%) of 321 in the standard-of-care group (Cox proportional multivariable hazard ratio [HR] 2·14, 95% CI 1·67-2·74; log rank pPoint-of-care CD4 cell counts in a resource-limited HBCT

  6. Adaptive density trajectory cluster based on time and space distance

    Science.gov (United States)

    Liu, Fagui; Zhang, Zhijie

    2017-10-01

    There are some hotspot problems remaining in trajectory cluster for discovering mobile behavior regularity, such as the computation of distance between sub trajectories, the setting of parameter values in cluster algorithm and the uncertainty/boundary problem of data set. As a result, based on the time and space, this paper tries to define the calculation method of distance between sub trajectories. The significance of distance calculation for sub trajectories is to clearly reveal the differences in moving trajectories and to promote the accuracy of cluster algorithm. Besides, a novel adaptive density trajectory cluster algorithm is proposed, in which cluster radius is computed through using the density of data distribution. In addition, cluster centers and number are selected by a certain strategy automatically, and uncertainty/boundary problem of data set is solved by designed weighted rough c-means. Experimental results demonstrate that the proposed algorithm can perform the fuzzy trajectory cluster effectively on the basis of the time and space distance, and obtain the optimal cluster centers and rich cluster results information adaptably for excavating the features of mobile behavior in mobile and sociology network.

  7. Cluster-based spectrum sensing for cognitive radios with imperfect channel to cluster-head

    KAUST Repository

    Ben Ghorbel, Mahdi

    2012-04-01

    Spectrum sensing is considered as the first and main step for cognitive radio systems to achieve an efficient use of spectrum. Cooperation and clustering among cognitive radio users are two techniques that can be employed with spectrum sensing in order to improve the sensing performance by reducing miss-detection and false alarm. In this paper, within the framework of a clustering-based cooperative spectrum sensing scheme, we study the effect of errors in transmitting the local decisions from the secondary users to the cluster heads (or the fusion center), while considering non-identical channel conditions between the secondary users. Closed-form expressions for the global probabilities of detection and false alarm at the cluster head are derived. © 2012 IEEE.

  8. A cluster randomised trial introducing rapid diagnostic tests into registered drug shops in Uganda

    DEFF Research Database (Denmark)

    Mbonye, Anthony K; Magnussen, Pascal; Lal, Sham

    2015-01-01

    the impact of introducing rapid diagnostic tests for malaria (mRDTs) in registered drug shops in Uganda, with the aim to increase appropriate treatment of malaria with artemisinin-based combination therapy (ACT) in patients seeking treatment for fever in drug shops. METHODS: A cluster-randomized trial...... substantially improved appropriate treatment of malaria with ACT. TRIAL REGISTRATION: ClinicalTrials.gov NCT01194557....

  9. Improving Tensor Based Recommenders with Clustering

    DEFF Research Database (Denmark)

    Leginus, Martin; Dolog, Peter; Zemaitis, Valdas

    2012-01-01

    Social tagging systems (STS) model three types of entities (i.e. tag-user-item) and relationships between them are encoded into a 3-order tensor. Latent relationships and patterns can be discovered by applying tensor factorization techniques like Higher Order Singular Value Decomposition (HOSVD...... of the recommendations and execution time are improved and memory requirements are decreased. The clustering is motivated by the fact that many tags in a tag space are semantically similar thus the tags can be grouped. Finally, promising experimental results are presented...

  10. The Quintuplet cluster - A young massive cluster study based on proper motion membership

    Science.gov (United States)

    Hußmann, Benjamin

    2014-01-01

    Young massive clusters define the high mass range of current clustered star formation and are frequently found in starburst and interacting galaxies. As - with the exception of the nearest galaxies within the local group - extragalactic clusters can not be resolved into individual stars, the few young massive clusters in the Milky Way and the Magellanic Clouds might serve as templates for unresolved young massive clusters in more distant galaxies. Due to their high masses, these clusters sample the full range of stellar masses. In combination with the small or negligible spreads in age or metallicity of their stellar populations, this makes these object unique laboratories to study stellar evolution, especially in the high mass range.Furthermore, they allow to probe the initial mass function, which describes the distribution of masses of a stellar population at its birth, in its entirety. The Quintuplet cluster is one of three known young massive clusters residing in the central molecular zone and is located at a projected distance of 30 pc from the Galactic centre. Because of the rather extreme conditions in this region, a potential dependence of the outcome of the star formation process on the environmental conditions under which the star formation event takes place might leave its imprint in the stellar mass function. As the Quintuplet cluster is lacking a dense core and shows a somewhat dispersed appearance, it is crucial to effectively distinguish between cluster stars and the rich population of stars from the Galactic field along the line of sight to the Galactic centre in order to measure its present-day mass function. In this thesis, a clean sample of cluster stars is derived based on the common bulk proper motion of the cluster with respect to the Galactic field and a subsequent colour selection. The diffraction limited resolution of multi-epoch near-infrared imaging observations obtained at the ESO Very Large Telescope with adaptive optics correction

  11. XML documents cluster research based on frequent subpatterns

    Science.gov (United States)

    Ding, Tienan; Li, Wei; Li, Xiongfei

    2015-12-01

    XML data is widely used in the information exchange field of Internet, and XML document data clustering is the hot research topic. In the XML document clustering process, measure differences between two XML documents is time costly, and impact the efficiency of XML document clustering. This paper proposed an XML documents clustering method based on frequent patterns of XML document dataset, first proposed a coding tree structure for encoding the XML document, and translate frequent pattern mining from XML documents into frequent pattern mining from string. Further, using the cosine similarity calculation method and cohesive hierarchical clustering method for XML document dataset by frequent patterns. Because of frequent patterns are subsets of the original XML document data, so the time consumption of XML document similarity measure is reduced. The experiment runs on synthetic dataset and the real datasets, the experimental result shows that our method is efficient.

  12. Local Community Detection Algorithm Based on Minimal Cluster

    Directory of Open Access Journals (Sweden)

    Yong Zhou

    2016-01-01

    Full Text Available In order to discover the structure of local community more effectively, this paper puts forward a new local community detection algorithm based on minimal cluster. Most of the local community detection algorithms begin from one node. The agglomeration ability of a single node must be less than multiple nodes, so the beginning of the community extension of the algorithm in this paper is no longer from the initial node only but from a node cluster containing this initial node and nodes in the cluster are relatively densely connected with each other. The algorithm mainly includes two phases. First it detects the minimal cluster and then finds the local community extended from the minimal cluster. Experimental results show that the quality of the local community detected by our algorithm is much better than other algorithms no matter in real networks or in simulated networks.

  13. Clustering economies based on multiple criteria decision making techniques

    Directory of Open Access Journals (Sweden)

    Mansour Momeni

    2011-10-01

    Full Text Available One of the primary concerns on many countries is to determine different important factors affecting economic growth. In this paper, we study some factors such as unemployment rate, inflation ratio, population growth, average annual income, etc to cluster different countries. The proposed model of this paper uses analytical hierarchy process (AHP to prioritize the criteria and then uses a K-mean technique to cluster 59 countries based on the ranked criteria into four groups. The first group includes countries with high standards such as Germany and Japan. In the second cluster, there are some developing countries with relatively good economic growth such as Saudi Arabia and Iran. The third cluster belongs to countries with faster rates of growth compared with the countries located in the second group such as China, India and Mexico. Finally, the fourth cluster includes countries with relatively very low rates of growth such as Jordan, Mali, Niger, etc.

  14. The design of the SAFE or SORRY? study: a cluster randomised trial on the development and testing of an evidence based inpatient safety program for the prevention of adverse events

    Directory of Open Access Journals (Sweden)

    Koopmans Raymond TCM

    2009-04-01

    Full Text Available Abstract Background Patients in hospitals and nursing homes are at risk of the development of, often preventable, adverse events (AEs, which threaten patient safety. Guidelines for prevention of many types of AEs are available, however, compliance with these guidelines appears to be lacking. Besides general barriers that inhibit implementation, this non-compliance is associated with the large number of guidelines competing for attention. As implementation of a guideline is time-consuming, it is difficult for organisations to implement all available guidelines. Another problem is lack of feedback about performance using quality indicators of guideline based care and lack of a recognisable, unambiguous system for implementation. A program that allows organisations to implement multiple guidelines simultaneously may facilitate guideline use and thus improve patient safety. The aim of this study is to develop and test such an integral patient safety program that addresses several AEs simultaneously in hospitals and nursing homes. This paper reports the design of this study. Methods and design The patient safety program addresses three AEs: pressure ulcers, falls and urinary tract infections. It consists of bundles and outcome and process indicators based on the existing evidence based guidelines. In addition it includes a multifaceted tailored implementation strategy: education, patient involvement, and a computerized registration and feedback system. The patient safety program was tested in a cluster randomised trial on ten hospital wards and ten nursing home wards. The baseline period was three months followed by the implementation of the patient safety program for fourteen months. Subsequently the follow-up period was nine months. Primary outcome measure was the incidence of AEs on every ward. Secondary outcome measures were the utilization of preventive interventions and the knowledge of nurses regarding the three topics. Randomisation took

  15. Fuzzy subtractive clustering based prediction model for brand association analysis

    Directory of Open Access Journals (Sweden)

    Widodo Imam Djati

    2018-01-01

    Full Text Available The brand is one of the crucial elements that determine the success of a product. Consumers in determining the choice of a product will always consider product attributes (such as features, shape, and color, however consumers are also considering the brand. Brand will guide someone to associate a product with specific attributes and qualities. This study was designed to identify the product attributes and predict brand performance with those attributes. A survey was run to obtain the attributes affecting the brand. Subtractive Fuzzy Clustering was used to classify and predict product brand association based aspects of the product under investigation. The result indicates that the five attributes namely shape, ease, image, quality and price can be used to classify and predict the brand. Training step gives best FSC model with radii (ra = 0.1. It develops 70 clusters/rules with MSE (Training is 9.7093e-016. By using 14 data testing, the model can predict brand very well (close to the target with MSE is 0.6005 and its’ accuracy rate is 71%.

  16. Community Clustering Algorithm in Complex Networks Based on Microcommunity Fusion

    Directory of Open Access Journals (Sweden)

    Jin Qi

    2015-01-01

    Full Text Available With the further research on physical meaning and digital features of the community structure in complex networks in recent years, the improvement of effectiveness and efficiency of the community mining algorithms in complex networks has become an important subject in this area. This paper puts forward a concept of the microcommunity and gets final mining results of communities through fusing different microcommunities. This paper starts with the basic definition of the network community and applies Expansion to the microcommunity clustering which provides prerequisites for the microcommunity fusion. The proposed algorithm is more efficient and has higher solution quality compared with other similar algorithms through the analysis of test results based on network data set.

  17. A Geometric Fuzzy-Based Approach for Airport Clustering

    Directory of Open Access Journals (Sweden)

    Maria Nadia Postorino

    2014-01-01

    Full Text Available Airport classification is a common need in the air transport field due to several purposes—such as resource allocation, identification of crucial nodes, and real-time identification of substitute nodes—which also depend on the involved actors’ expectations. In this paper a fuzzy-based procedure has been proposed to cluster airports by using a fuzzy geometric point of view according to the concept of unit-hypercube. By representing each airport as a point in the given reference metric space, the geometric distance among airports—which corresponds to a measure of similarity—has in fact an intrinsic fuzzy nature due to the airport specific characteristics. The proposed procedure has been applied to a test case concerning the Italian airport network and the obtained results are in line with expectations.

  18. Symptom-Based Clustering in Chronic Rhinosinusitis Relates to History of Aspirin Sensitivity and Postsurgical Outcomes.

    Science.gov (United States)

    Divekar, Rohit; Patel, Neil; Jin, Jay; Hagan, John; Rank, Matthew; Lal, Devyani; Kita, Hirohito; O'Brien, Erin

    2015-01-01

    Symptoms burden in chronic rhinosinusitis (CRS) may be assessed by interviews or by means of validated tools such as the 22-item SinoNasal Outcome Test (SNOT-22). However, when only the total SNOT-22 scores are used, the pattern of symptom distribution and heterogeneity in patient symptoms is lost. To use a standardized symptom assessment tool (SNOT-22) on preoperative symptoms to understand symptom heterogeneity in CRS and to aid in characterization of distinguishing clinical features between subgroups. This was a retrospective review of 97 surgical patients with CRS. Symptom-based clusters were derived on the basis of presurgical SNOT-22 scores using unsupervised analysis and network graphs. Comparison between clusters was performed for clinical and demographic parameters, postsurgical symptom scores, and presence or absence of a history of aspirin sensitivity. Unsupervised analysis reveals coclustering of specific symptoms in the SNOT-22 tool. Using symptom-based clustering, patients with CRS were stratified into severe overall (mean total score, 90.8), severe sinonasal (score, 62), moderate sinonasal (score, 40), moderate nonsinonasal (score, 37) and mild sinonasal (score, 16) clusters. The last 2 clusters were associated with lack of history of aspirin sensitivity. The first cluster had a rapid relapse in symptoms postoperatively, and the last cluster demonstrated minimal symptomatic improvement after surgery. Symptom-based clusters in CRS reveal a distinct grouping of symptom burden that may relate to aspirin sensitivity and treatment outcomes. Copyright © 2015 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  19. Rough hypercuboid based supervised clustering of miRNAs.

    Science.gov (United States)

    Paul, Sushmita; Vera, Julio

    2015-07-01

    The microRNAs are small, endogenous non-coding RNAs found in plants, animals, and some viruses, which function in RNA silencing and post-transcriptional regulation of gene expression. It is suggested by various genome-wide studies that a substantial fraction of miRNA genes is likely to form clusters. The coherent expression of the miRNA clusters can then be used to classify samples according to the clinical outcome. In this regard, a new clustering algorithm, termed as rough hypercuboid based supervised attribute clustering (RH-SAC), is proposed to find such groups of miRNAs. The proposed algorithm is based on the theory of rough set, which directly incorporates the information of sample categories into the miRNA clustering process, generating a supervised clustering algorithm for miRNAs. The effectiveness of the new approach is demonstrated on several publicly available miRNA expression data sets using support vector machine. The so-called B.632+ bootstrap error estimate is used to minimize the variability and biasedness of the derived results. The association of the miRNA clusters to various biological pathways is also shown by doing pathway enrichment analysis.

  20. Communication style and exercise compliance in physiotherapy (CONNECT): a cluster randomized controlled trial to test a theory-based intervention to increase chronic low back pain patients' adherence to physiotherapists' recommendations: study rationale, design, and methods.

    Science.gov (United States)

    Lonsdale, Chris; Hall, Amanda M; Williams, Geoffrey C; McDonough, Suzanne M; Ntoumanis, Nikos; Murray, Aileen; Hurley, Deirdre A

    2012-06-15

    Physical activity and exercise therapy are among the accepted clinical rehabilitation guidelines and are recommended self-management strategies for chronic low back pain. However, many back pain sufferers do not adhere to their physiotherapist's recommendations. Poor patient adherence may decrease the effectiveness of advice and home-based rehabilitation exercises. According to self-determination theory, support from health care practitioners can promote patients' autonomous motivation and greater long-term behavioral persistence (e.g., adherence to physiotherapists' recommendations). The aim of this trial is to assess the effect of an intervention designed to increase physiotherapists' autonomy-supportive communication on low back pain patients' adherence to physical activity and exercise therapy recommendations. This study will be a single-blinded cluster randomized controlled trial. Outpatient physiotherapy centers (N =12) in Dublin, Ireland (population = 1.25 million) will be randomly assigned using a computer-generated algorithm to either the experimental or control arm. Physiotherapists in the experimental arm (two hospitals and four primary care clinics) will attend eight hours of communication skills training. Training will include handouts, workbooks, video examples, role-play, and discussion designed to teach physiotherapists how to communicate in a manner that promotes autonomous patient motivation. Physiotherapists in the waitlist control arm (two hospitals and four primary care clinics) will not receive this training. Participants (N = 292) with chronic low back pain will complete assessments at baseline, as well as 1 week, 4 weeks, 12 weeks, and 24 weeks after their first physiotherapy appointment. Primary outcomes will include adherence to physiotherapy recommendations, as well as low back pain, function, and well-being. Participants will be blinded to treatment allocation, as they will not be told if their physiotherapist has

  1. Likelihood-based inference for clustered line transect data

    DEFF Research Database (Denmark)

    Waagepetersen, Rasmus Plenge; Schweder, Tore

    The uncertainty in estimation of spatial animal density from line transect surveys depends on the degree of spatial clustering in the animal population. To quantify the clustering we model line transect data as independent thinnings of spatial shot-noise Cox processes. Likelihood-based inference...... is implemented using Markov Chain Monte Carlo methods to obtain efficient estimates of spatial clustering parameters. Uncertainty is addressed using parametric bootstrap or by consideration of posterior distributions in a Bayesian setting. Maximum likelihood estimation and Bayesian inference is compared...

  2. Two clusters of child molesters based on impulsiveness

    Directory of Open Access Journals (Sweden)

    Danilo A. Baltieri

    2015-06-01

    Full Text Available Objective:High impulsiveness is a general problem that affects most criminal offenders and is associated with greater recidivism risk. A cluster analysis of impulsiveness measured by the Barratt Impulsiveness Scale - Version 11 (BIS-11 was performed on a sample of hands-on child molesters.Methods:The sample consisted of 208 child molesters enrolled in two different sectional studies carried out in São Paulo, Brazil. Using three factors from the BIS-11, a k-means cluster analysis was performed using the average silhouette width to determine cluster number. Direct logistic regression was performed to analyze the association of criminological and clinical features with the resulting clusters.Results:Two clusters were delineated. The cluster characterized by higher impulsiveness showed higher scores on the Sexual Screening for Pedophilic Interests (SSPI, Static-99, and Sexual Addiction Screening Test.Conclusions:Given that child molesters are an extremely heterogeneous population, the “number of victims” item of the SSPI should call attention to those offenders with the highest motor, attentional, and non-planning impulsiveness. Our findings could have implications in terms of differences in therapeutic management for these two groups, with the most impulsive cluster benefitting from psychosocial strategies combined with pharmacological interventions.

  3. DSN Beowulf Cluster-Based VLBI Correlator

    Science.gov (United States)

    Rogstad, Stephen P.; Jongeling, Andre P.; Finley, Susan G.; White, Leslie A.; Lanyi, Gabor E.; Clark, John E.; Goodhart, Charles E.

    2009-01-01

    The NASA Deep Space Network (DSN) requires a broadband VLBI (very long baseline interferometry) correlator to process data routinely taken as part of the VLBI source Catalogue Maintenance and Enhancement task (CAT M&E) and the Time and Earth Motion Precision Observations task (TEMPO). The data provided by these measurements are a crucial ingredient in the formation of precision deep-space navigation models. In addition, a VLBI correlator is needed to provide support for other VLBI related activities for both internal and external customers. The JPL VLBI Correlator (JVC) was designed, developed, and delivered to the DSN as a successor to the legacy Block II Correlator. The JVC is a full-capability VLBI correlator that uses software processes running on multiple computers to cross-correlate two-antenna broadband noise data. Components of this new system (see Figure 1) consist of Linux PCs integrated into a Beowulf Cluster, an existing Mark5 data storage system, a RAID array, an existing software correlator package (SoftC) originally developed for Delta DOR Navigation processing, and various custom- developed software processes and scripts. Parallel processing on the JVC is achieved by assigning slave nodes of the Beowulf cluster to process separate scans in parallel until all scans have been processed. Due to the single stream sequential playback of the Mark5 data, some ramp-up time is required before all nodes can have access to required scan data. Core functions of each processing step are accomplished using optimized C programs. The coordination and execution of these programs across the cluster is accomplished using Pearl scripts, PostgreSQL commands, and a handful of miscellaneous system utilities. Mark5 data modules are loaded on Mark5 Data systems playback units, one per station. Data processing is started when the operator scans the Mark5 systems and runs a script that reads various configuration files and then creates an experiment-dependent status database

  4. Vertex finding by sparse model-based clustering

    Science.gov (United States)

    Frühwirth, R.; Eckstein, K.; Frühwirth-Schnatter, S.

    2016-10-01

    The application of sparse model-based clustering to the problem of primary vertex finding is discussed. The observed z-positions of the charged primary tracks in a bunch crossing are modeled by a Gaussian mixture. The mixture parameters are estimated via Markov Chain Monte Carlo (MCMC). Sparsity is achieved by an appropriate prior on the mixture weights. The results are shown and compared to clustering by the expectation-maximization (EM) algorithm.

  5. Cluster-based localization and tracking in ubiquitous computing systems

    CERN Document Server

    Martínez-de Dios, José Ramiro; Torres-González, Arturo; Ollero, Anibal

    2017-01-01

    Localization and tracking are key functionalities in ubiquitous computing systems and techniques. In recent years a very high variety of approaches, sensors and techniques for indoor and GPS-denied environments have been developed. This book briefly summarizes the current state of the art in localization and tracking in ubiquitous computing systems focusing on cluster-based schemes. Additionally, existing techniques for measurement integration, node inclusion/exclusion and cluster head selection are also described in this book.

  6. Radiobiological analyse based on cell cluster models

    International Nuclear Information System (INIS)

    Lin Hui; Jing Jia; Meng Damin; Xu Yuanying; Xu Liangfeng

    2010-01-01

    The influence of cell cluster dimension on EUD and TCP for targeted radionuclide therapy was studied using the radiobiological method. The radiobiological features of tumor with activity-lack in core were evaluated and analyzed by associating EUD, TCP and SF.The results show that EUD will increase with the increase of tumor dimension under the activity homogeneous distribution. If the extra-cellular activity was taken into consideration, the EUD will increase 47%. Under the activity-lack in tumor center and the requirement of TCP=0.90, the α cross-fire influence of 211 At could make up the maximum(48 μm)3 activity-lack for Nucleus source, but(72 μm)3 for Cytoplasm, Cell Surface, Cell and Voxel sources. In clinic,the physician could prefer the suggested dose of Cell Surface source in case of the future of local tumor control for under-dose. Generally TCP could well exhibit the effect difference between under-dose and due-dose, but not between due-dose and over-dose, which makes TCP more suitable for the therapy plan choice. EUD could well exhibit the difference between different models and activity distributions,which makes it more suitable for the research work. When the user uses EUD to study the influence of activity inhomogeneous distribution, one should keep the consistency of the configuration and volume of the former and the latter models. (authors)

  7. Authentication Based on Multilayer Clustering in Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    Suh Heyi-Sook

    2005-01-01

    Full Text Available In this paper, we describe a secure cluster-routing protocol based on a multilayer scheme in ad hoc networks. This work provides scalable, threshold authentication scheme in ad hoc networks. We present detailed security threats against ad hoc routing protocols, specifically examining cluster-based routing. Our proposed protocol, called "authentication based on multilayer clustering for ad hoc networks" (AMCAN, designs an end-to-end authentication protocol that relies on mutual trust between nodes in other clusters. The AMCAN strategy takes advantage of a multilayer architecture that is designed for an authentication protocol in a cluster head (CH using a new concept of control cluster head (CCH scheme. We propose an authentication protocol that uses certificates containing an asymmetric key and a multilayer architecture so that the CCH is achieved using the threshold scheme, thereby reducing the computational overhead and successfully defeating all identified attacks. We also use a more extensive area, such as a CCH, using an identification protocol to build a highly secure, highly available authentication service, which forms the core of our security framework.

  8. Clustering based hybrid approach for facility location problem

    Directory of Open Access Journals (Sweden)

    Ashish Sharma

    2017-12-01

    Full Text Available The main objective of facility location problem is the utilization of the facility by maximum number of possible customers so that the profit is maximized. For instance, in some services like wireless sensor networks, Wi-Fi, repeaters, etc., where the service area is limited, some specific equipment is installed in such a way that it could be used by maximum number of users. Here, the number of users for a particular facility is optimized with the help of clustering technique. The study develops a model for facility allocation problem. For the solution algorithm, a hybrid approach which is based on clustering and mixed integer linear programming (MILP is proposed. The proposed method consists of two parts where in the first part, the K-means clustering technique is used and in the second part, for each cluster an MILP technique is implemented so that the facility which yields the maximum profit is obtained. Numerical examples for clustering and without clustering are presented. Analysis shows that due to clustering the average distance between facility and customer is significantly reduced.

  9. A Hybrid III stepped wedge cluster randomized trial testing an implementation strategy to facilitate the use of an evidence-based practice in VA Homeless Primary Care Treatment Programs.

    Science.gov (United States)

    Simmons, Molly M; Gabrielian, Sonya; Byrne, Thomas; McCullough, Megan B; Smith, Jeffery L; Taylor, Thom J; O'Toole, Tom P; Kane, Vincent; Yakovchenko, Vera; McInnes, D Keith; Smelson, David A

    2017-04-04

    Homeless veterans often have multiple health care and psychosocial needs, including assistance with access to housing and health care, as well as support for ongoing treatment engagement. The Department of Veterans Affairs (VA) developed specialized Homeless Patient Alignment Care Teams (HPACT) with the goal of offering an integrated, "one-stop program" to address housing and health care needs of homeless veterans. However, while 70% of HPACT's veteran enrollees have co-occurring mental health and substance use disorders, HPACT does not have a uniform, embedded treatment protocol for this subpopulation. One wraparound intervention designed to address the needs of homeless veterans with co-occurring mental health and substance use disorders which is suitable to be integrated into HPACT clinic sites is the evidence-based practice called Maintaining Independence and Sobriety through Systems Integration, Outreach, and Networking-Veterans Edition, or MISSION-Vet. Despite the promise of MISSION-Vet within HPACT clinics, implementation of an evidence-based intervention within a busy program like HPACT can be difficult. The current study is being undertaken to identify an appropriate implementation strategy for MISSION-Vet within HPACT. The study will test the implementation platform called Facilitation and compared to implementation as usual (IU). The aims of this study are as follows: (1) Compare the extent to which IU or Facilitation strategies achieve fidelity to the MISSION-Vet intervention as delivered by HPACT homeless provider staff. (2) Compare the effects of Facilitation and IU strategies on the National HPACT Performance Measures. (3) Compare the effects of IU and Facilitation on the permanent housing status. (4) Identify and describe key stakeholders' (patients, providers, staff) experiences with, and perspectives on, the barriers to, and facilitators of implementing MISSION. Type III Hybrid modified stepped wedge implementation comparing IU to Facilitation

  10. Ontology-based topic clustering for online discussion data

    Science.gov (United States)

    Wang, Yongheng; Cao, Kening; Zhang, Xiaoming

    2013-03-01

    With the rapid development of online communities, mining and extracting quality knowledge from online discussions becomes very important for the industrial and marketing sector, as well as for e-commerce applications and government. Most of the existing techniques model a discussion as a social network of users represented by a user-based graph without considering the content of the discussion. In this paper we propose a new multilayered mode to analysis online discussions. The user-based and message-based representation is combined in this model. A novel frequent concept sets based clustering method is used to cluster the original online discussion network into topic space. Domain ontology is used to improve the clustering accuracy. Parallel methods are also used to make the algorithms scalable to very large data sets. Our experimental study shows that the model and algorithms are effective when analyzing large scale online discussion data.

  11. A novel clustering algorithm based on quantum games

    International Nuclear Information System (INIS)

    Li Qiang; He Yan; Jiang Jingping

    2009-01-01

    Enormous successes have been made by quantum algorithms during the last decade. In this paper, we combine the quantum game with the problem of data clustering, and then develop a quantum-game-based clustering algorithm, in which data points in a dataset are considered as players who can make decisions and implement quantum strategies in quantum games. After each round of a quantum game, each player's expected payoff is calculated. Later, he uses a link-removing-and-rewiring (LRR) function to change his neighbors and adjust the strength of links connecting to them in order to maximize his payoff. Further, algorithms are discussed and analyzed in two cases of strategies, two payoff matrixes and two LRR functions. Consequently, the simulation results have demonstrated that data points in datasets are clustered reasonably and efficiently, and the clustering algorithms have fast rates of convergence. Moreover, the comparison with other algorithms also provides an indication of the effectiveness of the proposed approach.

  12. A time-series approach for clustering farms based on slaughterhouse health aberration data.

    Science.gov (United States)

    Hulsegge, B; de Greef, K H

    2018-05-01

    A large amount of data is collected routinely in meat inspection in pig slaughterhouses. A time series clustering approach is presented and applied that groups farms based on similar statistical characteristics of meat inspection data over time. A three step characteristic-based clustering approach was used from the idea that the data contain more info than the incidence figures. A stratified subset containing 511,645 pigs was derived as a study set from 3.5 years of meat inspection data. The monthly averages of incidence of pleuritis and of pneumonia of 44 Dutch farms (delivering 5149 batches to 2 pig slaughterhouses) were subjected to 1) derivation of farm level data characteristics 2) factor analysis and 3) clustering into groups of farms. The characteristic-based clustering was able to cluster farms for both lung aberrations. Three groups of data characteristics were informative, describing incidence, time pattern and degree of autocorrelation. The consistency of clustering similar farms was confirmed by repetition of the analysis in a larger dataset. The robustness of the clustering was tested on a substantially extended dataset. This confirmed the earlier results, three data distribution aspects make up the majority of distinction between groups of farms and in these groups (clusters) the majority of the farms was allocated comparable to the earlier allocation (75% and 62% for pleuritis and pneumonia, respectively). The difference between pleuritis and pneumonia in their seasonal dependency was confirmed, supporting the biological relevance of the clustering. Comparison of the identified clusters of statistically comparable farms can be used to detect farm level risk factors causing the health aberrations beyond comparison on disease incidence and trend alone. Copyright © 2018 Elsevier B.V. All rights reserved.

  13. clusters

    Indian Academy of Sciences (India)

    2017-09-27

    Sep 27, 2017 ... while CuCoNO, Co3NO, Cu3CoNO, Cu2Co3NO, Cu3Co3NO and Cu6CoNO clusters display stronger chemical stability. Magnetic and electronic properties are also discussed. The magnetic moment is affected by charge transfer and the spd hybridization. Keywords. CumConNO (m + n = 2–7) clusters; ...

  14. Analyzing Dynamic Probabilistic Risk Assessment Data through Topology-Based Clustering

    Energy Technology Data Exchange (ETDEWEB)

    Diego Mandelli; Dan Maljovec; BeiWang; Valerio Pascucci; Peer-Timo Bremer

    2013-09-01

    We investigate the use of a topology-based clustering technique on the data generated by dynamic event tree methodologies. The clustering technique we utilizes focuses on a domain-partitioning algorithm based on topological structures known as the Morse-Smale complex, which partitions the data points into clusters based on their uniform gradient flow behavior. We perform both end state analysis and transient analysis to classify the set of nuclear scenarios. We demonstrate our methodology on a dataset generated for a sodium-cooled fast reactor during an aircraft crash scenario. The simulation tracks the temperature of the reactor as well as the time for a recovery team to fix the passive cooling system. Combined with clustering results obtained previously through mean shift methodology, we present the user with complementary views of the data that help illuminate key features that may be otherwise hidden using a single methodology. By clustering the data, the number of relevant test cases to be selected for further analysis can be drastically reduced by selecting a representative from each cluster. Identifying the similarities of simulations within a cluster can also aid in the drawing of important conclusions with respect to safety analysis.

  15. Agent-based method for distributed clustering of textual information

    Science.gov (United States)

    Potok, Thomas E [Oak Ridge, TN; Reed, Joel W [Knoxville, TN; Elmore, Mark T [Oak Ridge, TN; Treadwell, Jim N [Louisville, TN

    2010-09-28

    A computer method and system for storing, retrieving and displaying information has a multiplexing agent (20) that calculates a new document vector (25) for a new document (21) to be added to the system and transmits the new document vector (25) to master cluster agents (22) and cluster agents (23) for evaluation. These agents (22, 23) perform the evaluation and return values upstream to the multiplexing agent (20) based on the similarity of the document to documents stored under their control. The multiplexing agent (20) then sends the document (21) and the document vector (25) to the master cluster agent (22), which then forwards it to a cluster agent (23) or creates a new cluster agent (23) to manage the document (21). The system also searches for stored documents according to a search query having at least one term and identifying the documents found in the search, and displays the documents in a clustering display (80) of similarity so as to indicate similarity of the documents to each other.

  16. CLUE: cluster-based retrieval of images by unsupervised learning.

    Science.gov (United States)

    Chen, Yixin; Wang, James Z; Krovetz, Robert

    2005-08-01

    In a typical content-based image retrieval (CBIR) system, target images (images in the database) are sorted by feature similarities with respect to the query. Similarities among target images are usually ignored. This paper introduces a new technique, cluster-based retrieval of images by unsupervised learning (CLUE), for improving user interaction with image retrieval systems by fully exploiting the similarity information. CLUE retrieves image clusters by applying a graph-theoretic clustering algorithm to a collection of images in the vicinity of the query. Clustering in CLUE is dynamic. In particular, clusters formed depend on which images are retrieved in response to the query. CLUE can be combined with any real-valued symmetric similarity measure (metric or nonmetric). Thus, it may be embedded in many current CBIR systems, including relevance feedback systems. The performance of an experimental image retrieval system using CLUE is evaluated on a database of around 60,000 images from COREL. Empirical results demonstrate improved performance compared with a CBIR system using the same image similarity measure. In addition, results on images returned by Google's Image Search reveal the potential of applying CLUE to real-world image data and integrating CLUE as a part of the interface for keyword-based image retrieval systems.

  17. Carbon based nanostructures: diamond clusters structured with nanotubes

    Directory of Open Access Journals (Sweden)

    O.A. Shenderova

    2003-01-01

    Full Text Available Feasibility of designing composites from carbon nanotubes and nanodiamond clusters is discussed based on atomistic simulations. Depending on nanotube size and morphology, some types of open nanotubes can be chemically connected with different facets of diamond clusters. The geometrical relation between different types of nanotubes and different diamond facets for construction of mechanically stable composites with all bonds saturated is summarized. Potential applications of the suggested nanostructures are briefly discussed based on the calculations of their electronic properties using environment dependent self-consistent tight-binding approach.

  18. Graph-based clustering and data visualization algorithms

    CERN Document Server

    Vathy-Fogarassy, Ágnes

    2013-01-01

    This work presents a data visualization technique that combines graph-based topology representation and dimensionality reduction methods to visualize the intrinsic data structure in a low-dimensional vector space. The application of graphs in clustering and visualization has several advantages. A graph of important edges (where edges characterize relations and weights represent similarities or distances) provides a compact representation of the entire complex data set. This text describes clustering and visualization methods that are able to utilize information hidden in these graphs, based on

  19. Clustering Based Approximation in Facial Image Retrieval

    OpenAIRE

    R.Pitchaiah

    2016-01-01

    The web search tool returns a great many pictures positioned by the essential words separated from the encompassing content. Existing article acknowledgment systems to prepare characterization models from human-named preparing pictures or endeavor to deduce the connection/probabilities in the middle of pictures and commented magic words. Albeit proficient in supporting in mining comparatively looking facial picture results utilizing feebly named ones, the learning phase of above bunch based c...

  20. Compositional based testing with ioco

    NARCIS (Netherlands)

    van der Bijl, H.M.; Rensink, Arend; Tretmans, G.J.; Petrenko, A.; Ulrich, A.

    2004-01-01

    Compositional testing concerns the testing of systems that consist of communicating components which can also be tested in isolation. Examples are component based testing and interoperability testing. We show that, with certain restrictions, the ioco-test theory for conformance testing is suitable

  1. Variable Selection in Model-based Clustering: A General Variable Role Modeling

    OpenAIRE

    Maugis, Cathy; Celeux, Gilles; Martin-Magniette, Marie-Laure

    2008-01-01

    The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. We propose a more versatile variable selection model which describes three possible roles for each variable: The relevant clustering variables, the irrelevant clustering variables dependent on a part of the relevant clustering variables and the irrelevant clustering variables totally indepe...

  2. Metabolomic-based identification of clusters that reflect dietary patterns.

    Science.gov (United States)

    Gibbons, Helena; Carr, Eibhlin; McNulty, Breige A; Nugent, Anne P; Walton, Janette; Flynn, Albert; Gibney, Michael J; Brennan, Lorraine

    2017-10-01

    Classification of subjects into dietary patterns generally relies on self-reporting dietary data which are prone to error. The aim of the present study was to develop a model for objective classification of people into dietary patterns based on metabolomic data. Dietary and urinary metabolomic data from the National Adult Nutrition Survey (NANS) was used in the analysis (n = 567). Two-step cluster analysis was applied to the urinary data to identify clusters. The subsequent model was used in an independent cohort to classify people into dietary patterns. Two distinct dietary patterns were identified. Cluster 1 was characterized by significantly higher intakes of breakfast cereals, low fat and skimmed milks, potatoes, fruit, fish and fish dishes (p patterns based on metabolomics data. Future applications of this approach could be developed for rapid and objective assignment of subjects into dietary patterns. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Enhanced Chain-Cluster Based Mixed Routing Algorithm for Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Husam Kareem Farhan

    2017-01-01

    Full Text Available Energy efficiency is a significant aspect in designing robust routing protocols for wireless sensor networks (WSNs. A reliable routing protocol has to be energy efficient and adaptive to the network size. To achieve high energy conservation and data aggregation, there are two major techniques, clusters and chains. In clustering technique, sensor networks are often divided into non-overlapping subsets called clusters. In chain technique, sensor nodes will be connected with the closest two neighbors, starting with the farthest node from the base station till the closest node to the base station. Each technique has its own advantages and disadvantages which motivate some researchers to come up with a hybrid routing algorithm that combines the full advantages of both cluster and chain techniques such as CCM (Chain-Cluster based Mixed routing. In this paper, introduce a routing algorithm relying on CCM algorithm called (Enhanced Chain-Cluster based Mixed routing algorithm E-CCM. Simulation results show that E-CCM algorithm improves the performance of CCM algorithm in terms of three performance metrics which are: energy consumption, network lifetime, and (FND and LND. MATLAB program is used to develop and test the simulation process in a computer with the following specifications: windows 7 (32-operating system, core i5, RAM 4 GB, hard 512 GB.

  4. Component Based Testing with ioco

    NARCIS (Netherlands)

    van der Bijl, H.M.; Rensink, Arend; Tretmans, G.J.

    Component based testing concerns the integration of components which have already been tested separately. We show that, with certain restrictions, the ioco-test theory for conformance testing is suitable for component based testing, in the sense that the integration of fully conformant components is

  5. Testing for spatial clustering of amino acid replacements within protein tertiary structure

    DEFF Research Database (Denmark)

    Yu, Jiaye; Thorne, Jeffrey L

    2006-01-01

    Widely used models of protein evolution ignore protein structure. Therefore, these models do not predict spatial clustering of amino acid replacements with respect to tertiary structure. One formal and biologically implausible possibility is that there is no tendency for amino acid replacements...... to be spatially clustered during evolution. An alternative to this is that amino acid replacements are spatially clustered and this spatial clustering can be fully explained by a tendency for similar rates of amino acid replacement at sites that are nearby in protein tertiary structure. A third possibility...... is that the amount of clustering exceeds that which can be explained solely on the basis of independently evolving protein sites with spatially clustered replacement rates. We introduce two simple and not very parametric hypothesis tests that help distinguish these three possibilities. We then apply these tests...

  6. cluster

    Indian Academy of Sciences (India)

    has been investigated electrochemically in positive and negative microenvironments, both in solution and in film. Charge nature around the active centre ... in plants, bacteria and also in mammals. This cluster is also an important constituent of a ..... selection of non-cysteine amino acid in the active centre of Rieske proteins.

  7. Nonuniform Sparse Data Clustering Cascade Algorithm Based on Dynamic Cumulative Entropy

    Directory of Open Access Journals (Sweden)

    Ning Li

    2016-01-01

    Full Text Available A small amount of prior knowledge and randomly chosen initial cluster centers have a direct impact on the accuracy of the performance of iterative clustering algorithm. In this paper we propose a new algorithm to compute initial cluster centers for k-means clustering and the best number of the clusters with little prior knowledge and optimize clustering result. It constructs the Euclidean distance control factor based on aggregation density sparse degree to select the initial cluster center of nonuniform sparse data and obtains initial data clusters by multidimensional diffusion density distribution. Multiobjective clustering approach based on dynamic cumulative entropy is adopted to optimize the initial data clusters and the best number of the clusters. The experimental results show that the newly proposed algorithm has good performance to obtain the initial cluster centers for the k-means algorithm and it effectively improves the clustering accuracy of nonuniform sparse data by about 5%.

  8. Assessment of Random Assignment in Training and Test Sets using Generalized Cluster Analysis Technique

    Directory of Open Access Journals (Sweden)

    Sorana D. BOLBOACĂ

    2011-06-01

    Full Text Available Aim: The properness of random assignment of compounds in training and validation sets was assessed using the generalized cluster technique. Material and Method: A quantitative Structure-Activity Relationship model using Molecular Descriptors Family on Vertices was evaluated in terms of assignment of carboquinone derivatives in training and test sets during the leave-many-out analysis. Assignment of compounds was investigated using five variables: observed anticancer activity and four structure descriptors. Generalized cluster analysis with K-means algorithm was applied in order to investigate if the assignment of compounds was or not proper. The Euclidian distance and maximization of the initial distance using a cross-validation with a v-fold of 10 was applied. Results: All five variables included in analysis proved to have statistically significant contribution in identification of clusters. Three clusters were identified, each of them containing both carboquinone derivatives belonging to training as well as to test sets. The observed activity of carboquinone derivatives proved to be normal distributed on every. The presence of training and test sets in all clusters identified using generalized cluster analysis with K-means algorithm and the distribution of observed activity within clusters sustain a proper assignment of compounds in training and test set. Conclusion: Generalized cluster analysis using the K-means algorithm proved to be a valid method in assessment of random assignment of carboquinone derivatives in training and test sets.

  9. Grey Wolf Optimizer Based on Powell Local Optimization Method for Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Sen Zhang

    2015-01-01

    Full Text Available One heuristic evolutionary algorithm recently proposed is the grey wolf optimizer (GWO, inspired by the leadership hierarchy and hunting mechanism of grey wolves in nature. This paper presents an extended GWO algorithm based on Powell local optimization method, and we call it PGWO. PGWO algorithm significantly improves the original GWO in solving complex optimization problems. Clustering is a popular data analysis and data mining technique. Hence, the PGWO could be applied in solving clustering problems. In this study, first the PGWO algorithm is tested on seven benchmark functions. Second, the PGWO algorithm is used for data clustering on nine data sets. Compared to other state-of-the-art evolutionary algorithms, the results of benchmark and data clustering demonstrate the superior performance of PGWO algorithm.

  10. Anomaly based Intrusion Detection using Modified Fuzzy Clustering

    Directory of Open Access Journals (Sweden)

    B.S. Harish

    2017-12-01

    Full Text Available This paper presents a network anomaly detection method based on fuzzy clustering. Computer security has become an increasingly vital field in computer science in response to the proliferation of private sensitive information. As a result, Intrusion Detection System has become an indispensable component of computer security. The proposed method consists of three steps: Pre-Processing, Feature Selection and Clustering. In pre-processing step, the duplicate samples are eliminated from the sample set. Next, principal component analysis is adopted to select the most discriminative features. In clustering step, the network samples are clustered using Robust Spatial Kernel Fuzzy C-Means (RSKFCM algorithm. RSKFCM is a variant of traditional Fuzzy C-Means which considers the neighbourhood membership information and uses kernel distance metric. To evaluate the proposed method, we conducted experiments on standard dataset and compared the results with state-of-the-art methods. We used cluster validity indices, accuracy and false positive rate as performance metrics. Experimental results inferred that, the proposed method achieves better results compared to other methods.

  11. Information bottleneck based incremental fuzzy clustering for large biomedical data.

    Science.gov (United States)

    Liu, Yongli; Wan, Xing

    2016-08-01

    Incremental fuzzy clustering combines advantages of fuzzy clustering and incremental clustering, and therefore is important in classifying large biomedical literature. Conventional algorithms, suffering from data sparsity and high-dimensionality, often fail to produce reasonable results and may even assign all the objects to a single cluster. In this paper, we propose two incremental algorithms based on information bottleneck, Single-Pass fuzzy c-means (spFCM-IB) and Online fuzzy c-means (oFCM-IB). These two algorithms modify conventional algorithms by considering different weights for each centroid and object and scoring mutual information loss to measure the distance between centroids and objects. spFCM-IB and oFCM-IB are used to group a collection of biomedical text abstracts from Medline database. Experimental results show that clustering performances of our approaches are better than such prominent counterparts as spFCM, spHFCM, oFCM and oHFCM, in terms of accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. GENERALISED MODEL BASED CONFIDENCE INTERVALS IN TWO STAGE CLUSTER SAMPLING

    Directory of Open Access Journals (Sweden)

    Christopher Ouma Onyango

    2010-09-01

    Full Text Available Chambers and Dorfman (2002 constructed bootstrap confidence intervals in model based estimation for finite population totals assuming that auxiliary values are available throughout a target population and that the auxiliary values are independent. They also assumed that the cluster sizes are known throughout the target population. We now extend to two stage sampling in which the cluster sizes are known only for the sampled clusters, and we therefore predict the unobserved part of the population total. Jan and Elinor (2008 have done similar work, but unlike them, we use a general model, in which the auxiliary values are not necessarily independent. We demonstrate that the asymptotic properties of our proposed estimator and its coverage rates are better than those constructed under the model assisted local polynomial regression model.

  13. VANET Clustering Based Routing Protocol Suitable for Deserts

    Science.gov (United States)

    Mohammed Nasr, Mohammed Mohsen; Abdelgader, Abdeldime Mohamed Salih; Wang, Zhi-Gong; Shen, Lian-Feng

    2016-01-01

    In recent years, there has emerged applications of vehicular ad hoc networks (VANETs) towards security, safety, rescue, exploration, military and communication redundancy systems in non-populated areas, besides its ordinary use in urban environments as an essential part of intelligent transportation systems (ITS). This paper proposes a novel algorithm for the process of organizing a cluster structure and cluster head election (CHE) suitable for VANETs. Moreover, it presents a robust clustering-based routing protocol, which is appropriate for deserts and can achieve high communication efficiency, ensuring reliable information delivery and optimal exploitation of the equipment on each vehicle. A comprehensive simulation is conducted to evaluate the performance of the proposed CHE and routing algorithms. PMID:27058539

  14. Using Clustering Techniques To Detect Usage Patterns in a Web-based Information System.

    Science.gov (United States)

    Chen, Hui-Min; Cooper, Michael D.

    2001-01-01

    This study developed an analytical approach to detecting groups with homogenous usage patterns in a Web-based information system. Principal component analysis was used for data reduction, cluster analysis for categorizing usage into groups. The methodology was demonstrated and tested using two independent samples of user sessions from the…

  15. Unsupervised active learning based on hierarchical graph-theoretic clustering.

    Science.gov (United States)

    Hu, Weiming; Hu, Wei; Xie, Nianhua; Maybank, Steve

    2009-10-01

    Most existing active learning approaches are supervised. Supervised active learning has the following problems: inefficiency in dealing with the semantic gap between the distribution of samples in the feature space and their labels, lack of ability in selecting new samples that belong to new categories that have not yet appeared in the training samples, and lack of adaptability to changes in the semantic interpretation of sample categories. To tackle these problems, we propose an unsupervised active learning framework based on hierarchical graph-theoretic clustering. In the framework, two promising graph-theoretic clustering algorithms, namely, dominant-set clustering and spectral clustering, are combined in a hierarchical fashion. Our framework has some advantages, such as ease of implementation, flexibility in architecture, and adaptability to changes in the labeling. Evaluations on data sets for network intrusion detection, image classification, and video classification have demonstrated that our active learning framework can effectively reduce the workload of manual classification while maintaining a high accuracy of automatic classification. It is shown that, overall, our framework outperforms the support-vector-machine-based supervised active learning, particularly in terms of dealing much more efficiently with new samples whose categories have not yet appeared in the training samples.

  16. The swift UVOT stars survey. I. Methods and test clusters

    International Nuclear Information System (INIS)

    Siegel, Michael H.; Porterfield, Blair L.; Linevsky, Jacquelyn S.; Bond, Howard E.; Hoversten, Erik A.; Berrier, Joshua L.; Gronwall, Caryl A.; Holland, Stephen T.; Breeveld, Alice A.; Brown, Peter J.

    2014-01-01

    We describe the motivations and background of a large survey of nearby stellar populations using the Ultraviolet Optical Telescope (UVOT) on board the Swift Gamma-Ray Burst Mission. UVOT, with its wide field, near-UV sensitivity, and 2.″3 spatial resolution, is uniquely suited to studying nearby stellar populations and providing insight into the near-UV properties of hot stars and the contribution of those stars to the integrated light of more distant stellar populations. We review the state of UV stellar photometry, outline the survey, and address problems specific to wide- and crowded-field UVOT photometry. We present color–magnitude diagrams of the nearby open clusters M67, NGC 188, and NGC 2539, and the globular cluster M79. We demonstrate that UVOT can easily discern the young- and intermediate-age main sequences, blue stragglers, and hot white dwarfs, producing results consistent with previous studies. We also find that it characterizes the blue horizontal branch of M79 and easily identifies a known post-asymptotic giant branch star.

  17. A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles.

    Science.gov (United States)

    Zhang, Lin; Meng, Jia; Liu, Hui; Huang, Yufei

    2012-01-01

    DNA methylation occurs in the context of a CpG dinucleotide. It is an important epigenetic modification, which can be inherited through cell division. The two major types of methylation include hypomethylation and hypermethylation. Unique methylation patterns have been shown to exist in diseases including various types of cancer. DNA methylation analysis promises to become a powerful tool in cancer diagnosis, treatment and prognostication. Large-scale methylation arrays are now available for studying methylation genome-wide. The Illumina methylation platform simultaneously measures cytosine methylation at more than 1500 CpG sites associated with over 800 cancer-related genes. Cluster analysis is often used to identify DNA methylation subgroups for prognosis and diagnosis. However, due to the unique non-Gaussian characteristics, traditional clustering methods may not be appropriate for DNA and methylation data, and the determination of optimal cluster number is still problematic. A Dirichlet process beta mixture model (DPBMM) is proposed that models the DNA methylation expressions as an infinite number of beta mixture distribution. The model allows automatic learning of the relevant parameters such as the cluster mixing proportion, the parameters of beta distribution for each cluster, and especially the number of potential clusters. Since the model is high dimensional and analytically intractable, we proposed a Gibbs sampling "no-gaps" solution for computing the posterior distributions, hence the estimates of the parameters. The proposed algorithm was tested on simulated data as well as methylation data from 55 Glioblastoma multiform (GBM) brain tissue samples. To reduce the computational burden due to the high data dimensionality, a dimension reduction method is adopted. The two GBM clusters yielded by DPBMM are based on data of different number of loci (P-value < 0.1), while hierarchical clustering cannot yield statistically significant clusters.

  18. Energy Aware Cluster Based Routing Scheme For Wireless Sensor Network

    Directory of Open Access Journals (Sweden)

    Roy Sohini

    2015-09-01

    Full Text Available Wireless Sensor Network (WSN has emerged as an important supplement to the modern wireless communication systems due to its wide range of applications. The recent researches are facing the various challenges of the sensor network more gracefully. However, energy efficiency has still remained a matter of concern for the researches. Meeting the countless security needs, timely data delivery and taking a quick action, efficient route selection and multi-path routing etc. can only be achieved at the cost of energy. Hierarchical routing is more useful in this regard. The proposed algorithm Energy Aware Cluster Based Routing Scheme (EACBRS aims at conserving energy with the help of hierarchical routing by calculating the optimum number of cluster heads for the network, selecting energy-efficient route to the sink and by offering congestion control. Simulation results prove that EACBRS performs better than existing hierarchical routing algorithms like Distributed Energy-Efficient Clustering (DEEC algorithm for heterogeneous wireless sensor networks and Energy Efficient Heterogeneous Clustered scheme for Wireless Sensor Network (EEHC.

  19. Saccharomyces cerevisiae-based system for studying clustered DNA damages

    Energy Technology Data Exchange (ETDEWEB)

    Moscariello, M.M.; Sutherland, B.

    2010-08-01

    DNA-damaging agents can induce clustered lesions or multiply damaged sites (MDSs) on the same or opposing DNA strands. In the latter, attempts to repair MDS can generate closely opposed single-strand break intermediates that may convert non-lethal or mutagenic base damage into double-strand breaks (DSBs). We constructed a diploid S. cerevisiae yeast strain with a chromosomal context targeted by integrative DNA fragments carrying different damages to determine whether closely opposed base damages are converted to DSBs following the outcomes of the homologous recombination repair pathway. As a model of MDS, we studied clustered uracil DNA damages with a known location and a defined distance separating the lesions. The system we describe might well be extended to assessing the repair of MDSs with different compositions, and to most of the complex DNA lesions induced by physical and chemical agents.

  20. Computer-Assisted Sleep Staging Based on Segmentation and Clustering

    Science.gov (United States)

    2001-10-25

    9) and females (3) with differnt sleep related complaints (8 normals and 4 with different pathologies ). The age of the subjects ranged form 17 to 63...COMPUTER-ASSISTED SLEEP STAGING BASED ON SEGMENTATION AND CLUSTERING Rajeev Agarwal12∗ and Jean Gotman13 1Stellate Systems, Montreal Canada...can be used to au- tomatically classify sleep states in an all-night polysomno- gram (PSG) to generate a hypnogram for the assesment of sleep -related

  1. Hierarchical clustering of RGB surface water images based on MIA ...

    African Journals Online (AJOL)

    Thus characterised images were partitioned into clusters of similar images using hierarchical clustering. The best defined clusters were obtained when the Ward's method was applied. Images were partitioned into the 2 main clusters in terms of similar colours of displayed objects. Each main cluster was further partitioned ...

  2. Seminal Quality Prediction Using Clustering-Based Decision Forests

    Directory of Open Access Journals (Sweden)

    Hong Wang

    2014-08-01

    Full Text Available Prediction of seminal quality with statistical learning tools is an emerging methodology in decision support systems in biomedical engineering and is very useful in early diagnosis of seminal patients and selection of semen donors candidates. However, as is common in medical diagnosis, seminal quality prediction faces the class imbalance problem. In this paper, we propose a novel supervised ensemble learning approach, namely Clustering-Based Decision Forests, to tackle unbalanced class learning problem in seminal quality prediction. Experiment results on real fertility diagnosis dataset have shown that Clustering-Based Decision Forests outperforms decision tree, Support Vector Machines, random forests, multilayer perceptron neural networks and logistic regression by a noticeable margin. Clustering-Based Decision Forests can also be used to evaluate variables’ importance and the top five important factors that may affect semen concentration obtained in this study are age, serious trauma, sitting time, the season when the semen sample is produced, and high fevers in the last year. The findings could be helpful in explaining seminal concentration problems in infertile males or pre-screening semen donor candidates.

  3. Flocking-based Document Clustering on the Graphics Processing Unit

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL; Patton, Robert M [ORNL; ST Charles, Jesse Lee [ORNL

    2008-01-01

    Abstract?Analyzing and grouping documents by content is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. Each bird represents a single document and flies toward other documents that are similar to it. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly difficult to receive results in a reasonable amount of time. However, flocking behavior, along with most naturally inspired algorithms such as ant colony optimization and particle swarm optimization, are highly parallel and have found increased performance on expensive cluster computers. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. Some applications see a huge increase in performance on this new platform. The cost of these high-performance devices is also marginal when compared with the price of cluster machines. In this paper, we have conducted research to exploit this architecture and apply its strengths to the document flocking problem. Our results highlight the potential benefit the GPU brings to all naturally inspired algorithms. Using the CUDA platform from NIVIDA? we developed a document flocking implementation to be run on the NIVIDA?GEFORCE 8800. Additionally, we developed a similar but sequential implementation of the same algorithm to be run on a desktop CPU. We tested the performance of each on groups of news articles ranging in size from 200 to 3000 documents. The results of these tests were very significant. Performance gains ranged from three to nearly five times improvement of the GPU over the CPU implementation. This dramatic improvement in runtime makes the GPU a potentially revolutionary platform for document clustering algorithms.

  4. A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays.

    Directory of Open Access Journals (Sweden)

    Leila M Naeni

    Full Text Available In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries, to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays.

  5. Multi-documents summarization based on clustering of learning object using hierarchical clustering

    Science.gov (United States)

    Mustamiin, M.; Budi, I.; Santoso, H. B.

    2018-03-01

    The Open Educational Resources (OER) is a portal of teaching, learning and research resources that is available in public domain and freely accessible. Learning contents or Learning Objects (LO) are granular and can be reused for constructing new learning materials. LO ontology-based searching techniques can be used to search for LO in the Indonesia OER. In this research, LO from search results are used as an ingredient to create new learning materials according to the topic searched by users. Summarizing-based grouping of LO use Hierarchical Agglomerative Clustering (HAC) with the dependency context to the user’s query which has an average value F-Measure of 0.487, while summarizing by K-Means F-Measure only has an average value of 0.336.

  6. Tests for informative cluster size using a novel balanced bootstrap scheme.

    Science.gov (United States)

    Nevalainen, Jaakko; Oja, Hannu; Datta, Somnath

    2017-07-20

    Clustered data are often encountered in biomedical studies, and to date, a number of approaches have been proposed to analyze such data. However, the phenomenon of informative cluster size (ICS) is a challenging problem, and its presence has an impact on the choice of a correct analysis methodology. For example, Dutta and Datta (2015, Biometrics) presented a number of marginal distributions that could be tested. Depending on the nature and degree of informativeness of the cluster size, these marginal distributions may differ, as do the choices of the appropriate test. In particular, they applied their new test to a periodontal data set where the plausibility of the informativeness was mentioned, but no formal test for the same was conducted. We propose bootstrap tests for testing the presence of ICS. A balanced bootstrap method is developed to successfully estimate the null distribution by merging the re-sampled observations with closely matching counterparts. Relying on the assumption of exchangeability within clusters, the proposed procedure performs well in simulations even with a small number of clusters, at different distributions and against different alternative hypotheses, thus making it an omnibus test. We also explain how to extend the ICS test to a regression setting and thereby enhancing its practical utility. The methodologies are illustrated using the periodontal data set mentioned earlier. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  7. INTERSECTION DETECTION BASED ON QUALITATIVE SPATIAL REASONING ON STOPPING POINT CLUSTERS

    Directory of Open Access Journals (Sweden)

    S. Zourlidou

    2016-06-01

    Full Text Available The purpose of this research is to propose and test a method for detecting intersections by analysing collectively acquired trajectories of moving vehicles. Instead of solely relying on the geometric features of the trajectories, such as heading changes, which may indicate turning points and consequently intersections, we extract semantic features of the trajectories in form of sequences of stops and moves. Under this spatiotemporal prism, the extracted semantic information which indicates where vehicles stop can reveal important locations, such as junctions. The advantage of the proposed approach in comparison with existing turning-points oriented approaches is that it can detect intersections even when not all the crossing road segments are sampled and therefore no turning points are observed in the trajectories. The challenge with this approach is that first of all, not all vehicles stop at the same location – thus, the stop-location is blurred along the direction of the road; this, secondly, leads to the effect that nearby junctions can induce similar stop-locations. As a first step, a density-based clustering is applied on the layer of stop observations and clusters of stop events are found. Representative points of the clusters are determined (one per cluster and in a last step the existence of an intersection is clarified based on spatial relational cluster reasoning, with which less informative geospatial clusters, in terms of whether a junction exists and where its centre lies, are transformed in more informative ones. Relational reasoning criteria, based on the relative orientation of the clusters with their adjacent ones are discussed for making sense of the relation that connects them, and finally for forming groups of stop events that belong to the same junction.

  8. 12 T test module coil (TMC-II) in the cluster test program

    International Nuclear Information System (INIS)

    Ando, T.; Nakajima, H.; Nishi, M.; Okuno, K.; Shimamoto, S.; Tada, E.; Takahashi, Y.; Yasukochi, K.; Yoshida, K.

    1983-01-01

    A 12 T-forced flow cooled test module coil (TMC-II) has been designed as the second step of the cluster test program which is demonstration program of high field toroidal coils in tokamak fusion machine. The TMC-II is a dee shaped coil with a 0.76 m x 0.92 m winding bore, and an average current density in winding is 37.5 A/mm 2 with an operating current of 11.4 kA. A cable-in-conduit type conductor, which consists of 60 strands, is selected for the TMC-II. A strand is composed of multi-filamentary (MF) Nb 3 Sn superconducting material, and copper and aluminum as the stabilizer. A thermal barrier is mounted on the surface of strand to decrease thermal input due to the friction between strands. To obtain the rigidity of the winding, a stainless steel ''Armor'', in which is placed cable-in-conduit conductor by covered by insulator, is welded to ''Armor'' of the next turn. This advanced design has been performed with the strong support of the research and development

  9. Spectral methods and cluster structure in correlation-based networks

    Science.gov (United States)

    Heimo, Tapio; Tibély, Gergely; Saramäki, Jari; Kaski, Kimmo; Kertész, János

    2008-10-01

    We investigate how in complex systems the eigenpairs of the matrices derived from the correlations of multichannel observations reflect the cluster structure of the underlying networks. For this we use daily return data from the NYSE and focus specifically on the spectral properties of weight W=|-δ and diffusion matrices D=W/sj-δ, where C is the correlation matrix and si=∑jW the strength of node j. The eigenvalues (and corresponding eigenvectors) of the weight matrix are ranked in descending order. As in the earlier observations, the first eigenvector stands for a measure of the market correlations. Its components are, to first approximation, equal to the strengths of the nodes and there is a second order, roughly linear, correction. The high ranking eigenvectors, excluding the highest ranking one, are usually assigned to market sectors and industrial branches. Our study shows that both for weight and diffusion matrices the eigenpair analysis is not capable of easily deducing the cluster structure of the network without a priori knowledge. In addition we have studied the clustering of stocks using the asset graph approach with and without spectrum based noise filtering. It turns out that asset graphs are quite insensitive to noise and there is no sharp percolation transition as a function of the ratio of bonds included, thus no natural threshold value for that ratio seems to exist. We suggest that these observations can be of use for other correlation based networks as well.

  10. Cluster chain based energy efficient routing protocol for moblie WSN

    Directory of Open Access Journals (Sweden)

    WU Ziyu

    2016-04-01

    Full Text Available With the ubiquitous smart devices acting as mobile sensor nodes in the wireless sensor networks(WSNs to sense and transmit physical information,routing protocols should be designed to accommodate the mobility issues,in addition to conventional considerations on energy efficiency.However,due to frequent topology change,traditional routing schemes cannot perform well.Moreover,existence of mobile nodes poses new challenges on energy dissipation and packet loss.In this paper,a novel routing scheme called cluster chain based routing protocol(CCBRP is proposed,which employs a combination of cluster and chain structure to accomplish data collection and transmission and thereafter selects qualified cluster heads as chain leaders to transmit data to the sink.Furthermore,node mobility is handled based on periodical membership update of mobile nodes.Simulation results demonstrate that CCBRP has a good performance in terms of network lifetime and packet delivery,also strikes a better balance between successful packet reception and energy consumption.

  11. Medical Imaging Lesion Detection Based on Unified Gravitational Fuzzy Clustering

    Directory of Open Access Journals (Sweden)

    Jean Marie Vianney Kinani

    2017-01-01

    Full Text Available We develop a swift, robust, and practical tool for detecting brain lesions with minimal user intervention to assist clinicians and researchers in the diagnosis process, radiosurgery planning, and assessment of the patient’s response to the therapy. We propose a unified gravitational fuzzy clustering-based segmentation algorithm, which integrates the Newtonian concept of gravity into fuzzy clustering. We first perform fuzzy rule-based image enhancement on our database which is comprised of T1/T2 weighted magnetic resonance (MR and fluid-attenuated inversion recovery (FLAIR images to facilitate a smoother segmentation. The scalar output obtained is fed into a gravitational fuzzy clustering algorithm, which separates healthy structures from the unhealthy. Finally, the lesion contour is automatically outlined through the initialization-free level set evolution method. An advantage of this lesion detection algorithm is its precision and its simultaneous use of features computed from the intensity properties of the MR scan in a cascading pattern, which makes the computation fast, robust, and self-contained. Furthermore, we validate our algorithm with large-scale experiments using clinical and synthetic brain lesion datasets. As a result, an 84%–93% overlap performance is obtained, with an emphasis on robustness with respect to different and heterogeneous types of lesion and a swift computation time.

  12. Green Clustering Implementation Based on DPS-MOPSO

    Directory of Open Access Journals (Sweden)

    Yang Lu

    2014-01-01

    Full Text Available A green clustering implementation is proposed to be as the first method in the framework of an energy-efficient strategy for centralized enterprise high-density WLANs. Traditionally, to maintain the network coverage, all of the APs within the WLAN have to be powered on. Nevertheless, the new algorithm can power off a large proportion of APs while the coverage is maintained as the always-on counterpart. The proposed algorithm is composed of two parallel and concurrent procedures, which are the faster procedure based on K-means and the more accurate procedure based on Dynamic Population Size Multiple Objective Particle Swarm Optimization (DPS-MOPSO. To implement green clustering efficiently and accurately, dynamic population size and mutational operators are introduced as complements for the classical MOPSO. In addition to the function of AP selection, the new green clustering algorithm has another new function as the reference and guidance for AP deployment. This paper also presents simulations in scenarios modeled with ray-tracing method and FDTD technique, and the results show that about 67% up to 90% of energy consumption can be saved while the original network coverage is maintained during periods when few users are online or when the traffic load is low.

  13. Research on Bridge Sensor Validation Based on Correlation in Cluster

    Directory of Open Access Journals (Sweden)

    Huang Xiaowei

    2016-01-01

    Full Text Available In order to avoid the false alarm and alarm failure caused by sensor malfunction or failure, it has been critical to diagnose the fault and analyze the failure of the sensor measuring system in major infrastructures. Based on the real time monitoring of bridges and the study on the correlation probability distribution between multisensors adopted in the fault diagnosis system, a clustering algorithm based on k-medoid is proposed, by dividing sensors of the same type into k clusters. Meanwhile, the value of k is optimized by a specially designed evaluation function. Along with the further study of the correlation of sensors within the same cluster, this paper presents the definition and corresponding calculation algorithm of the sensor’s validation. The algorithm is applied to the analysis of the sensor data from an actual health monitoring system. The result reveals that the algorithm can not only accurately measure the failure degree and orientate the malfunction in time domain but also quantitatively evaluate the performance of sensors and eliminate error of diagnosis caused by the failure of the reference sensor.

  14. Image Registration Algorithm Based on Parallax Constraint and Clustering Analysis

    Science.gov (United States)

    Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song

    2018-01-01

    To resolve the problem of slow computation speed and low matching accuracy in image registration, a new image registration algorithm based on parallax constraint and clustering analysis is proposed. Firstly, Harris corner detection algorithm is used to extract the feature points of two images. Secondly, use Normalized Cross Correlation (NCC) function to perform the approximate matching of feature points, and the initial feature pair is obtained. Then, according to the parallax constraint condition, the initial feature pair is preprocessed by K-means clustering algorithm, which is used to remove the feature point pairs with obvious errors in the approximate matching process. Finally, adopt Random Sample Consensus (RANSAC) algorithm to optimize the feature points to obtain the final feature point matching result, and the fast and accurate image registration is realized. The experimental results show that the image registration algorithm proposed in this paper can improve the accuracy of the image matching while ensuring the real-time performance of the algorithm.

  15. Clustering-based analysis for residential district heating data

    DEFF Research Database (Denmark)

    Gianniou, Panagiota; Liu, Xiufeng; Heller, Alfred

    2018-01-01

    residential heating consumption data and evaluate information included in national building databases. The proposed method uses the K-means algorithm to segment consumption groups based on consumption intensity and representative patterns and ranks the groups according to daily consumption. This paper also......The wide use of smart meters enables collection of a large amount of fine-granular time series, which can be used to improve the understanding of consumption behavior and used for consumption optimization. This paper presents a clustering-based knowledge discovery in databases method to analyze...

  16. Practice-related changes in neural activation patterns investigated via wavelet-based clustering analysis

    Science.gov (United States)

    Lee, Jinae; Park, Cheolwoo; Dyckman, Kara A.; Lazar, Nicole A.; Austin, Benjamin P.; Li, Qingyang; McDowell, Jennifer E.

    2012-01-01

    Objectives To evaluate brain activation using functional magnetic resonance imaging (fMRI) and specifically, activation changes across time associated with practice-related cognitive control during eye movement tasks. Experimental design Participants were engaged in antisaccade performance (generating a glance away from a cue) while fMR images were acquired during two separate time points: 1) at pre-test before any exposure to the task, and 2) at post-test, after one week of daily practice on antisaccades, prosaccades (glancing towards a target) or fixation (maintaining gaze on a target). Principal observations The three practice groups were compared across the two time points, and analyses were conducted via the application of a model-free clustering technique based on wavelet analysis. This series of procedures was developed to avoid analysis problems inherent in fMRI data and was composed of several steps: detrending, data aggregation, wavelet transform and thresholding, no trend test, principal component analysis and K-means clustering. The main clustering algorithm was built in the wavelet domain to account for temporal correlation. We applied a no trend test based on wavelets to significantly reduce the high dimension of the data. We clustered the thresholded wavelet coefficients of the remaining voxels using the principal component analysis K-means clustering. Conclusion Over the series of analyses, we found that the antisaccade practice group was the only group to show decreased activation from pre- to post-test in saccadic circuitry, particularly evident in supplementary eye field, frontal eye fields, superior parietal lobe, and cuneus. PMID:22505290

  17. A NEW TEST OF THE STATISTICAL NATURE OF THE BRIGHTEST CLUSTER GALAXIES

    International Nuclear Information System (INIS)

    Lin, Yen-Ting; Ostriker, Jeremiah P.; Miller, Christopher J.

    2010-01-01

    A novel statistic is proposed to examine the hypothesis that all cluster galaxies are drawn from the same luminosity distribution (LD). In such a 'statistical model' of galaxy LD, the brightest cluster galaxies (BCGs) are simply the statistical extreme of the galaxy population. Using a large sample of nearby clusters, we show that BCGs in high luminosity clusters (e.g., L tot ∼> 4 x 10 11 h -2 70 L sun ) are unlikely (probability ≤3 x 10 -4 ) to be drawn from the LD defined by all red cluster galaxies more luminous than M r = -20. On the other hand, BCGs in less luminous clusters are consistent with being the statistical extreme. Applying our method to the second brightest galaxies, we show that they are consistent with being the statistical extreme, which implies that the BCGs are also distinct from non-BCG luminous, red, cluster galaxies. We point out some issues with the interpretation of the classical tests proposed by Tremaine and Richstone (TR) that are designed to examine the statistical nature of BCGs, investigate the robustness of both our statistical test and those of TR against difficulties in photometry of galaxies of large angular size, and discuss the implication of our findings on surveys that use the luminous red galaxies to measure the baryon acoustic oscillation features in the galaxy power spectrum.

  18. Performance Based Clustering for Benchmarking of Container Ports: an Application of Dea and Cluster Analysis Technique

    Directory of Open Access Journals (Sweden)

    Jie Wu

    2010-12-01

    Full Text Available The operational performance of container ports has received more and more attentions in both academic and practitioner circles, the performance evaluation and process improvement of container ports have also been the focus of several studies. In this paper, Data Envelopment Analysis (DEA, an effective tool for relative efficiency assessment, is utilized for measuring the performances and benchmarking of the 77 world container ports in 2007. The used approaches in the current study consider four inputs (Capacity of Cargo Handling Machines, Number of Berths, Terminal Area and Storage Capacity and a single output (Container Throughput. The results for the efficiency scores are analyzed, and a unique ordering of the ports based on average cross efficiency is provided, also cluster analysis technique is used to select the more appropriate targets for poorly performing ports to use as benchmarks.

  19. IRT-based test construction

    NARCIS (Netherlands)

    van der Linden, Willem J.; Theunissen, T.J.J.M.; Boekkooi-Timminga, Ellen; Kelderman, Henk

    1987-01-01

    Four discussions of test construction based on item response theory (IRT) are presented. The first discussion, "Test Design as Model Building in Mathematical Programming" (T.J.J.M. Theunissen), presents test design as a decision process under certainty. A natural way of modeling this process leads

  20. Personalized PageRank Clustering: A graph clustering algorithm based on random walks

    Science.gov (United States)

    A. Tabrizi, Shayan; Shakery, Azadeh; Asadpour, Masoud; Abbasi, Maziar; Tavallaie, Mohammad Ali

    2013-11-01

    Graph clustering has been an essential part in many methods and thus its accuracy has a significant effect on many applications. In addition, exponential growth of real-world graphs such as social networks, biological networks and electrical circuits demands clustering algorithms with nearly-linear time and space complexity. In this paper we propose Personalized PageRank Clustering (PPC) that employs the inherent cluster exploratory property of random walks to reveal the clusters of a given graph. We combine random walks and modularity to precisely and efficiently reveal the clusters of a graph. PPC is a top-down algorithm so it can reveal inherent clusters of a graph more accurately than other nearly-linear approaches that are mainly bottom-up. It also gives a hierarchy of clusters that is useful in many applications. PPC has a linear time and space complexity and has been superior to most of the available clustering algorithms on many datasets. Furthermore, its top-down approach makes it a flexible solution for clustering problems with different requirements.

  1. EUV resists based on tin-oxo clusters

    Science.gov (United States)

    Cardineau, Brian; Del Re, Ryan; Al-Mashat, Hashim; Marnell, Miles; Vockenhuber, Michaela; Ekinci, Yasin; Sarma, Chandra; Neisser, Mark; Freedman, Daniel A.; Brainard, Robert L.

    2014-03-01

    We have studied the photolysis of tin clusters of the type [(RSn)12O14(OH)6] X2 using extreme ultraviolet (EUV, 13.5 nm) light, and developed these clusters into novel high-resolution photoresists. A thin film of [(BuSn)12O14(OH)6][p-toluenesulfonate]2 (1) was prepared by spin coating a solution of (1) in 2-butanone onto a silicon wafer. Exposure to EUV light caused the compound (1) to be converted into a substance that was markedly less soluble in aqueous isopropanol. To optimize the EUV lithographic performance of resists using tin-oxo clusters, and to gain insight into the mechanism of their photochemical reactions, we prepared several compounds based on [(RSn)12O14(OH)6] X2. The sensitivity of tin-oxide films to EUV light were studied as a function of variations in the structure of the counter-anions (X, primarily carboxylates) and organic ligands bound to tin (R). Correlations were sought between the EUV sensitivity of these complexes vs. the strength of the carbon-carboxylate bonds in the counteranions and vs. the strength of the carbon-tin bonds. No correlation was observed between the strength of the carboncarboxylate bonds in the counter-anions (X) and the EUV photosensitivity. However, the EUV sensitivity of the tinoxide films appears to be well-correlated with the strength of the carbon-tin bonds. We hypothesize this correlation indicates a mechanism of carbon-tin bond homolysis during exposure. Using these tin clusters, 18-nm lines were printed showcasing the high resolution capabilities of these materials as photoresists for EUV lithography.

  2. Dynamical mass of a star cluster in M 83: a test of fibre-fed multi-object spectroscopy

    Science.gov (United States)

    Moll, S. L.; de Grijs, R.; Anders, P.; Crowther, P. A.; Larsen, S. S.; Smith, L. J.; Portegies Zwart, S. F.

    2008-10-01

    Aims: We obtained VLT/FLAMES+UVES high-resolution, fibre-fed spectroscopy of five young massive clusters (YMCs) in M 83 (NGC 5236). This forms the basis of a pilot study testing the feasibility of using fibre-fed spectroscopy to measure the velocity dispersions of several clusters simultaneously, in order to determine their dynamical masses. In principle, this reduces the telescope time required to obtain a statistically significant sample of dynamical cluster masses. These can be used to assess the long-term survivability of YMCs by comparing their dynamical and photometric masses, which are necessary to ascertain the potential evolution of YMCs into second-generation globular clusters. Methods: We adopted two methods for determining the velocity dispersion of the star clusters: cross-correlating the cluster spectrum with the template spectra and minimising a χ2 value between the cluster spectrum and the broadened template spectra. We also considered both red giant and red supergiant template stars. Cluster 805 in M 83 (following the notation of Larsen) was chosen as a control to test the reliability of the results obtained by this observational method, through a comparison with the results obtained from a standard echelle VLT/UVES spectrum obtained by Larsen & Richtler. Results: We find no dependence of the velocity dispersions measured for a cluster on the choice of red giant versus red supergiant templates, nor on the method adopted. However, we do find that the standard deviation of the results obtained with only one method may underestimate the true uncertainty. We measure a velocity dispersion of σ_los = 10.2 ± 1.1 km s-1 for cluster 805 from our fibre-fed spectroscopy. This is in excellent agreement with the velocity dispersion of σ_los = 10.6 ± 1.4 km s-1 determined from the standard echelle UVES spectrum of cluster 805. Our FLAMES+UVES velocity dispersion measurement gives M_vir = (6.6 ± 1.7) × 105 M_⊙, consistent with previous results. This

  3. Centroid based clustering of high throughput sequencing reads based on n-mer counts.

    Science.gov (United States)

    Solovyov, Alexander; Lipkin, W Ian

    2013-09-08

    Many problems in computational biology require alignment-free sequence comparisons. One of the common tasks involving sequence comparison is sequence clustering. Here we apply methods of alignment-free comparison (in particular, comparison using sequence composition) to the challenge of sequence clustering. We study several centroid based algorithms for clustering sequences based on word counts. Study of their performance shows that using k-means algorithm with or without the data whitening is efficient from the computational point of view. A higher clustering accuracy can be achieved using the soft expectation maximization method, whereby each sequence is attributed to each cluster with a specific probability. We implement an open source tool for alignment-free clustering. It is publicly available from github: https://github.com/luscinius/afcluster. We show the utility of alignment-free sequence clustering for high throughput sequencing analysis despite its limitations. In particular, it allows one to perform assembly with reduced resources and a minimal loss of quality. The major factor affecting performance of alignment-free read clustering is the length of the read.

  4. A Spectrum Sensing Method Based on Signal Feature and Clustering Algorithm in Cognitive Wireless Multimedia Sensor Networks

    Directory of Open Access Journals (Sweden)

    Yongwei Zhang

    2017-01-01

    Full Text Available In order to solve the problem of difficulty in determining the threshold in spectrum sensing technologies based on the random matrix theory, a spectrum sensing method based on clustering algorithm and signal feature is proposed for Cognitive Wireless Multimedia Sensor Networks. Firstly, the wireless communication signal features are obtained according to the sampling signal covariance matrix. Then, the clustering algorithm is used to classify and test the signal features. Different signal features and clustering algorithms are compared in this paper. The experimental results show that the proposed method has better sensing performance.

  5. A first packet processing subdomain cluster model based on SDN

    Science.gov (United States)

    Chen, Mingyong; Wu, Weimin

    2017-08-01

    For the current controller cluster packet processing performance bottlenecks and controller downtime problems. An SDN controller is proposed to allocate the priority of each device in the SDN (Software Defined Network) network, and the domain contains several network devices and Controller, the controller is responsible for managing the network equipment within the domain, the switch performs data delivery based on the load of the controller, processing network equipment data. The experimental results show that the model can effectively solve the risk of single point failure of the controller, and can solve the performance bottleneck of the first packet processing.

  6. Estimating the concrete compressive strength using hard clustering and fuzzy clustering based regression techniques.

    Science.gov (United States)

    Nagwani, Naresh Kumar; Deo, Shirish V

    2014-01-01

    Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm.

  7. The effectiveness of mandatory-random student drug testing: a cluster randomized trial.

    Science.gov (United States)

    James-Burdumy, Susanne; Goesling, Brian; Deke, John; Einspruch, Eric

    2012-02-01

    This article presents findings from the largest experimental evaluation to date of school-based mandatory-random student drug testing (MRSDT). The study tested the effectiveness of MRSDT in reducing substance use among high school students. Cluster randomized trial included 36 high schools and more than 4,700 9th through 12th grade students. After baseline data collection in spring 2007, about half the schools were randomly assigned to a treatment group that was permitted to implement MRSDT immediately, and the remaining half were assigned to a control group that delayed MRSDT until after follow-up data collection was completed 1 year later, in spring 2008. Data from self-administered student questionnaires were used to compare rates of substance use in treatment and control schools at follow-up. Students subject to MRSDT by their districts reported less substances use in past 30 days compared with students in schools without MRSDT. The program had no detectable spillover effects on the substance use of students not subject to testing. We found no evidence of unintentional negative effects on students' future intentions to use substances, the proportion of students who participated in activities subject to drug testing, or on students' attitudes toward school and perceived consequences of substance use. MRSDT shows promise in reducing illicit substance use among high school students. The impacts of this study were measured for a 1-year period and may not represent longer term effects. Copyright © 2012 Society for Adolescent Health and Medicine. All rights reserved.

  8. Model-Based Security Testing

    Directory of Open Access Journals (Sweden)

    Ina Schieferdecker

    2012-02-01

    Full Text Available Security testing aims at validating software system requirements related to security properties like confidentiality, integrity, authentication, authorization, availability, and non-repudiation. Although security testing techniques are available for many years, there has been little approaches that allow for specification of test cases at a higher level of abstraction, for enabling guidance on test identification and specification as well as for automated test generation. Model-based security testing (MBST is a relatively new field and especially dedicated to the systematic and efficient specification and documentation of security test objectives, security test cases and test suites, as well as to their automated or semi-automated generation. In particular, the combination of security modelling and test generation approaches is still a challenge in research and of high interest for industrial applications. MBST includes e.g. security functional testing, model-based fuzzing, risk- and threat-oriented testing, and the usage of security test patterns. This paper provides a survey on MBST techniques and the related models as well as samples of new methods and tools that are under development in the European ITEA2-project DIAMONDS.

  9. Clinical Implications of Cluster Analysis-Based Classification of Acute Decompensated Heart Failure and Correlation with Bedside Hemodynamic Profiles.

    Directory of Open Access Journals (Sweden)

    Tariq Ahmad

    Full Text Available Classification of acute decompensated heart failure (ADHF is based on subjective criteria that crudely capture disease heterogeneity. Improved phenotyping of the syndrome may help improve therapeutic strategies.To derive cluster analysis-based groupings for patients hospitalized with ADHF, and compare their prognostic performance to hemodynamic classifications derived at the bedside.We performed a cluster analysis on baseline clinical variables and PAC measurements of 172 ADHF patients from the ESCAPE trial. Employing regression techniques, we examined associations between clusters and clinically determined hemodynamic profiles (warm/cold/wet/dry. We assessed association with clinical outcomes using Cox proportional hazards models. Likelihood ratio tests were used to compare the prognostic value of cluster data to that of hemodynamic data.We identified four advanced HF clusters: 1 male Caucasians with ischemic cardiomyopathy, multiple comorbidities, lowest B-type natriuretic peptide (BNP levels; 2 females with non-ischemic cardiomyopathy, few comorbidities, most favorable hemodynamics; 3 young African American males with non-ischemic cardiomyopathy, most adverse hemodynamics, advanced disease; and 4 older Caucasians with ischemic cardiomyopathy, concomitant renal insufficiency, highest BNP levels. There was no association between clusters and bedside-derived hemodynamic profiles (p = 0.70. For all adverse clinical outcomes, Cluster 4 had the highest risk, and Cluster 2, the lowest. Compared to Cluster 4, Clusters 1-3 had 45-70% lower risk of all-cause mortality. Clusters were significantly associated with clinical outcomes, whereas hemodynamic profiles were not.By clustering patients with similar objective variables, we identified four clinically relevant phenotypes of ADHF patients, with no discernable relationship to hemodynamic profiles, but distinct associations with adverse outcomes. Our analysis suggests that ADHF classification using

  10. A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles

    Directory of Open Access Journals (Sweden)

    Zhang Lin

    2012-10-01

    Full Text Available Abstract Background DNA methylation occurs in the context of a CpG dinucleotide. It is an important epigenetic modification, which can be inherited through cell division. The two major types of methylation include hypomethylation and hypermethylation. Unique methylation patterns have been shown to exist in diseases including various types of cancer. DNA methylation analysis promises to become a powerful tool in cancer diagnosis, treatment and prognostication. Large-scale methylation arrays are now available for studying methylation genome-wide. The Illumina methylation platform simultaneously measures cytosine methylation at more than 1500 CpG sites associated with over 800 cancer-related genes. Cluster analysis is often used to identify DNA methylation subgroups for prognosis and diagnosis. However, due to the unique non-Gaussian characteristics, traditional clustering methods may not be appropriate for DNA and methylation data, and the determination of optimal cluster number is still problematic. Method A Dirichlet process beta mixture model (DPBMM is proposed that models the DNA methylation expressions as an infinite number of beta mixture distribution. The model allows automatic learning of the relevant parameters such as the cluster mixing proportion, the parameters of beta distribution for each cluster, and especially the number of potential clusters. Since the model is high dimensional and analytically intractable, we proposed a Gibbs sampling "no-gaps" solution for computing the posterior distributions, hence the estimates of the parameters. Result The proposed algorithm was tested on simulated data as well as methylation data from 55 Glioblastoma multiform (GBM brain tissue samples. To reduce the computational burden due to the high data dimensionality, a dimension reduction method is adopted. The two GBM clusters yielded by DPBMM are based on data of different number of loci (P-value

  11. Problem decomposition by mutual information and force-based clustering

    Science.gov (United States)

    Otero, Richard Edward

    The scale of engineering problems has sharply increased over the last twenty years. Larger coupled systems, increasing complexity, and limited resources create a need for methods that automatically decompose problems into manageable sub-problems by discovering and leveraging problem structure. The ability to learn the coupling (inter-dependence) structure and reorganize the original problem could lead to large reductions in the time to analyze complex problems. Such decomposition methods could also provide engineering insight on the fundamental physics driving problem solution. This work forwards the current state of the art in engineering decomposition through the application of techniques originally developed within computer science and information theory. The work describes the current state of automatic problem decomposition in engineering and utilizes several promising ideas to advance the state of the practice. Mutual information is a novel metric for data dependence and works on both continuous and discrete data. Mutual information can measure both the linear and non-linear dependence between variables without the limitations of linear dependence measured through covariance. Mutual information is also able to handle data that does not have derivative information, unlike other metrics that require it. The value of mutual information to engineering design work is demonstrated on a planetary entry problem. This study utilizes a novel tool developed in this work for planetary entry system synthesis. A graphical method, force-based clustering, is used to discover related sub-graph structure as a function of problem structure and links ranked by their mutual information. This method does not require the stochastic use of neural networks and could be used with any link ranking method currently utilized in the field. Application of this method is demonstrated on a large, coupled low-thrust trajectory problem. Mutual information also serves as the basis for an

  12. CLUSS: Clustering of protein sequences based on a new similarity measure

    Directory of Open Access Journals (Sweden)

    Brzezinski Ryszard

    2007-08-01

    Full Text Available Abstract Background The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. In the rest of the paper, we use the term "phylogenetic" in the sense of "relatedness of biological functions". Results To show the effectiveness of CLUSS, we performed an extensive clustering on COG database. To demonstrate its ability to deal with hard-to-align sequences, we tested it on the GH2 family. In addition, we carried out experimental comparisons of CLUSS with a variety of mainstream algorithms. These comparisons were made on hard-to-align and easy-to-align protein sequences. The results of these experiments show the superiority of CLUSS in yielding clusters of proteins

  13. Novel density-based and hierarchical density-based clustering algorithms for uncertain data.

    Science.gov (United States)

    Zhang, Xianchao; Liu, Han; Zhang, Xiaotong

    2017-09-01

    Uncertain data has posed a great challenge to traditional clustering algorithms. Recently, several algorithms have been proposed for clustering uncertain data, and among them density-based techniques seem promising for handling data uncertainty. However, some issues like losing uncertain information, high time complexity and nonadaptive threshold have not been addressed well in the previous density-based algorithm FDBSCAN and hierarchical density-based algorithm FOPTICS. In this paper, we firstly propose a novel density-based algorithm PDBSCAN, which improves the previous FDBSCAN from the following aspects: (1) it employs a more accurate method to compute the probability that the distance between two uncertain objects is less than or equal to a boundary value, instead of the sampling-based method in FDBSCAN; (2) it introduces new definitions of probability neighborhood, support degree, core object probability, direct reachability probability, thus reducing the complexity and solving the issue of nonadaptive threshold (for core object judgement) in FDBSCAN. Then, we modify the algorithm PDBSCAN to an improved version (PDBSCANi), by using a better cluster assignment strategy to ensure that every object will be assigned to the most appropriate cluster, thus solving the issue of nonadaptive threshold (for direct density reachability judgement) in FDBSCAN. Furthermore, as PDBSCAN and PDBSCANi have difficulties for clustering uncertain data with non-uniform cluster density, we propose a novel hierarchical density-based algorithm POPTICS by extending the definitions of PDBSCAN, adding new definitions of fuzzy core distance and fuzzy reachability distance, and employing a new clustering framework. POPTICS can reveal the cluster structures of the datasets with different local densities in different regions better than PDBSCAN and PDBSCANi, and it addresses the issues in FOPTICS. Experimental results demonstrate the superiority of our proposed algorithms over the existing

  14. Towards semantically sensitive text clustering: a feature space modeling technology based on dimension extension.

    Science.gov (United States)

    Liu, Yuanchao; Liu, Ming; Wang, Xin

    2015-01-01

    The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach.

  15. Towards semantically sensitive text clustering: a feature space modeling technology based on dimension extension.

    Directory of Open Access Journals (Sweden)

    Yuanchao Liu

    Full Text Available The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach.

  16. [Predicting Incidence of Hepatitis E in Chinausing Fuzzy Time Series Based on Fuzzy C-Means Clustering Analysis].

    Science.gov (United States)

    Luo, Yi; Zhang, Tao; Li, Xiao-song

    2016-05-01

    To explore the application of fuzzy time series model based on fuzzy c-means clustering in forecasting monthly incidence of Hepatitis E in mainland China. Apredictive model (fuzzy time series method based on fuzzy c-means clustering) was developed using Hepatitis E incidence data in mainland China between January 2004 and July 2014. The incidence datafrom August 2014 to November 2014 were used to test the fitness of the predictive model. The forecasting results were compared with those resulted from traditional fuzzy time series models. The fuzzy time series model based on fuzzy c-means clustering had 0.001 1 mean squared error (MSE) of fitting and 6.977 5 x 10⁻⁴ MSE of forecasting, compared with 0.0017 and 0.0014 from the traditional forecasting model. The results indicate that the fuzzy time series model based on fuzzy c-means clustering has a better performance in forecasting incidence of Hepatitis E.

  17. Crowd Analysis by Using Optical Flow and Density Based Clustering

    DEFF Research Database (Denmark)

    Santoro, Francesco; Pedro, Sergio; Tan, Zheng-Hua

    2010-01-01

    In this paper, we present a system to detect and track crowds in a video sequence captured by a camera. In a first step, we compute optical flows by means of pyramidal Lucas-Kanade feature tracking. Afterwards, a density based clustering is used to group similar vectors. In the last step, it is a......In this paper, we present a system to detect and track crowds in a video sequence captured by a camera. In a first step, we compute optical flows by means of pyramidal Lucas-Kanade feature tracking. Afterwards, a density based clustering is used to group similar vectors. In the last step......, it is applied a crowd tracker in every frame, allowing us to detect and track the crowds. Our system gives the output as a graphic overlay, i.e it adds arrows and colors to the original frame sequence, in order to identify crowds and their movements. For the evaluation, we check when our system detect certains...

  18. Cluster cosmological analysis with X ray instrumental observables: introduction and testing of AsPIX method

    International Nuclear Information System (INIS)

    Valotti, Andrea

    2016-01-01

    Cosmology is one of the fundamental pillars of astrophysics, as such it contains many unsolved puzzles. To investigate some of those puzzles, we analyze X-ray surveys of galaxy clusters. These surveys are possible thanks to the bremsstrahlung emission of the intra-cluster medium. The simultaneous fit of cluster counts as a function of mass and distance provides an independent measure of cosmological parameters such as Ω_m, σ_s, and the dark energy equation of state w0. A novel approach to cosmological analysis using galaxy cluster data, called top-down, was developed in N. Clerc et al. (2012). This top-down approach is based purely on instrumental observables that are considered in a two-dimensional X-ray color-magnitude diagram. The method self-consistently includes selection effects and scaling relationships. It also provides a means of bypassing the computation of individual cluster masses. My work presents an extension of the top-down method by introducing the apparent size of the cluster, creating a three-dimensional X-ray cluster diagram. The size of a cluster is sensitive to both the cluster mass and its angular diameter, so it must also be included in the assessment of selection effects. The performance of this new method is investigated using a Fisher analysis. In parallel, I have studied the effects of the intrinsic scatter in the cluster size scaling relation on the sample selection as well as on the obtained cosmological parameters. To validate the method, I estimate uncertainties of cosmological parameters with MCMC method Amoeba minimization routine and using two simulated XMM surveys that have an increasing level of complexity. The first simulated survey is a set of toy catalogues of 100 and 10000 deg"2, whereas the second is a 1000 deg"2 catalogue that was generated using an Aardvark semi-analytical N-body simulation. This comparison corroborates the conclusions of the Fisher analysis. In conclusion, I find that a cluster diagram that accounts for

  19. Coherence-based Time Series Clustering for Brain Connectivity Visualization

    KAUST Repository

    Euan, Carolina

    2017-11-19

    We develop the hierarchical cluster coherence (HCC) method for brain signals, a procedure for characterizing connectivity in a network by clustering nodes or groups of channels that display high level of coordination as measured by

  20. Based on Similarity Metric Learning for Semi-Supervised Clustering

    Directory of Open Access Journals (Sweden)

    Wei QIU

    2014-08-01

    Full Text Available Semi-supervised clustering employs a small amount of labeled data to aid unsupervised learning. The focus of this paper is on Metric Learning, with particular interest in incorporating side information to make it semi-supervised. This study is primarily motivated by an application: face-image clustering. In the paper introduces metric learning and semi-supervised clustering, Similarity metric learning method that adapt the underlying similarity metric used by the clustering algorithm. This paper provides new methods for the two approaches as well as presents a new semi-supervised clustering algorithm that integrates both of these techniques in a uniform, principled framework. Experimental results demonstrate that the unified approach produces better clusters than both individual approaches as well as previously proposed semi-supervised clustering algorithms. This paper followed by the discussion of experiments on face-image clustering, as well as future work.

  1. A PSO-Based Subtractive Data Clustering Algorithm

    OpenAIRE

    Gamal Abdel-Azeem; Mahmoud Marie; Rehab Abdel-Kader; Mariam El-Tarabily

    2013-01-01

    There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algori...

  2. Genetic algorithm based two-mode clustering of metabolomics data

    NARCIS (Netherlands)

    Hageman, J.A.; van den Berg, R.A.; Westerhuis, J.A.; van der Werf, M.J.; Smilde, A.K.

    2008-01-01

    Metabolomics and other omics tools are generally characterized by large data sets with many variables obtained under different environmental conditions. Clustering methods and more specifically two-mode clustering methods are excellent tools for analyzing this type of data. Two-mode clustering

  3. Clustering the objective interestingness measures based on tendency of variation in statistical implications

    Directory of Open Access Journals (Sweden)

    Nghia Quoc Phan

    2016-05-01

    Full Text Available In recent years, the research cluster of objective interestingness measures has rapidly developed in order to assist users to choose the appropriate measure for their application. Researchers in this field mainly focus on three main directions: clustering based on the properties of the measures, clustering based on the behavior of measures and clustering tendency of variation in statistical implications. In this paper we propose a new approach to cluster the objective interestingness measures based on tendency of variation in statistical implications. In this proposal, we built the statistical implication data of 31 objective interestingness measures based on the examination of the partial derivatives on four parameters. From this data, two distance matrices of interestingness measures are established based on Euclidean and Manhattan distance. The similarity trees are built based on distance matrix that gave results of 31 measures clustering with two different clustering thresholds.

  4. Artificial Bee Colony Algorithm Based on K-Means Clustering for Multiobjective Optimal Power Flow Problem

    Directory of Open Access Journals (Sweden)

    Liling Sun

    2015-01-01

    Full Text Available An improved multiobjective ABC algorithm based on K-means clustering, called CMOABC, is proposed. To fasten the convergence rate of the canonical MOABC, the way of information communication in the employed bees’ phase is modified. For keeping the population diversity, the multiswarm technology based on K-means clustering is employed to decompose the population into many clusters. Due to each subcomponent evolving separately, after every specific iteration, the population will be reclustered to facilitate information exchange among different clusters. Application of the new CMOABC on several multiobjective benchmark functions shows a marked improvement in performance over the fast nondominated sorting genetic algorithm (NSGA-II, the multiobjective particle swarm optimizer (MOPSO, and the multiobjective ABC (MOABC. Finally, the CMOABC is applied to solve the real-world optimal power flow (OPF problem that considers the cost, loss, and emission impacts as the objective functions. The 30-bus IEEE test system is presented to illustrate the application of the proposed algorithm. The simulation results demonstrate that, compared to NSGA-II, MOPSO, and MOABC, the proposed CMOABC is superior for solving OPF problem, in terms of optimization accuracy.

  5. Clustering cliques for graph-based summarization of the biomedical research literature.

    Science.gov (United States)

    Zhang, Han; Fiszman, Marcelo; Shin, Dongwook; Wilkowski, Bartlomiej; Rindflesch, Thomas C

    2013-06-07

    Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts). SemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings. For 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively.

  6. Clustering cliques for graph-based summarization of the biomedical research literature

    Science.gov (United States)

    2013-01-01

    Background Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts). Results SemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings. Conclusions For 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively. PMID:23742159

  7. A Secure Cluster-Based Multipath Routing Protocol for WMSNs

    Directory of Open Access Journals (Sweden)

    Jamal N. Al-Karaki

    2011-04-01

    Full Text Available The new characteristics of Wireless Multimedia Sensor Network (WMSN and its design issues brought by handling different traffic classes of multimedia content (video streams, audio, and still images as well as scalar data over the network, make the proposed routing protocols for typical WSNs not directly applicable for WMSNs. Handling real-time multimedia data requires both energy efficiency and QoS assurance in order to ensure efficient utility of different capabilities of sensor resources and correct delivery of collected information. In this paper, we propose a Secure Cluster-based Multipath Routing protocol for WMSNs, SCMR, to satisfy the requirements of delivering different data types and support high data rate multimedia traffic. SCMR exploits the hierarchical structure of powerful cluster heads and the optimized multiple paths to support timeliness and reliable high data rate multimedia communication with minimum energy dissipation. Also, we present a light-weight distributed security mechanism of key management in order to secure the communication between sensor nodes and protect the network against different types of attacks. Performance evaluation from simulation results demonstrates a significant performance improvement comparing with existing protocols (which do not even provide any kind of security feature in terms of average end-to-end delay, network throughput, packet delivery ratio, and energy consumption.

  8. Bilingual Cluster Based Models for Statistical Machine Translation

    Science.gov (United States)

    Yamamoto, Hirofumi; Sumita, Eiichiro

    We propose a domain specific model for statistical machine translation. It is well-known that domain specific language models perform well in automatic speech recognition. We show that domain specific language and translation models also benefit statistical machine translation. However, there are two problems with using domain specific models. The first is the data sparseness problem. We employ an adaptation technique to overcome this problem. The second issue is domain prediction. In order to perform adaptation, the domain must be provided, however in many cases, the domain is not known or changes dynamically. For these cases, not only the translation target sentence but also the domain must be predicted. This paper focuses on the domain prediction problem for statistical machine translation. In the proposed method, a bilingual training corpus, is automatically clustered into sub-corpora. Each sub-corpus is deemed to be a domain. The domain of a source sentence is predicted by using its similarity to the sub-corpora. The predicted domain (sub-corpus) specific language and translation models are then used for the translation decoding. This approach gave an improvement of 2.7 in BLEU score on the IWSLT05 Japanese to English evaluation corpus (improving the score from 52.4 to 55.1). This is a substantial gain and indicates the validity of the proposed bilingual cluster based models.

  9. A secure cluster-based multipath routing protocol for WMSNs.

    Science.gov (United States)

    Almalkawi, Islam T; Zapata, Manel Guerrero; Al-Karaki, Jamal N

    2011-01-01

    The new characteristics of Wireless Multimedia Sensor Network (WMSN) and its design issues brought by handling different traffic classes of multimedia content (video streams, audio, and still images) as well as scalar data over the network, make the proposed routing protocols for typical WSNs not directly applicable for WMSNs. Handling real-time multimedia data requires both energy efficiency and QoS assurance in order to ensure efficient utility of different capabilities of sensor resources and correct delivery of collected information. In this paper, we propose a Secure Cluster-based Multipath Routing protocol for WMSNs, SCMR, to satisfy the requirements of delivering different data types and support high data rate multimedia traffic. SCMR exploits the hierarchical structure of powerful cluster heads and the optimized multiple paths to support timeliness and reliable high data rate multimedia communication with minimum energy dissipation. Also, we present a light-weight distributed security mechanism of key management in order to secure the communication between sensor nodes and protect the network against different types of attacks. Performance evaluation from simulation results demonstrates a significant performance improvement comparing with existing protocols (which do not even provide any kind of security feature) in terms of average end-to-end delay, network throughput, packet delivery ratio, and energy consumption.

  10. Multi-class clustering of cancer subtypes through SVM based ensemble of pareto-optimal solutions for gene marker identification.

    Science.gov (United States)

    Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra; Maulik, Ujjwal

    2010-11-12

    With the advancement of microarray technology, it is now possible to study the expression profiles of thousands of genes across different experimental conditions or tissue samples simultaneously. Microarray cancer datasets, organized as samples versus genes fashion, are being used for classification of tissue samples into benign and malignant or their subtypes. They are also useful for identifying potential gene markers for each cancer subtype, which helps in successful diagnosis of particular cancer types. In this article, we have presented an unsupervised cancer classification technique based on multiobjective genetic clustering of the tissue samples. In this regard, a real-coded encoding of the cluster centers is used and cluster compactness and separation are simultaneously optimized. The resultant set of near-Pareto-optimal solutions contains a number of non-dominated solutions. A novel approach to combine the clustering information possessed by the non-dominated solutions through Support Vector Machine (SVM) classifier has been proposed. Final clustering is obtained by consensus among the clusterings yielded by different kernel functions. The performance of the proposed multiobjective clustering method has been compared with that of several other microarray clustering algorithms for three publicly available benchmark cancer datasets. Moreover, statistical significance tests have been conducted to establish the statistical superiority of the proposed clustering method. Furthermore, relevant gene markers have been identified using the clustering result produced by the proposed clustering method and demonstrated visually. Biological relationships among the gene markers are also studied based on gene ontology. The results obtained are found to be promising and can possibly have important impact in the area of unsupervised cancer classification as well as gene marker identification for multiple cancer subtypes.

  11. Neighborhood clustering of non-communicable diseases: results from a community-based study in Northern Tanzania

    Directory of Open Access Journals (Sweden)

    John W. Stanifer

    2016-03-01

    Full Text Available Abstract Background In order to begin to address the burden of non-communicable diseases (NCDs in sub-Saharan Africa, high quality community-based epidemiological studies from the region are urgently needed. Cluster-designed sampling methods may be most efficient, but designing such studies requires assumptions about the clustering of the outcomes of interest. Currently, few studies from Sub-Saharan Africa have been published that describe the clustering of NCDs. Therefore, we report the neighborhood clustering of several NCDs from a community-based study in Northern Tanzania. Methods We conducted a cluster-designed cross-sectional household survey between January and June 2014. We used a three-stage cluster probability sampling method to select thirty-seven sampling areas from twenty-nine neighborhood clusters, stratified by urban and rural. Households were then randomly selected from each of the sampling areas, and eligible participants were tested for chronic kidney disease (CKD, glucose impairment including diabetes, hypertension, and obesity as part of the CKD-AFRiKA study. We used linear mixed models to explore clustering across each of the samplings units, and we estimated absolute-agreement intra-cluster correlation (ICC coefficients (ρ for the neighborhood clusters. Results We enrolled 481 participants from 346 urban and rural households. Neighborhood cluster sizes ranged from 6 to 49 participants (median: 13.0; 25th–75th percentiles: 9–21. Clustering varied across neighborhoods and differed by urban or rural setting. Among NCDs, hypertension (ρ = 0.075 exhibited the strongest clustering within neighborhoods followed by CKD (ρ = 0.440, obesity (ρ = 0.040, and glucose impairment (ρ = 0.039. Conclusion The neighborhood clustering was substantial enough to contribute to a design effect for NCD outcomes including hypertension, CKD, obesity, and glucose impairment, and it may also highlight NCD risk factors that vary

  12. The relationship between supplier networks and industrial clusters: an analysis based on the cluster mapping method

    Directory of Open Access Journals (Sweden)

    Ichiro IWASAKI

    2010-06-01

    Full Text Available Michael Porter’s concept of competitive advantages emphasizes the importance of regional cooperation of various actors in order to gain competitiveness on globalized markets. Foreign investors may play an important role in forming such cooperation networks. Their local suppliers tend to concentrate regionally. They can form, together with local institutions of education, research, financial and other services, development agencies, the nucleus of cooperative clusters. This paper deals with the relationship between supplier networks and clusters. Two main issues are discussed in more detail: the interest of multinational companies in entering regional clusters and the spillover effects that may stem from their participation. After the discussion on the theoretical background, the paper introduces a relatively new analytical method: “cluster mapping” - a method that can spot regional hot spots of specific economic activities with cluster building potential. Experience with the method was gathered in the US and in the European Union. After the discussion on the existing empirical evidence, the authors introduce their own cluster mapping results, which they obtained by using a refined version of the original methodology.

  13. PBL - Problem Based Learning for Companies and Clusters

    Energy Technology Data Exchange (ETDEWEB)

    Hamburg, I; Vladut, G.

    2016-07-01

    Small and medium sized companies (SMEs) assure economic growth in Europe. Generally many SMEs are struggling to survive in an ongoing global recession and often they are becoming reluctant to release or pay for staff training. In this paper we present shortly the learning methods in SMEs particularly the Problem Based Learning (PBL) as an efficient form for SMEs and entrepreneurship education. In the field of Urban Logistics it was developed four Clusters with potential of innovation and research in four European Regions: Tuscany - Italy, Valencia - Spain, Lisbon and Tagus - Portugal, Oltenia – Romania. Training and mentoring for SMEs, are essential to create competitiveness. Information and communication technologies (ICT) support the tutors by using an ICT platform which is in the development. (Author)

  14. Operational Numerical Weather Prediction systems based on Linux cluster architectures

    International Nuclear Information System (INIS)

    Pasqui, M.; Baldi, M.; Gozzini, B.; Maracchi, G.; Giuliani, G.; Montagnani, S.

    2005-01-01

    The progress in weather forecast and atmospheric science has been always closely linked to the improvement of computing technology. In order to have more accurate weather forecasts and climate predictions, more powerful computing resources are needed, in addition to more complex and better-performing numerical models. To overcome such a large computing request, powerful workstations or massive parallel systems have been used. In the last few years, parallel architectures, based on the Linux operating system, have been introduced and became popular, representing real high performance-low cost systems. In this work the Linux cluster experience achieved at the Laboratory far Meteorology and Environmental Analysis (LaMMA-CNR-IBIMET) is described and tips and performances analysed

  15. GPU-based parallel clustered differential pulse code modulation

    Science.gov (United States)

    Wu, Jiaji; Li, Wenze; Kong, Wanqiu

    2015-10-01

    Hyperspectral remote sensing technology is widely used in marine remote sensing, geological exploration, atmospheric and environmental remote sensing. Owing to the rapid development of hyperspectral remote sensing technology, resolution of hyperspectral image has got a huge boost. Thus data size of hyperspectral image is becoming larger. In order to reduce their saving and transmission cost, lossless compression for hyperspectral image has become an important research topic. In recent years, large numbers of algorithms have been proposed to reduce the redundancy between different spectra. Among of them, the most classical and expansible algorithm is the Clustered Differential Pulse Code Modulation (CDPCM) algorithm. This algorithm contains three parts: first clusters all spectral lines, then trains linear predictors for each band. Secondly, use these predictors to predict pixels, and get the residual image by subtraction between original image and predicted image. Finally, encode the residual image. However, the process of calculating predictors is timecosting. In order to improve the processing speed, we propose a parallel C-DPCM based on CUDA (Compute Unified Device Architecture) with GPU. Recently, general-purpose computing based on GPUs has been greatly developed. The capacity of GPU improves rapidly by increasing the number of processing units and storage control units. CUDA is a parallel computing platform and programming model created by NVIDIA. It gives developers direct access to the virtual instruction set and memory of the parallel computational elements in GPUs. Our core idea is to achieve the calculation of predictors in parallel. By respectively adopting global memory, shared memory and register memory, we finally get a decent speedup.

  16. Visualizing Confidence in Cluster-Based Ensemble Weather Forecast Analyses.

    Science.gov (United States)

    Kumpf, Alexander; Tost, Bianca; Baumgart, Marlene; Riemer, Michael; Westermann, Rudiger; Rautenhaus, Marc

    2018-01-01

    In meteorology, cluster analysis is frequently used to determine representative trends in ensemble weather predictions in a selected spatio-temporal region, e.g., to reduce a set of ensemble members to simplify and improve their analysis. Identified clusters (i.e., groups of similar members), however, can be very sensitive to small changes of the selected region, so that clustering results can be misleading and bias subsequent analyses. In this article, we - a team of visualization scientists and meteorologists-deliver visual analytics solutions to analyze the sensitivity of clustering results with respect to changes of a selected region. We propose an interactive visual interface that enables simultaneous visualization of a) the variation in composition of identified clusters (i.e., their robustness), b) the variability in cluster membership for individual ensemble members, and c) the uncertainty in the spatial locations of identified trends. We demonstrate that our solution shows meteorologists how representative a clustering result is, and with respect to which changes in the selected region it becomes unstable. Furthermore, our solution helps to identify those ensemble members which stably belong to a given cluster and can thus be considered similar. In a real-world application case we show how our approach is used to analyze the clustering behavior of different regions in a forecast of "Tropical Cyclone Karl", guiding the user towards the cluster robustness information required for subsequent ensemble analysis.

  17. Entropy-Based Incomplete Cholesky Decomposition for a Scalable Spectral Clustering Algorithm: Computational Studies and Sensitivity Analysis

    Directory of Open Access Journals (Sweden)

    Rocco Langone

    2016-05-01

    Full Text Available Spectral clustering methods allow datasets to be partitioned into clusters by mapping the input datapoints into the space spanned by the eigenvectors of the Laplacian matrix. In this article, we make use of the incomplete Cholesky decomposition (ICD to construct an approximation of the graph Laplacian and reduce the size of the related eigenvalue problem from N to m, with m ≪ N . In particular, we introduce a new stopping criterion based on normalized mutual information between consecutive partitions, which terminates the ICD when the change in the cluster assignments is below a given threshold. Compared with existing ICD-based spectral clustering approaches, the proposed method allows the reduction of the number m of selected pivots (i.e., to obtain a sparser model and at the same time, to maintain high clustering quality. The method scales linearly with respect to the number of input datapoints N and has low memory requirements, because only matrices of size N × m and m × m are calculated (in contrast to standard spectral clustering, where the construction of the full N × N similarity matrix is needed. Furthermore, we show that the number of clusters can be reliably selected based on the gap heuristics computed using just a small matrix R of size m × m instead of the entire graph Laplacian. The effectiveness of the proposed algorithm is tested on several datasets.

  18. Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches.

    Directory of Open Access Journals (Sweden)

    Kevin W Boyack

    2011-03-01

    Full Text Available We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents.We used a corpus of 2.15 million recent (2004-2008 records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine, latent semantic analysis (LSA, topic modeling, and two Poisson-based language models--BM25 and PMRA (PubMed Related Articles. The two data sources were a MeSH subject headings, and b words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1 within-cluster textual coherence based on the Jensen-Shannon divergence, and (2 two concentration measures based on grant-to-article linkages indexed in MEDLINE.PubMed's own related article approach (PMRA generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches

  19. Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches.

    Science.gov (United States)

    Boyack, Kevin W; Newman, David; Duhon, Russell J; Klavans, Richard; Patek, Michael; Biberstine, Joseph R; Schijvenaars, Bob; Skupin, André; Ma, Nianli; Börner, Katy

    2011-03-17

    We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents. We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models--BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE. PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only

  20. Risk Based Optimal Fatigue Testing

    DEFF Research Database (Denmark)

    Sørensen, John Dalsgaard; Faber, M.H.; Kroon, I.B.

    1992-01-01

    Optimal fatigue life testing of materials is considered. Based on minimization of the total expected costs of a mechanical component a strategy is suggested to determine the optimal stress range levels for which additional experiments are to be performed together with an optimal value of the maxi......Optimal fatigue life testing of materials is considered. Based on minimization of the total expected costs of a mechanical component a strategy is suggested to determine the optimal stress range levels for which additional experiments are to be performed together with an optimal value...

  1. The Successful Test Taker: Exploring Test-Taking Behavior Profiles through Cluster Analysis

    Science.gov (United States)

    Stenlund, Tova; Lyrén, Per-Erik; Eklöf, Hanna

    2018-01-01

    To be successful in a high-stakes testing situation is desirable for any test taker. It has been found that, beside content knowledge, test-taking behavior, such as risk-taking strategies, motivation, and test anxiety, is important for test performance. The purposes of the present study were to identify and group test takers with similar patterns…

  2. Model-based clustering with certainty estimation: implication for clade assignment of influenza viruses.

    Science.gov (United States)

    Zhang, Shunpu; Li, Zhong; Beland, Kevin; Lu, Guoqing

    2016-07-21

    Clustering is a common technique used by molecular biologists to group homologous sequences and study evolution. There remain issues such as how to cluster molecular sequences accurately and in particular how to evaluate the certainty of clustering results. We presented a model-based clustering method to analyze molecular sequences, described a subset bootstrap scheme to evaluate a certainty of the clusters, and showed an intuitive way using 3D visualization to examine clusters. We applied the above approach to analyze influenza viral hemagglutinin (HA) sequences. Nine clusters were estimated for high pathogenic H5N1 avian influenza, which agree with previous findings. The certainty for a given sequence that can be correctly assigned to a cluster was all 1.0 whereas the certainty for a given cluster was also very high (0.92-1.0), with an overall clustering certainty of 0.95. For influenza A H7 viruses, ten HA clusters were estimated and the vast majority of sequences could be assigned to a cluster with a certainty of more than 0.99. The certainties for clusters, however, varied from 0.40 to 0.98; such certainty variation is likely attributed to the heterogeneity of sequence data in different clusters. In both cases, the certainty values estimated using the subset bootstrap method are all higher than those calculated based upon the standard bootstrap method, suggesting our bootstrap scheme is applicable for the estimation of clustering certainty. We formulated a clustering analysis approach with the estimation of certainties and 3D visualization of sequence data. We analysed 2 sets of influenza A HA sequences and the results indicate our approach was applicable for clustering analysis of influenza viral sequences.

  3. CONSTRAINTS ON HELIUM ENHANCEMENT IN THE GLOBULAR CLUSTER M3 (NGC 5272): THE HORIZONTAL BRANCH TEST

    International Nuclear Information System (INIS)

    Catelan, M.; Valcarce, A. A. R.; Cortes, C.; Grundahl, F.; Sweigart, A. V.

    2009-01-01

    It has recently been suggested that the presence of multiple populations showing various amounts of helium enhancement is the rule, rather than the exception, among globular star clusters. An important prediction of this helium enhancement scenario is that the helium-enhanced blue horizontal branch (HB) stars should be brighter than the red HB stars which are not helium enhanced. In this Letter, we test this prediction in the case of the Galactic globular cluster M3 (NGC 5272), for which the helium-enhancement scenario predicts helium enhancements of ∼>0.02 in virtually all blue HB stars. Using high-precision Stroemgren photometry and spectroscopic gravities for blue HB stars, we find that any helium enhancement among most of the cluster's blue HB stars is very likely less than 0.01, thus ruling out the much higher helium enhancements that have been proposed in the literature.

  4. Canonical PSO Based K-Means Clustering Approach for Real Datasets.

    Science.gov (United States)

    Dey, Lopamudra; Chakraborty, Sanjay

    2014-01-01

    "Clustering" the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.

  5. DIDS Using Cooperative Agents Based on Ant Colony Clustering

    Directory of Open Access Journals (Sweden)

    Muhammad Nur Kholish Abdurrazaq

    2015-07-01

    Full Text Available Intrusion detection systems (IDS play an important role in information security. Two major problems in the development of IDSs are the computational aspect and the architectural aspect. The computational or algorithmic problems include lacking ability of novel-attack detection and computation overload caused by large data traffic. The architectural problems are related to the communication between components of detection, including difficulties to overcome distributed and coordinated attacks because of the need of large amounts of distributed information and synchronization between detection components. This paper proposes a multi-agent architecture for a distributed intrusion detection system (DIDS based on ant-colony clustering (ACC, for recognizing new and coordinated attacks, handling large data traffic, synchronization, co-operation between components without the presence of centralized computation, and good detection performance in real-time with immediate alarm notification. Feature selection based on principal component analysis (PCA is used for dimensional reduction of NSL-KDD. Initial features are transformed to new features in smaller dimensions, where probing attacks (Ra-Probe have a characteristic sign in their average value that is different from that of normal activity. Selection is based on the characteristics of these factors, resulting in a two-dimensional subset of the 75% data reduction.

  6. A Gamblers Clustering Based on Their Favorite Gambling Activity.

    Science.gov (United States)

    Challet-Bouju, Gaëlle; Hardouin, Jean-Benoit; Renard, Noëlle; Legauffre, Cindy; Valleur, Marc; Magalon, David; Fatséas, Mélina; Chéreau-Boudet, Isabelle; Gorsane, Mohamed-Ali; Vénisse, Jean-Luc; Grall-Bronnec, Marie

    2015-12-01

    The objective of this study was to identify profiles of gamblers to explain the choice of preferred gambling activity among both problem and non-problem gamblers. 628 non-problem and problem gamblers were assessed with a structured interview including "healthy" (sociodemographic characteristics, gambling habits and personality profile assessed with the Temperament and Character Inventory-125) and "pathological" [diagnosis of pathological gambling, gambling-related cognitions (GRCs) and psychiatric comorbidity] variables. We performed a two-step cluster analysis based solely on "healthy" variables to identify gamblers' profiles which typically reflect the choice of preferred gambling activity. The obtained classes were then described using both "healthy" and "pathological" variables, by comparing each class to the rest of the sample. Clusters were generated. Class 1 (Electronic Gaming Machines gamblers) showed high cooperativeness, a lower level of GRC about strategy and more depressive disorders. Class 2 (games with deferred results gamblers) were high novelty seekers and showed a higher level of GRC about strategy and more addictive disorders. Class 3 (roulette gamblers) were more often high rollers and showed a higher level of GRC about strategy and more manic or hypomanic episodes and more obsessive-compulsive disorders. Class 4 (instant lottery gamblers) showed a lower tendency to suicide attempts. Class 5 (scratch cards gamblers) were high harm avoiders and showed a lower overall level of GRC and more panic attacks and eating disorders. The preference for one particular gambling activity may concern different profiles of gamblers. This study highlights the importance of considering the pair gambler-game rather than one or the other separately, and may provide support for future research on gambling and preventive actions directed toward a particular game.

  7. Testing modified gravity with globular clusters: the case of NGC 2419

    Science.gov (United States)

    Llinares, Claudio

    2018-02-01

    The dynamics of globular clusters has been studied in great detail in the context of general relativity as well as with modifications of gravity that strongly depart from the standard paradigm such as MOND. However, at present there are no studies that aim to test the impact that less extreme modifications of gravity (e.g. models constructed as alternatives to dark energy) have on the behaviour of globular clusters. This Letter presents fits to the velocity dispersion profile of the cluster NGC 2419 under the symmetron modified gravity model. The data shows an increase in the velocity dispersion towards the centre of the cluster which could be difficult to explain within general relativity. By finding the best fitting solution associated with the symmetron model, we show that this tension does not exist in modified gravity. However, the best fitting parameters give a model that is inconsistent with the dynamics of the Solar System. Exploration of different screening mechanisms should give us the chance to understand if it is possible to maintain the appealing properties of the symmetron model when it comes to globular clusters and at the same time recover the Solar System dynamics properly.

  8. Risk Based Optimal Fatigue Testing

    DEFF Research Database (Denmark)

    Sørensen, John Dalsgaard; Faber, M.H.; Kroon, I.B.

    1992-01-01

    Optimal fatigue life testing of materials is considered. Based on minimization of the total expected costs of a mechanical component a strategy is suggested to determine the optimal stress range levels for which additional experiments are to be performed together with an optimal value...

  9. Semi-supervised dimensionality reduction using orthogonal projection divergence-based clustering for hyperspectral imagery

    Science.gov (United States)

    Su, Hongjun; Du, Peijun; Du, Qian

    2012-11-01

    Band clustering and selection are applied to dimensionality reduction of hyperspectral imagery. The proposed method is based on a hierarchical clustering structure, which aims to group bands using an information or similarity measure. Specifically, the distance based on orthogonal projection divergence is used as a criterion for clustering. After clustering, a band selection step is applied to select representative band to be used in the following data analysis. Different from unsupervised clustering using all the pixels or supervised clustering requiring labeled pixels, the proposed semi-supervised band clustering and selection needs class spectral signatures only. The experimental results show that the proposed algorithm can significantly outperform other existing methods with regard to pixel-based classification task.

  10. An improved fuzzy c-means clustering algorithm based on shadowed sets and PSO.

    Science.gov (United States)

    Zhang, Jian; Shen, Ling

    2014-01-01

    To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM) based on particle swarm optimization (PSO) and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect.

  11. An Improved Fuzzy c-Means Clustering Algorithm Based on Shadowed Sets and PSO

    Directory of Open Access Journals (Sweden)

    Jian Zhang

    2014-01-01

    Full Text Available To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM based on particle swarm optimization (PSO and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect.

  12. Interface-based software testing

    Directory of Open Access Journals (Sweden)

    Aziz Ahmad Rais

    2016-10-01

    Full Text Available Software quality is determined by assessing the characteristics that specify how it should work, which are verified through testing. If it were possible to touch, see, or measure software, it would be easier to analyze and prove its quality. Unfortunately, software is an intangible asset, which makes testing complex. This is especially true when software quality is not a question of particular functions that can be tested through a graphical user interface. The primary objective of software architecture is to design quality of software through modeling and visualization. There are many methods and standards that define how to control and manage quality. However, many IT software development projects still fail due to the difficulties involved in measuring, controlling, and managing software quality. Software quality failure factors are numerous. Examples include beginning to test software too late in the development process, or failing properly to understand, or design, the software architecture and the software component structure. The goal of this article is to provide an interface-based software testing technique that better measures software quality, automates software quality testing, encourages early testing, and increases the software’s overall testability

  13. A fuzzy logic based clustering strategy for improving vehicular ad ...

    Indian Academy of Sciences (India)

    Plenty of parameters related to user preferences, network conditions and application requirements such as speed of mobile nodes, distance to cluster head, data rate and signal strength must be evaluated in the cluster head selection process together with the direction parameter for highly dynamic VANET structures.

  14. Clustering by partitioning around medoids using distance-based ...

    African Journals Online (AJOL)

    It is reported in this paper, the results of a study of the partitioning around medoids (PAM) clustering algorithm applied to four datasets, both standardized and not, and of varying sizes and numbers of clusters. The angular distance proximity measure in addition to the two more traditional proximity measures, namely the ...

  15. A fuzzy logic based clustering strategy for improving vehicular ad ...

    Indian Academy of Sciences (India)

    ITS proposes to manage vehicle traffic, support drivers with safety .... the same time. The vehicle that sends firstly a message for inviting the vehicles to join and has more cluster members will be elected as a cluster head. There are ... In this study, an alternative approach using fuzzy logic under dynamic network conditions.

  16. A Coupled User Clustering Algorithm Based on Mixed Data for Web-Based Learning Systems

    Directory of Open Access Journals (Sweden)

    Ke Niu

    2015-01-01

    Full Text Available In traditional Web-based learning systems, due to insufficient learning behaviors analysis and personalized study guides, a few user clustering algorithms are introduced. While analyzing the behaviors with these algorithms, researchers generally focus on continuous data but easily neglect discrete data, each of which is generated from online learning actions. Moreover, there are implicit coupled interactions among the data but are frequently ignored in the introduced algorithms. Therefore, a mass of significant information which can positively affect clustering accuracy is neglected. To solve the above issues, we proposed a coupled user clustering algorithm for Wed-based learning systems by taking into account both discrete and continuous data, as well as intracoupled and intercoupled interactions of the data. The experiment result in this paper demonstrates the outperformance of the proposed algorithm.

  17. Resource Provisioning in SLA-Based Cluster Computing

    Science.gov (United States)

    Xiong, Kaiqi; Suh, Sang

    Cluster computing is excellent for parallel computation. It has become increasingly popular. In cluster computing, a service level agreement (SLA) is a set of quality of services (QoS) and a fee agreed between a customer and an application service provider. It plays an important role in an e-business application. An application service provider uses a set of cluster computing resources to support e-business applications subject to an SLA. In this paper, the QoS includes percentile response time and cluster utilization. We present an approach for resource provisioning in such an environment that minimizes the total cost of cluster computing resources used by an application service provider for an e-business application that often requires parallel computation for high service performance, availability, and reliability while satisfying a QoS and a fee negotiated between a customer and the application service provider. Simulation experiments demonstrate the applicability of the approach.

  18. Modeling Molecular Systems at Extreme Pressure by an Extension of the Polarizable Continuum Model (PCM) Based on the Symmetry-Adapted Cluster-Configuration Interaction (SAC-CI) Method: Confined Electronic Excited States of Furan as a Test Case.

    Science.gov (United States)

    Fukuda, Ryoichi; Ehara, Masahiro; Cammi, Roberto

    2015-05-12

    Novel molecular photochemistry can be developed by combining high pressure and laser irradiation. For studying such high-pressure effects on the confined electronic ground and excited states, we extend the PCM (polarizable continuum model) SAC (symmetry-adapted cluster) and SAC-CI (SAC-configuration interaction) methods to the PCM-XP (extreme pressure) framework. By using the PCM-XP SAC/SAC-CI method, molecular systems in various electronic states can be confined by polarizable media in a smooth and flexible way. The PCM-XP SAC/SAC-CI method is applied to a furan (C4H4O) molecule in cyclohexane at high pressure (1-60 GPa). The relationship between the calculated free-energy and cavity volume can be approximately represented with the Murnaghan equation of state. The excitation energies of furan in cyclohexane show blueshifts with increasing pressure, and the extents of the blueshifts significantly depend on the character of the excitations. Particularly large confinement effects are found in the Rydberg states. The energy ordering of the lowest Rydberg and valence states alters under high-pressure. The pressure effects on the electronic structure may be classified into two contributions: a confinement of the molecular orbital and a suppression of the mixing between the valence and Rydberg configurations. The valence or Rydberg character in an excited state is, therefore, enhanced under high pressure.

  19. CLUSTERING-BASED FEATURE LEARNING ON VARIABLE STARS

    International Nuclear Information System (INIS)

    Mackenzie, Cristóbal; Pichara, Karim; Protopapas, Pavlos

    2016-01-01

    The success of automatic classification of variable stars depends strongly on the lightcurve representation. Usually, lightcurves are represented as a vector of many descriptors designed by astronomers called features. These descriptors are expensive in terms of computing, require substantial research effort to develop, and do not guarantee a good classification. Today, lightcurve representation is not entirely automatic; algorithms must be designed and manually tuned up for every survey. The amounts of data that will be generated in the future mean astronomers must develop scalable and automated analysis pipelines. In this work we present a feature learning algorithm designed for variable objects. Our method works by extracting a large number of lightcurve subsequences from a given set, which are then clustered to find common local patterns in the time series. Representatives of these common patterns are then used to transform lightcurves of a labeled set into a new representation that can be used to train a classifier. The proposed algorithm learns the features from both labeled and unlabeled lightcurves, overcoming the bias using only labeled data. We test our method on data sets from the Massive Compact Halo Object survey and the Optical Gravitational Lensing Experiment; the results show that our classification performance is as good as and in some cases better than the performance achieved using traditional statistical features, while the computational cost is significantly lower. With these promising results, we believe that our method constitutes a significant step toward the automation of the lightcurve classification pipeline

  20. A fuzzy relational clustering algorithm based on a dissimilarity measure extracted from data.

    Science.gov (United States)

    Corsini, Paolo; Lazzerini, Beatrice; Marcelloni, Francesco

    2004-02-01

    One of the critical aspects of clustering algorithms is the correct identification of the dissimilarity measure used to drive the partitioning of the data set. The dissimilarity measure induces the cluster shape and therefore determines the success of clustering algorithms. As cluster shapes change from a data set to another, dissimilarity measures should be extracted from data. To this aim, we exploit some pairs of points with known dissimilarity value to teach a dissimilarity relation to a feed-forward neural network. Then, we use the neural dissimilarity measure to guide an unsupervised relational clustering algorithm. Experiments on synthetic data sets and on the Iris data set show that the relational clustering algorithm based on the neural dissimilarity outperforms some popular clustering algorithms (with possible partial supervision) based on spatial dissimilarity.

  1. Rotating Machinery Fault Diagnosis for Imbalanced Data Based on Fast Clustering Algorithm and Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Xiaochen Zhang

    2017-01-01

    Full Text Available To diagnose rotating machinery fault for imbalanced data, a method based on fast clustering algorithm (FCA and support vector machine (SVM was proposed. Combined with variational mode decomposition (VMD and principal component analysis (PCA, sensitive features of the rotating machinery fault were obtained and constituted the imbalanced fault sample set. Next, a fast clustering algorithm was adopted to reduce the number of the majority data from the imbalanced fault sample set. Consequently, the balanced fault sample set consisted of the clustered data and the minority data from the imbalanced fault sample set. After that, SVM was trained with the balanced fault sample set and tested with the imbalanced fault sample set so the fault diagnosis model of the rotating machinery could be obtained. Finally, the gearbox fault data set and the rolling bearing fault data set were adopted to test the fault diagnosis model. The experimental results showed that the fault diagnosis model could effectively diagnose the rotating machinery fault for imbalanced data.

  2. A Cluster-Based Dual-Adaptive Topology Control Approach in Wireless Sensor Networks

    Science.gov (United States)

    Gui, Jinsong; Zhou, Kai; Xiong, Naixue

    2016-01-01

    Multi-Input Multi-Output (MIMO) can improve wireless network performance. Sensors are usually single-antenna devices due to the high hardware complexity and cost, so several sensors are used to form virtual MIMO array, which is a desirable approach to efficiently take advantage of MIMO gains. Also, in large Wireless Sensor Networks (WSNs), clustering can improve the network scalability, which is an effective topology control approach. The existing virtual MIMO-based clustering schemes do not either fully explore the benefits of MIMO or adaptively determine the clustering ranges. Also, clustering mechanism needs to be further improved to enhance the cluster structure life. In this paper, we propose an improved clustering scheme for virtual MIMO-based topology construction (ICV-MIMO), which can determine adaptively not only the inter-cluster transmission modes but also the clustering ranges. Through the rational division of cluster head function and the optimization of cluster head selection criteria and information exchange process, the ICV-MIMO scheme effectively reduces the network energy consumption and improves the lifetime of the cluster structure when compared with the existing typical virtual MIMO-based scheme. Moreover, the message overhead and time complexity are still in the same order of magnitude. PMID:27681731

  3. Cluster-based Data Gathering in Long-Strip Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    FANG, W.

    2012-02-01

    Full Text Available This paper investigates a special class of wireless sensor networks that are different from traditional ones in that the sensor nodes in this class of networks are deployed along narrowly elongated geographical areas and form a long-strip topology. According to hardware capabilities of current sensor nodes, a cluster-based protocol for reliable and efficient data gathering in long-strip wireless sensor networks (LSWSN is proposed. A well-distributed cluster-based architecture is first formed in the whole network through contention-based cluster head election. Cluster heads are responsible for coordination among the nodes within their clusters and aggregation of their sensory data, as well as transmission the data to the sink node on behalf of their own clusters. The intra-cluster coordination is based on the traditional TDMA schedule, in which the inter-cluster interference caused by the border nodes is solved by the multi-channel communication technique. The cluster reporting is based on the CSMA contention, in which a connected overlay network is formed by relay nodes to forward the data from the cluster heads through multi-hops to the sink node. The relay nodes are non-uniformly deployed to resolve the energy-hole problem which is extremely serious in the LSWSN. Extensive simulation results illuminate the distinguished performance of the proposed protocol.

  4. Cell-Based Genotoxicity Testing

    Science.gov (United States)

    Reifferscheid, Georg; Buchinger, Sebastian

    Genotoxicity test systems that are based on bacteria display an important role in the detection and assessment of DNA damaging chemicals. They belong to the basic line of test systems due to their easy realization, rapidness, broad applicability, high sensitivity and good reproducibility. Since the development of the Salmonella microsomal mutagenicity assay by Ames and coworkers in the early 1970s, significant development in bacterial genotoxicity assays was achieved and is still a subject matter of research. The basic principle of the mutagenicity assay is a reversion of a growth inhibited bacterial strain, e.g., due to auxotrophy, back to a fast growing phenotype (regain of prototrophy). Deeper knowledge of the ­mutation events allows a mechanistic understanding of the induced DNA-damage by the utilization of base specific tester strains. Collections of such specific tester strains were extended by genetic engineering. Beside the reversion assays, test systems utilizing the bacterial SOS-response were invented. These methods are based on the fusion of various SOS-responsive promoters with a broad variety of reporter genes facilitating numerous methods of signal detection. A very important aspect of genotoxicity testing is the bioactivation of ­xenobiotics to DNA-damaging compounds. Most widely used is the extracellular metabolic activation by making use of rodent liver homogenates. Again, genetic engineering allows the construction of highly sophisticated bacterial tester strains with significantly enhanced sensitivity due to overexpression of enzymes that are involved in the metabolism of xenobiotics. This provides mechanistic insights into the toxification and detoxification pathways of xenobiotics and helps explaining the chemical nature of hazardous substances in unknown mixtures. In summary, beginning with "natural" tester strains the rational design of bacteria led to highly specific and sensitive tools for a rapid, reliable and cost effective

  5. Multiscale deep drawing analysis of dual-phase steels using grain cluster-based RGC scheme

    International Nuclear Information System (INIS)

    Tjahjanto, D D; Eisenlohr, P; Roters, F

    2015-01-01

    Multiscale modelling and simulation play an important role in sheet metal forming analysis, since the overall material responses at macroscopic engineering scales, e.g. formability and anisotropy, are strongly influenced by microstructural properties, such as grain size and crystal orientations (texture). In the present report, multiscale analysis on deep drawing of dual-phase steels is performed using an efficient grain cluster-based homogenization scheme.The homogenization scheme, called relaxed grain cluster (RGC), is based on a generalization of the grain cluster concept, where a (representative) volume element consists of p  ×  q  ×  r (hexahedral) grains. In this scheme, variation of the strain or deformation of individual grains is taken into account through the, so-called, interface relaxation, which is formulated within an energy minimization framework. An interfacial penalty term is introduced into the energy minimization framework in order to account for the effects of grain boundaries.The grain cluster-based homogenization scheme has been implemented and incorporated into the advanced material simulation platform DAMASK, which purposes to bridge the macroscale boundary value problems associated with deep drawing analysis to the micromechanical constitutive law, e.g. crystal plasticity model. Standard Lankford anisotropy tests are performed to validate the model parameters prior to the deep drawing analysis. Model predictions for the deep drawing simulations are analyzed and compared to the corresponding experimental data. The result shows that the predictions of the model are in a very good agreement with the experimental measurement. (paper)

  6. Retrieval with Clustering in a Case-Based Reasoning System for Radiotherapy Treatment Planning

    International Nuclear Information System (INIS)

    Khussainova, Gulmira; Petrovic, Sanja; Jagannathan, Rupa

    2015-01-01

    Radiotherapy treatment planning aims to deliver a sufficient radiation dose to cancerous tumour cells while sparing healthy organs in the tumour surrounding area. This is a trial and error process highly dependent on the medical staff's experience and knowledge. Case-Based Reasoning (CBR) is an artificial intelligence tool that uses past experiences to solve new problems. A CBR system has been developed to facilitate radiotherapy treatment planning for brain cancer. Given a new patient case the existing CBR system retrieves a similar case from an archive of successfully treated patient cases with the suggested treatment plan. The next step requires adaptation of the retrieved treatment plan to meet the specific demands of the new case. The CBR system was tested by medical physicists for the new patient cases. It was discovered that some of the retrieved cases were not suitable and could not be adapted for the new cases. This motivated us to revise the retrieval mechanism of the existing CBR system by adding a clustering stage that clusters cases based on their tumour positions. A number of well-known clustering methods were investigated and employed in the retrieval mechanism. Results using real world brain cancer patient cases have shown that the success rate of the new CBR retrieval is higher than that of the original system. (paper)

  7. A cluster-based architecture to structure the topology of parallel wireless sensor networks.

    Science.gov (United States)

    Lloret, Jaime; Garcia, Miguel; Bri, Diana; Diaz, Juan R

    2009-01-01

    A wireless sensor network is a self-configuring network of mobile nodes connected by wireless links where the nodes have limited capacity and energy. In many cases, the application environment requires the design of an exclusive network topology for a particular case. Cluster-based network developments and proposals in existence have been designed to build a network for just one type of node, where all nodes can communicate with any other nodes in their coverage area. Let us suppose a set of clusters of sensor nodes where each cluster is formed by different types of nodes (e.g., they could be classified by the sensed parameter using different transmitting interfaces, by the node profile or by the type of device: laptops, PDAs, sensor etc.) and exclusive networks, as virtual networks, are needed with the same type of sensed data, or the same type of devices, or even the same type of profiles. In this paper, we propose an algorithm that is able to structure the topology of different wireless sensor networks to coexist in the same environment. It allows control and management of the topology of each network. The architecture operation and the protocol messages will be described. Measurements from a real test-bench will show that the designed protocol has low bandwidth consumption and also demonstrates the viability and the scalability of the proposed architecture. Our ccluster-based algorithm is compared with other algorithms reported in the literature in terms of architecture and protocol measurements.

  8. Contact-based ligand-clustering approach for the identification of active compounds in virtual screening

    Directory of Open Access Journals (Sweden)

    Mantsyzov AB

    2012-09-01

    Full Text Available Alexey B Mantsyzov,1 Guillaume Bouvier,2 Nathalie Evrard-Todeschi,1 Gildas Bertho11Université Paris Descartes, Sorbonne, Paris, France; 2Institut Pasteur, Paris, FranceAbstract: Evaluation of docking results is one of the most important problems for virtual screening and in silico drug design. Modern approaches for the identification of active compounds in a large data set of docked molecules use energy scoring functions. One of the general and most significant limitations of these methods relates to inaccurate binding energy estimation, which results in false scoring of docked compounds. Automatic analysis of poses using self-organizing maps (AuPosSOM represents an alternative approach for the evaluation of docking results based on the clustering of compounds by the similarity of their contacts with the receptor. A scoring function was developed for the identification of the active compounds in the AuPosSOM clustered dataset. In addition, the AuPosSOM efficiency for the clustering of compounds and the identification of key contacts considered as important for its activity, were also improved. Benchmark tests for several targets revealed that together with the developed scoring function, AuPosSOM represents a good alternative to the energy-based scoring functions for the evaluation of docking results.Keywords: scoring, docking, virtual screening, CAR, AuPosSOM

  9. Retrieval with Clustering in a Case-Based Reasoning System for Radiotherapy Treatment Planning

    Science.gov (United States)

    Khussainova, Gulmira; Petrovic, Sanja; Jagannathan, Rupa

    2015-05-01

    Radiotherapy treatment planning aims to deliver a sufficient radiation dose to cancerous tumour cells while sparing healthy organs in the tumour surrounding area. This is a trial and error process highly dependent on the medical staff's experience and knowledge. Case-Based Reasoning (CBR) is an artificial intelligence tool that uses past experiences to solve new problems. A CBR system has been developed to facilitate radiotherapy treatment planning for brain cancer. Given a new patient case the existing CBR system retrieves a similar case from an archive of successfully treated patient cases with the suggested treatment plan. The next step requires adaptation of the retrieved treatment plan to meet the specific demands of the new case. The CBR system was tested by medical physicists for the new patient cases. It was discovered that some of the retrieved cases were not suitable and could not be adapted for the new cases. This motivated us to revise the retrieval mechanism of the existing CBR system by adding a clustering stage that clusters cases based on their tumour positions. A number of well-known clustering methods were investigated and employed in the retrieval mechanism. Results using real world brain cancer patient cases have shown that the success rate of the new CBR retrieval is higher than that of the original system.

  10. A Human Activity Recognition System Based on Dynamic Clustering of Skeleton Data

    Directory of Open Access Journals (Sweden)

    Alessandro Manzi

    2017-05-01

    Full Text Available Human activity recognition is an important area in computer vision, with its wide range of applications including ambient assisted living. In this paper, an activity recognition system based on skeleton data extracted from a depth camera is presented. The system makes use of machine learning techniques to classify the actions that are described with a set of a few basic postures. The training phase creates several models related to the number of clustered postures by means of a multiclass Support Vector Machine (SVM, trained with Sequential Minimal Optimization (SMO. The classification phase adopts the X-means algorithm to find the optimal number of clusters dynamically. The contribution of the paper is twofold. The first aim is to perform activity recognition employing features based on a small number of informative postures, extracted independently from each activity instance; secondly, it aims to assess the minimum number of frames needed for an adequate classification. The system is evaluated on two publicly available datasets, the Cornell Activity Dataset (CAD-60 and the Telecommunication Systems Team (TST Fall detection dataset. The number of clusters needed to model each instance ranges from two to four elements. The proposed approach reaches excellent performances using only about 4 s of input data (~100 frames and outperforms the state of the art when it uses approximately 500 frames on the CAD-60 dataset. The results are promising for the test in real context.

  11. A comparison of heuristic and model-based clustering methods for dietary pattern analysis.

    Science.gov (United States)

    Greve, Benjamin; Pigeot, Iris; Huybrechts, Inge; Pala, Valeria; Börnhorst, Claudia

    2016-02-01

    Cluster analysis is widely applied to identify dietary patterns. A new method based on Gaussian mixture models (GMM) seems to be more flexible compared with the commonly applied k-means and Ward's method. In the present paper, these clustering approaches are compared to find the most appropriate one for clustering dietary data. The clustering methods were applied to simulated data sets with different cluster structures to compare their performance knowing the true cluster membership of observations. Furthermore, the three methods were applied to FFQ data assessed in 1791 children participating in the IDEFICS (Identification and Prevention of Dietary- and Lifestyle-Induced Health Effects in Children and Infants) Study to explore their performance in practice. The GMM outperformed the other methods in the simulation study in 72 % up to 100 % of cases, depending on the simulated cluster structure. Comparing the computationally less complex k-means and Ward's methods, the performance of k-means was better in 64-100 % of cases. Applied to real data, all methods identified three similar dietary patterns which may be roughly characterized as a 'non-processed' cluster with a high consumption of fruits, vegetables and wholemeal bread, a 'balanced' cluster with only slight preferences of single foods and a 'junk food' cluster. The simulation study suggests that clustering via GMM should be preferred due to its higher flexibility regarding cluster volume, shape and orientation. The k-means seems to be a good alternative, being easier to use while giving similar results when applied to real data.

  12. Clustering-based urbanisation to improve enterprise information systems agility

    Science.gov (United States)

    Imache, Rabah; Izza, Said; Ahmed-Nacer, Mohamed

    2015-11-01

    Enterprises are daily facing pressures to demonstrate their ability to adapt quickly to the unpredictable changes of their dynamic in terms of technology, social, legislative, competitiveness and globalisation. Thus, to ensure its place in this hard context, enterprise must always be agile and must ensure its sustainability by a continuous improvement of its information system (IS). Therefore, the agility of enterprise information systems (EISs) can be considered today as a primary objective of any enterprise. One way of achieving this objective is by the urbanisation of the EIS in the context of continuous improvement to make it a real asset servicing enterprise strategy. This paper investigates the benefits of EISs urbanisation based on clustering techniques as a driver for agility production and/or improvement to help managers and IT management departments to improve continuously the performance of the enterprise and make appropriate decisions in the scope of the enterprise objectives and strategy. This approach is applied to the urbanisation of a tour operator EIS.

  13. Frailty phenotypes in the elderly based on cluster analysis

    DEFF Research Database (Denmark)

    Dato, Serena; Montesanto, Alberto; Lagani, Vincenzo

    2012-01-01

    Frailty is a physiological state characterized by the deregulation of multiple physiologic systems of an aging organism determining the loss of homeostatic capacity, which exposes the elderly to disability, diseases, and finally death. An operative definition of frailty, useful for the classifica......Frailty is a physiological state characterized by the deregulation of multiple physiologic systems of an aging organism determining the loss of homeostatic capacity, which exposes the elderly to disability, diseases, and finally death. An operative definition of frailty, useful...... genetic background on the frailty status is still questioned. We investigated the applicability of a cluster analysis approach based on specific geriatric parameters, previously set up and validated in a southern Italian population, to two large longitudinal Danish samples. In both cohorts, we identified...... groups of subjects homogeneous for their frailty status and characterized by different survival patterns. A subsequent survival analysis availing of Accelerated Failure Time models allowed us to formulate an operative index able to correlate classification variables with survival probability. From...

  14. Microcalcification detection in full-field digital mammograms with PFCM clustering and weighted SVM-based method

    Science.gov (United States)

    Liu, Xiaoming; Mei, Ming; Liu, Jun; Hu, Wei

    2015-12-01

    Clustered microcalcifications (MCs) in mammograms are an important early sign of breast cancer in women. Their accurate detection is important in computer-aided detection (CADe). In this paper, we integrated the possibilistic fuzzy c-means (PFCM) clustering algorithm and weighted support vector machine (WSVM) for the detection of MC clusters in full-field digital mammograms (FFDM). For each image, suspicious MC regions are extracted with region growing and active contour segmentation. Then geometry and texture features are extracted for each suspicious MC, a mutual information-based supervised criterion is used to select important features, and PFCM is applied to cluster the samples into two clusters. Weights of the samples are calculated based on possibilities and typicality values from the PFCM, and the ground truth labels. A weighted nonlinear SVM is trained. During the test process, when an unknown image is presented, suspicious regions are located with the segmentation step, selected features are extracted, and the suspicious MC regions are classified as containing MC or not by the trained weighted nonlinear SVM. Finally, the MC regions are analyzed with spatial information to locate MC clusters. The proposed method is evaluated using a database of 410 clinical mammograms and compared with a standard unweighted support vector machine (SVM) classifier. The detection performance is evaluated using response receiver operating (ROC) curves and free-response receiver operating characteristic (FROC) curves. The proposed method obtained an area under the ROC curve of 0.8676, while the standard SVM obtained an area of 0.8268 for MC detection. For MC cluster detection, the proposed method obtained a high sensitivity of 92 % with a false-positive rate of 2.3 clusters/image, and it is also better than standard SVM with 4.7 false-positive clusters/image at the same sensitivity.

  15. Variable selection in multivariate calibration based on clustering of variable concept.

    Science.gov (United States)

    Farrokhnia, Maryam; Karimi, Sadegh

    2016-01-01

    Recently we have proposed a new variable selection algorithm, based on clustering of variable concept (CLoVA) in classification problem. With the same idea, this new concept has been applied to a regression problem and then the obtained results have been compared with conventional variable selection strategies for PLS. The basic idea behind the clustering of variable is that, the instrument channels are clustered into different clusters via clustering algorithms. Then, the spectral data of each cluster are subjected to PLS regression. Different real data sets (Cargill corn, Biscuit dough, ACE QSAR, Soy, and Tablet) have been used to evaluate the influence of the clustering of variables on the prediction performances of PLS. Almost in the all cases, the statistical parameter especially in prediction error shows the superiority of CLoVA-PLS respect to other variable selection strategies. Finally the synergy clustering of variable (sCLoVA-PLS), which is used the combination of cluster, has been proposed as an efficient and modification of CLoVA algorithm. The obtained statistical parameter indicates that variable clustering can split useful part from redundant ones, and then based on informative cluster; stable model can be reached. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Patterns of object relations and reality testing deficits in schizophrenia: clusters and their symptom and personality correlates.

    Science.gov (United States)

    Bell, M D; Conway Greig, T; Bryson, G; Kaplan, E

    2001-12-01

    Bell Object Relations Reality Testing Inventory (BORRTI) profile scores were used to cluster 222 outpatients with schizophrenia or schizoaffective disorder. An eight-cluster solution was subjected to replication analysis, and six clusters were found valid and replicable. These clusters were sorted into three pairs that were interpreted as follows: Residually Impaired consisted of Sealed-Over Recovery and Integrated Recovery; Socially Withdrawn consisted of Socially Withdrawn and Socially Withdrawn-Autistic; and Psychotically Egocentric consisted of Psychotically Egocentric and Psychotically Egocentric-Severe. Clusters were compared on Positive and Negative Syndrome Scale ratings and on subscales from the Eysenck Personality Questionnaire. MANOVAs indicated significant differences among clusters. These differences provided further interpretations of cluster membership. Implications for the use of BORRTI profiles for treatment and rehabilitation planning are discussed. Copyright 2001 John Wiley & Sons, Inc.

  17. An AK-LDMeans algorithm based on image clustering

    Science.gov (United States)

    Chen, Huimin; Li, Xingwei; Zhang, Yongbin; Chen, Nan

    2018-03-01

    Clustering is an effective analytical technique for handling unmarked data for value mining. Its ultimate goal is to mark unclassified data quickly and correctly. We use the roadmap for the current image processing as the experimental background. In this paper, we propose an AK-LDMeans algorithm to automatically lock the K value by designing the Kcost fold line, and then use the long-distance high-density method to select the clustering centers to further replace the traditional initial clustering center selection method, which further improves the efficiency and accuracy of the traditional K-Means Algorithm. And the experimental results are compared with the current clustering algorithm and the results are obtained. The algorithm can provide effective reference value in the fields of image processing, machine vision and data mining.

  18. Automatic script identification from images using cluster-based templates

    Energy Technology Data Exchange (ETDEWEB)

    Hochberg, J.; Kerns, L.; Kelly, P.; Thomas, T.

    1995-02-01

    We have developed a technique for automatically identifying the script used to generate a document that is stored electronically in bit image form. Our approach differs from previous work in that the distinctions among scripts are discovered by an automatic learning procedure, without any handson analysis. We first develop a set of representative symbols (templates) for each script in our database (Cyrillic, Roman, etc.). We do this by identifying all textual symbols in a set of training documents, scaling each symbol to a fixed size, clustering similar symbols, pruning minor clusters, and finding each cluster`s centroid. To identify a new document`s script, we identify and scale a subset of symbols from the document and compare them to the templates for each script. We choose the script whose templates provide the best match. Our current system distinguishes among the Armenian, Burmese, Chinese, Cyrillic, Ethiopic, Greek, Hebrew, Japanese, Korean, Roman, and Thai scripts with over 90% accuracy.

  19. Personalized Profile Based Search Interface With Ranked and Clustered Display

    National Research Council Canada - National Science Library

    Kumar, Sachin; Oztekin, B. U; Ertoz, Levent; Singhal, Saurabh; Han, Euihong; Kumar, Vipin

    2001-01-01

    We have developed an experimental meta-search engine, which takes the snippets from traditional search engines and presents them to the user either in the form of clusters, indices or re-ranked list...

  20. Constraints on helium enhancement in the globular cluster M4 (NGC 6121): The horizontal branch test

    International Nuclear Information System (INIS)

    Valcarce, A. A. R.; De Medeiros, J. R.; Catelan, M.; Alonso-García, J.; Cortés, C.

    2014-01-01

    Recent pieces of evidence have revealed that most, and possibly all, globular star clusters are composed of groups of stars that formed in multiple episodes with different chemical compositions. In this sense, it has also been argued that variations in the initial helium abundance (Y) from one population to the next are also the rule, rather than the exception. In the case of the metal-intermediate globular cluster M4 (NGC 6121), recent high-resolution spectroscopic observations of blue horizontal branch (HB) stars (i.e., HB stars hotter than the RR Lyrae instability strip) suggest that a large fraction of blue HB stars are second-generation stars formed with high helium abundances. In this paper, we test this scenario by using recent photometric and spectroscopic data together with theoretical evolutionary computations for different Y values. Comparing the photometric data with the theoretically derived color-magnitude diagrams, we find that the bulk of the blue HB stars in M4 have ΔY ≲ 0.01 with respect to the cluster's red HB stars (i.e., HB stars cooler than the RR Lyrae strip)—a result which is corroborated by comparison with spectroscopically derived gravities and temperatures, which also favor little He enhancement. However, the possible existence of a minority population on the blue HB of the cluster with a significant He enhancement level is also discussed.

  1. Constraints on helium enhancement in the globular cluster M4 (NGC 6121): The horizontal branch test

    Energy Technology Data Exchange (ETDEWEB)

    Valcarce, A. A. R.; De Medeiros, J. R. [Universidade Federal do Rio Grande do Norte, Departamento de Física, 59072-970 Natal, RN (Brazil); Catelan, M. [Pontificia Universidad Católica de Chile, Centro de Astroingeniería, Av. Vicuña Mackena 4860, 782-0436 Macul, Santiago (Chile); Alonso-García, J. [Pontificia Universidad Católica de Chile, Instituto de Astrofísica, Facultad de Física, Av. Vicuña Mackena 4860, 782-0436 Macul, Santiago (Chile); Cortés, C. [Universidad Metropolitana de Ciencias de la Educación, Facultad de Ciencias Básicas, Departamento de Física, Av. José Pedro Alessandri 774, Santiago (Chile)

    2014-02-20

    Recent pieces of evidence have revealed that most, and possibly all, globular star clusters are composed of groups of stars that formed in multiple episodes with different chemical compositions. In this sense, it has also been argued that variations in the initial helium abundance (Y) from one population to the next are also the rule, rather than the exception. In the case of the metal-intermediate globular cluster M4 (NGC 6121), recent high-resolution spectroscopic observations of blue horizontal branch (HB) stars (i.e., HB stars hotter than the RR Lyrae instability strip) suggest that a large fraction of blue HB stars are second-generation stars formed with high helium abundances. In this paper, we test this scenario by using recent photometric and spectroscopic data together with theoretical evolutionary computations for different Y values. Comparing the photometric data with the theoretically derived color-magnitude diagrams, we find that the bulk of the blue HB stars in M4 have ΔY ≲ 0.01 with respect to the cluster's red HB stars (i.e., HB stars cooler than the RR Lyrae strip)—a result which is corroborated by comparison with spectroscopically derived gravities and temperatures, which also favor little He enhancement. However, the possible existence of a minority population on the blue HB of the cluster with a significant He enhancement level is also discussed.

  2. Improved Density Based Spatial Clustering of Applications of Noise Clustering Algorithm for Knowledge Discovery in Spatial Data

    Directory of Open Access Journals (Sweden)

    Arvind Sharma

    2016-01-01

    Full Text Available There are many techniques available in the field of data mining and its subfield spatial data mining is to understand relationships between data objects. Data objects related with spatial features are called spatial databases. These relationships can be used for prediction and trend detection between spatial and nonspatial objects for social and scientific reasons. A huge data set may be collected from different sources as satellite images, X-rays, medical images, traffic cameras, and GIS system. To handle this large amount of data and set relationship between them in a certain manner with certain results is our primary purpose of this paper. This paper gives a complete process to understand how spatial data is different from other kinds of data sets and how it is refined to apply to get useful results and set trends to predict geographic information system and spatial data mining process. In this paper a new improved algorithm for clustering is designed because role of clustering is very indispensable in spatial data mining process. Clustering methods are useful in various fields of human life such as GIS (Geographic Information System, GPS (Global Positioning System, weather forecasting, air traffic controller, water treatment, area selection, cost estimation, planning of rural and urban areas, remote sensing, and VLSI designing. This paper presents study of various clustering methods and algorithms and an improved algorithm of DBSCAN as IDBSCAN (Improved Density Based Spatial Clustering of Application of Noise. The algorithm is designed by addition of some important attributes which are responsible for generation of better clusters from existing data sets in comparison of other methods.

  3. A user credit assessment model based on clustering ensemble for broadband network new media service supervision

    Science.gov (United States)

    Liu, Fang; Cao, San-xing; Lu, Rui

    2012-04-01

    This paper proposes a user credit assessment model based on clustering ensemble aiming to solve the problem that users illegally spread pirated and pornographic media contents within the user self-service oriented broadband network new media platforms. Its idea is to do the new media user credit assessment by establishing indices system based on user credit behaviors, and the illegal users could be found according to the credit assessment results, thus to curb the bad videos and audios transmitted on the network. The user credit assessment model based on clustering ensemble proposed by this paper which integrates the advantages that swarm intelligence clustering is suitable for user credit behavior analysis and K-means clustering could eliminate the scattered users existed in the result of swarm intelligence clustering, thus to realize all the users' credit classification automatically. The model's effective verification experiments are accomplished which are based on standard credit application dataset in UCI machine learning repository, and the statistical results of a comparative experiment with a single model of swarm intelligence clustering indicates this clustering ensemble model has a stronger creditworthiness distinguishing ability, especially in the aspect of predicting to find user clusters with the best credit and worst credit, which will facilitate the operators to take incentive measures or punitive measures accurately. Besides, compared with the experimental results of Logistic regression based model under the same conditions, this clustering ensemble model is robustness and has better prediction accuracy.

  4. A time-series approach for clustering farms based on slaughterhouse health aberration data

    NARCIS (Netherlands)

    Hulsegge, Ina; Greef, de K.H.; Hulsegge, Ina

    2018-01-01

    A large amount of data is collected routinely in meat inspection in pig slaughterhouses. A time series clustering approach is presented and applied that groups farms based on similar statistical characteristics of meat inspection data over time. A three step characteristic-based clustering approach

  5. Parallel File System I/O Performance Testing On LANL Clusters

    Energy Technology Data Exchange (ETDEWEB)

    Wiens, Isaac Christian [Los Alamos National Lab. (LANL), Los Alamos, NM (United States). High Performance Computing Division. Programming and Runtime Environments; Green, Jennifer Kathleen [Los Alamos National Lab. (LANL), Los Alamos, NM (United States). High Performance Computing Division. Programming and Runtime Environments

    2016-08-18

    These are slides from a presentation on parallel file system I/O performance testing on LANL clusters. I/O is a known bottleneck for HPC applications. Performance optimization of I/O is often required. This summer project entailed integrating IOR under Pavilion and automating the results analysis. The slides cover the following topics: scope of the work, tools utilized, IOR-Pavilion test workflow, build script, IOR parameters, how parameters are passed to IOR, *run_ior: functionality, Python IOR-Output Parser, Splunk data format, Splunk dashboard and features, and future work.

  6. Construction and application of Red5 cluster based on OpenStack

    Science.gov (United States)

    Wang, Jiaqing; Song, Jianxin

    2017-08-01

    With the application and development of cloud computing technology in various fields, the resource utilization rate of the data center has been improved obviously, and the system based on cloud computing platform has also improved the expansibility and stability. In the traditional way, Red5 cluster resource utilization is low and the system stability is poor. This paper uses cloud computing to efficiently calculate the resource allocation ability, and builds a Red5 server cluster based on OpenStack. Multimedia applications can be published to the Red5 cloud server cluster. The system achieves the flexible construction of computing resources, but also greatly improves the stability of the cluster and service efficiency.

  7. Residual energy level based clustering routing protocol for wireless sensor networks

    Science.gov (United States)

    Yuan, Xu; Zhong, Fangming; Chen, Zhikui; Yang, Deli

    2015-12-01

    The wireless sensor networks, which nodes prone to premature death, with unbalanced energy consumption and a short life time, influenced the promotion and application of this technology in internet of things in agriculture. This paper proposes a clustering routing protocol based on the residual energy level (RELCP). RELCP includes three stages: the selection of cluster head, establishment of cluster and data transmission. RELCP considers the remaining energy level and distance to base station, while election of cluster head nodes and data transmitting. Simulation results demonstrate that the protocol can efficiently balance the energy dissipation of all nodes, and prolong the network lifetime.

  8. Study of cluster headache: A hospital-based study

    Directory of Open Access Journals (Sweden)

    Amita Bhargava

    2014-01-01

    Full Text Available Introduction: Cluster headache (CH is uncommon and most painful of all primary headaches, and continues to be managed suboptimally because of wrong diagnosis. It needs to be diagnosed correctly and specifically treated. There are few studies and none from this region on CH. Materials and Methods: To study the detailed clinical profile of CH patients and to compare them among both the genders. Study was conducted at Mahatma Gandhi hospital, Jodhpur (from January 2011to December 2013. Study comprises 30 CH patients diagnosed according to International Headache Society guidelines (ICHD-II. Routine investigations and MRI brain was done in all patients. All measurements were reported as mean ± SD. Categorical variables were compared using the Chi-square test, and continuous variables were compared using Student′s t-test. SPSS for Windows, Version 16.0, was used for statistical analyses with the significance level set at P = 0.05. Results: M: F ratio was 9:1. Age at presentation was from 22-60 years (mean - 38 years. Latency before diagnosis was 3 months-12 years (mean - 3.5 years. All suffered from episodic CH and aura was found in none. Pain was strictly unilateral (right-19, left-11, predominantly over temporal region-18 (60%. Pain intensity was severe in 27 (90% and moderate in 3 (10%. Pain quality was throbbing in 12 (40%. Peak intensity was reached in 5 minutes-30 minutes and attack duration varied from 30 minutes to 3 hours (mean - 2.45 hours. Among autonomic features, conjunctival injection-23 (76.6% and lacrimation-25 (83.3% were most common. Restlessness during episode was found in 80%. CH duration varied from 10 days to 12 weeks. Circadian periodicity for attacks was noted in 24 (80%. Conclusion: Results are consistent with other studies on many accounts, but is different from Western studies with respect to low frequency of family history, chronic CH, restlessness and aura preceeding the attack. Detailed elicitation of history is

  9. DCE: A Distributed Energy-Efficient Clustering Protocol for Wireless Sensor Network Based on Double-Phase Cluster-Head Election.

    Science.gov (United States)

    Han, Ruisong; Yang, Wei; Wang, Yipeng; You, Kaiming

    2017-05-01

    Clustering is an effective technique used to reduce energy consumption and extend the lifetime of wireless sensor network (WSN). The characteristic of energy heterogeneity of WSNs should be considered when designing clustering protocols. We propose and evaluate a novel distributed energy-efficient clustering protocol called DCE for heterogeneous wireless sensor networks, based on a Double-phase Cluster-head Election scheme. In DCE, the procedure of cluster head election is divided into two phases. In the first phase, tentative cluster heads are elected with the probabilities which are decided by the relative levels of initial and residual energy. Then, in the second phase, the tentative cluster heads are replaced by their cluster members to form the final set of cluster heads if any member in their cluster has more residual energy. Employing two phases for cluster-head election ensures that the nodes with more energy have a higher chance to be cluster heads. Energy consumption is well-distributed in the proposed protocol, and the simulation results show that DCE achieves longer stability periods than other typical clustering protocols in heterogeneous scenarios.

  10. Clustering of commercial fish sauce products based on an e-panel technique

    Directory of Open Access Journals (Sweden)

    Mitsutoshi Nakano

    2018-02-01

    Full Text Available Fish sauce is a brownish liquid seasoning with a characteristic flavor that is produced in Asian countries and limited areas of Europe. The types of fish and shellfish and fermentation process used in its production depend on the region from which it derives. Variations in ingredients and fermentation procedures yield end products with different smells, tastes, and colors. For this data article, we employed an electronic panel (e-panel technique including an electronic nose (e-nose, electronic tongue (e-tongue, and electronic eye (e-eye, in which smell, taste, and color are evaluated by sensors instead of the human nose, tongue, and eye to avoid subjective error. The presented data comprise clustering of 46 commercially available fish sauce products based separate e-nose, e-tongue, and e-eye test results. Sensory intensity data from the e-nose, e-tongue, and e-eye were separately classified by cluster analysis and are shown in dendrograms. The hierarchical cluster analysis indicates major three groups on e-nose and e-tongue data, and major four groups on e-eye data.

  11. Clustering and information in correlation based financial networks

    Science.gov (United States)

    Onnela, J.-P.; Kaski, K.; Kertész, J.

    2004-03-01

    Networks of companies can be constructed by using return correlations. A crucial issue in this approach is to select the relevant correlations from the correlation matrix. In order to study this problem, we start from an empty graph with no edges where the vertices correspond to stocks. Then, one by one, we insert edges between the vertices according to the rank of their correlation strength, resulting in a network called asset graph. We study its properties, such as topologically different growth types, number and size of clusters and clustering coefficient. These properties, calculated from empirical data, are compared against those of a random graph. The growth of the graph can be classified according to the topological role of the newly inserted edge. We find that the type of growth which is responsible for creating cycles in the graph sets in much earlier for the empirical asset graph than for the random graph, and thus reflects the high degree of networking present in the market. We also find the number of clusters in the random graph to be one order of magnitude higher than for the asset graph. At a critical threshold, the random graph undergoes a radical change in topology related to percolation transition and forms a single giant cluster, a phenomenon which is not observed for the asset graph. Differences in mean clustering coefficient lead us to conclude that most information is contained roughly within 10% of the edges.

  12. Optimal colour quality of LED clusters based on memory colours.

    Science.gov (United States)

    Smet, Kevin; Ryckaert, Wouter R; Pointer, Michael R; Deconinck, Geert; Hanselaer, Peter

    2011-03-28

    The spectral power distributions of tri- and tetrachromatic clusters of Light-Emitting-Diodes, composed of simulated and commercially available LEDs, were optimized with a genetic algorithm to maximize the luminous efficacy of radiation and the colour quality as assessed by the memory colour quality metric developed by the authors. The trade-off of the colour quality as assessed by the memory colour metric and the luminous efficacy of radiation was investigated by calculating the Pareto optimal front using the NSGA-II genetic algorithm. Optimal peak wavelengths and spectral widths of the LEDs were derived, and over half of them were found to be close to Thornton's prime colours. The Pareto optimal fronts of real LED clusters were always found to be smaller than those of the simulated clusters. The effect of binning on designing a real LED cluster was investigated and was found to be quite large. Finally, a real LED cluster of commercially available AlGaInP, InGaN and phosphor white LEDs was optimized to obtain a higher score on memory colour quality scale than its corresponding CIE reference illuminant.

  13. Research on retailer data clustering algorithm based on Spark

    Science.gov (United States)

    Huang, Qiuman; Zhou, Feng

    2017-03-01

    Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.

  14. Medical Inpatient Journey Modeling and Clustering: A Bayesian Hidden Markov Model Based Approach.

    Science.gov (United States)

    Huang, Zhengxing; Dong, Wei; Wang, Fei; Duan, Huilong

    2015-01-01

    Modeling and clustering medical inpatient journeys is useful to healthcare organizations for a number of reasons including inpatient journey reorganization in a more convenient way for understanding and browsing, etc. In this study, we present a probabilistic model-based approach to model and cluster medical inpatient journeys. Specifically, we exploit a Bayesian Hidden Markov Model based approach to transform medical inpatient journeys into a probabilistic space, which can be seen as a richer representation of inpatient journeys to be clustered. Then, using hierarchical clustering on the matrix of similarities, inpatient journeys can be clustered into different categories w.r.t their clinical and temporal characteristics. We evaluated the proposed approach on a real clinical data set pertaining to the unstable angina treatment process. The experimental results reveal that our method can identify and model latent treatment topics underlying in personalized inpatient journeys, and yield impressive clustering quality.

  15. An improved initialization center k-means clustering algorithm based on distance and density

    Science.gov (United States)

    Duan, Yanling; Liu, Qun; Xia, Shuyin

    2018-04-01

    Aiming at the problem of the random initial clustering center of k means algorithm that the clustering results are influenced by outlier data sample and are unstable in multiple clustering, a method of central point initialization method based on larger distance and higher density is proposed. The reciprocal of the weighted average of distance is used to represent the sample density, and the data sample with the larger distance and the higher density are selected as the initial clustering centers to optimize the clustering results. Then, a clustering evaluation method based on distance and density is designed to verify the feasibility of the algorithm and the practicality, the experimental results on UCI data sets show that the algorithm has a certain stability and practicality.

  16. Efficient Clustering for Irregular Geometries Based on Identification of Concavities

    Directory of Open Access Journals (Sweden)

    Velázquez-Villegas Fernando

    2014-04-01

    Full Text Available Two dimensional clustering problem has much relevance in applications related to the efficient use of raw material, such as cutting stock, packing, etc. This is a very complex problem in which multiple bodies are accommodated efficiently in a way that they occupy as little space as possible. The complexity of the problem increases with the complexity of the bodies. Clearly the number of possible arrangements between bodies is huge. No Fit Polygon (NFP allows to determine the entire relative positions between two patterns (regular or irregular in contact, non-overlapping, therefore the best position can be selected. However, NFP generation requires a lot of calculations; besides, selecting the best cluster isn’t a simple task because, between two irregular patterns in contact, hollows (unusable areas and external concavities (usable areas can be produced. This work presents a quick and simple method to reduce calculations associated with NFP generation and to minimize unusable areas in a cluster. This method consists of generating partial NFP, just on concave regions of the patterns, and selecting the best cluster using a total weighted efficiency, i.e. a weighted value of enclosure efficiency (ratio of occupied area on convex hull area and hollow efficiency (ratio of occupied area on cluster area. The proposed method produces similar results as those obtained by other methods; however the shape of the clusters obtained allows to accommodate more parts in similar spaces, which is a desirable result when it comes to optimizing the use of material. We present two examples to show the performance of the proposal.

  17. A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream

    OpenAIRE

    Amini, Amineh; Saboohi, Hadi; Ying Wah, Teh; Herawan, Tutut

    2014-01-01

    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-bas...

  18. Trend analysis using non-stationary time series clustering based on the finite element method

    OpenAIRE

    Gorji Sefidmazgi, M.; Sayemuzzaman, M.; Homaifar, A.; Jha, M. K.; Liess, S.

    2014-01-01

    In order to analyze low-frequency variability of climate, it is useful to model the climatic time series with multiple linear trends and locate the times of significant changes. In this paper, we have used non-stationary time series clustering to find change points in the trends. Clustering in a multi-dimensional non-stationary time series is challenging, since the problem is mathematically ill-posed. Clustering based on the finite element method (FEM) is one of the methods ...

  19. Kernel method for clustering based on optimal target vector

    International Nuclear Information System (INIS)

    Angelini, Leonardo; Marinazzo, Daniele; Pellicoro, Mario; Stramaglia, Sebastiano

    2006-01-01

    We introduce Ising models, suitable for dichotomic clustering, with couplings that are (i) both ferro- and anti-ferromagnetic (ii) depending on the whole data-set and not only on pairs of samples. Couplings are determined exploiting the notion of optimal target vector, here introduced, a link between kernel supervised and unsupervised learning. The effectiveness of the method is shown in the case of the well-known iris data-set and in benchmarks of gene expression levels, where it works better than existing methods for dichotomic clustering

  20. Dynamical mass of a star cluster in M 83: a test of fibre-fed multi-object spectroscopy

    NARCIS (Netherlands)

    Moll, S.L.; Grijs, R.; Anders, P.; Crowther, P.A.; Larsen, S.S.; Smith, L.J.; Portegies Zwart, S.F.

    2008-01-01

    Aims. We obtained VLT/FLAMES+UVES high-resolution, fibre-fed spectroscopy of five young massive clusters (YMCs) in M 83 (NGC 5236). This forms the basis of a pilot study testing the feasibility of using fibre-fed spectroscopy to measure the velocity dispersions of several clusters simultaneously, in

  1. Dynamical mass of a star cluster in M 83: A test of fibre-fed multi-object spectroscopy

    NARCIS (Netherlands)

    Moll, S.L.; de Grijs, R.; Anders, P.; Crowther, P.A.; Larsen, S.S.; Smith, L.J.; Portegies Zwart, S.F.

    2008-01-01

    Aims. We obtained VLT/FLAMES+UVES high-resolution, fibre-fed spectroscopy of five young massive clusters (YMCs) in M 83 (NGC 5236). This forms the basis of a pilot study testing the feasibility of using fibre-fed spectroscopy to measure the velocity dispersions of several clusters simultaneously, in

  2. Maximizing genetic differentiation in core collections by PCA-based clustering of molecular marker data.

    Science.gov (United States)

    van Heerwaarden, Joost; Odong, T L; van Eeuwijk, F A

    2013-03-01

    Developing genetically diverse core sets is key to the effective management and use of crop genetic resources. Core selection increasingly uses molecular marker-based dissimilarity and clustering methods, under the implicit assumption that markers and genes of interest are genetically correlated. In practice, low marker densities mean that genome-wide correlations are mainly caused by genetic differentiation, rather than by physical linkage. Although of central concern, genetic differentiation per se is not specifically targeted by most commonly employed dissimilarity and clustering methods. Principal component analysis (PCA) on genotypic data is known to effectively describe the inter-locus correlations caused by differentiation, but to date there has been no evaluation of its application to core selection. Here, we explore PCA-based clustering of marker data as a basis for core selection, with the aim of demonstrating its use in capturing genetic differentiation in the data. Using simulated datasets, we show that replacing full-rank genotypic data by the subset of genetically significant PCs leads to better description of differentiation and improves assignment of genotypes to their population of origin. We test the effectiveness of differentiation as a criterion for the formation of core sets by applying a simple new PCA-based core selection method to simulated and actual data and comparing its performance to one of the best existing selection algorithms. We find that although gains in genetic diversity are generally modest, PCA-based core selection is equally effective at maximizing diversity at non-marker loci, while providing better representation of genetically differentiated groups.

  3. Cluster based architecture and network maintenance protocol for medical priority aware cognitive radio based hospital.

    Science.gov (United States)

    Al Mamoon, Ishtiak; Muzahidul Islam, A K M; Baharun, Sabariah; Ahmed, Ashir; Komaki, Shozo

    2016-08-01

    Due to the rapid growth of wireless medical devices in near future, wireless healthcare services may face some inescapable issue such as medical spectrum scarcity, electromagnetic interference (EMI), bandwidth constraint, security and finally medical data communication model. To mitigate these issues, cognitive radio (CR) or opportunistic radio network enabled wireless technology is suitable for the upcoming wireless healthcare system. The up-to-date research on CR based healthcare has exposed some developments on EMI and spectrum problems. However, the investigation recommendation on system design and network model for CR enabled hospital is rare. Thus, this research designs a hierarchy based hybrid network architecture and network maintenance protocols for previously proposed CR hospital system, known as CogMed. In the previous study, the detail architecture of CogMed and its maintenance protocols were not present. The proposed architecture includes clustering concepts for cognitive base stations and non-medical devices. Two cluster head (CH selector equations are formulated based on priority of location, device, mobility rate of devices and number of accessible channels. In order to maintain the integrity of the proposed network model, node joining and node leaving protocols are also proposed. Finally, the simulation results show that the proposed network maintenance time is very low for emergency medical devices (average maintenance period 9.5 ms) and the re-clustering effects for different mobility enabled non-medical devices are also balanced.

  4. INTERNATIONAL BEHAVIOUR AND PERFORMANCE BASED ROMANIAN ENTREPRENEURIAL AND TRADITIONAL FIRM CLUSTERS

    Directory of Open Access Journals (Sweden)

    FEDER Emoke - Szidonia

    2015-07-01

    Full Text Available The micro, small and medium-sized firms (SMEs present a key interest at European level due to their potential positive influence on regional, national and firm level competitiveness. At a certain moment in time, internationalisation became an expected and even unavoidable strategy in firms’ future development, growth and evolution. From theoretical perspective, an integrative complementarily approach is adopted concerning the dominant paradigm of stage models from incremental internationalisation theory and the emergent paradigm of international entrepreneurship theory. Several researcher calls for empirical testing of different theoretical frameworks and international firms. Therefore, the first aim of the quantitative study is to empirically prove, the existence of various internationalisation behaviour configuration based clusters, like sporadic and traditional international firms, born-again global and born global firms, within the framework of Romanian SMEs. Secondly, within the research framework the study propose to assess different distinguishing internationalisation behavioural characteristics and patterns for the delimited clusters, in terms of foreign market scope, internationalisation pace and rhythm, initial and current entry modes, international product portfolio and commitment. Thirdly, internationalisation cluster membership and patterns differential influence and contribution is analysed on firm level international business performance, as internationalisation degree, financial and marketing measures. The framework was tested on a transversal sample consisting of 140 Romanian internationalised SMEs. Findings are especially useful for entrepreneurs and SME managers presenting various decisional possibilities and options on internationalisation behaviours and performance. These emphasize the importance of internationalisation scope, pace, object and opportunity seeking, along with positive influence on performance, indifferent

  5. A fast density-based clustering algorithm for real-time Internet of Things stream.

    Science.gov (United States)

    Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut

    2014-01-01

    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.

  6. A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream

    Directory of Open Access Journals (Sweden)

    Amineh Amini

    2014-01-01

    Full Text Available Data streams are continuously generated over time from Internet of Things (IoT devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.

  7. Cluster-based service discovery for heterogeneous wireless sensor networks

    NARCIS (Netherlands)

    Marin Perianu, Raluca; Scholten, Johan; Havinga, Paul J.M.; Hartel, Pieter H.

    2007-01-01

    We propose an energy-efficient service discovery protocol for heterogeneous wireless sensor networks. Our solution exploits a cluster overlay, where the clusterhead nodes form a distributed service registry. A service lookup results in visiting only the clusterhead nodes. We aim for minimizing the

  8. Accelerated EM-based clustering of large data sets

    NARCIS (Netherlands)

    Verbeek, J.J.; Nunnink, J.R.J.; Vlassis, N.

    2006-01-01

    Motivated by the poor performance (linear complexity) of the EM algorithm in clustering large data sets, and inspired by the successful accelerated versions of related algorithms like k-means, we derive an accelerated variant of the EM algorithm for Gaussian mixtures that: (1) offers speedups that

  9. Fault-tolerant measurement-based quantum computing with continuous-variable cluster states.

    Science.gov (United States)

    Menicucci, Nicolas C

    2014-03-28

    A long-standing open question about Gaussian continuous-variable cluster states is whether they enable fault-tolerant measurement-based quantum computation. The answer is yes. Initial squeezing in the cluster above a threshold value of 20.5 dB ensures that errors from finite squeezing acting on encoded qubits are below the fault-tolerance threshold of known qubit-based error-correcting codes. By concatenating with one of these codes and using ancilla-based error correction, fault-tolerant measurement-based quantum computation of theoretically indefinite length is possible with finitely squeezed cluster states.

  10. Automatic content extraction of filled-form images based on clustering component block projection vectors

    Science.gov (United States)

    Peng, Hanchuan; He, Xiaofeng; Long, Fuhui

    2003-12-01

    Automatic understanding of document images is a hard problem. Here we consider a sub-problem, automatically extracting content from filled form images. Without pre-selected templates or sophisticated structural/semantic analysis, we propose a novel approach based on clustering the component-block-projection-vectors. By combining spectral clustering and minimal spanning tree clustering, we generate highly accurate clusters, from which the adaptive templates are constructed to extract the filled-in content. Our experiments show this approach is effective for a set of 1040 US IRS tax form images belonging to 208 types.

  11. A new validity measure for a correlation-based fuzzy c-means clustering algorithm.

    Science.gov (United States)

    Zhang, Mingrui; Zhang, Wei; Sicotte, Hugues; Yang, Ping

    2009-01-01

    One of the major challenges in unsupervised clustering is the lack of consistent means for assessing the quality of clusters. In this paper, we evaluate several validity measures in fuzzy clustering and develop a new measure for a fuzzy c-means algorithm which uses a Pearson correlation in its distance metrics. The measure is designed with within-cluster sum of square, and makes use of fuzzy memberships. In comparing to the existing fuzzy partition coefficient and a fuzzy validity index, this new measure performs consistently across six microarray datasets. The newly developed measure could be used to assess the validity of fuzzy clusters produced by a correlation-based fuzzy c-means clustering algorithm.

  12. Neural network based cluster creation in the ATLAS silicon Pixel Detector

    CERN Document Server

    Perez Cavalcanti, T; The ATLAS collaboration

    2012-01-01

    The hit signals read out from pixels on planar semi-conductor sensors are grouped into clusters, to reconstruct the location where a charged particle passed through. The resolution of the individual pixel sizes can be improved significantly using the information from the cluster of adjacent pixels. Such analog cluster creation techniques have been used by the ATLAS experiment for many years giving an excellent performance. However, in dense environments, such as those inside high-energy jets, is likely that the charge deposited by two or more close-by tracks merges into one single cluster. A new pattern recognition algorithm based on neural network methods has been developed for the ATLAS Pixel Detector. This can identify the shared clusters, split them if necessary, and estimate the positions of all particles traversing the cluster. The algorithm significantly reduces ambiguities in the assignment of pixel detector measurements to tracks within jets, and improves the positional accuracy with respect to stand...

  13. Multiple- vs Non- or Single-Imputation based Fuzzy Clustering for Incomplete Longitudinal Behavioral Intervention Data.

    Science.gov (United States)

    Zhang, Zhaoyang; Fang, Hua

    2016-06-01

    Disentangling patients' behavioral variations is a critical step for better understanding an intervention's effects on individual outcomes. Missing data commonly exist in longitudinal behavioral intervention studies. Multiple imputation (MI) has been well studied for missing data analyses in the statistical field, however, has not yet been scrutinized for clustering or unsupervised learning, which are important techniques for explaining the heterogeneity of treatment effects. Built upon previous work on MI fuzzy clustering, this paper theoretically, empirically and numerically demonstrate how MI-based approach can reduce the uncertainty of clustering accuracy in comparison to non-and single-imputation based clustering approach. This paper advances our understanding of the utility and strength of multiple-imputation (MI) based fuzzy clustering approach to processing incomplete longitudinal behavioral intervention data.

  14. Bootstrap-based methods for estimating standard errors in Cox's regression analyses of clustered event times.

    Science.gov (United States)

    Xiao, Yongling; Abrahamowicz, Michal

    2010-03-30

    We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.

  15. How clustering dynamics influence lumber utilization patterns in the Amish-based furniture industry in Ohio

    Science.gov (United States)

    Matthew S. Bumgardner; Gary W. Graham; P. Charles Goebel; Robert L. Romig

    2011-01-01

    Preliminary studies have suggested that the Amish-based furniture and related products manufacturing cluster located in and around Holmes County, Ohio, uses sizeable quantities of hardwood lumber. The number of firms within the cluster has grown even as the broader domestic furniture manufacturing sector has contracted. The present study was undertaken in 2008 (spring/...

  16. A survey on the taxonomy of cluster-based routing protocols for homogeneous wireless sensor networks.

    Science.gov (United States)

    Naeimi, Soroush; Ghafghazi, Hamidreza; Chow, Chee-Onn; Ishii, Hiroshi

    2012-01-01

    The past few years have witnessed increased interest among researchers in cluster-based protocols for homogeneous networks because of their better scalability and higher energy efficiency than other routing protocols. Given the limited capabilities of sensor nodes in terms of energy resources, processing and communication range, the cluster-based protocols should be compatible with these constraints in either the setup state or steady data transmission state. With focus on these constraints, we classify routing protocols according to their objectives and methods towards addressing the shortcomings of clustering process on each stage of cluster head selection, cluster formation, data aggregation and data communication. We summarize the techniques and methods used in these categories, while the weakness and strength of each protocol is pointed out in details. Furthermore, taxonomy of the protocols in each phase is given to provide a deeper understanding of current clustering approaches. Ultimately based on the existing research, a summary of the issues and solutions of the attributes and characteristics of clustering approaches and some open research areas in cluster-based routing protocols that can be further pursued are provided.

  17. A Survey on the Taxonomy of Cluster-Based Routing Protocols for Homogeneous Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Hiroshi Ishii

    2012-05-01

    Full Text Available The past few years have witnessed increased interest among researchers in cluster-based protocols for homogeneous networks because of their better scalability and higher energy efficiency than other routing protocols. Given the limited capabilities of sensor nodes in terms of energy resources, processing and communication range, the cluster-based protocols should be compatible with these constraints in either the setup state or steady data transmission state. With focus on these constraints, we classify routing protocols according to their objectives and methods towards addressing the shortcomings of clustering process on each stage of cluster head selection, cluster formation, data aggregation and data communication. We summarize the techniques and methods used in these categories, while the weakness and strength of each protocol is pointed out in details. Furthermore, taxonomy of the protocols in each phase is given to provide a deeper understanding of current clustering approaches. Ultimately based on the existing research, a summary of the issues and solutions of the attributes and characteristics of clustering approaches and some open research areas in cluster-based routing protocols that can be further pursued are provided.

  18. Profiling physical activity motivation based on self-determination theory: a cluster analysis approach.

    Science.gov (United States)

    Friederichs, Stijn Ah; Bolman, Catherine; Oenema, Anke; Lechner, Lilian

    2015-01-01

    In order to promote physical activity uptake and maintenance in individuals who do not comply with physical activity guidelines, it is important to increase our understanding of physical activity motivation among this group. The present study aimed to examine motivational profiles in a large sample of adults who do not comply with physical activity guidelines. The sample for this study consisted of 2473 individuals (31.4% male; age 44.6 ± 12.9). In order to generate motivational profiles based on motivational regulation, a cluster analysis was conducted. One-way analyses of variance were then used to compare the clusters in terms of demographics, physical activity level, motivation to be active and subjective experience while being active. Three motivational clusters were derived based on motivational regulation scores: a low motivation cluster, a controlled motivation cluster and an autonomous motivation cluster. These clusters differed significantly from each other with respect to physical activity behavior, motivation to be active and subjective experience while being active. Overall, the autonomous motivation cluster displayed more favorable characteristics compared to the other two clusters. The results of this study provide additional support for the importance of autonomous motivation in the context of physical activity behavior. The three derived clusters may be relevant in the context of physical activity interventions as individuals within the different clusters might benefit most from different intervention approaches. In addition, this study shows that cluster analysis is a useful method for differentiating between motivational profiles in large groups of individuals who do not comply with physical activity guidelines.

  19. Intelligent cruise control field operational test. Vol III, Performance of a string or cluster of ACC-equipped cars

    Science.gov (United States)

    1998-07-01

    This report is one element of a cooperative agreement between NHTSA and UMTRI entitled Intelligent Cruise Control (ICC) Field Operational Test (FOT). It addresses the operation of a serial string or dense cluster of passenger cars equipped with a new...

  20. A hybrid method based on fuzzy clustering and local region-based level set for segmentation of inhomogeneous medical images.

    Science.gov (United States)

    Rastgarpour, Maryam; Shanbehzadeh, Jamshid; Soltanian-Zadeh, Hamid

    2014-08-01

    medical images are more affected by intensity inhomogeneity rather than noise and outliers. This has a great impact on the efficiency of region-based image segmentation methods, because they rely on homogeneity of intensities in the regions of interest. Meanwhile, initialization and configuration of controlling parameters affect the performance of level set segmentation. To address these problems, this paper proposes a new hybrid method that integrates a local region-based level set method with a variation of fuzzy clustering. Specifically it takes an information fusion approach based on a coarse-to-fine framework that seamlessly fuses local spatial information and gray level information with the information of the local region-based level set method. Also, the controlling parameters of level set are directly computed from fuzzy clustering result. This approach has valuable benefits such as automation, no need to prior knowledge about the region of interest (ROI), robustness on intensity inhomogeneity, automatic adjustment of controlling parameters, insensitivity to initialization, and satisfactory accuracy. So, the contribution of this paper is to provide these advantages together which have not been proposed yet for inhomogeneous medical images. Proposed method was tested on several medical images from different modalities for performance evaluation. Experimental results approve its effectiveness in segmenting medical images in comparison with similar methods.

  1. The implementation of two stages clustering (k-means clustering and adaptive neuro fuzzy inference system) for prediction of medicine need based on medical data

    Science.gov (United States)

    Husein, A. M.; Harahap, M.; Aisyah, S.; Purba, W.; Muhazir, A.

    2018-03-01

    Medication planning aim to get types, amount of medicine according to needs, and avoid the emptiness medicine based on patterns of disease. In making the medicine planning is still rely on ability and leadership experience, this is due to take a long time, skill, difficult to obtain a definite disease data, need a good record keeping and reporting, and the dependence of the budget resulted in planning is not going well, and lead to frequent lack and excess of medicines. In this research, we propose Adaptive Neuro Fuzzy Inference System (ANFIS) method to predict medication needs in 2016 and 2017 based on medical data in 2015 and 2016 from two source of hospital. The framework of analysis using two approaches. The first phase is implementing ANFIS to a data source, while the second approach we keep using ANFIS, but after the process of clustering from K-Means algorithm, both approaches are calculated values of Root Mean Square Error (RMSE) for training and testing. From the testing result, the proposed method with better prediction rates based on the evaluation analysis of quantitative and qualitative compared with existing systems, however the implementation of K-Means Algorithm against ANFIS have an effect on the timing of the training process and provide a classification accuracy significantly better without clustering.

  2. Electrochemical DNA detection based on the polyhedral boron cluster label

    Czech Academy of Sciences Publication Activity Database

    Jelen, František; Olejniczak, A.B.; Kouřilová, Alena; Lesnikowski, Z.J.; Paleček, Emil

    2009-01-01

    Roč. 81, č. 2 (2009), s. 840-844 ISSN 0003-2700 R&D Projects: GA AV ČR(CZ) IAA100040602; GA AV ČR(CZ) IAA400040804; GA ČR(CZ) GA301/07/0490 Institutional research plan: CEZ:AV0Z50040507; CEZ:AV0Z50040702 Keywords : DNA detection * DNA hybridization * polyhedral boron cluster Subject RIV: BO - Biophysics Impact factor: 5.214, year: 2009

  3. Semantic-based multilingual document clustering via tensor modeling

    OpenAIRE

    Romeo, S.; Tagarelli, A.; Ienco, D.

    2014-01-01

    EMNLP, Conference on Empirical Methods in Natural Language Processing , Doha, QAT, 25-/10/2014 - 29/10/2014; International audience; A major challenge in document clustering research arises from the growing amount of text data written in different languages. Previous approaches depend on language-specific solutions (e.g., bilingual dictionaries, sequential machine translation) to evaluate document similarities, and the required transformations may alter the original document semantics. To cop...

  4. Ligand Effects in Aluminum Cluster based Energetic Materials

    Science.gov (United States)

    2017-09-01

    nature of the Slater determinant [49]. represents the self-interaction of the electron density cloud and the potential is considered local [49], [50...including terms containing the gradient of the actual (non-uniform) electron density [49]. Attempts to improve upon GGA functionals with higher...maximum 200 words) This dissertation examines the electronic structure and thermochemistry of low-valent aluminum clusters that may serve as

  5. A Cluster-based Approach Towards Detecting and Modeling Network Dictionary Attacks

    Directory of Open Access Journals (Sweden)

    A. Tajari Siahmarzkooh

    2016-12-01

    Full Text Available In this paper, we provide an approach to detect network dictionary attacks using a data set collected as flows based on which a clustered graph is resulted. These flows provide an aggregated view of the network traffic in which the exchanged packets in the network are considered so that more internally connected nodes would be clustered. We show that dictionary attacks could be detected through some parameters namely the number and the weight of clusters in time series and their evolution over the time. Additionally, the Markov model based on the average weight of clusters,will be also created. Finally, by means of our suggested model, we demonstrate that artificial clusters of the flows are created for normal and malicious traffic. The results of the proposed approach on CAIDA 2007 data set suggest a high accuracy for the model and, therefore, it provides a proper method for detecting the dictionary attack.

  6. Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions.

    Science.gov (United States)

    Tokuda, Tomoki; Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji

    2017-01-01

    We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data.

  7. Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions.

    Directory of Open Access Journals (Sweden)

    Tomoki Tokuda

    Full Text Available We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data.

  8. Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

    Science.gov (United States)

    Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji

    2017-01-01

    We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data. PMID:29049392

  9. Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias

    Science.gov (United States)

    2013-01-01

    Background Deep sequencing is a powerful tool for assessing viral genetic diversity. Such experiments harness the high coverage afforded by next generation sequencing protocols by treating sequencing reads as a population sample. Distinguishing true single nucleotide variants (SNVs) from sequencing errors remains challenging, however. Current protocols are characterised by high false positive rates, with results requiring time consuming manual checking. Results By statistical modelling, we show that if multiple variant sites are considered at once, SNVs can be called reliably from high coverage viral deep sequencing data at frequencies lower than the error rate of the sequencing technology, and that SNV calling accuracy increases as true sequence diversity within a read length increases. We demonstrate these findings on two control data sets, showing that SNV detection is more reliable on a high diversity human immunodeficiency virus sample as compared to a moderate diversity sample of hepatitis C virus. Finally, we show that in situations where probabilistic clustering retains false positive SNVs (for instance due to insufficient sample diversity or systematic errors), applying a strand bias test based on a beta-binomial model of forward read distribution can improve precision, with negligible cost to true positive recall. Conclusions By combining probabilistic clustering (implemented in the program ShoRAH) with a statistical test of strand bias, SNVs may be called from deeply sequenced viral populations with high accuracy. PMID:23879730

  10. Reconstruction of a digital core containing clay minerals based on a clustering algorithm

    Science.gov (United States)

    He, Yanlong; Pu, Chunsheng; Jing, Cheng; Gu, Xiaoyu; Chen, Qingdong; Liu, Hongzhi; Khan, Nasir; Dong, Qiaoling

    2017-10-01

    It is difficult to obtain a core sample and information for digital core reconstruction of mature sandstone reservoirs around the world, especially for an unconsolidated sandstone reservoir. Meanwhile, reconstruction and division of clay minerals play a vital role in the reconstruction of the digital cores, although the two-dimensional data-based reconstruction methods are specifically applicable as the microstructure reservoir simulation methods for the sandstone reservoir. However, reconstruction of clay minerals is still challenging from a research viewpoint for the better reconstruction of various clay minerals in the digital cores. In the present work, the content of clay minerals was considered on the basis of two-dimensional information about the reservoir. After application of the hybrid method, and compared with the model reconstructed by the process-based method, the digital core containing clay clusters without the labels of the clusters' number, size, and texture were the output. The statistics and geometry of the reconstruction model were similar to the reference model. In addition, the Hoshen-Kopelman algorithm was used to label various connected unclassified clay clusters in the initial model and then the number and size of clay clusters were recorded. At the same time, the K -means clustering algorithm was applied to divide the labeled, large connecting clusters into smaller clusters on the basis of difference in the clusters' characteristics. According to the clay minerals' characteristics, such as types, textures, and distributions, the digital core containing clay minerals was reconstructed by means of the clustering algorithm and the clay clusters' structure judgment. The distributions and textures of the clay minerals of the digital core were reasonable. The clustering algorithm improved the digital core reconstruction and provided an alternative method for the simulation of different clay minerals in the digital cores.

  11. Reconstruction of a digital core containing clay minerals based on a clustering algorithm.

    Science.gov (United States)

    He, Yanlong; Pu, Chunsheng; Jing, Cheng; Gu, Xiaoyu; Chen, Qingdong; Liu, Hongzhi; Khan, Nasir; Dong, Qiaoling

    2017-10-01

    It is difficult to obtain a core sample and information for digital core reconstruction of mature sandstone reservoirs around the world, especially for an unconsolidated sandstone reservoir. Meanwhile, reconstruction and division of clay minerals play a vital role in the reconstruction of the digital cores, although the two-dimensional data-based reconstruction methods are specifically applicable as the microstructure reservoir simulation methods for the sandstone reservoir. However, reconstruction of clay minerals is still challenging from a research viewpoint for the better reconstruction of various clay minerals in the digital cores. In the present work, the content of clay minerals was considered on the basis of two-dimensional information about the reservoir. After application of the hybrid method, and compared with the model reconstructed by the process-based method, the digital core containing clay clusters without the labels of the clusters' number, size, and texture were the output. The statistics and geometry of the reconstruction model were similar to the reference model. In addition, the Hoshen-Kopelman algorithm was used to label various connected unclassified clay clusters in the initial model and then the number and size of clay clusters were recorded. At the same time, the K-means clustering algorithm was applied to divide the labeled, large connecting clusters into smaller clusters on the basis of difference in the clusters' characteristics. According to the clay minerals' characteristics, such as types, textures, and distributions, the digital core containing clay minerals was reconstructed by means of the clustering algorithm and the clay clusters' structure judgment. The distributions and textures of the clay minerals of the digital core were reasonable. The clustering algorithm improved the digital core reconstruction and provided an alternative method for the simulation of different clay minerals in the digital cores.

  12. Semi-supervised weighted kernel clustering based on gravitational search for fault diagnosis.

    Science.gov (United States)

    Li, Chaoshun; Zhou, Jianzhong

    2014-09-01

    Supervised learning method, like support vector machine (SVM), has been widely applied in diagnosing known faults, however this kind of method fails to work correctly when new or unknown fault occurs. Traditional unsupervised kernel clustering can be used for unknown fault diagnosis, but it could not make use of the historical classification information to improve diagnosis accuracy. In this paper, a semi-supervised kernel clustering model is designed to diagnose known and unknown faults. At first, a novel semi-supervised weighted kernel clustering algorithm based on gravitational search (SWKC-GS) is proposed for clustering of dataset composed of labeled and unlabeled fault samples. The clustering model of SWKC-GS is defined based on wrong classification rate of labeled samples and fuzzy clustering index on the whole dataset. Gravitational search algorithm (GSA) is used to solve the clustering model, while centers of clusters, feature weights and parameter of kernel function are selected as optimization variables. And then, new fault samples are identified and diagnosed by calculating the weighted kernel distance between them and the fault cluster centers. If the fault samples are unknown, they will be added in historical dataset and the SWKC-GS is used to partition the mixed dataset and update the clustering results for diagnosing new fault. In experiments, the proposed method has been applied in fault diagnosis for rotatory bearing, while SWKC-GS has been compared not only with traditional clustering methods, but also with SVM and neural network, for known fault diagnosis. In addition, the proposed method has also been applied in unknown fault diagnosis. The results have shown effectiveness of the proposed method in achieving expected diagnosis accuracy for both known and unknown faults of rotatory bearing. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.

  13. Image processing of globular clusters - Simulation for deconvolution tests (GlencoeSim)

    Science.gov (United States)

    Blazek, Martin; Pata, Petr

    2016-10-01

    This paper presents an algorithmic approach for efficiency tests of deconvolution algorithms in astronomic image processing. Due to the existence of noise in astronomical data there is no certainty that a mathematically exact result of stellar deconvolution exists and iterative or other methods such as aperture or PSF fitting photometry are commonly used. Iterative methods are important namely in the case of crowded fields (e.g., globular clusters). For tests of the efficiency of these iterative methods on various stellar fields, information about the real fluxes of the sources is essential. For this purpose a simulator of artificial images with crowded stellar fields provides initial information on source fluxes for a robust statistical comparison of various deconvolution methods. The "GlencoeSim" simulator and the algorithms presented in this paper consider various settings of Point-Spread Functions, noise types and spatial distributions, with the aim of producing as realistic an astronomical optical stellar image as possible.

  14. A Novel Clustering-Based Feature Representation for the Classification of Hyperspectral Imagery

    Directory of Open Access Journals (Sweden)

    Qikai Lu

    2014-06-01

    Full Text Available In this study, a new clustering-based feature extraction algorithm is proposed for the spectral-spatial classification of hyperspectral imagery. The clustering approach is able to group the high-dimensional data into a subspace by mining the salient information and suppressing the redundant information. In this way, the relationship between neighboring pixels, which was hidden in the original data, can be extracted more effectively. Specifically, in the proposed algorithm, a two-step process is adopted to make use of the clustering-based information. A clustering approach is first used to produce the initial clustering map, and, subsequently, a multiscale cluster histogram (MCH is proposed to represent the spatial information around each pixel. In order to evaluate the robustness of the proposed MCH, four clustering techniques are employed to analyze the influence of the clustering methods. Meanwhile, the performance of the MCH is compared to three other widely used spatial features: the gray-level co-occurrence matrix (GLCM, the 3D wavelet texture, and differential morphological profiles (DMPs. The experiments conducted on four well-known hyperspectral datasets verify that the proposed MCH can significantly improve the classification accuracy, and it outperforms other commonly used spatial features.

  15. Substructures in DAFT/FADA survey clusters based on XMM and optical data

    Science.gov (United States)

    Durret, F.; DAFT/FADA Team

    2014-07-01

    The DAFT/FADA survey was initiated to perform weak lensing tomography on a sample of 90 massive clusters in the redshift range [0.4,0.9] with HST imaging available. The complementary deep multiband imaging constitutes a high quality imaging data base for these clusters. In X-rays, we have analysed the XMM-Newton and/or Chandra data available for 32 clusters, and for 23 clusters we fit the X-ray emissivity with a beta-model and subtract it to search for substructures in the X-ray gas. This study was coupled with a dynamical analysis for the 18 clusters with at least 15 spectroscopic galaxy redshifts in the cluster range, based on a Serna & Gerbal (SG) analysis. We detected ten substructures in eight clusters by both methods (X-rays and SG). The percentage of mass included in substructures is found to be roughly constant with redshift, with values of 5-15%. Most of the substructures detected both in X-rays and with the SG method are found to be relatively recent infalls, probably at their first cluster pericenter approach.

  16. Segmentation of MRI Volume Data Based on Clustering Method

    Directory of Open Access Journals (Sweden)

    Ji Dongsheng

    2016-01-01

    Full Text Available Here we analyze the difficulties of segmentation without tag line of left ventricle MR images, and propose an algorithm for automatic segmentation of left ventricle (LV internal and external profiles. Herein, we propose an Incomplete K-means and Category Optimization (IKCO method. Initially, using Hough transformation to automatically locate initial contour of the LV, the algorithm uses a simple approach to complete data subsampling and initial center determination. Next, according to the clustering rules, the proposed algorithm finishes MR image segmentation. Finally, the algorithm uses a category optimization method to improve segmentation results. Experiments show that the algorithm provides good segmentation results.

  17. Aluminum Cluster-Based Materials for Propulsion and Other Applications

    Science.gov (United States)

    2012-04-04

    spontaneous generation of hydrogen from water. More recent work has shown that such pairs can even break OH bond in methanol and CO bond in formaldehyde . These...covalent bonds including carbonyl bond in formaldehyde.Gp11 D. On the Growth Mechanisms of Nb3Cn- (n=5-10) Clusters: Metallocarbohedrenes (Met-Cars...075435-1 – 075435-7 (2011). GP11. “ Carbonyl Bond Cleavage by Complementary Active Sites”, W. H. Woodward, A. C. Reber, J. C. Smith, S. N. Khanna

  18. Unsupervised Performance Evaluation Strategy for Bridge Superstructure Based on Fuzzy Clustering and Field Data

    Directory of Open Access Journals (Sweden)

    Yubo Jiao

    2013-01-01

    Full Text Available Performance evaluation of a bridge is critical for determining the optimal maintenance strategy. An unsupervised bridge superstructure state assessment method is proposed in this paper based on fuzzy clustering and bridge field measured data. Firstly, the evaluation index system of bridge is constructed. Secondly, a certain number of bridge health monitoring data are selected as clustering samples to obtain the fuzzy similarity matrix and fuzzy equivalent matrix. Finally, different thresholds are selected to form dynamic clustering maps and determine the best classification based on statistic analysis. The clustering result is regarded as a sample base, and the bridge state can be evaluated by calculating the fuzzy nearness between the unknown bridge state data and the sample base. Nanping Bridge in Jilin Province is selected as the engineering project to verify the effectiveness of the proposed method.

  19. Unsupervised performance evaluation strategy for bridge superstructure based on fuzzy clustering and field data.

    Science.gov (United States)

    Jiao, Yubo; Liu, Hanbing; Zhang, Peng; Wang, Xianqiang; Wei, Haibin

    2013-01-01

    Performance evaluation of a bridge is critical for determining the optimal maintenance strategy. An unsupervised bridge superstructure state assessment method is proposed in this paper based on fuzzy clustering and bridge field measured data. Firstly, the evaluation index system of bridge is constructed. Secondly, a certain number of bridge health monitoring data are selected as clustering samples to obtain the fuzzy similarity matrix and fuzzy equivalent matrix. Finally, different thresholds are selected to form dynamic clustering maps and determine the best classification based on statistic analysis. The clustering result is regarded as a sample base, and the bridge state can be evaluated by calculating the fuzzy nearness between the unknown bridge state data and the sample base. Nanping Bridge in Jilin Province is selected as the engineering project to verify the effectiveness of the proposed method.

  20. Recognition of genetically modified product based on affinity propagation clustering and terahertz spectroscopy

    Science.gov (United States)

    Liu, Jianjun; Kan, Jianquan

    2018-04-01

    In this paper, based on the terahertz spectrum, a new identification method of genetically modified material by support vector machine (SVM) based on affinity propagation clustering is proposed. This algorithm mainly uses affinity propagation clustering algorithm to make cluster analysis and labeling on unlabeled training samples, and in the iterative process, the existing SVM training data are continuously updated, when establishing the identification model, it does not need to manually label the training samples, thus, the error caused by the human labeled samples is reduced, and the identification accuracy of the model is greatly improved.

  1. pdc: An R Package for Complexity-Based Clustering of Time Series

    Directory of Open Access Journals (Sweden)

    Andreas M. Brandmaier

    2015-10-01

    Full Text Available Permutation distribution clustering is a complexity-based approach to clustering time series. The dissimilarity of time series is formalized as the squared Hellinger distance between the permutation distribution of embedded time series. The resulting distance measure has linear time complexity, is invariant to phase and monotonic transformations, and robust to outliers. A probabilistic interpretation allows the determination of the number of significantly different clusters. An entropy-based heuristic relieves the user of the need to choose the parameters of the underlying time-delayed embedding manually and, thus, makes it possible to regard the approach as parameter-free. This approach is illustrated with examples on empirical data.

  2. Process evaluation of a cluster-randomised trial testing a pressure ulcer prevention care bundle: a mixed-methods study.

    Science.gov (United States)

    Roberts, Shelley; McInnes, Elizabeth; Bucknall, Tracey; Wallis, Marianne; Banks, Merrilyn; Chaboyer, Wendy

    2017-02-13

    As pressure ulcers contribute to significant patient burden and increased health care costs, their prevention is a clinical priority. Our team developed and tested a complex intervention, a pressure ulcer prevention care bundle promoting patient participation in care, in a cluster-randomised trial. The UK Medical Research Council recommends process evaluation of complex interventions to provide insight into why they work or fail and how they might be improved. This study aimed to evaluate processes underpinning implementation of the intervention and explore end-users' perceptions of it, in order to give a deeper understanding of its effects. A pre-specified, mixed-methods process evaluation was conducted as an adjunct to the main trial, guided by a framework for process evaluation of cluster-randomised trials. Data was collected across eight Australian hospitals but mainly focused on the four intervention hospitals. Quantitative and qualitative data were collected across the evaluation domains: recruitment, reach, intervention delivery and response to intervention, at both cluster and individual patient level. Quantitative data were analysed using descriptive and inferential statistics. Qualitative data were analysed using thematic analysis. In the context of the main trial, which found a 42% reduction in risk of pressure ulcer with the intervention that was not significant after adjusting for clustering and covariates, this process evaluation provides important insights. Recruitment and reach among clusters and individuals was high, indicating that patients, nurses and hospitals are willing to engage with a pressure ulcer prevention care bundle. Of 799 intervention patients in the trial, 96.7% received the intervention, which took under 10 min to deliver. Patients and nurses accepted the care bundle, recognising benefits to it and describing how it enabled participation in pressure ulcer prevention (PUP) care. This process evaluation found no major failures

  3. A comparison of two suffix tree-based document clustering algorithms

    OpenAIRE

    Rafi, Muhammad; Maujood, M.; Fazal, M. M.; Ali, S. M.

    2011-01-01

    Document clustering as an unsupervised approach extensively used to navigate, filter, summarize and manage large collection of document repositories like the World Wide Web (WWW). Recently, focuses in this domain shifted from traditional vector based document similarity for clustering to suffix tree based document similarity, as it offers more semantic representation of the text present in the document. In this paper, we compare and contrast two recently introduced approaches to document clus...

  4. Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis

    Science.gov (United States)

    2015-01-01

    discussion of its application to the network of network scientists. Each partitioning step in this spectral scheme either bipartitions or tripartitions a...University of California Los Angeles Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis A dissertation...00-00-2015 to 00-00-2015 4. TITLE AND SUBTITLE Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis 5a

  5. Current and Future Tests of the Algebraic Cluster Model of12C

    Science.gov (United States)

    Gai, Moshe

    2017-07-01

    A new theoretical approach to clustering in the frame of the Algebraic Cluster Model (ACM) has been developed. It predicts, in12C, rotation-vibration structure with rotational bands of an oblate equilateral triangular symmetric spinning top with a D 3h symmetry characterized by the sequence of states: 0+, 2+, 3-, 4±, 5- with a degenerate 4+ and 4- (parity doublet) states. Our newly measured {2}2+ state in12C allows the first study of rotation-vibration structure in12C. The newly measured 5- state and 4- states fit very well the predicted ground state rotational band structure with the predicted sequence of states: 0+, 2+, 3-, 4±, 5- with almost degenerate 4+ and 4- (parity doublet) states. Such a D 3h symmetry is characteristic of triatomic molecules, but it is observed in the ground state rotational band of12C for the first time in a nucleus. We discuss predictions of the ACM of other rotation-vibration bands in12C such as the (0+) Hoyle band and the (1-) bending mode with prediction of (“missing 3- and 4-”) states that may shed new light on clustering in12C and light nuclei. In particular, the observation (or non observation) of the predicted (“missing”) states in the Hoyle band will allow us to conclude the geometrical arrangement of the three alpha particles composing the Hoyle state at 7.6542 MeV in12C. We discuss proposed research programs at the Darmstadt S- DALINAC and at the newly constructed ELI-NP facility near Bucharest to test the predictions of the ACM in isotopes of carbon.

  6. Automatic spike sorting for extracellular electrophysiological recording using unsupervised single linkage clustering based on grey relational analysis

    Science.gov (United States)

    Lai, Hsin-Yi; Chen, You-Yin; Lin, Sheng-Huang; Lo, Yu-Chun; Tsang, Siny; Chen, Shin-Yuan; Zhao, Wan-Ting; Chao, Wen-Hung; Chang, Yao-Chuan; Wu, Robby; Shih, Yen-Yu I.; Tsai, Sheng-Tsung; Jaw, Fu-Shan

    2011-06-01

    Automatic spike sorting is a prerequisite for neuroscience research on multichannel extracellular recordings of neuronal activity. A novel spike sorting framework, combining efficient feature extraction and an unsupervised clustering method, is described here. Wavelet transform (WT) is adopted to extract features from each detected spike, and the Kolmogorov-Smirnov test (KS test) is utilized to select discriminative wavelet coefficients from the extracted features. Next, an unsupervised single linkage clustering method based on grey relational analysis (GSLC) is applied for spike clustering. The GSLC uses the grey relational grade as the similarity measure, instead of the Euclidean distance for distance calculation; the number of clusters is automatically determined by the elbow criterion in the threshold-cumulative distribution. Four simulated data sets with four noise levels and electrophysiological data recorded from the subthalamic nucleus of eight patients with Parkinson's disease during deep brain stimulation surgery are used to evaluate the performance of GSLC. Feature extraction results from the use of WT with the KS test indicate a reduced number of feature coefficients, as well as good noise rejection, despite similar spike waveforms. Accordingly, the use of GSLC for spike sorting achieves high classification accuracy in all simulated data sets. Moreover, J-measure results in the electrophysiological data indicating that the quality of spike sorting is adequate with the use of GSLC.

  7. Analyses of crime patterns in NIBRS data based on a novel graph theory clustering method: Virginia as a case study.

    Science.gov (United States)

    Zhao, Peixin; Darrah, Marjorie; Nolan, Jim; Zhang, Cun-Quan

    2014-01-01

    This paper suggests a novel clustering method for analyzing the National Incident-Based Reporting System (NIBRS) data, which include the determination of correlation of different crime types, the development of a likelihood index for crimes to occur in a jurisdiction, and the clustering of jurisdictions based on crime type. The method was tested by using the 2005 assault data from 121 jurisdictions in Virginia as a test case. The analyses of these data show that some different crime types are correlated and some different crime parameters are correlated with different crime types. The analyses also show that certain jurisdictions within Virginia share certain crime patterns. This information assists with constructing a pattern for a specific crime type and can be used to determine whether a jurisdiction may be more likely to see this type of crime occur in their area.

  8. Analyses of Crime Patterns in NIBRS Data Based on a Novel Graph Theory Clustering Method: Virginia as a Case Study

    Directory of Open Access Journals (Sweden)

    Peixin Zhao

    2014-01-01

    Full Text Available This paper suggests a novel clustering method for analyzing the National Incident-Based Reporting System (NIBRS data, which include the determination of correlation of different crime types, the development of a likelihood index for crimes to occur in a jurisdiction, and the clustering of jurisdictions based on crime type. The method was tested by using the 2005 assault data from 121 jurisdictions in Virginia as a test case. The analyses of these data show that some different crime types are correlated and some different crime parameters are correlated with different crime types. The analyses also show that certain jurisdictions within Virginia share certain crime patterns. This information assists with constructing a pattern for a specific crime type and can be used to determine whether a jurisdiction may be more likely to see this type of crime occur in their area.

  9. Internet2-based 3D PET image reconstruction using a PC cluster.

    Science.gov (United States)

    Shattuck, D W; Rapela, J; Asma, E; Chatzioannou, A; Qi, J; Leahy, R M

    2002-08-07

    We describe an approach to fast iterative reconstruction from fully three-dimensional (3D) PET data using a network of PentiumIII PCs configured as a Beowulf cluster. To facilitate the use of this system, we have developed a browser-based interface using Java. The system compresses PET data on the user's machine, sends these data over a network, and instructs the PC cluster to reconstruct the image. The cluster implements a parallelized version of our preconditioned conjugate gradient method for fully 3D MAP image reconstruction. We report on the speed-up factors using the Beowulf approach and the impacts of communication latencies in the local cluster network and the network connection between the user's machine and our PC cluster.

  10. An Adaptive Sweep-Circle Spatial Clustering Algorithm Based on Gestalt

    Directory of Open Access Journals (Sweden)

    Qingming Zhan

    2017-08-01

    Full Text Available An adaptive spatial clustering (ASC algorithm is proposed in this present study, which employs sweep-circle techniques and a dynamic threshold setting based on the Gestalt theory to detect spatial clusters. The proposed algorithm can automatically discover clusters in one pass, rather than through the modification of the initial model (for example, a minimal spanning tree, Delaunay triangulation, or Voronoi diagram. It can quickly identify arbitrarily-shaped clusters while adapting efficiently to non-homogeneous density characteristics of spatial data, without the need for prior knowledge or parameters. The proposed algorithm is also ideal for use in data streaming technology with dynamic characteristics flowing in the form of spatial clustering in large data sets.

  11. Hierarchical cluster analysis of ignitable liquids based on the total ion spectrum.

    Science.gov (United States)

    Waddell, Erin E; Frisch-Daiello, Jessica L; Williams, Mary R; Sigman, Michael E

    2014-09-01

    Gas chromatography-mass spectrometry (GC-MS) data of ignitable liquids in the Ignitable Liquids Reference Collection (ILRC) database were processed to obtain 445 total ion spectra (TIS), that is, average mass spectra across the chromatographic profile. Hierarchical cluster analysis, an unsupervised learning technique, was applied to find features useful for classification of ignitable liquids. A combination of the correlation distance and average linkage was utilized for grouping ignitable liquids with similar chemical composition. This study evaluated whether hierarchical cluster analysis of the TIS would cluster together ignitable liquids of the same ASTM class assignment, as designated in the ILRC database. The ignitable liquids clustered based on their chemical composition, and the ignitable liquids within each cluster were predominantly from one ASTM E1618-11 class. These results reinforce use of the TIS as a tool to aid in forensic fire debris analysis. © 2014 American Academy of Forensic Sciences.

  12. Internet2-based 3D PET image reconstruction using a PC cluster

    International Nuclear Information System (INIS)

    Shattuck, D.W.; Rapela, J.; Asma, E.; Leahy, R.M.; Chatzioannou, A.; Qi, J.

    2002-01-01

    We describe an approach to fast iterative reconstruction from fully three-dimensional (3D) PET data using a network of PentiumIII PCs configured as a Beowulf cluster. To facilitate the use of this system, we have developed a browser-based interface using Java. The system compresses PET data on the user's machine, sends these data over a network, and instructs the PC cluster to reconstruct the image. The cluster implements a parallelized version of our preconditioned conjugate gradient method for fully 3D MAP image reconstruction. We report on the speed-up factors using the Beowulf approach and the impacts of communication latencies in the local cluster network and the network connection between the user's machine and our PC cluster. (author)

  13. Neural network based cluster creation in the ATLAS silicon pixel detector

    CERN Document Server

    Selbach, K E; The ATLAS collaboration

    2012-01-01

    The read-out from individual pixels on planar semi-conductor sensors are grouped into clusters to reconstruct the location where a charged particle passed through the sensor. The resolution given by individual pixel sizes is significantly improved by using the information from the charge sharing between pixels. Such analog cluster creation techniques have been used by the ATLAS experiment for many years to obtain an excellent performance. However, in dense environments, such as those inside high-energy jets, clusters have an increased probability of merging the charge deposited by multiple particles. Recently, a neural network based algorithm which estimates both the cluster position and whether a cluster should be split has been developed for the ATLAS pixel detector. The algorithm significantly reduces ambiguities in the assignment of pixel detector measurement to tracks within jets and improves the position accuracy with respect to standard interpolation techniques by taking into account the 2-dimensional ...

  14. Neural network based cluster creation in the ATLAS silicon Pixel Detector

    CERN Document Server

    Andreazza, A; The ATLAS collaboration

    2013-01-01

    The read-out from individual pixels on planar semi-conductor sensors are grouped into clusters to reconstruct the location where a charged particle passed through the sensor. The resolution given by individual pixel sizes is significantly improved by using the information from the charge sharing between pixels. Such analog cluster creation techniques have been used by the ATLAS experiment for many years to obtain an excellent performance. However, in dense environments, such as those inside high-energy jets, clusters have an increased probability of merging the charge deposited by multiple particles. Recently, a neural network based algorithm which estimates both the cluster position and whether a cluster should be split has been developed for the ATLAS Pixel Detector. The algorithm significantly reduces ambiguities in the assignment of pixel detector measurement to tracks within jets and improves the position accuracy with respect to standard interpolation techniques by taking into account the 2-dimensional ...

  15. Fatigue Feature Extraction Analysis based on a K-Means Clustering Approach

    Directory of Open Access Journals (Sweden)

    M.F.M. Yunoh

    2015-06-01

    Full Text Available This paper focuses on clustering analysis using a K-means approach for fatigue feature dataset extraction. The aim of this study is to group the dataset as closely as possible (homogeneity for the scattered dataset. Kurtosis, the wavelet-based energy coefficient and fatigue damage are calculated for all segments after the extraction process using wavelet transform. Kurtosis, the wavelet-based energy coefficient and fatigue damage are used as input data for the K-means clustering approach. K-means clustering calculates the average distance of each group from the centroid and gives the objective function values. Based on the results, maximum values of the objective function can be seen in the two centroid clusters, with a value of 11.58. The minimum objective function value is found at 8.06 for five centroid clusters. It can be seen that the objective function with the lowest value for the number of clusters is equal to five; which is therefore the best cluster for the dataset.

  16. Subtypes of autism by cluster analysis based on structural MRI data.

    Science.gov (United States)

    Hrdlicka, Michal; Dudova, Iva; Beranova, Irena; Lisy, Jiri; Belsan, Tomas; Neuwirth, Jiri; Komarek, Vladimir; Faladova, Ludvika; Havlovicova, Marketa; Sedlacek, Zdenek; Blatny, Marek; Urbanek, Tomas

    2005-05-01

    The aim of our study was to subcategorize Autistic Spectrum Disorders (ASD) using a multidisciplinary approach. Sixty four autistic patients (mean age 9.4+/-5.6 years) were entered into a cluster analysis. The clustering analysis was based on MRI data. The clusters obtained did not differ significantly in the overall severity of autistic symptomatology as measured by the total score on the Childhood Autism Rating Scale (CARS). The clusters could be characterized as showing significant differences: Cluster 1: showed the largest sizes of the genu and splenium of the corpus callosum (CC), the lowest pregnancy order and the lowest frequency of facial dysmorphic features. Cluster 2: showed the largest sizes of the amygdala and hippocampus (HPC), the least abnormal visual response on the CARS, the lowest frequency of epilepsy and the least frequent abnormal psychomotor development during the first year of life. Cluster 3: showed the largest sizes of the caput of the nucleus caudatus (NC), the smallest sizes of the HPC and facial dysmorphic features were always present. Cluster 4: showed the smallest sizes of the genu and splenium of the CC, as well as the amygdala, and caput of the NC, the most abnormal visual response on the CARS, the highest frequency of epilepsy, the highest pregnancy order, abnormal psychomotor development during the first year of life was always present and facial dysmorphic features were always present. This multidisciplinary approach seems to be a promising method for subtyping autism.

  17. CASSIS and SMIPS: promoter-based prediction of secondary metabolite gene clusters in eukaryotic genomes.

    Science.gov (United States)

    Wolf, Thomas; Shelest, Vladimir; Nath, Neetika; Shelest, Ekaterina

    2016-04-15

    Secondary metabolites (SM) are structurally diverse natural products of high pharmaceutical importance. Genes involved in their biosynthesis are often organized in clusters, i.e., are co-localized and co-expressed. In silico cluster prediction in eukaryotic genomes remains problematic mainly due to the high variability of the clusters' content and lack of other distinguishing sequence features. We present Cluster Assignment by Islands of Sites (CASSIS), a method for SM cluster prediction in eukaryotic genomes, and Secondary Metabolites by InterProScan (SMIPS), a tool for genome-wide detection of SM key enzymes ('anchor' genes): polyketide synthases, non-ribosomal peptide synthetases and dimethylallyl tryptophan synthases. Unlike other tools based on protein similarity, CASSIS exploits the idea of co-regulation of the cluster genes, which assumes the existence of common regulatory patterns in the cluster promoters. The method searches for 'islands' of enriched cluster-specific motifs in the vicinity of anchor genes. It was validated in a series of cross-validation experiments and showed high sensitivity and specificity. CASSIS and SMIPS are freely available at https://sbi.hki-jena.de/cassis thomas.wolf@leibniz-hki.de or ekaterina.shelest@leibniz-hki.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  18. GAECH: Genetic Algorithm Based Energy Efficient Clustering Hierarchy in Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    B. Baranidharan

    2015-01-01

    Full Text Available Clustering the Wireless Sensor Networks (WSNs is the major issue which determines the lifetime of the network. The parameters chosen for clustering should be appropriate to form the clusters according to the need of the applications. Some of the well-known clustering techniques in WSN are designed only to reduce overall energy consumption in the network and increase the network lifetime. These algorithms achieve increased lifetime, but at the cost of overloading individual sensor nodes. Load balancing among the nodes in the network is also equally important in achieving increased lifetime. First Node Die (FND, Half Node Die (HND, and Last Node Die (LND are the different metrics for analysing lifetime of the network. In this paper, a new clustering algorithm, Genetic Algorithm based Energy efficient Clustering Hierarchy (GAECH algorithm, is proposed to increase FND, HND, and LND with a novel fitness function. The fitness function in GAECH forms well-balanced clusters considering the core parameters of a cluster, which again increases both the stability period and lifetime of the network. The experimental results also clearly indicate better performance of GAECH over other algorithms in all the necessary aspects.

  19. Transient identification by clustering based on Integrated Deterministic and Probabilistic Safety Analysis outcomes

    International Nuclear Information System (INIS)

    Di Maio, Francesco; Vagnoli, Matteo; Zio, Enrico

    2016-01-01

    Highlights: • We develop an Integrated Deterministic and Probabilistic Safety Analysis (IDPSA). • We present a transient identification approach for retrieving IDPSA scenarios information. • We post-process the IDPSA scenarios for clustering Prime Implicants and Near Misses. • The approach is useful for an on-line cluster assignment of an unknown developing scenario. • We apply the approach to the accidental scenarios of a dynamic Steam Generator of a NPP. - Abstract: In this work, we present a transient identification approach that utilizes clustering for retrieving scenarios information from an Integrated Deterministic and Probabilistic Safety Analysis (IDPSA). The approach requires: (i) creation of a database of scenarios by IDPSA; (ii) scenario post-processing for clustering Prime Implicants (PIs), i.e., minimum combinations of failure events that are capable of leading the system into a fault state, and Near Misses, i.e., combinations of failure events that lead the system to a quasi-fault state; (iii) on-line cluster assignment of an unknown developing scenario. In the step (ii), we adopt a visual interactive method and risk-based clustering to identify PIs and Near Misses, respectively; in the on-line step (iii), to assign a scenario to a cluster we consider the sequence of events in the scenario and evaluate the Hamming similarity to the sequences of the previously clustered scenarios. The feasibility of the analysis is shown with respect to the accidental scenarios of a dynamic Steam Generator (SG) of a NPP.

  20. Fuzzy clustering-based segmented attenuation correction in whole-body PET

    CERN Document Server

    Zaidi, H; Boudraa, A; Slosman, DO

    2001-01-01

    Segmented-based attenuation correction is now a widely accepted technique to reduce noise contribution of measured attenuation correction. In this paper, we present a new method for segmenting transmission images in positron emission tomography. This reduces the noise on the correction maps while still correcting for differing attenuation coefficients of specific tissues. Based on the Fuzzy C-Means (FCM) algorithm, the method segments the PET transmission images into a given number of clusters to extract specific areas of differing attenuation such as air, the lungs and soft tissue, preceded by a median filtering procedure. The reconstructed transmission image voxels are therefore segmented into populations of uniform attenuation based on the human anatomy. The clustering procedure starts with an over-specified number of clusters followed by a merging process to group clusters with similar properties and remove some undesired substructures using anatomical knowledge. The method is unsupervised, adaptive and a...

  1. A Model-Based Cluster Analysis of Maternal Emotion Regulation and Relations to Parenting Behavior.

    Science.gov (United States)

    Shaffer, Anne; Whitehead, Monica; Davis, Molly; Morelen, Diana; Suveg, Cynthia

    2017-10-15

    In a diverse community sample of mothers (N = 108) and their preschool-aged children (M age  = 3.50 years), this study conducted person-oriented analyses of maternal emotion regulation (ER) based on a multimethod assessment incorporating physiological, observational, and self-report indicators. A model-based cluster analysis was applied to five indicators of maternal ER: maternal self-report, observed negative affect in a parent-child interaction, baseline respiratory sinus arrhythmia (RSA), and RSA suppression across two laboratory tasks. Model-based cluster analyses revealed four maternal ER profiles, including a group of mothers with average ER functioning, characterized by socioeconomic advantage and more positive parenting behavior. A dysregulated cluster demonstrated the greatest challenges with parenting and dyadic interactions. Two clusters of intermediate dysregulation were also identified. Implications for assessment and applications to parenting interventions are discussed. © 2017 Family Process Institute.

  2. [Automatic Sleep Stage Classification Based on an Improved K-means Clustering Algorithm].

    Science.gov (United States)

    Xiao, Shuyuan; Wang, Bei; Zhang, Jian; Zhang, Qunfeng; Zou, Junzhong

    2016-10-01

    Sleep stage scoring is a hotspot in the field of medicine and neuroscience.Visual inspection of sleep is laborious and the results may be subjective to different clinicians.Automatic sleep stage classification algorithm can be used to reduce the manual workload.However,there are still limitations when it encounters complicated and changeable clinical cases.The purpose of this paper is to develop an automatic sleep staging algorithm based on the characteristics of actual sleep data.In the proposed improved K-means clustering algorithm,points were selected as the initial centers by using a concept of density to avoid the randomness of the original K-means algorithm.Meanwhile,the cluster centers were updated according to the‘Three-Sigma Rule’during the iteration to abate the influence of the outliers.The proposed method was tested and analyzed on the overnight sleep data of the healthy persons and patients with sleep disorders after continuous positive airway pressure(CPAP)treatment.The automatic sleep stage classification results were compared with the visual inspection by qualified clinicians and the averaged accuracy reached 76%.With the analysis of morphological diversity of sleep data,it was proved that the proposed improved K-means algorithm was feasible and valid for clinical practice.

  3. Dynamic Clustering-Based Estimation of Missing Values in Mixed Type Data

    Science.gov (United States)

    Ayuyev, Vadim V.; Jupin, Joseph; Harris, Philip W.; Obradovic, Zoran

    The appropriate choice of a method for imputation of missing data becomes especially important when the fraction of missing values is large and the data are of mixed type. The proposed dynamic clustering imputation (DCI) algorithm relies on similarity information from shared neighbors, where mixed type variables are considered together. When evaluated on a public social science dataset of 46,043 mixed type instances with up to 33% missing values, DCI resulted in more than 20% improved imputation accuracy over Multiple Imputation, Predictive Mean Matching, Linear and Multilevel Regression, and Mean Mode Replacement methods. Data imputed by 6 methods were used for prediction tests by NB-Tree, Random Subset Selection and Neural Network-based classification models. In our experiments classification accuracy obtained using DCI-preprocessed data was much better than when relying on alternative imputation methods for data preprocessing.

  4. Alerts Visualization and Clustering in Network-based Intrusion Detection

    Energy Technology Data Exchange (ETDEWEB)

    Yang, Dr. Li [University of Tennessee; Gasior, Wade C [ORNL; Dasireddy, Swetha [University of Tennessee

    2010-04-01

    Today's Intrusion detection systems when deployed on a busy network overload the network with huge number of alerts. This behavior of producing too much raw information makes it less effective. We propose a system which takes both raw data and Snort alerts to visualize and analyze possible intrusions in a network. Then we present with two models for the visualization of clustered alerts. Our first model gives the network administrator with the logical topology of the network and detailed information of each node that involves its associated alerts and connections. In the second model, flocking model, presents the network administrator with the visual representation of IDS data in which each alert is represented in different color and the alerts with maximum similarity move together. This gives network administrator with the idea of detecting various of intrusions through visualizing the alert patterns.

  5. Constraints on Ωm and σ8 from the potential-based cluster temperature function

    Science.gov (United States)

    Angrick, Christian; Pace, Francesco; Bartelmann, Matthias; Roncarelli, Mauro

    2015-12-01

    The abundance of galaxy clusters is in principle a powerful tool to constrain cosmological parameters, especially Ωm and σ8, due to the exponential dependence in the high-mass regime. While the best observables are the X-ray temperature and luminosity, the abundance of galaxy clusters, however, is conventionally predicted as a function of mass. Hence, the intrinsic scatter and the uncertainties in the scaling relations between mass and either temperature or luminosity lower the reliability of galaxy clusters to constrain cosmological parameters. In this article, we further refine the X-ray temperature function for galaxy clusters by Angrick et al., which is based on the statistics of perturbations in the cosmic gravitational potential and proposed to replace the classical mass-based temperature function, by including a refined analytic merger model and compare the theoretical prediction to results from a cosmological hydrodynamical simulation. Although we find already a good agreement if we compare with a cluster temperature function based on the mass-weighted temperature, including a redshift-dependent scaling between mass-based and spectroscopic temperature yields even better agreement between theoretical model and numerical results. As a proof of concept, incorporating this additional scaling in our model, we constrain the cosmological parameters Ωm and σ8 from an X-ray sample of galaxy clusters and tentatively find agreement with the recent cosmic microwave background based results from the Planck mission at 1σ-level.

  6. A robust approach based on Weibull distribution for clustering gene expression data

    Directory of Open Access Journals (Sweden)

    Gong Binsheng

    2011-05-01

    Full Text Available Abstract Background Clustering is a widely used technique for analysis of gene expression data. Most clustering methods group genes based on the distances, while few methods group genes according to the similarities of the distributions of the gene expression levels. Furthermore, as the biological annotation resources accumulated, an increasing number of genes have been annotated into functional categories. As a result, evaluating the performance of clustering methods in terms of the functional consistency of the resulting clusters is of great interest. Results In this paper, we proposed the WDCM (Weibull Distribution-based Clustering Method, a robust approach for clustering gene expression data, in which the gene expressions of individual genes are considered as the random variables following unique Weibull distributions. Our WDCM is based on the concept that the genes with similar expression profiles have similar distribution parameters, and thus the genes are clustered via the Weibull distribution parameters. We used the WDCM to cluster three cancer gene expression data sets from the lung cancer, B-cell follicular lymphoma and bladder carcinoma and obtained well-clustered results. We compared the performance of WDCM with k-means and Self Organizing Map (SOM using functional annotation information given by the Gene Ontology (GO. The results showed that the functional annotation ratios of WDCM are higher than those of the other methods. We also utilized the external measure Adjusted Rand Index to validate the performance of the WDCM. The comparative results demonstrate that the WDCM provides the better clustering performance compared to k-means and SOM algorithms. The merit of the proposed WDCM is that it can be applied to cluster incomplete gene expression data without imputing the missing values. Moreover, the robustness of WDCM is also evaluated on the incomplete data sets. Conclusions The results demonstrate that our WDCM produces clusters

  7. Voxel-based clustered imaging by multiparameter diffusion tensor images for glioma grading.

    Science.gov (United States)

    Inano, Rika; Oishi, Naoya; Kunieda, Takeharu; Arakawa, Yoshiki; Yamao, Yukihiro; Shibata, Sumiya; Kikuchi, Takayuki; Fukuyama, Hidenao; Miyamoto, Susumu

    2014-01-01

    Gliomas are the most common intra-axial primary brain tumour; therefore, predicting glioma grade would influence therapeutic strategies. Although several methods based on single or multiple parameters from diagnostic images exist, a definitive method for pre-operatively determining glioma grade remains unknown. We aimed to develop an unsupervised method using multiple parameters from pre-operative diffusion tensor images for obtaining a clustered image that could enable visual grading of gliomas. Fourteen patients with low-grade gliomas and 19 with high-grade gliomas underwent diffusion tensor imaging and three-dimensional T1-weighted magnetic resonance imaging before tumour resection. Seven features including diffusion-weighted imaging, fractional anisotropy, first eigenvalue, second eigenvalue, third eigenvalue, mean diffusivity and raw T2 signal with no diffusion weighting, were extracted as multiple parameters from diffusion tensor imaging. We developed a two-level clustering approach for a self-organizing map followed by the K-means algorithm to enable unsupervised clustering of a large number of input vectors with the seven features for the whole brain. The vectors were grouped by the self-organizing map as protoclusters, which were classified into the smaller number of clusters by K-means to make a voxel-based diffusion tensor-based clustered image. Furthermore, we also determined if the diffusion tensor-based clustered image was really helpful for predicting pre-operative glioma grade in a supervised manner. The ratio of each class in the diffusion tensor-based clustered images was calculated from the regions of interest manually traced on the diffusion tensor imaging space, and the common logarithmic ratio scales were calculated. We then applied support vector machine as a classifier for distinguishing between low- and high-grade gliomas. Consequently, the sensitivity, specificity, accuracy and area under the curve of receiver operating characteristic

  8. Direct Reconstruction of CT-based Attenuation Correction Images for PET with Cluster-Based Penalties

    Science.gov (United States)

    Kim, Soo Mee; Alessio, Adam M.; De Man, Bruno; Asma, Evren; Kinahan, Paul E.

    2015-01-01

    Extremely low-dose CT acquisitions for the purpose of PET attenuation correction will have a high level of noise and biasing artifacts due to factors such as photon starvation. This work explores a priori knowledge appropriate for CT iterative image reconstruction for PET attenuation correction. We investigate the maximum a posteriori (MAP) framework with cluster-based, multinomial priors for the direct reconstruction of the PET attenuation map. The objective function for direct iterative attenuation map reconstruction was modeled as a Poisson log-likelihood with prior terms consisting of quadratic (Q) and mixture (M) distributions. The attenuation map is assumed to have values in 4 clusters: air+background, lung, soft tissue, and bone. Under this assumption, the MP was a mixture probability density function consisting of one exponential and three Gaussian distributions. The relative proportion of each cluster was jointly estimated during each voxel update of direct iterative coordinate decent (dICD) method. Noise-free data were generated from NCAT phantom and Poisson noise was added. Reconstruction with FBP (ramp filter) was performed on the noise-free (ground truth) and noisy data. For the noisy data, dICD reconstruction was performed with the combination of different prior strength parameters (β and γ) of Q- and M-penalties. The combined quadratic and mixture penalties reduces the RMSE by 18.7% compared to post-smoothed iterative reconstruction and only 0.7% compared to quadratic alone. For direct PET attenuation map reconstruction from ultra-low dose CT acquisitions, the combination of quadratic and mixture priors offers regularization of both variance and bias and is a potential method to derive attenuation maps with negligible patient dose. However, the small improvement in quantitative accuracy relative to the substantial increase in algorithm complexity does not currently justify the use of mixture-based PET attenuation priors for reconstruction of CT

  9. The role of clusters on the healing efficiency of a modified Zn based ionomer

    NARCIS (Netherlands)

    Vega Vega, J.M.; Van der Zwaag, S.; Garcia Espallargas, S.J.

    2013-01-01

    Poly(ethylene-co-methacrylic acid) (EMAA) ionomers have shown healing capabilities in both ballistic and static tests. In previous studies it was shown that the degree of crosslinking (clusters) affects (positively or negatively) the healing under impact tests. Moreover, it has also been reported

  10. ROC-based determination of the number of clusters for fMRI activation detection

    Science.gov (United States)

    Jahanian, Hesamoddin; Soltanian-Zadeh, Hamid; Hossein-Zadeh, Gholam A.; Siadat, Mohammad-Reza

    2004-05-01

    Fuzzy C-means (FCM), in spite of its potent advantages in exploratory analyze of functional magnetic resonance imaging (fMRI), suffers from limitations such as a priori determination of number of clusters, unknown statistical significance for the results, and instability of the results when it is applied on raw fMRI time series. Choosing different number of clusters, or thresholding the membership degree at different levels, lead to considerably different activation maps. However, research work for finding a standard index to determine the number of clusters has not yet succeeded. Using randomization, we developed a method to control false positive rate in FCM, which gives a meaningful statistical significance to the results. Making use of this novel method and an ROC-based cluster validity measure, we determined the optimal number of clusters. In this study, we applied the FCM on a feature space that takes the variability of hemodynamic response function into account (HRF-based feature space). The proposed method found the accurate number of clusters in simulated fMRI data. In addition, the proposed method generated excellent results for experimental fMRI data and showed a good reproducibility for determining the number of clusters.

  11. A Comparison Study of Validity Indices on Swarm-Intelligence-Based Clustering.

    Science.gov (United States)

    Rui Xu; Jie Xu; Wunsch, D C

    2012-08-01

    Swarm intelligence has emerged as a worthwhile class of clustering methods due to its convenient implementation, parallel capability, ability to avoid local minima, and other advantages. In such applications, clustering validity indices usually operate as fitness functions to evaluate the qualities of the obtained clusters. However, as the validity indices are usually data dependent and are designed to address certain types of data, the selection of different indices as the fitness functions may critically affect cluster quality. Here, we compare the performances of eight well-known and widely used clustering validity indices, namely, the Caliński-Harabasz index, the CS index, the Davies-Bouldin index, the Dunn index with two of its generalized versions, the I index, and the silhouette statistic index, on both synthetic and real data sets in the framework of differential-evolution-particle-swarm-optimization (DEPSO)-based clustering. DEPSO is a hybrid evolutionary algorithm of the stochastic optimization approach (differential evolution) and the swarm intelligence method (particle swarm optimization) that further increases the search capability and achieves higher flexibility in exploring the problem space. According to the experimental results, we find that the silhouette statistic index stands out in most of the data sets that we examined. Meanwhile, we suggest that users reach their conclusions not just based on only one index, but after considering the results of several indices to achieve reliable clustering structures.

  12. A Cooperative Learning-Based Clustering Approach to Lip Segmentation Without Knowing Segment Number.

    Science.gov (United States)

    Cheung, Yiu-Ming; Li, Meng; Peng, Qinmu; Chen, C L Philip

    2017-01-01

    It is usually hard to predetermine the true number of segments in lip segmentation. This paper, therefore, presents a clustering-based approach to lip segmentation without knowing the true segment number. The objective function in the proposed approach is a variant of the partition entropy (PE) and features that the coincident cluster centroids in pattern space can be equivalently substituted by one centroid with the function value unchanged. It is shown that the minimum of the proposed objective function can be reached provided that: 1) the number of positions occupied by cluster centroids in pattern space is equal to the true number of clusters and 2) these positions are coincident with the optimal cluster centroids obtained under PE criterion. In implementation, we first randomly initialize the clusters provided that the number of clusters is greater than or equal to the ground truth. Then, an iterative algorithm is utilized to minimize the proposed objective function. For each iterative step, not only is the winner, i.e., the centroid with the maximum membership degree, updated to adapt to the corresponding input data, but also the other centroids are adjusted with a specific cooperation strength, so that they are each close to the winner. Subsequently, the initial overpartition will be gradually faded out with the redundant centroids superposed over the convergence of the algorithm. Based upon the proposed algorithm, we present a lip segmentation scheme. Empirical studies have shown its efficacy in comparison with the existing methods.

  13. A Clustering-Based Automatic Transfer Function Design for Volume Visualization

    Directory of Open Access Journals (Sweden)

    Tianjin Zhang

    2016-01-01

    Full Text Available The two-dimensional transfer functions (TFs designed based on intensity-gradient magnitude (IGM histogram are effective tools for the visualization and exploration of 3D volume data. However, traditional design methods usually depend on multiple times of trial-and-error. We propose a novel method for the automatic generation of transfer functions by performing the affinity propagation (AP clustering algorithm on the IGM histogram. Compared with previous clustering algorithms that were employed in volume visualization, the AP clustering algorithm has much faster convergence speed and can achieve more accurate clustering results. In order to obtain meaningful clustering results, we introduce two similarity measurements: IGM similarity and spatial similarity. These two similarity measurements can effectively bring the voxels of the same tissue together and differentiate the voxels of different tissues so that the generated TFs can assign different optical properties to different tissues. Before performing the clustering algorithm on the IGM histogram, we propose to remove noisy voxels based on the spatial information of voxels. Our method does not require users to input the number of clusters, and the classification and visualization process is automatic and efficient. Experiments on various datasets demonstrate the effectiveness of the proposed method.

  14. Performance Test of Automated Photographic Photometer and Photometry of Cluster Byur 2

    Science.gov (United States)

    Cho, Dong-Hwan; Lee, See-Woo; Lee, Hyun-Gon

    1993-12-01

    The Automated Eichner Iris Photometer(AEIP) at the Korea Basic Science Center was tested for its function and the proper procedure for photographic photometry. The AEIP requires about three hours for reaching the electrical stability. When the iris is controlled automatically, the repeatability of density unit(DU) is accurate in the uncertainty of (0.0028~0.0048)DU. The iris reading is found to be accurate within the mean error of 0.05m, which could be reduced to 0.02m by the manual control. To check the applicability of the AEIP, each two photographic plates for UBV colors which were taken by Dupuy and Zukauskas(1976) for the open cluster Byur 2, were measured by using the AEIP, and the photographic magnitudes and colors of the stars in Byur 2 were determined, discussing the previous results.

  15. Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images

    Directory of Open Access Journals (Sweden)

    LI Jian-Wei

    2014-08-01

    Full Text Available On the basis of the cluster validity function based on geometric probability in literature [1, 2], propose a cluster analysis method based on geometric probability to process large amount of data in rectangular area. The basic idea is top-down stepwise refinement, firstly categories then subcategories. On all clustering levels, use the cluster validity function based on geometric probability firstly, determine clusters and the gathering direction, then determine the center of clustering and the border of clusters. Through TM remote sensing image classification examples, compare with the supervision and unsupervised classification in ERDAS and the cluster analysis method based on geometric probability in two-dimensional square which is proposed in literature 2. Results show that the proposed method can significantly improve the classification accuracy.

  16. A Spatial Division Clustering Method and Low Dimensional Feature Extraction Technique Based Indoor Positioning System

    Directory of Open Access Journals (Sweden)

    Yun Mo

    2014-01-01

    Full Text Available Indoor positioning systems based on the fingerprint method are widely used due to the large number of existing devices with a wide range of coverage. However, extensive positioning regions with a massive fingerprint database may cause high computational complexity and error margins, therefore clustering methods are widely applied as a solution. However, traditional clustering methods in positioning systems can only measure the similarity of the Received Signal Strength without being concerned with the continuity of physical coordinates. Besides, outage of access points could result in asymmetric matching problems which severely affect the fine positioning procedure. To solve these issues, in this paper we propose a positioning system based on the Spatial Division Clustering (SDC method for clustering the fingerprint dataset subject to physical distance constraints. With the Genetic Algorithm and Support Vector Machine techniques, SDC can achieve higher coarse positioning accuracy than traditional clustering algorithms. In terms of fine localization, based on the Kernel Principal Component Analysis method, the proposed positioning system outperforms its counterparts based on other feature extraction methods in low dimensionality. Apart from balancing online matching computational burden, the new positioning system exhibits advantageous performance on radio map clustering, and also shows better robustness and adaptability in the asymmetric matching problem aspect.

  17. Effective Social Relationship Measurement and Cluster Based Routing in Mobile Opportunistic Networks †

    Science.gov (United States)

    Zeng, Feng; Zhao, Nan; Li, Wenjia

    2017-01-01

    In mobile opportunistic networks, the social relationship among nodes has an important impact on data transmission efficiency. Motivated by the strong share ability of “circles of friends” in communication networks such as Facebook, Twitter, Wechat and so on, we take a real-life example to show that social relationships among nodes consist of explicit and implicit parts. The explicit part comes from direct contact among nodes, and the implicit part can be measured through the “circles of friends”. We present the definitions of explicit and implicit social relationships between two nodes, adaptive weights of explicit and implicit parts are given according to the contact feature of nodes, and the distributed mechanism is designed to construct the “circles of friends” of nodes, which is used for the calculation of the implicit part of social relationship between nodes. Based on effective measurement of social relationships, we propose a social-based clustering and routing scheme, in which each node selects the nodes with close social relationships to form a local cluster, and the self-control method is used to keep all cluster members always having close relationships with each other. A cluster-based message forwarding mechanism is designed for opportunistic routing, in which each node only forwards the copy of the message to nodes with the destination node as a member of the local cluster. Simulation results show that the proposed social-based clustering and routing outperforms the other classic routing algorithms. PMID:28498309

  18. Effective Social Relationship Measurement and Cluster Based Routing in Mobile Opportunistic Networks.

    Science.gov (United States)

    Zeng, Feng; Zhao, Nan; Li, Wenjia

    2017-05-12

    In mobile opportunistic networks, the social relationship among nodes has an important impact on data transmission efficiency. Motivated by the strong share ability of "circles of friends" in communication networks such as Facebook, Twitter, Wechat and so on, we take a real-life example to show that social relationships among nodes consist of explicit and implicit parts. The explicit part comes from direct contact among nodes, and the implicit part can be measured through the "circles of friends". We present the definitions of explicit and implicit social relationships between two nodes, adaptive weights of explicit and implicit parts are given according to the contact feature of nodes, and the distributed mechanism is designed to construct the "circles of friends" of nodes, which is used for the calculation of the implicit part of social relationship between nodes. Based on effective measurement of social relationships, we propose a social-based clustering and routing scheme, in which each node selects the nodes with close social relationships to form a local cluster, and the self-control method is used to keep all cluster members always having close relationships with each other. A cluster-based message forwarding mechanism is designed for opportunistic routing, in which each node only forwards the copy of the message to nodes with the destination node as a member of the local cluster. Simulation results show that the proposed social-based clustering and routing outperforms the other classic routing algorithms.

  19. Dynamic Characteristics Analysis and Stabilization of PV-Based Multiple Microgrid Clusters

    DEFF Research Database (Denmark)

    Zhao, Zhuoli; Yang, Ping; Wang, Yuewu

    2018-01-01

    As the penetration of PV generation increases, there is a growing operational demand on PV systems to participate in microgrid frequency regulation. It is expected that future distribution systems will consist of multiple microgrid clusters. However, interconnecting PV microgrids may lead to system...... interactions and instability. To date, no research work has been done to analyze the dynamic behavior and enhance the stability of microgrid clusters considering the dynamics of the PV primary sources and dc links. To fill this gap, this paper presents comprehensive modeling, analysis, and stabilization of PV......-based multiple microgrid clusters. A detailed small-signal model for PV-based microgrid clusters considering local adaptive dynamic droop control mechanism of the voltage-source PV system is developed. The complete dynamic model is then used to access and compare the dynamic characteristics of the single...

  20. K-means-clustering-based fiber nonlinearity equalization techniques for 64-QAM coherent optical communication system.

    Science.gov (United States)

    Zhang, Junfeng; Chen, Wei; Gao, Mingyi; Shen, Gangxiang

    2017-10-30

    In this work, we proposed two k-means-clustering-based algorithms to mitigate the fiber nonlinearity for 64-quadrature amplitude modulation (64-QAM) signal, the training-sequence assisted k-means algorithm and the blind k-means algorithm. We experimentally demonstrated the proposed k-means-clustering-based fiber nonlinearity mitigation techniques in 75-Gb/s 64-QAM coherent optical communication system. The proposed algorithms have reduced clustering complexity and low data redundancy and they are able to quickly find appropriate initial centroids and select correctly the centroids of the clusters to obtain the global optimal solutions for large k value. We measured the bit-error-ratio (BER) performance of 64-QAM signal with different launched powers into the 50-km single mode fiber and the proposed techniques can greatly mitigate the signal impairments caused by the amplified spontaneous emission noise and the fiber Kerr nonlinearity and improve the BER performance.

  1. Promoting STI testing among senior vocational students in Rotterdam, the Netherlands: effects of a cluster randomized study

    Directory of Open Access Journals (Sweden)

    Wolfers Mireille

    2011-12-01

    Full Text Available Abstract Background Adolescents are a risk group for acquiring sexually transmitted infections (STIs. In the Netherlands, senior vocational school students are particular at risk. However, STI test rates among adolescents are low and interventions that promote testing are scarce. To enhance voluntary STI testing, an intervention was designed and evaluated in senior vocational schools. The intervention combined classroom health education with sexual health services at the school site. The purpose of this study was to assess the combined and single effects on STI testing of health education and school-based sexual health services. Methods In a cluster-randomized study the intervention was evaluated in 24 schools, using three experimental conditions: 1 health education, 2 sexual health services; 3 both components; and a control group. STI testing was assessed by self reported behavior and registrations at regional sexual health services. Follow-up measurements were performed at 1, 3, and 6-9 months. Of 1302 students present at baseline, 739 (57% completed at least 1 follow-up measurement, of these students 472 (64% were sexually experienced, and considered to be susceptible for the intervention. Multi-level analyses were conducted. To perform analyses according to the principle of intention-to-treat, missing observations at follow-up on the outcome measure were imputed with multiple imputation techniques. Results were compared with the complete cases analysis. Results Sexually experienced students that received the combined intervention of health education and sexual health services reported more STI testing (29% than students in the control group (4% (OR = 4.3, p Conclusions Despite a low dose of intervention that was received by the students and a high attrition, we were able to show an intervention effect among sexually experienced students on STI testing. This study confirmed our hypothesis that offering health education to vocational students

  2. Classifying Aerosols Based on Fuzzy Clustering and Their Optical and Microphysical Properties Study in Beijing, China

    OpenAIRE

    Wenhao Zhang; Hui Xu; Fengjie Zheng

    2017-01-01

    Classification of Beijing aerosol is carried out based on clustering optical properties obtained from three Aerosol Robotic Network (AERONET) sites. The fuzzy c-mean (FCM) clustering algorithm is used to classify fourteen-year (2001–2014) observations, totally of 6,732 records, into six aerosol types. They are identified as fine particle nonabsorbing, two kinds of fine particle moderately absorbing (fine-MA1 and fine-MA2), fine particle highly absorbing, polluted dust, and desert dust aerosol...

  3. A Deep Learning Prediction Model Based on Extreme-Point Symmetric Mode Decomposition and Cluster Analysis

    OpenAIRE

    Li, Guohui; Zhang, Songling; Yang, Hong

    2017-01-01

    Aiming at the irregularity of nonlinear signal and its predicting difficulty, a deep learning prediction model based on extreme-point symmetric mode decomposition (ESMD) and clustering analysis is proposed. Firstly, the original data is decomposed by ESMD to obtain the finite number of intrinsic mode functions (IMFs) and residuals. Secondly, the fuzzy c-means is used to cluster the decomposed components, and then the deep belief network (DBN) is used to predict it. Finally, the reconstructed ...

  4. Base motif recognition and design of DNA templates for fluorescent silver clusters by machine learning.

    Science.gov (United States)

    Copp, Stacy M; Bogdanov, Petko; Debord, Mark; Singh, Ambuj; Gwinn, Elisabeth

    2014-09-03

    Discriminative base motifs within DNA templates for fluorescent silver clusters are identified using methods that combine large experimental data sets with machine learning tools for pattern recognition. Combining the discovery of certain multibase motifs important for determining fluorescence brightness with a generative algorithm, the probability of selecting DNA templates that stabilize fluorescent silver clusters is increased by a factor of >3. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Fuzzy ensemble clustering based on random projections for DNA microarray data analysis.

    Science.gov (United States)

    Avogadri, Roberto; Valentini, Giorgio

    2009-01-01

    Two major problems related the unsupervised analysis of gene expression data are represented by the accuracy and reliability of the discovered clusters, and by the biological fact that the boundaries between classes of patients or classes of functionally related genes are sometimes not clearly defined. The main goal of this work consists in the exploration of new strategies and in the development of new clustering methods to improve the accuracy and robustness of clustering results, taking into account the uncertainty underlying the assignment of examples to clusters in the context of gene expression data analysis. We propose a fuzzy ensemble clustering approach both to improve the accuracy of clustering results and to take into account the inherent fuzziness of biological and bio-medical gene expression data. We applied random projections that obey the Johnson-Lindenstrauss lemma to obtain several instances of lower dimensional gene expression data from the original high-dimensional ones, approximately preserving the information and the metric structure of the original data. Then we adopt a double fuzzy approach to obtain a consensus ensemble clustering, by first applying a fuzzy k-means algorithm to the different instances of the projected low-dimensional data and then by using a fuzzy t-norm to combine the multiple clusterings. Several variants of the fuzzy ensemble clustering algorithms are proposed, according to different techniques to combine the base clusterings and to obtain the final consensus clustering. We applied our proposed fuzzy ensemble methods to the gene expression analysis of leukemia, lymphoma, adenocarcinoma and melanoma patients, and we compared the results with other state of the art ensemble methods. Results show that in some cases, taking into account the natural fuzziness of the data, we can improve the discovery of classes of patients defined at bio-molecular level. The reduction of the dimension of the data, achieved through random

  6. Depth data research of GIS based on clustering analysis algorithm

    Science.gov (United States)

    Xiong, Yan; Xu, Wenli

    2018-03-01

    The data of GIS have spatial distribution. Geographic data has both spatial characteristics and attribute characteristics, and also changes with time. Therefore, the amount of data is very large. Nowadays, many industries and departments in the society are using GIS. However, without proper data analysis and mining scheme, GIS will not exert its maximum effectiveness and will waste a lot of data. In this paper, we use the geographic information demand of a national security department as the experimental object, combining the characteristics of GIS data, taking into account the characteristics of time, space, attributes and so on, and using cluster analysis algorithm. We further study the mining scheme for depth data, and get the algorithm model. This algorithm can automatically classify sample data, and then carry out exploratory analysis. The research shows that the algorithm model and the information mining scheme can quickly find hidden depth information from the surface data of GIS, thus improving the efficiency of the security department. This algorithm can also be extended to other fields.

  7. Particle Swarm Optimization and harmony search based clustering and routing in Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Veena Anand

    2017-01-01

    Full Text Available Wireless Sensor Networks (WSN has the disadvantage of limited and non-rechargeable energy resource in WSN creates a challenge and led to development of various clustering and routing algorithms. The paper proposes an approach for improving network lifetime by using Particle swarm optimization based clustering and Harmony Search based routing in WSN. So in this paper, global optimal cluster head are selected and Gateway nodes are introduced to decrease the energy consumption of the CH while sending aggregated data to the Base station (BS. Next, the harmony search algorithm based Local Search strategy finds best routing path for gateway nodes to the Base Station. Finally, the proposed algorithm is presented.

  8. An ant colony based resilience approach to cascading failures in cluster supply network

    Science.gov (United States)

    Wang, Yingcong; Xiao, Renbin

    2016-11-01

    Cluster supply chain network is a typical complex network and easily suffers cascading failures under disruption events, which is caused by the under-load of enterprises. Improving network resilience can increase the ability of recovery from cascading failures. Social resilience is found in ant colony and comes from ant's spatial fidelity zones (SFZ). Starting from the under-load failures, this paper proposes a resilience method to cascading failures in cluster supply chain network by leveraging on social resilience of ant colony. First, the mapping between ant colony SFZ and cluster supply chain network SFZ is presented. Second, a new cascading model for cluster supply chain network is constructed based on under-load failures. Then, the SFZ-based resilience method and index to cascading failures are developed according to ant colony's social resilience. Finally, a numerical simulation and a case study are used to verify the validity of the cascading model and the resilience method. Experimental results show that, the cluster supply chain network becomes resilient to cascading failures under the SFZ-based resilience method, and the cluster supply chain network resilience can be enhanced by improving the ability of enterprises to recover and adjust.

  9. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods.

    Directory of Open Access Journals (Sweden)

    Lovro Šubelj

    Full Text Available Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community.

  10. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods.

    Science.gov (United States)

    Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo

    2016-01-01

    Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community.

  11. Regional SAR Image Segmentation Based on Fuzzy Clustering with Gamma Mixture Model

    Science.gov (United States)

    Li, X. L.; Zhao, Q. H.; Li, Y.

    2017-09-01

    Most of stochastic based fuzzy clustering algorithms are pixel-based, which can not effectively overcome the inherent speckle noise in SAR images. In order to deal with the problem, a regional SAR image segmentation algorithm based on fuzzy clustering with Gamma mixture model is proposed in this paper. First, initialize some generating points randomly on the image, the image domain is divided into many sub-regions using Voronoi tessellation technique. Each sub-region is regarded as a homogeneous area in which the pixels share the same cluster label. Then, assume the probability of the pixel to be a Gamma mixture model with the parameters respecting to the cluster which the pixel belongs to. The negative logarithm of the probability represents the dissimilarity measure between the pixel and the cluster. The regional dissimilarity measure of one sub-region is defined as the sum of the measures of pixels in the region. Furthermore, the Markov Random Field (MRF) model is extended from pixels level to Voronoi sub-regions, and then the regional objective function is established under the framework of fuzzy clustering. The optimal segmentation results can be obtained by the solution of model parameters and generating points. Finally, the effectiveness of the proposed algorithm can be proved by the qualitative and quantitative analysis from the segmentation results of the simulated and real SAR images.

  12. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods

    Science.gov (United States)

    Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo

    2016-01-01

    Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community. PMID:27124610

  13. A parallel tempering based study of Coulombic explosion and identification of dissociating fragments in charged noble gas clusters.

    Science.gov (United States)

    Talukder, Srijeeta; Sen, Shrabani; Neogi, Soumya Ganguly; Chaudhury, Pinaki

    2013-10-28

    In this communication, we would like to test the feasibility of a parallel tempering based study of dissociation in dicationic noble gas clusters, namely, Ar(n)(2+), Kr(n)(2+), and Xe(n)(2+), where "n" is the size of the cluster units. We would like to find out the correct limit for sizes of each of these systems, above which the clusters stay intact as a single unit and does not dissociate into fragments by the process of Coulomb explosion. Moreover, we would also like to, for a specific case, i.e., Ar(n)(2+), study in detail the fragmentation patterns and point out the switchover from the non-fission way to the fission mechanism of dissociation. In all these calculations, we would like to analyse, how close we are in our predictions with that of experimental results. As a further check on the dissociating patterns found out by parallel tempering, we also conduct basin hopping based study on representative sizes of the clusters and find that parallel tempering, as used for this present work as an optimizer, is able to predict correct features when compared with other celebrated methods like the basin hopping algorithm.

  14. Gene microarray data analysis using parallel point-symmetry-based clustering.

    Science.gov (United States)

    Sarkar, Anasua; Maulik, Ujjwal

    2015-01-01

    Identification of co-expressed genes is the central goal in microarray gene expression analysis. Point-symmetry-based clustering is an important unsupervised learning technique for recognising symmetrical convex- or non-convex-shaped clusters. To enable fast clustering of large microarray data, we propose a distributed time-efficient scalable approach for point-symmetry-based K-Means algorithm. A natural basis for analysing gene expression data using symmetry-based algorithm is to group together genes with similar symmetrical expression patterns. This new parallel implementation also satisfies linear speedup in timing without sacrificing the quality of clustering solution on large microarray data sets. The parallel point-symmetry-based K-Means algorithm is compared with another new parallel symmetry-based K-Means and existing parallel K-Means over eight artificial and benchmark microarray data sets, to demonstrate its superiority, in both timing and validity. The statistical analysis is also performed to establish the significance of this message-passing-interface based point-symmetry K-Means implementation. We also analysed the biological relevance of clustering solutions.

  15. Space Launch System Base Heating Test: Experimental Operations & Results

    Science.gov (United States)

    Dufrene, Aaron; Mehta, Manish; MacLean, Matthew; Seaford, Mark; Holden, Michael

    2016-01-01

    NASA's Space Launch System (SLS) uses four clustered liquid rocket engines along with two solid rocket boosters. The interaction between all six rocket exhaust plumes will produce a complex and severe thermal environment in the base of the vehicle. This work focuses on a recent 2% scale, hot-fire SLS base heating test. These base heating tests are short-duration tests executed with chamber pressures near the full-scale values with gaseous hydrogen/oxygen engines and RSRMV analogous solid propellant motors. The LENS II shock tunnel/Ludwieg tube tunnel was used at or near flight duplicated conditions up to Mach 5. Model development was based on the Space Shuttle base heating tests with several improvements including doubling of the maximum chamber pressures and duplication of freestream conditions. Test methodology and conditions are presented, and base heating results from 76 runs are reported in non-dimensional form. Regions of high heating are identified and comparisons of various configuration and conditions are highlighted. Base pressure and radiometer results are also reported.

  16. The Impact of Visual Field Clusters on Performance-based Measures and Vision-Related Quality of Life in Patients With Glaucoma.

    Science.gov (United States)

    Sun, Yi; Lin, Clarissa; Waisbourd, Michael; Ekici, Feyzahan; Erdem, Elif; Wizov, Sheryl S; Hark, Lisa A; Spaeth, George L

    2016-03-01

    To investigate how visual field (VF) clusters affect performance-based measures of the ability to perform activities of daily living and subjective measures of vision-related quality of life (QoL) in patients with glaucoma. Prospective, cross-sectional study. setting: Institutional - Wills Eye Hospital. 322 eyes of 161 patients with moderate-stage glaucoma. VF tests were conducted using the Humphrey 24-2 Swedish Interactive Thresholding Algorithm standard perimeter. The VFs of each patient were divided into 5 clusters: nasal, temporal, central, paracentral, and peripheral. The score for each cluster was the averaged total deviation scores of all tested points within the cluster. Each cluster score was correlated with performance-based measures of visual function and subjective assessment of vision-related QoL. The Compressed Assessment of Ability Related to Vision, the National Eye Institute Visual Functioning Questionnaire 25 (NEI VFQ-25), and the Modified Glaucoma Symptom Scale. The central VF cluster in the better eye was positively correlated with all Compressed Assessment of Ability Related to Vision (performance-based measure) subscales. The strongest correlation for the better eye was between the central VF cluster and total Compressed Assessment of Ability Related to Vision score (0.39, P < .001). The inferior VF hemisphere in both eyes was positively correlated with most Compressed Assessment of Ability Related to Vision subscales. Central VF clusters in the better eye were positively correlated with a majority of the NEI VFQ-25 subscales. There were no significant correlations between VF clusters and Modified Glaucoma Symptom Scale subscales. Scores of central VF defects in the better eye and inferior hemisphere defects in both eyes were positively correlated with performance-based measures of the ability to perform activities of daily living. Glaucoma patients with central defects in the better eye were more likely to have reduced scores on assessments

  17. Nationwide Registry-Based Analysis of Cancer Clustering Detects Strong Familial Occurrence of Kaposi Sarcoma

    Science.gov (United States)

    Vahteristo, Pia; Patama, Toni; Li, Yilong; Saarinen, Silva; Kilpivaara, Outi; Pitkänen, Esa; Knekt, Paul; Laaksonen, Maarit; Artama, Miia; Lehtonen, Rainer; Aaltonen, Lauri A.; Pukkala, Eero

    2013-01-01

    Many cancer predisposition syndromes are rare or have incomplete penetrance, and traditional epidemiological tools are not well suited for their detection. Here we have used an approach that employs the entire population based data in the Finnish Cancer Registry (FCR) for analyzing familial aggregation of all types of cancer, in order to find evidence for previously unrecognized cancer susceptibility conditions. We performed a systematic clustering of 878,593 patients in FCR based on family name at birth, municipality of birth, and tumor type, diagnosed between years 1952 and 2011. We also estimated the familial occurrence of the tumor types using cluster score that reflects the proportion of patients belonging to the most significant clusters compared to all patients in Finland. The clustering effort identified 25,910 birth name-municipality based clusters representing 183 different tumor types characterized by topography and morphology. We produced information about familial occurrence of hundreds of tumor types, and many of the tumor types with high cluster score represented known cancer syndromes. Unexpectedly, Kaposi sarcoma (KS) also produced a very high score (cluster score 1.91, p-value <0.0001). We verified from population records that many of the KS patients forming the clusters were indeed close relatives, and identified one family with five affected individuals in two generations and several families with two first degree relatives. Our approach is unique in enabling systematic examination of a national epidemiological database to derive evidence of aberrant familial aggregation of all tumor types, both common and rare. It allowed effortless identification of families displaying features of both known as well as potentially novel cancer predisposition conditions, including striking familial aggregation of KS. Further work with high-throughput methods should elucidate the molecular basis of the potentially novel predisposition conditions found in this

  18. Atomic Action Refinement in Model Based Testing

    NARCIS (Netherlands)

    van der Bijl, H.M.; Rensink, Arend; Tretmans, G.J.

    2007-01-01

    In model based testing (MBT) test cases are derived from a specification of the system that we want to test. In general the specification is more abstract than the implementation. This may result in 1) test cases that are not executable, because their actions are too abstract (the implementation

  19. Model-based testing for software safety

    NARCIS (Netherlands)

    Gurbuz, Havva Gulay; Tekinerdogan, Bedir

    2017-01-01

    Testing safety-critical systems is crucial since a failure or malfunction may result in death or serious injuries to people, equipment, or environment. An important challenge in testing is the derivation of test cases that can identify the potential faults. Model-based testing adopts models of a

  20. Unit root tests based on M estimators

    NARCIS (Netherlands)

    Lucas, André

    1995-01-01

    This paper considers unit root tests based on M estimators. The asymptotic theory for these tests is developed. It is shown how the asymptotic distributions of the tests depend on nuisance parameters and how tests can be constructed that are invariant to these parameters. It is also shown that a

  1. DCT-Yager FNN: a novel Yager-based fuzzy neural network with the discrete clustering technique.

    Science.gov (United States)

    Singh, A; Quek, C; Cho, S Y

    2008-04-01

    superior performance. Extensive experiments have been conducted to test the effectiveness of these two networks, using various clustering algorithms. It follows that the SDCT and UDCT clustering algorithms are particularly suited to networks based on the Yager inference rule.

  2. Novel Biological Approaches for Testing the Contributions of Single DSBs and DSB Clusters to the Biological Effects of High LET Radiation.

    Science.gov (United States)

    Mladenova, Veronika; Mladenov, Emil; Iliakis, George

    2016-01-01

    The adverse biological effects of ionizing radiation (IR) are commonly attributed to the generation of DNA double-strand breaks (DSBs). IR-induced DSBs are generated by clusters of ionizations, bear damaged terminal nucleotides, and frequently comprise base damages and single-strand breaks in the vicinity generating a unique DNA damage-clustering effect that increases DSB "complexity." The number of ionizations in clusters of different radiation modalities increases with increasing linear energy transfer (LET), and is thought to determine the long-known LET-dependence of the relative biological effectiveness (RBE). Multiple ionizations may also lead to the formation of DSB clusters, comprising two or more DSBs that destabilize chromatin further and compromise overall processing. DSB complexity and DSB-cluster formation are increasingly considered in the development of mathematical models of radiation action, which are then "tested" by fitting available experimental data. Despite a plethora of such mathematical models the ultimate goal, i.e., the "a priori" prediction of the radiation effect, has not yet been achieved. The difficulty partly arises from unsurmountable difficulties in testing the fundamental assumptions of such mathematical models in defined biological model systems capable of providing conclusive answers. Recently, revolutionary advances in methods allowing the generation of enzymatic DSBs at random or in well-defined locations in the genome, generate unique testing opportunities for several key assumptions frequently fed into mathematical modeling - including the role of DSB clusters in the overall effect. Here, we review the problematic of DSB-cluster formation in radiation action and present novel biological technologies that promise to revolutionize the way we address the biological consequences of such lesions. We describe new ways of exploiting the I-SceI endonuclease to generate DSB-clusters at random locations in the genome and describe the

  3. FRCA: a fuzzy relevance-based cluster head selection algorithm for wireless mobile ad-hoc sensor networks.

    Science.gov (United States)

    Lee, Chongdeuk; Jeong, Taegwon

    2011-01-01

    Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clustering algorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive Clustering Algorithm (WACA), and the Scenario-based Clustering Algorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms.

  4. Management of Energy Consumption on Cluster Based Routing Protocol for MANET

    Science.gov (United States)

    Hosseini-Seno, Seyed-Amin; Wan, Tat-Chee; Budiarto, Rahmat; Yamada, Masashi

    The usage of light-weight mobile devices is increasing rapidly, leading to demand for more telecommunication services. Consequently, mobile ad hoc networks and their applications have become feasible with the proliferation of light-weight mobile devices. Many protocols have been developed to handle service discovery and routing in ad hoc networks. However, the majority of them did not consider one critical aspect of this type of network, which is the limited of available energy in each node. Cluster Based Routing Protocol (CBRP) is a robust/scalable routing protocol for Mobile Ad hoc Networks (MANETs) and superior to existing protocols such as Ad hoc On-demand Distance Vector (AODV) in terms of throughput and overhead. Therefore, based on this strength, methods to increase the efficiency of energy usage are incorporated into CBRP in this work. In order to increase the stability (in term of life-time) of the network and to decrease the energy consumption of inter-cluster gateway nodes, an Enhanced Gateway Cluster Based Routing Protocol (EGCBRP) is proposed. Three methods have been introduced by EGCBRP as enhancements to the CBRP: improving the election of cluster Heads (CHs) in CBRP which is based on the maximum available energy level, implementing load balancing for inter-cluster traffic using multiple gateways, and implementing sleep state for gateway nodes to further save the energy. Furthermore, we propose an Energy Efficient Cluster Based Routing Protocol (EECBRP) which extends the EGCBRP sleep state concept into all idle member nodes, excluding the active nodes in all clusters. The experiment results show that the EGCBRP decreases the overall energy consumption of the gateway nodes up to 10% and the EECBRP reduces the energy consumption of the member nodes up to 60%, both of which in turn contribute to stabilizing the network.

  5. Efficient Regression Testing Based on Test History: An Industrial Evaluation

    OpenAIRE

    Ekelund, Edward Dunn; Engström, Emelie

    2015-01-01

    Due to changes in the development practices at Axis Communications, towards continuous integration, faster regression testing feedback is needed. The current automated regression test suite takes approximately seven hours to run which prevents developers from integrating code changes several times a day as preferred. Therefore we want to implement a highly selective yet accurate regression testing strategy. Traditional code coverage based techniques are not applicable due to the size and comp...

  6. An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network

    Directory of Open Access Journals (Sweden)

    C. Vimalarani

    2016-01-01

    Full Text Available Wireless Sensor Network (WSN is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption.

  7. An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network.

    Science.gov (United States)

    Vimalarani, C; Subramanian, R; Sivanandam, S N

    2016-01-01

    Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption.

  8. Intelligent Control of the Complex Technology Process Based on Adaptive Pattern Clustering and Feature Map

    Directory of Open Access Journals (Sweden)

    Wushan Cheng

    2008-01-01

    Full Text Available A kind of fuzzy neural networks (FNNs based on adaptive pattern clustering and feature map (APCFM is proposed to improve the property of the large delay and time varying of the sintering process. By using the density clustering and learning vector quantization (LVQ, the sintering process is divided automatically into subclasses which have similar clustering center and labeled fitting number. Then these labeled subclass samples are taken into fuzzy neural network (FNN to be trained; this network is used to solve the prediction problem of the burning through point (BTP. Using the 707 groups of actual training process data and the FNN to train APCFM algorithm, experiments prove that the system has stronger robustness and wide generality in clustering analysis and feature extraction.

  9. Analysis of space payload operation modes based on divide-and-conquer clustering

    Directory of Open Access Journals (Sweden)

    Si Feng

    2016-01-01

    Full Text Available With the development of space electronic technology, the space payload operation modes are more and more complex, and manual interpretation is prone to errors for much workload. Generally the space payload’s operation modes are reflected by its telemetry data. By analysing the characteristics of the payload telemetry data, it is proposed an automatic analysis method of payload operation modes based on divide–and–conquer clustering. The clustering method combines division and incremental clustering. The principle of the method is introduced and the method is validated using the actual payload telemetry data. Furthermore the improved method is proposed to the problems encountered. Experimental results show that divide–and–conquer clustering method has the feature of calculation simple and classification accurate, when applied to the classification of payload operation modes. Furthermore this method can be applied to the other areas of payload data processing by extending the method.

  10. On the Power and Limits of Sequence Similarity Based Clustering of Proteins Into Families

    DEFF Research Database (Denmark)

    Wiwie, Christian; Röttger, Richard

    2017-01-01

    important to also unravel the proteomic repertoire of an organism. A classical computational approach for detecting protein families is a sequence-based similarity calculation coupled with a subsequent cluster analysis. In this work we have intensively analyzed various clustering tools on a large scale. We...... used the data to investigate the behavior of the tools' parameters underlining the diversity of the protein families. Furthermore, we trained regression models for predicting the expected performance of a clustering tool for an unknown data set and aimed to also suggest optimal parameters...... in an automated fashion. Our analysis demonstrates the benefits and limitations of the clustering of proteins with low sequence similarity indicating that each protein family requires its own distinct set of tools and parameters. All results, a tool prediction service, and additional supporting material is also...

  11. Innovative Development of Building Materials Industry of the Region Based on the Cluster Approach

    Directory of Open Access Journals (Sweden)

    Mottaeva Asiiat

    2016-01-01

    Full Text Available The article discusses issues of innovative development of building materials industry of the region based on the cluster approach. Determined the significance of regional cluster development of the industry of construction materials as the effective implementation of the innovative breakthrough of the region as an important part of strategies for strengthening innovation activities may be to support the formation and development of cluster structures. Analyses the current situation with innovation in the building materials industry of the region based on the cluster approach. In the course of the study revealed a direct correlation between involvement in innovative activities on a cluster basis, and the level of development of industry of construction materials. The conducted research allowed identifying the factors that determine the innovation process, systematization and classification which determine the sustainable functioning of the building materials industry in the period of active innovation. The proposed grouping of innovations for the construction industry taking into account industry-specific characteristics that reflect modern trends of scientific and technological progress in construction. Significance of the study lies in the fact that the proposals and practical recommendations can be used in the formation mechanism of innovative development of building materials industry and the overall regional construction complex of Russian regions by creating clusters of construction.

  12. Priority Based Congestion Control Dynamic Clustering Protocol in Mobile Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    R. Beulah Jayakumari

    2015-01-01

    Full Text Available Wireless sensor network is widely used to monitor natural phenomena because natural disaster has globally increased which causes significant loss of life, economic setback, and social development. Saving energy in a wireless sensor network (WSN is a critical factor to be considered. The sensor nodes are deployed to sense, compute, and communicate alerts in a WSN which are used to prevent natural hazards. Generally communication consumes more energy than sensing and computing; hence cluster based protocol is preferred. Even with clustering, multiclass traffic creates congested hotspots in the cluster, thereby causing packet loss and delay. In order to conserve energy and to avoid congestion during multiclass traffic a novel Priority Based Congestion Control Dynamic Clustering (PCCDC protocol is developed. PCCDC is designed with mobile nodes which are organized dynamically into clusters to provide complete coverage and connectivity. PCCDC computes congestion at intra- and intercluster level using linear and binary feedback method. Each mobile node within the cluster has an appropriate queue model for scheduling prioritized packet during congestion without drop or delay. Simulation results have proven that packet drop, control overhead, and end-to-end delay are much lower in PCCDC which in turn significantly increases packet delivery ratio, network lifetime, and residual energy when compared with PASCC protocol.

  13. An Adaptive Density-Based Time Series Clustering Algorithm: A Case Study on Rainfall Patterns

    Directory of Open Access Journals (Sweden)

    Xiaomi Wang

    2016-11-01

    Full Text Available Current time series clustering algorithms fail to effectively mine clustering distribution characteristics of time series data without sufficient prior knowledge. Furthermore, these algorithms fail to simultaneously consider the spatial attributes, non-spatial time series attribute values, and non-spatial time series attribute trends. This paper proposes an adaptive density-based time series clustering (DTSC algorithm that simultaneously considers the three above-mentioned attributes to relieve these limitations. In this algorithm, the Delaunay triangulation is first utilized in combination with particle swarm optimization (PSO to adaptively obtain objects with similar spatial attributes. An improved density-based clustering strategy is then adopted to detect clusters with similar non-spatial time series attribute values and time series attribute trends. The effectiveness and efficiency of the DTSC algorithm are validated by experiments on simulated datasets and real applications. The results indicate that the proposed DTSC algorithm effectively detects time series clusters with arbitrary shapes and similar attributes and densities while considering noises.

  14. Chinese Text Clustering for Topic Detection Based on Word Pattern Relation

    Science.gov (United States)

    Yang, Yen-Ju; Yu, Su-Hsin

    This research adopt the method of word expansion to compose relevant features into the same semantic concept, then conduct the corresponding documents to concept clusters, and finally merge the concepts with common documents into document clusters. We expect the mechanism, the use of semantic concept to form a feature index, can reduce the problems of polysemy and synonymy. The frequent two or three sequent nouns in the same sentence are used to form a key pattern to replace the keyword as the feature of the text. The distributive strength of key patterns is measured by Pattern Frequency, Pattern Frequency-Inverse Document Frequency, Conditional Probability, Mutual Information, and Association Norm. According to the strength the agglomerate hierarchical clustering technique is applied to cluster these key patterns into semantic concepts. Then, based on the common documents between concepts, several semantic concepts are merged to a group, in which the corresponding text will be considered as topic-related. The experimental results show that our proposed text clustering based on five strength measures of key patterns are all better than the traditional VSM clustering. PFIDF is the best in average F-measure, 97.5%.

  15. A Dirichlet Process Mixture Based Name Origin Clustering and Alignment Model for Transliteration

    Directory of Open Access Journals (Sweden)

    Chunyue Zhang

    2015-01-01

    Full Text Available In machine transliteration, it is common that the transliterated names in the target language come from multiple language origins. A conventional maximum likelihood based single model can not deal with this issue very well and often suffers from overfitting. In this paper, we exploit a coupled Dirichlet process mixture model (cDPMM to address overfitting and names multiorigin cluster issues simultaneously in the transliteration sequence alignment step over the name pairs. After the alignment step, the cDPMM clusters name pairs into many groups according to their origin information automatically. In the decoding step, in order to use the learned origin information sufficiently, we use a cluster combination method (CCM to build clustering-specific transliteration models by combining small clusters into large ones based on the perplexities of name language and transliteration model, which makes sure each origin cluster has enough data for training a transliteration model. On the three different Western-Chinese multiorigin names corpora, the cDPMM outperforms two state-of-the-art baseline models in terms of both the top-1 accuracy and mean F-score, and furthermore the CCM significantly improves the cDPMM.

  16. Trend analysis using non-stationary time series clustering based on the finite element method

    Science.gov (United States)

    Gorji Sefidmazgi, M.; Sayemuzzaman, M.; Homaifar, A.; Jha, M. K.; Liess, S.

    2014-05-01

    In order to analyze low-frequency variability of climate, it is useful to model the climatic time series with multiple linear trends and locate the times of significant changes. In this paper, we have used non-stationary time series clustering to find change points in the trends. Clustering in a multi-dimensional non-stationary time series is challenging, since the problem is mathematically ill-posed. Clustering based on the finite element method (FEM) is one of the methods that can analyze multidimensional time series. One important attribute of this method is that it is not dependent on any statistical assumption and does not need local stationarity in the time series. In this paper, it is shown how the FEM-clustering method can be used to locate change points in the trend of temperature time series from in situ observations. This method is applied to the temperature time series of North Carolina (NC) and the results represent region-specific climate variability despite higher frequency harmonics in climatic time series. Next, we investigated the relationship between the climatic indices with the clusters/trends detected based on this clustering method. It appears that the natural variability of climate change in NC during 1950-2009 can be explained mostly by AMO and solar activity.

  17. Progressive Amalgamation of Building Clusters for Map Generalization Based on Scaling Subgroups

    Directory of Open Access Journals (Sweden)

    Xianjin He

    2018-03-01

    Full Text Available Map generalization utilizes transformation operations to derive smaller-scale maps from larger-scale maps, and is a key procedure for the modelling and understanding of geographic space. Studies to date have largely applied a fixed tolerance to aggregate clustered buildings into a single object, resulting in the loss of details that meet cartographic constraints and may be of importance for users. This study aims to develop a method that amalgamates clustered buildings gradually without significant modification of geometry, while preserving the map details as much as possible under cartographic constraints. The amalgamation process consists of three key steps. First, individual buildings are grouped into distinct clusters by using the graph-based spatial clustering application with random forest (GSCARF method. Second, building clusters are decomposed into scaling subgroups according to homogeneity with regard to the mean distance of subgroups. Thus, hierarchies of building clusters can be derived based on scaling subgroups. Finally, an amalgamation operation is progressively performed from the bottom-level subgroups to the top-level subgroups using the maximum distance of each subgroup as the amalgamating tolerance instead of using a fixed tolerance. As a consequence of this step, generalized intermediate scaling results are available, which can form the multi-scale representation of buildings. The experimental results show that the proposed method can generate amalgams with correct details, statistical area balance and orthogonal shape while satisfying cartographic constraints (e.g., minimum distance and minimum area.

  18. Tin-oxo clusters based on aryl arsonate anions.

    Science.gov (United States)

    Xie, Yun-Peng; Yang, Jin; Ma, Jian-Fang; Zhang, Lai-Ping; Song, Shu-Yan; Su, Zhong-Min

    2008-01-01

    Reactions of Ph(3)SnOH or Ph3SnCl with aryl arsonic acids RAsO3H2, where R=C6H5 (1), 2-NH2C6H4 (2), 4-NH2C6H4 (3), 2-NO2C6H4 (4), 3-NO2C6H4 (5), 4-NO2C6H4 (6), 3-NO2-4-OHC6H3 (7), 2-ClC6H4 (8) and 2,4-Cl2C6H3 (9), gave 18 Sn-O cluster compounds. These compounds can be classified into four types: type A: [{(PhSn)3(RAsO3)3(mu3-O)(OH)(R'O)2}2Sn] (R=C6H5, 2-NH2C6H4, 4-NH2C6H4, 2-NO2C6H4, 3-NO2C6H4, 2-ClC6H4, 2,4-Cl2C6H3, and 3-NO2-4-OHC6H3; R'=Me or Et); type B: [{(PhSn)3(RAsO3)(2)(RAsO3H)(mu3-O)(R'O)2}2] (R=4-NO2C6H4, R'=Me); type C: [{(PhSn)3(RAsO3)3(mu3-O)(R'O)3}2Sn] (R=2,4-Cl2C6H3, R'=Me); type D: [{Sn3Cl3(mu3-O)(R'O)3}(2)(RAsO3)4] (R=2-NO2C6H4 and 4-NO2-C6H4; R'=Me or Et). Structures of types A and B contain [Sn3(mu3-O)(mu2-OR')2] building blocks, while in types C and D the stannoxane cores are built from two [Sn3(mu3-O)(mu2-OR')3] building blocks. The reactions proceeded with partial or complete dearylation of the triphenyltin precursor. These various structural forms are realized by subtle changes in the nature of the organotin precursors and aryl arsonic acids. The syntheses, structures, and structural interrelationship of these organostannoxanes are discussed.

  19. Advances in Bayesian Model Based Clustering Using Particle Learning

    Energy Technology Data Exchange (ETDEWEB)

    Merl, D M

    2009-11-19

    implementation of Carvalho et al that allow us to retain the computational advantages of particle learning while improving the suitability of the methodology to the analysis of streaming data and simultaneously facilitating the real time discovery of latent cluster structures. Section 4 demonstrates our methodological enhancements in the context of several simulated and classical data sets, showcasing the use of particle learning methods for online anomaly detection, label generation, drift detection, and semi-supervised classification, none of which would be achievable through a standard MCMC approach. Section 5 concludes with a discussion of future directions for research.

  20. a Web-Based Interactive Platform for Co-Clustering Spatio-Temporal Data

    Science.gov (United States)

    Wu, X.; Poorthuis, A.; Zurita-Milla, R.; Kraak, M.-J.

    2017-09-01

    Since current studies on clustering analysis mainly focus on exploring spatial or temporal patterns separately, a co-clustering algorithm is utilized in this study to enable the concurrent analysis of spatio-temporal patterns. To allow users to adopt and adapt the algorithm for their own analysis, it is integrated within the server side of an interactive web-based platform. The client side of the platform, running within any modern browser, is a graphical user interface (GUI) with multiple linked visualizations that facilitates the understanding, exploration and interpretation of the raw dataset and co-clustering results. Users can also upload their own datasets and adjust clustering parameters within the platform. To illustrate the use of this platform, an annual temperature dataset from 28 weather stations over 20 years in the Netherlands is used. After the dataset is loaded, it is visualized in a set of linked visualizations: a geographical map, a timeline and a heatmap. This aids the user in understanding the nature of their dataset and the appropriate selection of co-clustering parameters. Once the dataset is processed by the co-clustering algorithm, the results are visualized in the small multiples, a heatmap and a timeline to provide various views for better understanding and also further interpretation. Since the visualization and analysis are integrated in a seamless platform, the user can explore different sets of co-clustering parameters and instantly view the results in order to do iterative, exploratory data analysis. As such, this interactive web-based platform allows users to analyze spatio-temporal data using the co-clustering method and also helps the understanding of the results using multiple linked visualizations.

  1. Method of Selection of Bacteria Antibiotic Resistance Genes Based on Clustering of Similar Nucleotide Sequences.

    Science.gov (United States)

    Balashov, I S; Naumov, V A; Borovikov, P I; Gordeev, A B; Dubodelov, D V; Lyubasovskaya, L A; Rodchenko, Yu V; Bystritskii, A A; Aleksandrova, N V; Trofimov, D Yu; Priputnevich, T V

    2017-10-01

    A new method for selection of bacterium antibiotic resistance genes is proposed and tested for solving the problems related to selection of primers for PCR assay. The method implies clustering of similar nucleotide sequences and selection of group primers for all genes of each cluster. Clustering of resistance genes for six groups of antibiotics (aminoglycosides, β-lactams, fluoroquinolones, glycopeptides, macrolides and lincosamides, and fusidic acid) was performed. The method was tested for 81 strains of bacteria of different genera isolated from patients (K. pneumoniae, Staphylococcus spp., S. agalactiae, E. faecalis, E. coli, and G. vaginalis). The results obtained by us are comparable to those in the selection of individual genes; this allows reducing the number of primers necessary for maximum coverage of the known antibiotic resistance genes during PCR analysis.

  2. Bearing performance degradation assessment based on a combination of empirical mode decomposition and k-medoids clustering

    Science.gov (United States)

    Rai, Akhand; Upadhyay, S. H.

    2017-09-01

    Bearing is the most critical component in rotating machinery since it is more susceptible to failure. The monitoring of degradation in bearings becomes of great concern for averting the sudden machinery breakdown. In this study, a novel method for bearing performance degradation assessment (PDA) based on an amalgamation of empirical mode decomposition (EMD) and k-medoids clustering is encouraged. The fault features are extracted from the bearing signals using the EMD process. The extracted features are then subjected to k-medoids based clustering for obtaining the normal state and failure state cluster centres. A confidence value (CV) curve based on dissimilarity of the test data object to the normal state is obtained and employed as the degradation indicator for assessing the health of bearings. The proposed outlook is applied on the vibration signals collected in run-to-failure tests of bearings to assess its effectiveness in bearing PDA. To validate the superiority of the suggested approach, it is compared with commonly used time-domain features RMS and kurtosis, well-known fault diagnosis method envelope analysis (EA) and existing PDA classifiers i.e. self-organizing maps (SOM) and Fuzzy c-means (FCM). The results demonstrate that the recommended method outperforms the time-domain features, SOM and FCM based PDA in detecting the early stage degradation more precisely. Moreover, EA can be used as an accompanying method to confirm the early stage defect detected by the proposed bearing PDA approach. The study shows the potential application of k-medoids clustering as an effective tool for PDA of bearings.

  3. HIV self-testing among female sex workers in Zambia: A cluster randomized controlled trial.

    Directory of Open Access Journals (Sweden)

    Michael M Chanda

    2017-11-01

    Full Text Available HIV self-testing (HIVST may play a role in addressing gaps in HIV testing coverage and as an entry point for HIV prevention services. We conducted a cluster randomized trial of 2 HIVST distribution mechanisms compared to the standard of care among female sex workers (FSWs in Zambia.Trained peer educators in Kapiri Mposhi, Chirundu, and Livingstone, Zambia, each recruited 6 FSW participants. Peer educator-FSW groups were randomized to 1 of 3 arms: (1 delivery (direct distribution of an oral HIVST from the peer educator, (2 coupon (a coupon for collection of an oral HIVST from a health clinic/pharmacy, or (3 standard-of-care HIV testing. Participants in the 2 HIVST arms received 2 kits: 1 at baseline and 1 at 10 weeks. The primary outcome was any self-reported HIV testing in the past month at the 1- and 4-month visits, as HIVST can replace other types of HIV testing. Secondary outcomes included linkage to care, HIVST use in the HIVST arms, and adverse events. Participants completed questionnaires at 1 and 4 months following peer educator interventions. In all, 965 participants were enrolled between September 16 and October 12, 2016 (delivery, N = 316; coupon, N = 329; standard of care, N = 320; 20% had never tested for HIV. Overall HIV testing at 1 month was 94.9% in the delivery arm, 84.4% in the coupon arm, and 88.5% in the standard-of-care arm (delivery versus standard of care risk ratio [RR] = 1.07, 95% CI 0.99-1.15, P = 0.10; coupon versus standard of care RR = 0.95, 95% CI 0.86-1.05, P = 0.29; delivery versus coupon RR = 1.13, 95% CI 1.04-1.22, P = 0.005. Four-month rates were 84.1% for the delivery arm, 79.8% for the coupon arm, and 75.1% for the standard-of-care arm (delivery versus standard of care RR = 1.11, 95% CI 0.98-1.27, P = 0.11; coupon versus standard of care RR = 1.06, 95% CI 0.92-1.22, P = 0.42; delivery versus coupon RR = 1.05, 95% CI 0.94-1.18, P = 0.40. At 1 month, the majority of HIV tests were self-tests (88.4%. HIV self-test

  4. Testing Secondary Models for the Origin of Radio Mini-Halos in Galaxy Clusters

    Science.gov (United States)

    ZuHone, J. A.; Brunetti, G.; Giacintucci, S.; Markevitch, M.

    2015-03-01

    We present an MHD simulation of the emergence of a radio minihalo in a galaxy cluster core in a “secondary” model, where the source of the synchrotron-emitting electrons is hadronic interactions between cosmic-ray protons with the thermal intracluster gas, an alternative to the “reacceleration model” where the cosmic ray electrons are reaccelerated by turbulence induced by core sloshing, which we discussed in an earlier work. We follow the evolution of cosmic-ray electron spectra and their radio emission using passive tracer particles, taking into account the time-dependent injection of electrons from hadronic interactions and their energy losses. We find that secondary electrons in a sloshing cluster core can generate diffuse synchrotron emission with luminosity and extent similar to observed radio minihalos. However, we also find important differences with our previous work. We find that the drop in radio emission at cold fronts is less prominent than that in our reacceleration-based simulations, indicating that in this flavor of the secondary model the emission is more spatially extended than in some observed minihalos. We also explore the effect of rapid changes in the magnetic field on the radio spectrum. While the resulting spectra in some regions are steeper than expected from stationary conditions, the change is marginal, with differences in the synchrotron spectral index of {Δ }α ≲ 0.15-0.25, depending on the frequency band. This is a much narrower range than claimed in the best-observed minihalos and produced in the reacceleration model. Our results provide important suggestions to constrain these models with future observations.

  5. Clustering and firm performance in project-based industries: the case of the global video game industry, 1972-2007

    NARCIS (Netherlands)

    Vaan, M. de; Boschma, R.; Frenken, K.

    2013-01-01

    Explanations of spatial clustering based on localization externalities are being questioned by recent empirical evidence showing that firms in clusters do not outperform firms outside clusters. We propose that these findings may be driven by the particularities of the industrial settings chosen

  6. Statistical analysis of two-dimensional cluster structures composed of ferromagnetic particles based on a flexible chain model.

    Science.gov (United States)

    Morimoto, Hisao; Maekawa, Toru; Matsumoto, Yoichiro

    2003-12-01

    We investigate two-dimensional cluster structures composed of ferromagnetic colloidal particles, based on a flexible chain model, by the configurational-bias Monte Carlo method. We clarify the dependence of the probabilities of the creation of different types of clusters on the dipole-dipole interactive energy and the cluster size.

  7. Insight into acid-base nucleation experiments by comparison of the chemical composition of positive, negative, and neutral clusters.

    Science.gov (United States)

    Bianchi, Federico; Praplan, Arnaud P; Sarnela, Nina; Dommen, Josef; Kürten, Andreas; Ortega, Ismael K; Schobesberger, Siegfried; Junninen, Heikki; Simon, Mario; Tröstl, Jasmin; Jokinen, Tuija; Sipilä, Mikko; Adamov, Alexey; Amorim, Antonio; Almeida, Joao; Breitenlechner, Martin; Duplissy, Jonathan; Ehrhart, Sebastian; Flagan, Richard C; Franchin, Alessandro; Hakala, Jani; Hansel, Armin; Heinritzi, Martin; Kangasluoma, Juha; Keskinen, Helmi; Kim, Jaeseok; Kirkby, Jasper; Laaksonen, Ari; Lawler, Michael J; Lehtipalo, Katrianne; Leiminger, Markus; Makhmutov, Vladimir; Mathot, Serge; Onnela, Antti; Petäjä, Tuukka; Riccobono, Francesco; Rissanen, Matti P; Rondo, Linda; Tomé, António; Virtanen, Annele; Viisanen, Yrjö; Williamson, Christina; Wimmer, Daniela; Winkler, Paul M; Ye, Penglin; Curtius, Joachim; Kulmala, Markku; Worsnop, Douglas R; Donahue, Neil M; Baltensperger, Urs

    2014-12-02

    We investigated the nucleation of sulfuric acid together with two bases (ammonia and dimethylamine), at the CLOUD chamber at CERN. The chemical composition of positive, negative, and neutral clusters was studied using three Atmospheric Pressure interface-Time Of Flight (APi-TOF) mass spectrometers: two were operated in positive and negative mode to detect the chamber ions, while the third was equipped with a nitrate ion chemical ionization source allowing detection of neutral clusters. Taking into account the possible fragmentation that can happen during the charging of the ions or within the first stage of the mass spectrometer, the cluster formation proceeded via essentially one-to-one acid-base addition for all of the clusters, independent of the type of the base. For the positive clusters, the charge is carried by one excess protonated base, while for the negative clusters it is carried by a deprotonated acid; the same is true for the neutral clusters after these have been ionized. During the experiments involving sulfuric acid and dimethylamine, it was possible to study the appearance time for all the clusters (positive, negative, and neutral). It appeared that, after the formation of the clusters containing three molecules of sulfuric acid, the clusters grow at a similar speed, independent of their charge. The growth rate is then probably limited by the arrival rate of sulfuric acid or cluster-cluster collision.

  8. Strain measurement based battery testing

    Science.gov (United States)

    Xu, Jeff Qiang; Steiber, Joe; Wall, Craig M.; Smith, Robert; Ng, Cheuk

    2017-05-23

    A method and system for strain-based estimation of the state of health of a battery, from an initial state to an aged state, is provided. A strain gauge is applied to the battery. A first strain measurement is performed on the battery, using the strain gauge, at a selected charge capacity of the battery and at the initial state of the battery. A second strain measurement is performed on the battery, using the strain gauge, at the selected charge capacity of the battery and at the aged state of the battery. The capacity degradation of the battery is estimated as the difference between the first and second strain measurements divided by the first strain measurement.

  9. Biological consequences of potential repair intermediates of clustered base damage site in Escherichia coli

    International Nuclear Information System (INIS)

    Shikazono, Naoya; O'Neill, Peter

    2009-01-01

    Clustered DNA damage induced by a single radiation track is a unique feature of ionizing radiation. Using a plasmid-based assay in Escherichia coli, we previously found significantly higher mutation frequencies for bistranded clusters containing 7,8-dihydro-8-oxoguanine (8-oxoG) and 5,6-dihydrothymine (DHT) than for either a single 8-oxoG or a single DHT in wild type and in glycosylase-deficient strains of E. coli. This indicates that the removal of an 8-oxoG from a clustered damage site is most likely retarded compared to the removal of a single 8-oxoG. To gain further insights into the processing of bistranded base lesions, several potential repair intermediates following 8-oxoG removal were assessed. Clusters, such as DHT + apurinic/apyrimidinic (AP) and DHT + GAP have relatively low mutation frequencies, whereas clusters, such as AP + AP or GAP + AP, significantly reduce the number of transformed colonies, most probably through formation of a lethal double strand break (DSB). Bistranded AP sites placed 3' to each other with various interlesion distances also blocked replication. These results suggest that bistranded base lesions, i.e., single base lesions on each strand, but not clusters containing only AP sites and strand breaks, are repaired in a coordinated manner so that the formation of DSBs is avoided. We propose that, when either base lesion is initially excised from a bistranded base damage site, the remaining base lesion will only rarely be converted into an AP site or a single strand break in vivo.

  10. Nearest neighbor-density-based clustering methods for large hyperspectral images

    Science.gov (United States)

    Cariou, Claude; Chehdi, Kacem

    2017-10-01

    We address the problem of hyperspectral image (HSI) pixel partitioning using nearest neighbor - density-based (NN-DB) clustering methods. NN-DB methods are able to cluster objects without specifying the number of clusters to be found. Within the NN-DB approach, we focus on deterministic methods, e.g. ModeSeek, knnClust, and GWENN (standing for Graph WatershEd using Nearest Neighbors). These methods only require the availability of a k-nearest neighbor (kNN) graph based on a given distance metric. Recently, a new DB clustering method, called Density Peak Clustering (DPC), has received much attention, and kNN versions of it have quickly followed and showed their efficiency. However, NN-DB methods still suffer from the difficulty of obtaining the kNN graph due to the quadratic complexity with respect to the number of pixels. This is why GWENN was embedded into a multiresolution (MR) scheme to bypass the computation of the full kNN graph over the image pixels. In this communication, we propose to extent the MR-GWENN scheme on three aspects. Firstly, similarly to knnClust, the original labeling rule of GWENN is modified to account for local density values, in addition to the labels of previously processed objects. Secondly, we set up a modified NN search procedure within the MR scheme, in order to stabilize of the number of clusters found from the coarsest to the finest spatial resolution. Finally, we show that these extensions can be easily adapted to the three other NN-DB methods (ModeSeek, knnClust, knnDPC) for pixel clustering in large HSIs. Experiments are conducted to compare the four NN-DB methods for pixel clustering in HSIs. We show that NN-DB methods can outperform a classical clustering method such as fuzzy c-means (FCM), in terms of classification accuracy, relevance of found clusters, and clustering speed. Finally, we demonstrate the feasibility and evaluate the performances of NN-DB methods on a very large image acquired by our AISA Eagle hyperspectral

  11. A Smartphone Indoor Localization Algorithm Based on WLAN Location Fingerprinting with Feature Extraction and Clustering

    Directory of Open Access Journals (Sweden)

    Junhai Luo

    2017-06-01

    Full Text Available With the development of communication technology, the demand for location-based services is growing rapidly. This paper presents an algorithm for indoor localization based on Received Signal Strength (RSS, which is collected from Access Points (APs. The proposed localization algorithm contains the offline information acquisition phase and online positioning phase. Firstly, the AP selection algorithm is reviewed and improved based on the stability of signals to remove useless AP; secondly, Kernel Principal Component Analysis (KPCA is analyzed and used to remove the data redundancy and maintain useful characteristics for nonlinear feature extraction; thirdly, the Affinity Propagation Clustering (APC algorithm utilizes RSS values to classify data samples and narrow the positioning range. In the online positioning phase, the classified data will be matched with the testing data to determine the position area, and the Maximum Likelihood (ML estimate will be employed for precise positioning. Eventually, the proposed algorithm is implemented in a real-world environment for performance evaluation. Experimental results demonstrate that the proposed algorithm improves the accuracy and computational complexity.

  12. Point-of-Care Hemostatic Testing in Cardiac Surgery: A Stepped-Wedge Clustered Randomized Controlled Trial.

    Science.gov (United States)

    Karkouti, Keyvan; Callum, Jeannie; Wijeysundera, Duminda N; Rao, Vivek; Crowther, Mark; Grocott, Hilary P; Pinto, Ruxandra; Scales, Damon C

    2016-10-18

    Cardiac surgery is frequently complicated by coagulopathic bleeding that is difficult to optimally manage using standard hemostatic testing. We hypothesized that point-of-care hemostatic testing within the context of an integrated transfusion algorithm would improve the management of coagulopathy in cardiac surgery and thereby reduce blood transfusions. We conducted a pragmatic multicenter stepped-wedge cluster randomized controlled trial of a point-of-care-based transfusion algorithm in consecutive patients undergoing cardiac surgery with cardiopulmonary bypass at 12 hospitals from October 6, 2014, to May 1, 2015. Following a 1-month data collection at all participating hospitals, a transfusion algorithm incorporating point-of-care hemostatic testing was sequentially implemented at 2 hospitals at a time in 1-month intervals, with the implementation order randomly assigned. No other aspects of care were modified. The primary outcome was red blood cell transfusion from surgery to postoperative day 7. Other outcomes included transfusion of other blood products, major bleeding, and major complications. The analysis adjusted for secular time trends, within-hospital clustering, and patient-level risk factors. All outcomes and analyses were prespecified before study initiation. Among the 7402 patients studied, 3555 underwent surgery during the control phase and 3847 during the intervention phase. Overall, 3329 (45.0%) received red blood cells, 1863 (25.2%) received platelets, 1645 (22.2%) received plasma, and 394 (5.3%) received cryoprecipitate. Major bleeding occurred in 1773 (24.1%) patients, and major complications occurred in 740 (10.2%) patients. The trial intervention reduced rates of red blood cell transfusion (adjusted relative risk, 0.91; 95% confidence interval, 0.85-0.98; P=0.02; number needed to treat, 24.7), platelet transfusion (relative risk, 0.77; 95% confidence interval, 0.68-0.87; Ppoint-of-care hemostatic testing within the context of an integrated

  13. Ecosystem health pattern analysis of urban clusters based on emergy synthesis: Results and implication for management

    International Nuclear Information System (INIS)

    Su, Meirong; Fath, Brian D.; Yang, Zhifeng; Chen, Bin; Liu, Gengyuan

    2013-01-01

    The evaluation of ecosystem health in urban clusters will help establish effective management that promotes sustainable regional development. To standardize the application of emergy synthesis and set pair analysis (EM–SPA) in ecosystem health assessment, a procedure for using EM–SPA models was established in this paper by combining the ability of emergy synthesis to reflect health status from a biophysical perspective with the ability of set pair analysis to describe extensive relationships among different variables. Based on the EM–SPA model, the relative health levels of selected urban clusters and their related ecosystem health patterns were characterized. The health states of three typical Chinese urban clusters – Jing-Jin-Tang, Yangtze River Delta, and Pearl River Delta – were investigated using the model. The results showed that the health status of the Pearl River Delta was relatively good; the health for the Yangtze River Delta was poor. As for the specific health characteristics, the Pearl River Delta and Yangtze River Delta urban clusters were relatively strong in Vigor, Resilience, and Urban ecosystem service function maintenance, while the Jing-Jin-Tang was relatively strong in organizational structure and environmental impact. Guidelines for managing these different urban clusters were put forward based on the analysis of the results of this study. - Highlights: • The use of integrated emergy synthesis and set pair analysis model was standardized. • The integrated model was applied on the scale of an urban cluster. • Health patterns of different urban clusters were compared. • Policy suggestions were provided based on the health pattern analysis

  14. The young star cluster population of M51 with LEGUS - II. Testing environmental dependencies

    Science.gov (United States)

    Messa, Matteo; Adamo, A.; Calzetti, D.; Reina-Campos, M.; Colombo, D.; Schinnerer, E.; Chandar, R.; Dale, D. A.; Gouliermis, D. A.; Grasha, K.; Grebel, E. K.; Elmegreen, B. G.; Fumagalli, M.; Johnson, K. E.; Kruijssen, J. M. D.; Östlin, G.; Shabani, F.; Smith, L. J.; Whitmore, B. C.

    2018-03-01

    It has recently been established that the properties of young star clusters (YSCs) can vary as a function of the galactic environment in which they are found. We use the cluster catalogue produced by the Legacy Extragalactic UV Survey (LEGUS) collaboration to investigate cluster properties in the spiral galaxy M51. We analyse the cluster population as a function of galactocentric distance and in arm and inter-arm regions. The cluster mass function exhibits a similar shape at all radial bins, described by a power law with a slope close to -2 and an exponential truncation around 105 M⊙. While the mass functions of the YSCs in the spiral arm and inter-arm regions have similar truncation masses, the inter-arm region mass function has a significantly steeper slope than the one in the arm region; a trend that is also observed in the giant molecular cloud mass function and predicted by simulations. The age distribution of clusters is dependent on the region considered, and is consistent with rapid disruption only in dense regions, while little disruption is observed at large galactocentric distances and in the inter-arm region. The fraction of stars forming in clusters does not show radial variations, despite the drop in the H2 surface density measured as function of galactocentric distance. We suggest that the higher disruption rate observed in the inner part of the galaxy is likely at the origin of the observed flat cluster formation efficiency radial profile.

  15. Automation for a base station stability testing

    OpenAIRE

    Punnek, Elvis

    2016-01-01

    This Batchelor’s thesis was commissioned by Oy LM Ericsson Ab Oulu. The aim of it was to help to investigate and create a test automation solution for the stability testing of the LTE base station. The main objective was to create a test automation for a predefined test set. This test automation solution had to be created for specific environments and equipment. This work included creating the automation for the test cases and putting them to daily test automation jobs. The key factor...

  16. Clustering Batik Images using Fuzzy C-Means Algorithm Based on Log-Average Luminance

    Directory of Open Access Journals (Sweden)

    Ahmad Sanmorino

    2012-06-01

    Full Text Available Batik is a fabric or clothes that are made ​​with a special staining technique called wax-resist dyeing and is one of the cultural heritage which has high artistic value. In order to improve the efficiency and give better semantic to the image, some researchers apply clustering algorithm for managing images before they can be retrieved. Image clustering is a process of grouping images based on their similarity. In this paper we attempt to provide an alternative method of grouping batik image using fuzzy c-means (FCM algorithm based on log-average luminance of the batik. FCM clustering algorithm is an algorithm that works using fuzzy models that allow all data from all cluster members are formed with different degrees of membership between 0 and 1. Log-average luminance (LAL is the average value of the lighting in an image. We can compare different image lighting from one image to another using LAL. From the experiments that have been made, it can be concluded that fuzzy c-means algorithm can be used for batik image clustering based on log-average luminance of each image possessed.

  17. A clustering-based graph Laplacian framework for value function approximation in reinforcement learning.

    Science.gov (United States)

    Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold

    2014-12-01

    In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.

  18. Predictor-Year Subspace Clustering Based Ensemble Prediction of Indian Summer Monsoon

    Directory of Open Access Journals (Sweden)

    Moumita Saha

    2016-01-01

    Full Text Available Forecasting the Indian summer monsoon is a challenging task due to its complex and nonlinear behavior. A large number of global climatic variables with varying interaction patterns over years influence monsoon. Various statistical and neural prediction models have been proposed for forecasting monsoon, but many of them fail to capture variability over years. The skill of predictor variables of monsoon also evolves over time. In this article, we propose a joint-clustering of monsoon years and predictors for understanding and predicting the monsoon. This is achieved by subspace clustering algorithm. It groups the years based on prevailing global climatic condition using statistical clustering technique and subsequently for each such group it identifies significant climatic predictor variables which assist in better prediction. Prediction model is designed to frame individual cluster using random forest of regression tree. Prediction of aggregate and regional monsoon is attempted. Mean absolute error of 5.2% is obtained for forecasting aggregate Indian summer monsoon. Errors in predicting the regional monsoons are also comparable in comparison to the high variation of regional precipitation. Proposed joint-clustering based ensemble model is observed to be superior to existing monsoon prediction models and it also surpasses general nonclustering based prediction models.

  19. An effective trust-based recommendation method using a novel graph clustering algorithm

    Science.gov (United States)

    Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin

    2015-10-01

    Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.

  20. Design and Implementation of Streaming Media Server Cluster Based on FFMpeg

    Science.gov (United States)

    Zhao, Hong; Zhou, Chun-long; Jin, Bao-zhao

    2015-01-01

    Poor performance and network congestion are commonly observed in the streaming media single server system. This paper proposes a scheme to construct a streaming media server cluster system based on FFMpeg. In this scheme, different users are distributed to different servers according to their locations and the balance among servers is maintained by the dynamic load-balancing algorithm based on active feedback. Furthermore, a service redirection algorithm is proposed to improve the transmission efficiency of streaming media data. The experiment results show that the server cluster system has significantly alleviated the network congestion and improved the performance in comparison with the single server system. PMID:25734187

  1. Design and implementation of streaming media server cluster based on FFMpeg.

    Science.gov (United States)

    Zhao, Hong; Zhou, Chun-long; Jin, Bao-zhao

    2015-01-01

    Poor performance and network congestion are commonly observed in the streaming media single server system. This paper proposes a scheme to construct a streaming media server cluster system based on FFMpeg. In this scheme, different users are distributed to different servers according to their locations and the balance among servers is maintained by the dynamic load-balancing algorithm based on active feedback. Furthermore, a service redirection algorithm is proposed to improve the transmission efficiency of streaming media data. The experiment results show that the server cluster system has significantly alleviated the network congestion and improved the performance in comparison with the single server system.

  2. THE SLUGGS SURVEY: NGC 3115, A CRITICAL TEST CASE FOR METALLICITY BIMODALITY IN GLOBULAR CLUSTER SYSTEMS

    Energy Technology Data Exchange (ETDEWEB)

    Brodie, Jean P.; Conroy, Charlie; Arnold, Jacob A.; Romanowsky, Aaron J. [University of California Observatories and Department of Astronomy and Astrophysics, University of California, Santa Cruz, CA 95064 (United States); Usher, Christopher; Forbes, Duncan A. [Centre for Astrophysics and Supercomputing, Swinburne University, Hawthorn, VIC 3122 (Australia); Strader, Jay, E-mail: brodie@ucolick.org [Department of Physics and Astronomy, Michigan State University, East Lansing, MI 48824 (United States)

    2012-11-10

    Due to its proximity (9 Mpc) and the strongly bimodal color distribution of its spectroscopically well-sampled globular cluster (GC) system, the early-type galaxy NGC 3115 provides one of the best available tests of whether the color bimodality widely observed in GC systems generally reflects a true metallicity bimodality. Color bimodality has alternatively been attributed to a strongly nonlinear color-metallicity relation reflecting the influence of hot horizontal-branch stars. Here, we couple Subaru Suprime-Cam gi photometry with Keck/DEIMOS spectroscopy to accurately measure GC colors and a CaT index that measures the Ca II triplet. We find the NGC 3115 GC system to be unambiguously bimodal in both color and the CaT index. Using simple stellar population models, we show that the CaT index is essentially unaffected by variations in horizontal-branch morphology over the range of metallicities relevant to GC systems (and is thus a robust indicator of metallicity) and confirm bimodality in the metallicity distribution. We assess the existing evidence for and against multiple metallicity subpopulations in early- and late-type galaxies and conclude that metallicity bi/multimodality is common. We briefly discuss how this fundamental characteristic links directly to the star formation and assembly histories of galaxies.

  3. Formal And Informal Macro-Regional Transport Clusters As A Primary Step In The Design And Implementation Of Cluster-Based Strategies

    Directory of Open Access Journals (Sweden)

    Nežerenko Olga

    2015-09-01

    Full Text Available The aim of the study is the identification of a formal macro-regional transport and logistics cluster and its development trends on a macro-regional level in 2007-2011 by means of the hierarchical cluster analysis. The central approach of the study is based on two concepts: 1 the concept of formal and informal macro-regions, and 2 the concept of clustering which is based on the similarities shared by the countries of a macro-region and tightly related to the concept of macro-region. The authors seek to answer the question whether the formation of a formal transport cluster could provide the BSR a stable competitive position in the global transportation and logistics market.

  4. Evaluating the Relevance of Corporate Ontological Knowledge Base Documents Using Their Hierarchical Role Clustering

    Directory of Open Access Journals (Sweden)

    A. P. Karpenko

    2014-01-01

    Full Text Available The article considers a corporate knowledge base representing a set of the large number of various semi-structured documents, which describe to some detail precedents i.e. some situations and decisions made in these situations. The task is to find the decision in such knowledge base and search the most suitable precedents and their corresponding documents in it.The work continues a series of the authors’ works, which develop an original approach to the solution of this task. It is supposed that metadata of the knowledge base documents are based on the ontology of the corresponding subject domain specified as a semantic network. As opposed to the previous works it is necessary that ontology of subject domain, and thereby and corresponding semantic network, is hierarchically clustered based on the roles of concepts.Documents in the knowledge base, and also search queries are presented as frames, which are called ‘patterns of design and inquiry’, respectively. Slots of these patterns correspond to roles of concepts of the used ontology. Above-noted roles break concepts of ontology, document, and request to the knowledge base into clusters. It is supposed that semantic networks of the abovementioned clusters are designed by a technique for creation of a semantic network of the document. Thus, search images of the document and inquiry are presented as a set of the semantic networks corresponding to slots of a pattern of design and a pattern of inquiry.The work offers the model of a semantic network of the ontological knowledge base using a hierarchical role clustering of concepts of this ontology. The task for a multi-criteria evaluation of relevance of corporate ontological knowledge base documents using their hierarchical role clustering is set. The method is offered to solve the task using the original measures of proximity of role patterns of design of the required document and inquiry.

  5. The Ideal Base For Patch Testing

    Directory of Open Access Journals (Sweden)

    Ashok Kumar Bajaj

    1984-01-01

    Full Text Available A study was conducted to find out a suitable vehicle for patch testing in India. Various bases: tested were petrolatum, propylene glycol, polyethylene glycol 400, lanolin, olive, oil and plastobase. The observations suggest that polythylene glycol 400 is the most suitable vehicle for patch testing.

  6. Comparison of Skin Moisturizer: Consumer-Based Brand Equity (CBBE Factors in Clusters Based on Consumer Ethnocentrism

    Directory of Open Access Journals (Sweden)

    Yossy Hanna Garlina

    2014-09-01

    Full Text Available This research aims to analyze relevant factors contributing to the four dimensions of consumer-based brand equity in skin moisturizer industry. It is then followed by the clustering of female consumers of skin moisturizer based on ethnocentrism and differentiating each cluster’s consumer-based brand equity dimensions towards a domestic skin moisturizer brand Mustika Ratu, skin moisturizer. Research used descriptive survey method analysis. Primary data was obtained through questionnaire distribution to 70 female respondents for factor analysis and 120 female respondents for cluster analysis and one way analysis of variance (ANOVA. This research employed factor analysis to obtain relevant factors contributing to the five dimensions of consumer-based brand equity in skin moisturizer industry. Cluster analysis and one way analysis of variance (ANOVA were to see the difference of consumer-based brand equity between highly ethnocentric consumer and low ethnocentric consumer towards the same skin moisturizer domestic brand, Mustika Ratu skin moisturizer. Research found in all individual dimension analysis, all variable means and individual means show distinct difference between the high ethnocentric consumer and the low ethnocentric consumer. The low ethnocentric consumer cluster tends to be lower in mean score of Brand Loyalty, Perceived Quality, Brand Awareness, Brand Association, and Overall Brand Equity than the high ethnocentric consumer cluster. Research concludes consumer ethnocentrism is positively correlated with preferences towards domestic products and negatively correlated with foreign-made product preference. It is, then, highly ethnocentric consumers have positive perception towards domestic product.

  7. A WEB-BASED SOLUTION TO VISUALIZE OPERATIONAL MONITORING LINUX CLUSTER FOR THE PROTODUNE DATA QUALITY MONITORING CLUSTER

    CERN Document Server

    Mosesane, Badisa

    2017-01-01

    The Neutrino computing cluster made of 300 Dell PowerEdge 1950 U1 nodes serves an integral role to the CERN Neutrino Platform (CENF). It represents an effort to foster fundamental research in the field of Neutrino physics as it provides data processing facility. We cannot begin to over emphasize the need for data quality monitoring coupled with automating system configurations and remote monitoring of the cluster. To achieve these, a software stack has been chosen to implement automatic propagation of configurations across all the nodes in the cluster. The bulk of these discusses and delves more into the automated configuration management system on this cluster to enable the fast online data processing and Data Quality (DQM) process for the Neutrino Platform cluster (npcmp.cern.ch).

  8. Comparison of tests for spatial heterogeneity on data with global clustering patterns and outliers

    Directory of Open Access Journals (Sweden)

    Hachey Mark

    2009-10-01

    Full Text Available Abstract Background The ability to evaluate geographic heterogeneity of cancer incidence and mortality is important in cancer surveillance. Many statistical methods for evaluating global clustering and local cluster patterns are developed and have been examined by many simulation studies. However, the performance of these methods on two extreme cases (global clustering evaluation and local anomaly (outlier detection has not been thoroughly investigated. Methods We compare methods for global clustering evaluation including Tango's Index, Moran's I, and Oden's I*pop; and cluster detection methods such as local Moran's I and SaTScan elliptic version on simulated count data that mimic global clustering patterns and outliers for cancer cases in the continental United States. We examine the power and precision of the selected methods in the purely spatial analysis. We illustrate Tango's MEET and SaTScan elliptic version on a 1987-2004 HIV and a 1950-1969 lung cancer mortality data in the United States. Results For simulated data with outlier patterns, Tango's MEET, Moran's I and I*pop had powers less than 0.2, and SaTScan had powers around 0.97. For simulated data with global clustering patterns, Tango's MEET and I*pop (with 50% of total population as the maximum search window had powers close to 1. SaTScan had powers around 0.7-0.8 and Moran's I has powers around 0.2-0.3. In the real data example, Tango's MEET indicated the existence of global clustering patterns in both the HIV and lung cancer mortality data. SaTScan found a large cluster for HIV mortality rates, which is consistent with the finding from Tango's MEET. SaTScan also found clusters and outliers in the lung cancer mortality data. Conclusion SaTScan elliptic version is more efficient for outlier detection compared with the other methods evaluated in this article. Tango's MEET and Oden's I*pop perform best in global clustering scenarios among the selected methods. The use of SaTScan for

  9. An Extension and Test of Sutherland's Concept of Differential Social Organization: The Geographic Clustering of Japanese Suicide and Homicide Rates

    Science.gov (United States)

    Baller, Robert D.; Shin, Dong-Joon; Richardson, Kelly K.

    2005-01-01

    In an effort to explain the spatial patterning of violence, we expanded Sutherland's (1947) concept of differential social organization to include the level of deviance exhibited by neighboring areas. To test the value of this extension, the geographic clustering of Japanese suicide and homicide rates is assessed using 1985 and 1995 data for…

  10. Analytical network process based optimum cluster head selection in wireless sensor network.

    Science.gov (United States)

    Farman, Haleem; Javed, Huma; Jan, Bilal; Ahmad, Jamil; Ali, Shaukat; Khalil, Falak Naz; Khan, Murad

    2017-01-01

    Wireless Sensor Networks (WSNs) are becoming ubiquitous in everyday life due to their applications in weather forecasting, surveillance, implantable sensors for health monitoring and other plethora of applications. WSN is equipped with hundreds and thousands of small sensor nodes. As the size of a sensor node decreases, critical issues such as limited energy, computation time and limited memory become even more highlighted. In such a case, network lifetime mainly depends on efficient use of available resources. Organizing nearby nodes into clusters make it convenient to efficiently manage each cluster as well as the overall network. In this paper, we extend our previous work of grid-based hybrid network deployment approach, in which merge and split technique has been proposed to construct network topology. Constructing topology through our proposed technique, in this paper we have used analytical network process (ANP) model for cluster head selection in WSN. Five distinct parameters: distance from nodes (DistNode), residual energy level (REL), distance from centroid (DistCent), number of times the node has been selected as cluster head (TCH) and merged node (MN) are considered for CH selection. The problem of CH selection based on these parameters is tackled as a multi criteria decision system, for which ANP method is used for optimum cluster head selection. Main contribution of this work is to check the applicability of ANP model for cluster head selection in WSN. In addition, sensitivity analysis is carried out to check the stability of alternatives (available candidate nodes) and their ranking for different scenarios. The simulation results show that the proposed method outperforms existing energy efficient clustering protocols in terms of optimum CH selection and minimizing CH reselection process that results in extending overall network lifetime. This paper analyzes that ANP method used for CH selection with better understanding of the dependencies of

  11. A magnetic nanoparticle-clustering biosensor for blu-ray based optical detection of small-molecules

    DEFF Research Database (Denmark)

    Yang, Jaeyoung; Donolato, Marco; Antunes, Paula Soares Martins

    2014-01-01

    -cost instruments limit the advancement of MNP-based assays. We report here a novel MNP-clustering small-molecule assay on an optical readout platform to overcome the limitations aforementioned with the following improvements. First, a facile MNP-clustering assay applicable to diverse small-molecules was realized......In magnetic nanoparticle (MNP)-clustering assays, a target molecule is bound to multiple receptors tethered onto MNPs, triggering MNP-clustering and leading to changes in the size of clusters. However, sandwich-type clustering requires multiple binding-sites on a target molecule, which is often...... unavailable for small-molecules. Furthermore, measuring magnetic properties as signals is not intrinsically selective regarding MNP-cluster size. Thus, the detection of few MNP-clusters is readily interfered by background signals from predominantly-existing single MNPs. Additionally, bulky and high...

  12. Multispectral image compression algorithm based on spectral clustering and wavelet transform

    Science.gov (United States)

    Huang, Rong; Qiao, Weidong; Yang, Jianfeng; Wang, Hong; Xue, Bin; Tao, Jinyou

    2017-11-01

    In this paper, a method based on spectral clustering and the discrete wavelet transform (DWT) is proposed, which is based on the problem of the high degree of space-time redundancy in the current multispectral image compression algorithm. First, the spectral images are grouped by spectral clustering methods, and the clusters of similar heights are grouped together to remove the redundancy of the spectra. Then, wavelet transform and coding of the class representative are performed, and the space redundancy is eliminated, and the difference composition is applied to the Karhunen-Loeve transform (KLT) and wavelet transform. Experimental results show that with JPEG2000 and upon KLT + DWT algorithm, compared with the method has better peak signal-to-noise ratio and compression ratio, and it is suitable for compression of different spectral bands.

  13. Risk Assessment for Bridges Safety Management during Operation Based on Fuzzy Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Xia Hanyu

    2016-01-01

    Full Text Available In recent years, large span and large sea-crossing bridges are built, bridges accidents caused by improper operational management occur frequently. In order to explore the better methods for risk assessment of the bridges operation departments, the method based on fuzzy clustering algorithm is selected. Then, the implementation steps of fuzzy clustering algorithm are described, the risk evaluation system is built, and Taizhou Bridge is selected as an example, the quantitation of risk factors is described. After that, the clustering algorithm based on fuzzy equivalence is calculated on MATLAB 2010a. In the last, Taizhou Bridge operation management departments are classified and sorted according to the degree of risk, and the safety situation of operation departments is analyzed.

  14. Address allocation for MANET merge and partition using cluster based routing.

    Science.gov (United States)

    Singh, Sugandha; Rajpal, Navin; Sharma, Ashok

    2014-01-01

    Network merges and partitions occur quite often in MANET wherein address auto-configuration is a critical requirement. There are various approaches for address auto-configuration in MANETs which allocate address to the nodes in a dynamic and distributed manner in which HOST ID and MANET ID are assigned on the basis of their Base value. MANET merges and partitions employing Cluster Based Routing Protocol require a node to be assigned as the Cluster Head (CH). This paper presents the Election Algorithm which assigns a node as the Cluster Head on the basis of its weight. Through simulation using the NS-2, it has been shown that the Election Algorithm improves the packet delivery ratio (PDR) significantly and decreases the packet delay to a great extent in comparison to the existing AODV protocol.

  15. SHAM beyond clustering: new tests of galaxy–halo abundance matching with galaxy groups

    Energy Technology Data Exchange (ETDEWEB)

    Hearin, Andrew P.; Zentner, Andrew R.; Berlind, Andreas A.; Newman, Jeffrey A.

    2013-05-27

    We construct mock catalogs of galaxy groups using subhalo abundance matching (SHAM) and undertake several new tests of the SHAM prescription for the galaxy-dark matter connection. All SHAM models we studied exhibit significant tension with galaxy groups observed in the Sloan Digital Sky Survey (SDSS). The SHAM prediction for the field galaxy luminosity function (LF) is systematically too dim, and the group galaxy LF systematically too bright, regardless of the details of the SHAM prescription. SHAM models connecting r-band luminosity, Mr, to Vacc, the maximum circular velocity of a subhalo at the time of accretion onto the host, faithfully reproduce galaxy group abundance as a function of richness, g(N). However, SHAM models connecting Mr with Vpeak, the peak value of Vmax over the entire merger history of the halo, over-predict galaxy group abundance. Our results suggest that no SHAM model can simultaneously reproduce the observed g(N) and two-point projected galaxy clustering. Nevertheless, we also report a new success of SHAM: an accurate prediction for Phi(m12), the abundance of galaxy groups as a function of magnitude gap m12, defined as the difference between the r-band absolute magnitude of the two brightest group members. We show that it may be possible to use joint measurements of g(N) and Phi(m12) to tightly constrain the details of the SHAM implementation. Additionally, we show that the hypothesis that the luminosity gap is constructed via random draws from a universal LF provides a poor description of the data, contradicting recent claims in the literature. Finally, we test a common assumption of the Conditional Luminosity Function (CLF) formalism, that the satellite LF need only be conditioned by the brightness of the central galaxy. We find this assumption to be well-supported by the observed Phi(m12).

  16. ADAPTIVE CLUSTER BASED ROUTING PROTOCOL WITH ANT COLONY OPTIMIZATION FOR MOBILE AD-HOC NETWORK IN DISASTER AREA

    Directory of Open Access Journals (Sweden)

    Enrico Budianto

    2012-07-01

    Full Text Available In post-disaster rehabilitation efforts, the availability of telecommunication facilities takes important role. However, the process to improve telecommunication facilities in disaster area is risky if it is done by humans. Therefore, a network method that can work efficiently, effectively, and capable to reach the widest possible area is needed. This research introduces a cluster-based routing protocol named Adaptive Cluster Based Routing Protocol (ACBRP equipped by Ant Colony Optimization method, and its implementation in a simulator developed by author. After data analysis and statistical tests, it can be concluded that routing protocol ACBRP performs better than AODV and DSR routing protocol. Pada upaya rehabilitasi pascabencana, ketersediaan fasilitas telekomunikasi memiliki peranan yang sangat penting. Namun, proses untuk memperbaiki fasilitas telekomunikasi di daerah bencana memiliki resiko jika dilakukan oleh manusia. Oleh karena itu, metode jaringan yang dapat bekerja secara efisien, efektif, dan mampu mencapai area seluas mungkin diperlukan. Penelitian ini memperkenalkan sebuah protokol routing berbasis klaster bernama Adaptive Cluster Based Routing Protocol (ACBRP, yang dilengkapi dengan metode Ant Colony Optimization, dan diimplementasikan pada simulator yang dikembangkan penulis. Setelah data dianalisis dan dilakukan uji statistik, disimpulkan bahwa protokol routing ACBRP beroperasi lebih baik daripada protokol routing AODV maupun DSR.

  17. Density-Based Statistical Clustering: Enabling Sidefire Ultrasonic Traffic Sensing in Smart Cities

    Directory of Open Access Journals (Sweden)

    Volker Lücken

    2018-01-01

    Full Text Available Traffic routing is a central challenge in the context of urban areas, with a direct impact on personal mobility, traffic congestion, and air pollution. In the last decade, the possibilities for traffic flow control have improved together with the corresponding management systems. However, the lack of real-time traffic flow information with a city-wide coverage is a major limiting factor for an optimum operation. Smart City concepts seek to tackle these challenges in the future by combining sensing, communications, distributed information, and actuation. This paper presents an integrated approach that combines smart street lamps with traffic sensing technology. More specifically, infrastructure-based ultrasonic sensors, which are deployed together with a street light system, are used for multilane traffic participant detection and classification. Application of these sensors in time-varying reflective environments posed an unresolved problem for many ultrasonic sensing solutions in the past and therefore widely limited the dissemination of this technology. We present a solution using an algorithmic approach that combines statistical standardization with clustering techniques from the field of unsupervised learning. By using a multilevel communication concept, centralized and decentralized traffic information fusion is possible. The evaluation is based on results from automotive test track measurements and several European real-world installations.

  18. TESTING THE RELIABILITY OF CLUSTER MASS INDICATORS WITH A SYSTEMATICS LIMITED DATA SET

    International Nuclear Information System (INIS)

    Juett, Adrienne M.; Mushotzky, Richard; Davis, David S.

    2010-01-01

    We present the mass-X-ray observable scaling relationships for clusters of galaxies using the XMM-Newton cluster catalog of Snowden et al. Our results are roughly consistent with previous observational and theoretical work, with one major exception. We find two to three times the scatter around the best-fit mass scaling relationships as expected from cluster simulations or seen in other observational studies. We suggest that this is a consequence of using hydrostatic mass, as opposed to virial mass, and is due to the explicit dependence of the hydrostatic mass on the gradients of the temperature and gas density profiles. We find a larger range of slope in the cluster temperature profiles at r 500 than previous observational studies. Additionally, we find only a weak dependence of the gas mass fraction on cluster mass, consistent with a constant. Our average gas mass fraction results argue for a closer study of the systematic errors due to instrumental calibration and analysis method variations. We suggest that a more careful study of the differences between various observational results and with cluster simulations is needed to understand sources of bias and scatter in cosmological studies of galaxy clusters.

  19. GALAXY CLUSTERING TOPOLOGY IN THE SLOAN DIGITAL SKY SURVEY MAIN GALAXY SAMPLE: A TEST FOR GALAXY FORMATION MODELS

    International Nuclear Information System (INIS)

    Choi, Yun-Young; Kim, Juhan; Kim, Sungsoo S.; Park, Changbom; Gott, J. Richard; Weinberg, David H.; Vogeley, Michael S.

    2010-01-01

    We measure the topology of the main galaxy distribution using the Seventh Data Release of the Sloan Digital Sky Survey, examining the dependence of galaxy clustering topology on galaxy properties. The observational results are used to test galaxy formation models. A volume-limited sample defined by M r -1 Mpc smoothing scale, with 4.8% uncertainty including all systematics and cosmic variance. The clustering topology over the smoothing length interval from 6 to 10 h -1 Mpc reveals a mild scale dependence for the shift (Δν) and void abundance (A V ) parameters of the genus curve. We find substantial bias in the topology of galaxy clustering with respect to the predicted topology of the matter distribution, which varies with luminosity, morphology, color, and the smoothing scale of the density field. The distribution of relatively brighter galaxies shows a greater prevalence of isolated clusters and more percolated voids. Even though early (late)-type galaxies show topology similar to that of red (blue) galaxies, the morphology dependence of topology is not identical to the color dependence. In particular, the void abundance parameter A V depends on morphology more strongly than on color. We test five galaxy assignment schemes applied to cosmological N-body simulations of a ΛCDM universe to generate mock galaxies: the halo-galaxy one-to-one correspondence model, the halo occupation distribution model, and three implementations of semi-analytic models (SAMs). None of the models reproduces all aspects of the observed clustering topology; the deviations vary from one model to another but include statistically significant discrepancies in the abundance of isolated voids or isolated clusters and the amplitude and overall shift of the genus curve. SAM predictions of the topology color dependence are usually correct in sign but incorrect in magnitude. Our topology tests indicate that, in these models, voids should be emptier and more connected and the threshold for

  20. Wavelet-based clustering of resting state MRI data in the rat

    Science.gov (United States)

    Medda, Alessio; Hoffmann, Lukas; Magnuson, Matthew; Thompson, Garth; Pan, Wen-Ju; Keilholz, Shella

    2015-01-01

    While functional connectivity has typically been calculated over the entire length of the scan (5-10 min), interest has been growing in dynamic analysis methods that can detect changes in connectivity on the order of cognitive processes (seconds). Previous work with sliding window correlation has shown that changes in functional connectivity can be observed on these time scales in the awake human and in anesthetized animals. This exciting advance creates a need for improved approaches to characterize dynamic functional networks in the brain. Previous studies were performed using sliding window analysis on regions of interest defined based on anatomy or obtained from traditional steady-state analysis methods. The parcellation of the brain may therefore be suboptimal, and the characteristics of the time-varying connectivity between regions are dependent upon the length of the sliding window chosen. This manuscript describes an algorithm based on wavelet decomposition that allows data-driven clustering of voxels into functional regions based on temporal and spectral properties. Previous work has shown that different networks have characteristic frequency fingerprints, and the use of wavelets ensures that both the frequency and the timing of the BOLD fluctuations are considered during the clustering process. The method was applied to resting state data acquired from anesthetized rats, and the resulting clusters agreed well with known anatomical areas. Clusters were highly reproducible across subjects. Wavelet cross-correlation values between clusters from a single scan were significantly higher than the values from randomly-matched clusters that shared no temporal information, indicating that wavelet-based analysis is sensitive to the relationship between areas. PMID:26481903

  1. Group analyses of connectivity-based cortical parcellation using repeated k-means clustering.

    Science.gov (United States)

    Nanetti, Luca; Cerliani, Leonardo; Gazzola, Valeria; Renken, Remco; Keysers, Christian

    2009-10-01

    K-means clustering has become a popular tool for connectivity-based cortical segmentation using Diffusion Weighted Imaging (DWI) data. A sometimes ignored issue is, however, that the output of the algorithm depends on the initial placement of starting points, and that different sets of starting points therefore could lead to different solutions. In this study we explore this issue. We apply k-means clustering a thousand times to the same DWI dataset collected in 10 individuals to segment two brain regions: the SMA-preSMA on the medial wall, and the insula. At the level of single subjects, we found that in both brain regions, repeatedly applying k-means indeed often leads to a variety of rather different cortical based parcellations. By assessing the similarity and frequency of these different solutions, we show that approximately 256 k-means repetitions are needed to accurately estimate the distribution of possible solutions. Using nonparametric group statistics, we then propose a method to employ the variability of clustering solutions to assess the reliability with which certain voxels can be attributed to a particular cluster. In addition, we show that the proportion of voxels that can be attributed significantly to either cluster in the SMA and preSMA is relatively higher than in the insula and discuss how this difference may relate to differences in the anatomy of these regions.

  2. Cluster-based Dynamic Energy Management for Collaborative Target Tracking in Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Dao-Wei Bi

    2007-07-01

    Full Text Available A primary criterion of wireless sensor network is energy efficiency. Focused onthe energy problem of target tracking in wireless sensor networks, this paper proposes acluster-based dynamic energy management mechanism. Target tracking problem isformulated by the multi-sensor detection model as well as energy consumption model. Adistributed adaptive clustering approach is investigated to form a reasonable routingframework which has uniform cluster head distribution. Dijkstra’s algorithm is utilized toobtain optimal intra-cluster routing. Target position is predicted by particle filter. Thepredicted target position is adopted to estimate the idle interval of sensor nodes. Hence,dynamic awakening approach is exploited to prolong sleep time of sensor nodes so that theoperation energy consumption of wireless sensor network can be reduced. The sensornodes around the target wake up on time and act as sensing candidates. With the candidatesensor nodes and predicted target position, the optimal sensor node selection is considered.Binary particle swarm optimization is proposed to minimize the total energy consumptionduring collaborative sensing and data reporting. Experimental results verify that theproposed clustering approach establishes a low-energy communication structure while theenergy efficiency of wireless sensor networks is enhanced by cluster-based dynamic energymanagement.

  3. Virtual screening by a new Clustering-based Weighted Similarity Extreme Learning Machine approach.

    Science.gov (United States)

    Pasupa, Kitsuchart; Kudisthalert, Wasu

    2018-01-01

    Machine learning techniques are becoming popular in virtual screening tasks. One of the powerful machine learning algorithms is Extreme Learning Machine (ELM) which has been applied to many applications and has recently been applied to virtual screening. We propose the Weighted Similarity ELM (WS-ELM) which is based on a single layer feed-forward neural network in a conjunction of 16 different similarity coefficients as activation function in the hidden layer. It is known that the performance of conventional ELM is not robust due to random weight selection in the hidden layer. Thus, we propose a Clustering-based WS-ELM (CWS-ELM) that deterministically assigns weights by utilising clustering algorithms i.e. k-means clustering and support vector clustering. The experiments were conducted on one of the most challenging datasets-Maximum Unbiased Validation Dataset-which contains 17 activity classes carefully selected from PubChem. The proposed algorithms were then compared with other machine learning techniques such as support vector machine, random forest, and similarity searching. The results show that CWS-ELM in conjunction with support vector clustering yields the best performance when utilised together with Sokal/Sneath(1) coefficient. Furthermore, ECFP_6 fingerprint presents the best results in our framework compared to the other types of fingerprints, namely ECFP_4, FCFP_4, and FCFP_6.

  4. Multi-fault clustering and diagnosis of gear system mined by spectrum entropy clustering based on higher order cumulants.

    Science.gov (United States)

    Shao, Renping; Li, Jing; Hu, Wentao; Dong, Feifei

    2013-02-01

    Higher order cumulants (HOC) is a new kind of modern signal analysis of theory and technology. Spectrum entropy clustering (SEC) is a data mining method of statistics, extracting useful characteristics from a mass of nonlinear and non-stationary data. Following a discussion on the characteristics of HOC theory and SEC method in this paper, the study of signal processing techniques and the unique merits of nonlinear coupling characteristic analysis in processing random and non-stationary signals are introduced. Also, a new clustering analysis and diagnosis method is proposed for detecting multi-damage on gear by introducing the combination of HOC and SEC into the damage-detection and diagnosis of the gear system. The noise is restrained by HOC and by extracting coupling features and separating the characteristic signal at different speeds and frequency bands. Under such circumstances, the weak signal characteristics in the system are emphasized and the characteristic of multi-fault is extracted. Adopting a data-mining method of SEC conducts an analysis and diagnosis at various running states, such as the speed of 300 r/min, 900 r/min, 1200 r/min, and 1500 r/min of the following six signals: no-fault, short crack-fault in tooth root, long crack-fault in tooth root, short crack-fault in pitch circle, long crack-fault in pitch circle, and wear-fault on tooth. Research shows that this combined method of detection and diagnosis can also identify the degree of damage of some faults. On this basis, the virtual instrument of the gear system which detects damage and diagnoses faults is developed by combining with advantages of MATLAB and VC++, employing component object module technology, adopting mixed programming methods, and calling the program transformed from an *.m file under VC++. This software system possesses functions of collecting and introducing vibration signals of gear, analyzing and processing signals, extracting features, visualizing graphics, detecting and

  5. Multi-fault clustering and diagnosis of gear system mined by spectrum entropy clustering based on higher order cumulants

    Science.gov (United States)

    Shao, Renping; Li, Jing; Hu, Wentao; Dong, Feifei

    2013-02-01

    Higher order cumulants (HOC) is a new kind of modern signal analysis of theory and technology. Spectrum entropy clustering (SEC) is a data mining method of statistics, extracting useful characteristics from a mass of nonlinear and non-stationary data. Following a discussion on the characteristics of HOC theory and SEC method in this paper, the study of signal processing techniques and the unique merits of nonlinear coupling characteristic analysis in processing random and non-stationary signals are introduced. Also, a new clustering analysis and diagnosis method is proposed for detecting multi-damage on gear by introducing the combination of HOC and SEC into the damage-detection and diagnosis of the gear system. The noise is restrained by HOC and by extracting coupling features and separating the characteristic signal at different speeds and frequency bands. Under such circumstances, the weak signal characteristics in the system are emphasized and the characteristic of multi-fault is extracted. Adopting a data-mining method of SEC conducts an analysis and diagnosis at various running states, such as the speed of 300 r/min, 900 r/min, 1200 r/min, and 1500 r/min of the following six signals: no-fault, short crack-fault in tooth root, long crack-fault in tooth root, short crack-fault in pitch circle, long crack-fault in pitch circle, and wear-fault on tooth. Research shows that this combined method of detection and diagnosis can also identify the degree of damage of some faults. On this basis, the virtual instrument of the gear system which detects damage and diagnoses faults is developed by combining with advantages of MATLAB and VC++, employing component object module technology, adopting mixed programming methods, and calling the program transformed from an *.m file under VC++. This software system possesses functions of collecting and introducing vibration signals of gear, analyzing and processing signals, extracting features, visualizing graphics, detecting and

  6. Mindfulness-based prevention for eating disorders: A school-based cluster randomized controlled study.

    Science.gov (United States)

    Atkinson, Melissa J; Wade, Tracey D

    2015-11-01

    Successful prevention of eating disorders represents an important goal due to damaging long-term impacts on health and well-being, modest treatment outcomes, and low treatment seeking among individuals at risk. Mindfulness-based approaches have received early support in the treatment of eating disorders, but have not been evaluated as a prevention strategy. This study aimed to assess the feasibility, acceptability, and efficacy of a novel mindfulness-based intervention for reducing the risk of eating disorders among adolescent females, under both optimal (trained facilitator) and task-shifted (non-expert facilitator) conditions. A school-based cluster randomized controlled trial was conducted in which 19 classes of adolescent girls (N = 347) were allocated to a three-session mindfulness-based intervention, dissonance-based intervention, or classes as usual control. A subset of classes (N = 156) receiving expert facilitation were analyzed separately as a proxy for delivery under optimal conditions. Task-shifted facilitation showed no significant intervention effects across outcomes. Under optimal facilitation, students receiving mindfulness demonstrated significant reductions in weight and shape concern, dietary restraint, thin-ideal internalization, eating disorder symptoms, and psychosocial impairment relative to control by 6-month follow-up. Students receiving dissonance showed significant reductions in socio-cultural pressures. There were no statistically significant differences between the two interventions. Moderate intervention acceptability was reported by both students and teaching staff. Findings show promise for the application of mindfulness in the prevention of eating disorders; however, further work is required to increase both impact and acceptability, and to enable successful outcomes when delivered by less expert providers. © 2015 Wiley Periodicals, Inc.

  7. Testing the galaxy cluster mass-observable relations at z = 1 with XMM-Newton and Chandra observations of XLSSJ022403.9-041328

    Science.gov (United States)

    Maughan, B. J.; Jones, L. R.; Pierre, M.; Andreon, S.; Birkinshaw, M.; Bremer, M. N.; Pacaud, F.; Ponman, T. J.; Valtchanov, I.; Willis, J.

    2008-07-01

    We present an analysis of deep XMM-Newton and Chandra observations of the z = 1.05 galaxy cluster XLSSJ022403.9-041328 (hereafter XLSSC029), detected in the XMM-Newton Large Scale Structure survey. Density and temperature profiles of the X-ray emitting gas were used to perform a hydrostatic mass analysis of the system. This allowed us to measure the total mass and gas fraction in the cluster and define overdensity radii R500 and R2500. The global properties of XLSSC029 were measured within these radii and compared with those of the local population. The gas mass fraction was found to be consistent with local clusters. The mean metal abundance was 0.18+0.17-0.15Zsolar, with the cluster core regions excluded, consistent with the predicted and observed evolution. The properties of XLSSC029 were then used to investigate the position of the cluster on the M-kT, YX-M and LX-M scaling relations. In all cases the observed properties of XLSSC029 agreed well with the simple self-similar evolution of the scaling relations. This is the first test of the evolution of these relations at z > 1 and supports the use of the scaling relations in cosmological studies with distant galaxy clusters. Based on observations obtained with XMM-Newton, an ESA science mission with instruments and contributions directly funded by ESA Member States and NASA. E-mail: ben.maughan@bristol.ac.uk ‡ Chandra fellow.

  8. Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study.

    Science.gov (United States)

    Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein

    2014-11-01

    Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.

  9. Monte Carlo test of the convergence of cluster expansions in Jastrow correlated nuclei

    Energy Technology Data Exchange (ETDEWEB)

    Bosca, M.C.; Buendia, E.; Guardiola, R.

    1987-11-26

    By comparing the results of the variational Monte Carlo method with various orders of a multiplicative cluster expansion appropriate to finite nuclear systems, we draw conclusions about the quality of the latter.

  10. Possible world based consistency learning model for clustering and classifying uncertain data.

    Science.gov (United States)

    Liu, Han; Zhang, Xianchao; Zhang, Xiaotong

    2018-06-01

    Possible world has shown to be effective for handling various types of data uncertainty in uncertain data management. However, few uncertain data clustering and classification algorithms are proposed based on possible world. Moreover, existing possible world based algorithms suffer from the following issues: (1) they deal with each possible world independently and ignore the consistency principle across different possible worlds; (2) they require the extra post-processing procedure to obtain the final result, which causes that the effectiveness highly relies on the post-processing method and the efficiency is also not very good. In this paper, we propose a novel possible world based consistency learning model for uncertain data, which can be extended both for clustering and classifying uncertain data. This model utilizes the consistency principle to learn a consensus affinity matrix for uncertain data, which can make full use of the information across different possible worlds and then improve the clustering and classification performance. Meanwhile, this model imposes a new rank constraint on the Laplacian matrix of the consensus affinity matrix, thereby ensuring that the number of connected components in the consensus affinity matrix is exactly equal to the number of classes. This also means that the clustering and classification results can be directly obtained without any post-processing procedure. Furthermore, for the clustering and classification tasks, we respectively derive the efficient optimization methods to solve the proposed model. Experimental results on real benchmark datasets and real world uncertain datasets show that the proposed model outperforms the state-of-the-art uncertain data clustering and classification algorithms in effectiveness and performs competitively in efficiency. Copyright © 2018 Elsevier Ltd. All rights reserved.

  11. Application of attribute weighting method based on clustering centers to discrimination of linearly non-separable medical datasets.

    Science.gov (United States)

    Polat, Kemal

    2012-08-01

    In this paper, attribute weighting method based on the cluster centers with aim of increasing the discrimination between classes has been proposed and applied to nonlinear separable datasets including two medical datasets (mammographic mass dataset and bupa liver disorders dataset) and 2-D spiral dataset. The goals of this method are to gather the data points near to cluster center all together to transform from nonlinear separable datasets to linear separable dataset. As clustering algorithm, k-means clustering, fuzzy c-means clustering, and subtractive clustering have been used. The proposed attribute weighting methods are k-means clustering based attribute weighting (KMCBAW), fuzzy c-means clustering based attribute weighting (FCMCBAW), and subtractive clustering based attribute weighting (SCBAW) and used prior to classifier algorithms including C4.5 decision tree and adaptive neuro-fuzzy inference system (ANFIS). To evaluate the proposed method, the recall, precision value, true negative rate (TNR), G-mean1, G-mean2, f-measure, and classification accuracy have been used. The results have shown that the best attribute weighting method was the subtractive clustering based attribute weighting with respect to classification performance in the classification of three used datasets.

  12. Environment-based selection effects of Planck clusters

    Energy Technology Data Exchange (ETDEWEB)

    Kosyra, R.; Gruen, D.; Seitz, S.; Mana, A.; Rozo, E.; Rykoff, E.; Sanchez, A.; Bender, R.

    2015-07-24

    We investigate whether the large-scale structure environment of galaxy clusters imprints a selection bias on Sunyaev–Zel'dovich (SZ) catalogues. Such a selection effect might be caused by line of sight (LoS) structures that add to the SZ signal or contain point sources that disturb the signal extraction in the SZ survey. We use the Planck PSZ1 union catalogue in the Sloan Digital Sky Survey (SDSS) region as our sample of SZ-selected clusters. We calculate the angular two-point correlation function (2pcf) for physically correlated, foreground and background structure in the RedMaPPer SDSS DR8 catalogue with respect to each cluster. We compare our results with an optically selected comparison cluster sample and with theoretical predictions. In contrast to the hypothesis of no environment-based selection, we find a mean 2pcf for background structures of -0.049 on scales of ≲40 arcmin, significantly non-zero at ~4σ, which means that Planck clusters are more likely to be detected in regions of low background density. We hypothesize this effect arises either from background estimation in the SZ survey or from radio sources in the background. We estimate the defect in SZ signal caused by this effect to be negligibly small, of the order of ~10-4 of the signal of a typical Planck detection. Analogously, there are no implications on X-ray mass measurements. However, the environmental dependence has important consequences for weak lensing follow up of Planck galaxy clusters: we predict that projection effects account for half of the mass contained within a 15 arcmin radius of Planck galaxy clusters. We did not detect a background underdensity of CMASS LRGs, which also leaves a spatially varying redshift dependence of the Planck SZ selection function as a possible cause for our findings.

  13. PCA based clustering for brain tumor segmentation of T1w MRI images.

    Science.gov (United States)

    Kaya, Irem Ersöz; Pehlivanlı, Ayça Çakmak; Sekizkardeş, Emine Gezmez; Ibrikci, Turgay

    2017-03-01

    Medical images are huge collections of information that are difficult to store and process consuming extensive computing time. Therefore, the reduction techniques are commonly used as a data pre-processing step to make the image data less complex so that a high-dimensional data can be identified by an appropriate low-dimensional representation. PCA is one of the most popular multivariate methods for data reduction. This paper is focused on T1-weighted MRI images clustering for brain tumor segmentation with dimension reduction by different common Principle Component Analysis (PCA) algorithms. Our primary aim is to present a comparison between different variations of PCA algorithms on MRIs for two cluster methods. Five most common PCA algorithms; namely the conventional PCA, Probabilistic Principal Component Analysis (PPCA), Expectation Maximization Based Principal Component Analysis (EM-PCA), Generalize Hebbian Algorithm (GHA), and Adaptive Principal Component Extraction (APEX) were applied to reduce dimensionality in advance of two clustering algorithms, K-Means and Fuzzy C-Means. In the study, the T1-weighted MRI images of the human brain with brain tumor were used for clustering. In addition to the original size of 512 lines and 512 pixels per line, three more different sizes, 256 × 256, 128 × 128 and 64 × 64, were included in the study to examine their effect on the methods. The obtained results were compared in terms of both the reconstruction errors and the Euclidean distance errors among the clustered images containing the same number of principle components. According to the findings, the PPCA obtained the best results among all others. Furthermore, the EM-PCA and the PPCA assisted K-Means algorithm to accomplish the best clustering performance in the majority as well as achieving significant results with both clustering algorithms for all size of T1w MRI images. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  14. Aromatic character of planar boron-based clusters revisited by ring current calculations

    NARCIS (Netherlands)

    Hung Tan Pham, [Unknown; Lim, Kie Zen; Havenith, Remco W. A.; Minh Tho Nguyen, [No Value

    2016-01-01

    The planarity of small boron-based clusters is the result of an interplay between geometry, electron delocalization, covalent bonding and stability. These compounds contain two different bonding patterns involving both sigma and pi delocalized bonds, and up to now, their aromaticity has been

  15. A novel analysis of spring phenological patterns over Europe based on co-clustering

    NARCIS (Netherlands)

    Wu, X.; Zurita-Milla, R.; Kraak, M.J.

    2016-01-01

    The study of phenological patterns and their dynamics provides insights into the impacts of climate change on terrestrial ecosystems. Here we present a novel analytical workflow, based on co-clustering, that enables the concurrent study of spatio-temporal patterns in spring phenology. The workflow

  16. The Home Care Crew Scheduling Problem: Preference-based visit clustering and temporal dependencies

    DEFF Research Database (Denmark)

    Rasmussen, Matias Sevel; Justesen, Tor Fog; Dohn, Anders Høeg

    2012-01-01

    branch-and-price solution algorithm, as this method has previously given solid results for classical vehicle routing problems. Temporal dependencies are modelled as generalised precedence constraints and enforced through the branching. We introduce a novel visit clustering approach based on the soft...

  17. Earthquakes clustering based on the magnitude and the depths in Molluca Province

    Energy Technology Data Exchange (ETDEWEB)

    Wattimanela, H. J., E-mail: hwattimaela@yahoo.com [Pattimura University, Ambon (Indonesia); Institute of Technology Bandung, Bandung (Indonesia); Pasaribu, U. S.; Indratno, S. W.; Puspito, A. N. T. [Institute of Technology Bandung, Bandung (Indonesia)

    2015-12-22

    In this paper, we present a model to classify the earthquakes occurred in Molluca Province. We use K-Means clustering method to classify the earthquake based on the magnitude and the depth of the earthquake. The result can be used for disaster mitigation and for designing evacuation route in Molluca Province.

  18. Group analyses of connectivity-based cortical parcellation using repeated k-means clustering

    NARCIS (Netherlands)

    Nanetti, Luca; Cerliani, Leonardo; Gazzola, Valeria; Renken, Remco; Keysers, Christian

    2009-01-01

    K-means clustering has become a popular tool for connectivity-based cortical segmentation using Diffusion Weighted Imaging (DWI) data. A sometimes ignored issue is, however, that the output of the algorithm depends on the initial placement of starting points, and that different sets of starting

  19. Maximizing genetic differentiation in core collections by PCA-based clustering of molecular marker data

    NARCIS (Netherlands)

    Heerwaarden, van J.; Odong, T.L.; Eeuwijk, van F.A.

    2013-01-01

    Developing genetically diverse core sets is key to the effective management and use of crop genetic resources. Core selection increasingly uses molecular marker-based dissimilarity and clustering methods, under the implicit assumption that markers and genes of interest are genetically correlated. In

  20. Earthquakes clustering based on the magnitude and the depths in Molluca Province

    International Nuclear Information System (INIS)

    Wattimanela, H. J.; Pasaribu, U. S.; Indratno, S. W.; Puspito, A. N. T.

    2015-01-01

    In this paper, we present a model to classify the earthquakes occurred in Molluca Province. We use K-Means clustering method to classify the earthquake based on the magnitude and the depth of the earthquake. The result can be used for disaster mitigation and for designing evacuation route in Molluca Province

  1. Fuzzy modeling based on generalized neural networks and fuzzy clustering objective functions

    Science.gov (United States)

    Sun, Chuen-Tsai; Jang, Jyh-Shing

    1991-01-01

    An approach to the formulation of fuzzy if-then rules based on clustering objective functions is proposed. The membership functions are then calibrated with the generalized neural networks technique to achieve a desired input-output mapping. The learning procedure is basically a gradient-descent algorithm. A Kalman filter algorithm is used to improve the overall performance.

  2. Performance Evaluation of a Cluster-Based Service Discovery Protocol for Heterogeneous Wireless Sensor Networks

    NARCIS (Netherlands)

    Marin Perianu, Raluca; Scholten, Johan; Havinga, Paul J.M.; Hartel, Pieter H.

    2006-01-01

    Abstract—This paper evaluates the performance in terms of resource consumption of a service discovery protocol proposed for heterogeneous Wireless Sensor Networks (WSNs). The protocol is based on a clustering structure, which facilitates the construction of a distributed directory. Nodes with higher

  3. Using Cluster Analysis to Segment Students Based on Self-Reported Emotionally Intelligent Leadership Behaviors

    Science.gov (United States)

    Facca, Tina M.; Allen, Scott J.

    2011-01-01

    Using emotionally intelligent leadership (EIL) as the model, the authors identify behaviors that three levels of leaders engage in based on a self-report inventory (Emotionally Intelligent Leadership for Students-Inventory). Three clusters of students are identified: those that are "Less-involved, Less Others-oriented,"…

  4. An octacobalt cluster based, (3,12)-connected, magnetic, porous coordination polymer.

    Science.gov (United States)

    Hou, Lei; Zhang, Wei-Xiong; Zhang, Jie-Peng; Xue, Wei; Zhang, Yue-Biao; Chen, Xiao-Ming

    2010-09-14

    Solvothermal reaction of CoSO(4) with 2,6-di-p-carboxyphenyl-4,4'-bipyridine affords a novel, octacobalt cluster based, (3,12)-connected porous framework, which exhibits gas sorption and spin-glassy magnetic behaviour.

  5. The impact of semantic document expansion on cluster-based fusion for microblog search

    NARCIS (Netherlands)

    Liang, S.; Ren, Z.; de Rijke, M.; de Rijke, M.; Kenter, T.; de Vries, A.P.; Zhai, C.X.; de Jong, F.; Radinsky, K.; Hofmann, K.

    2014-01-01

    Searching microblog posts, with their limited length and creative language usage, is challenging. We frame the microblog search problem as a data fusion problem. We examine the effectiveness of a recent cluster-based fusion method on the task of retrieving microblog posts. We find that in the

  6. An integrated approach to fingerprint indexing using spectral clustering based on minutiae points

    CSIR Research Space (South Africa)

    Mngenge, NA

    2015-07-01

    Full Text Available and Information Conference 2015 July 28-30, 2015 | London, UK An Integrated Approach to Fingerprint Indexing Using Spectral Clustering Based on Minutiae Points 1Ntethelelo A. Mngenge Linda and Mthembu 2Fulufhelo V. Nelwamondo and Cynthia H. Ngejane 1School...

  7. Refined tropical curve counts and canonical bases for quantum cluster algebras

    DEFF Research Database (Denmark)

    Mandel, Travis

    We express the (quantizations of the) Gross-Hacking-Keel-Kontsevich canonical bases for cluster algebras in terms of certain (Block-Göttsche) weighted counts of tropical curves. In the process, we obtain via scattering diagram techniques a new invariance result for these Block-Göttsche counts....

  8. Clustering cliques for graph-based summarization of the biomedical research literature

    DEFF Research Database (Denmark)

    Zhang, Han; Fiszman, Marcelo; Shin, Dongwook

    2013-01-01

    Background: Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts).Results: Sem...

  9. Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm.

    Science.gov (United States)

    Gibbons, Theodore R; Mount, Stephen M; Cooper, Endymion D; Delwiche, Charles F

    2015-07-10

    Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future

  10. Simulation-based Testing of Control Software

    Energy Technology Data Exchange (ETDEWEB)

    Ozmen, Ozgur [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Nutaro, James J. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Sanyal, Jibonananda [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Olama, Mohammed M. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

    2017-02-10

    It is impossible to adequately test complex software by examining its operation in a physical prototype of the system monitored. Adequate test coverage can require millions of test cases, and the cost of equipment prototypes combined with the real-time constraints of testing with them makes it infeasible to sample more than a small number of these tests. Model based testing seeks to avoid this problem by allowing for large numbers of relatively inexpensive virtual prototypes that operate in simulation time at a speed limited only by the available computing resources. In this report, we describe how a computer system emulator can be used as part of a model based testing environment; specifically, we show that a complete software stack including operating system and application software - can be deployed within a simulated environment, and that these simulations can proceed as fast as possible. To illustrate this approach to model based testing, we describe how it is being used to test several building control systems that act to coordinate air conditioning loads for the purpose of reducing peak demand. These tests involve the use of ADEVS (A Discrete Event System Simulator) and QEMU (Quick Emulator) to host the operational software within the simulation, and a building model developed with the MODELICA programming language using Buildings Library and packaged as an FMU (Functional Mock-up Unit) that serves as the virtual test environment.

  11. Model-based testing for embedded systems

    CERN Document Server

    Zander, Justyna; Mosterman, Pieter J

    2011-01-01

    What the experts have to say about Model-Based Testing for Embedded Systems: "This book is exactly what is needed at the exact right time in this fast-growing area. From its beginnings over 10 years ago of deriving tests from UML statecharts, model-based testing has matured into a topic with both breadth and depth. Testing embedded systems is a natural application of MBT, and this book hits the nail exactly on the head. Numerous topics are presented clearly, thoroughly, and concisely in this cutting-edge book. The authors are world-class leading experts in this area and teach us well-used

  12. A three-stage strategy for optimal price offering by a retailer based on clustering techniques

    International Nuclear Information System (INIS)

    Mahmoudi-Kohan, N.; Shayesteh, E.; Moghaddam, M. Parsa; Sheikh-El-Eslami, M.K.

    2010-01-01

    In this paper, an innovative strategy for optimal price offering to customers for maximizing the profit of a retailer is proposed. This strategy is based on load profile clustering techniques and includes three stages. For the purpose of clustering, an improved weighted fuzzy average K-means is proposed. Also, in this paper a new acceptance function for increasing the profit of the retailer is proposed. The new method is evaluated by implementation on a group of 300 customers of a 20 kV distribution network. (author)

  13. An Intrusion Detection System Based on Multi-Level Clustering for Hierarchical Wireless Sensor Networks.

    Science.gov (United States)

    Butun, Ismail; Ra, In-Ho; Sankar, Ravi

    2015-11-17

    In this work, an intrusion detection system (IDS) framework based on multi-level clustering for hierarchical wireless sensor networks is proposed. The framework employs two types of intrusion detection approaches: (1) "downward-IDS (D-IDS)" to detect the abnormal behavior (intrusion) of the subordinate (member) nodes; and (2) "upward-IDS (U-IDS)" to detect the abnormal behavior of the cluster heads. By using analytical calculations, the optimum parameters for the D-IDS (number of maximum hops) and U-IDS (monitoring group size) of the framework are evaluated and presented.

  14. An Intrusion Detection System Based on Multi-Level Clustering for Hierarchical Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Ismail Butun

    2015-11-01

    Full Text Available In this work, an intrusion detection system (IDS framework based on multi-level clustering for hierarchical wireless sensor networks is proposed. The framework employs two types of intrusion detection approaches: (1 “downward-IDS (D-IDS” to detect the abnormal behavior (intrusion of the subordinate (member nodes; and (2 “upward-IDS (U-IDS” to detect the abnormal behavior of the cluster heads. By using analytical calculations, the optimum parameters for the D-IDS (number of maximum hops and U-IDS (monitoring group size of the framework are evaluated and presented.

  15. Selection of representative embankments based on rough set - fuzzy clustering method

    Science.gov (United States)

    Bin, Ou; Lin, Zhi-xiang; Fu, Shu-yan; Gao, Sheng-song

    2018-02-01

    The premise condition of comprehensive evaluation of embankment safety is selection of representative unit embankment, on the basis of dividing the unit levee the influencing factors and classification of the unit embankment are drafted.Based on the rough set-fuzzy clustering, the influence factors of the unit embankment are measured by quantitative and qualitative indexes.Construct to fuzzy similarity matrix of standard embankment then calculate fuzzy equivalent matrix of fuzzy similarity matrix by square method. By setting the threshold of the fuzzy equivalence matrix, the unit embankment is clustered, and the representative unit embankment is selected from the classification of the embankment.

  16. Moving toward endotypes in atopic dermatitis: Identification of patient clusters based on serum biomarker analysis.

    Science.gov (United States)

    Thijs, Judith L; Strickland, Ian; Bruijnzeel-Koomen, Carla A F M; Nierkens, Stefan; Giovannone, Barbara; Csomor, Eszter; Sellman, Bret R; Mustelin, Tomas; Sleeman, Matthew A; de Bruin-Weller, Marjolein S; Herath, Athula; Drylewicz, Julia; May, Richard D; Hijnen, DirkJan

    2017-09-01

    Atopic dermatitis (AD) is a complex, chronic, inflammatory skin disease with a diverse clinical presentation. However, it is unclear whether this diversity exists at a biological level. We sought to test the hypothesis that AD is heterogeneous at the biological level of individual inflammatory mediators. Sera from 193 adult patients with moderate-to-severe AD (six area, six sign atopic dermatitis [SASSAD] score: geometric mean, 22.3 [95% CI, 21.3-23.3] and 39.1 [95% CI, 37.5-40.9], respectively) and 30 healthy control subjects without AD were analyzed for 147 serum mediators, total IgE levels, and 130 allergen-specific IgE levels. Population heterogeneity was assessed by using principal component analysis, followed by unsupervised k-means cluster analysis of the principal components. Patients with AD showed pronounced evidence of inflammation compared with healthy control subjects. Principal component analysis of data on sera from patients with AD revealed the presence of 4 potential clusters. Fifty-seven principal components described approximately 90% of the variance. Unsupervised k-means cluster analysis of the 57 largest principal components delivered 4 distinct clusters of patients with AD. Cluster 1 had high SASSAD scores and body surface areas with the highest levels of pulmonary and activation-regulated chemokine, tissue inhibitor of metalloproteinases 1, and soluble CD14. Cluster 2 had low SASSAD scores with the lowest levels of IFN-α, tissue inhibitor of metalloproteinases 1, and vascular endothelial growth factor. Cluster 3 had high SASSAD scores with the lowest levels of IFN-β, IL-1, and epithelial cytokines. Cluster 4 had low SASSAD scores but the highest levels of the inflammatory markers IL-1, IL-4, IL-13, and thymic stromal lymphopoietin. AD is a heterogeneous disease both clinically and biologically. Four distinct clusters of patients with AD have been identified that could represent endotypes with unique biological mechanisms. Elucidation of

  17. Creating multithemed ecological regions for macroscale ecology: Testing a flexible, repeatable, and accessible clustering method

    Science.gov (United States)

    Cheruvelil, Kendra Spence; Yuan, Shuai; Webster, Katherine E.; Tan, Pang-Ning; Lapierre, Jean-Francois; Collins, Sarah M.; Fergus, C. Emi; Scott, Caren E.; Norton Henry, Emily; Soranno, Patricia A.; Filstrup, Christopher T.; Wagner, Tyler

    2017-01-01

    Understanding broad-scale ecological patterns and processes often involves accounting for regional-scale heterogeneity. A common way to do so is to include ecological regions in sampling schemes and empirical models. However, most existing ecological regions were developed for specific purposes, using a limited set of geospatial features and irreproducible methods. Our study purpose was to: (1) describe a method that takes advantage of recent computational advances and increased availability of regional and global data sets to create customizable and reproducible ecological regions, (2) make this algorithm available for use and modification by others studying different ecosystems, variables of interest, study extents, and macroscale ecology research questions, and (3) demonstrate the power of this approach for the research question—How well do these regions capture regional-scale variation in lake water quality? To achieve our purpose we: (1) used a spatially constrained spectral clustering algorithm that balances geospatial homogeneity and region contiguity to create ecological regions using multiple terrestrial, climatic, and freshwater geospatial data for 17 northeastern U.S. states (~1,800,000 km2); (2) identified which of the 52 geospatial features were most influential in creating the resulting 100 regions; and (3) tested the ability of these ecological regions to capture regional variation in water nutrients and clarity for ~6,000 lakes. We found that: (1) a combination of terrestrial, climatic, and freshwater geospatial features influenced region creation, suggesting that the oft-ignored freshwater landscape provides novel information on landscape variability not captured by traditionally used climate and terrestrial metrics; and (2) the delineated regions captured macroscale heterogeneity in ecosystem properties not included in region delineation—approximately 40% of the variation in total phosphorus and water clarity among lakes was at the regional

  18. Investigation on IMCP based clustering in LTE-M communication for smart metering applications

    Directory of Open Access Journals (Sweden)

    Kartik Vishal Deshpande

    2017-06-01

    Full Text Available Machine to Machine (M2M is foreseen as an emerging technology for smart metering applications where devices communicate seamlessly for information transfer. The M2M communication makes use of long term evolution (LTE as its backbone network and it results in long-term evolution for machine type communication (LTE-M network. As huge number of M2M devices is to be handled by single eNB (evolved Node B, clustering is exploited for efficient processing of the network. This paper investigates the proposed Improved M2M Clustering Process (IMCP based clustering technique and it is compared with two well-known clustering algorithms, namely, Low Energy Adaptive Clustering Hierarchical (LEACH and Energy Aware Multihop Multipath Hierarchical (EAMMH techniques. Further, the IMCP algorithm is analyzed with two-tier and three-tier M2M systems for various mobility conditions. The proposed IMCP algorithm improves the last node death by 63.15% and 51.61% as compared to LEACH and EAMMH, respectively. Further, the average energy of each node in IMCP is increased by 89.85% and 81.15%, as compared to LEACH and EAMMH, respectively.

  19. An adaptive enhancement algorithm for infrared video based on modified k-means clustering

    Science.gov (United States)

    Zhang, Linze; Wang, Jingqi; Wu, Wen

    2016-09-01

    In this paper, we have proposed a video enhancement algorithm to improve the output video of the infrared camera. Sometimes the video obtained by infrared camera is very dark since there is no clear target. In this case, infrared video should be divided into frame images by frame extraction, in order to carry out the image enhancement. For the first frame image, which can be divided into k sub images by using K-means clustering according to the gray interval it occupies before k sub images' histogram equalization according to the amount of information per sub image, we used a method to solve a problem that final cluster centers close to each other in some cases; and for the other frame images, their initial cluster centers can be determined by the final clustering centers of the previous ones, and the histogram equalization of each sub image will be carried out after image segmentation based on K-means clustering. The histogram equalization can make the gray value of the image to the whole gray level, and the gray level of each sub image is determined by the ratio of pixels to a frame image. Experimental results show that this algorithm can improve the contrast of infrared video where night target is not obvious which lead to a dim scene, and reduce the negative effect given by the overexposed pixels adaptively in a certain range.

  20. Nonnegative Matrix Factorization-Based Spatial-Temporal Clustering for Multiple Sensor Data Streams

    Directory of Open Access Journals (Sweden)

    Di-Hua Sun

    2014-01-01

    Full Text Available Cyber physical systems have grown exponentially and have been attracting a lot of attention over the last few years. To retrieve and mine the useful information from massive amounts of sensor data streams with spatial, temporal, and other multidimensional information has become an active research area. Moreover, recent research has shown that clusters of streams change with a comprehensive spatial-temporal viewpoint in real applications. In this paper, we propose a spatial-temporal clustering algorithm (STClu based on nonnegative matrix trifactorization by utilizing time-series observational data streams and geospatial relationship for clustering multiple sensor data streams. Instead of directly clustering multiple data streams periodically, STClu incorporates the spatial relationship between two sensors in proximity and integrates the historical information into consideration. Furthermore, we develop an iterative updating optimization algorithm STClu. The effectiveness and efficiency of the algorithm STClu are both demonstrated in experiments on real and synthetic data sets. The results show that the proposed STClu algorithm outperforms existing methods for clustering sensor data streams.

  1. A genomics based discovery of secondary metabolite biosynthetic gene clusters in Aspergillus ustus.

    Directory of Open Access Journals (Sweden)

    Borui Pi

    Full Text Available Secondary metabolites (SMs produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.

  2. Form gene clustering method about pan-ethnic-group products based on emotional semantic

    Science.gov (United States)

    Chen, Dengkai; Ding, Jingjing; Gao, Minzhuo; Ma, Danping; Liu, Donghui

    2016-09-01

    The use of pan-ethnic-group products form knowledge primarily depends on a designer's subjective experience without user participation. The majority of studies primarily focus on the detection of the perceptual demands of consumers from the target product category. A pan-ethnic-group products form gene clustering method based on emotional semantic is constructed. Consumers' perceptual images of the pan-ethnic-group products are obtained by means of product form gene extraction and coding and computer aided product form clustering technology. A case of form gene clustering about the typical pan-ethnic-group products is investigated which indicates that the method is feasible. This paper opens up a new direction for the future development of product form design which improves the agility of product design process in the era of Industry 4.0.

  3. 3D Building Models Segmentation Based on K-Means++ Cluster Analysis

    Science.gov (United States)

    Zhang, C.; Mao, B.

    2016-10-01

    3D mesh model segmentation is drawing increasing attentions from digital geometry processing field in recent years. The original 3D mesh model need to be divided into separate meaningful parts or surface patches based on certain standards to support reconstruction, compressing, texture mapping, model retrieval and etc. Therefore, segmentation is a key problem for 3D mesh model segmentation. In this paper, we propose a method to segment Collada (a type of mesh model) 3D building models into meaningful parts using cluster analysis. Common clustering methods segment 3D mesh models by K-means, whose performance heavily depends on randomized initial seed points (i.e., centroid) and different randomized centroid can get quite different results. Therefore, we improved the existing method and used K-means++ clustering algorithm to solve this problem. Our experiments show that K-means++ improves both the speed and the accuracy of K-means, and achieve good and meaningful results.

  4. 3D BUILDING MODELS SEGMENTATION BASED ON K-MEANS++ CLUSTER ANALYSIS

    Directory of Open Access Journals (Sweden)

    C. Zhang

    2016-10-01

    Full Text Available 3D mesh model segmentation is drawing increasing attentions from digital geometry processing field in recent years. The original 3D mesh model need to be divided into separate meaningful parts or surface patches based on certain standards to support reconstruction, compressing, texture mapping, model retrieval and etc. Therefore, segmentation is a key problem for 3D mesh model segmentation. In this paper, we propose a method to segment Collada (a type of mesh model 3D building models into meaningful parts using cluster analysis. Common clustering methods segment 3D mesh models by K-means, whose performance heavily depends on randomized initial seed points (i.e., centroid and different randomized centroid can get quite different results. Therefore, we improved the existing method and used K-means++ clustering algorithm to solve this problem. Our experiments show that K-means++ improves both the speed and the accuracy of K-means, and achieve good and meaningful results.

  5. WebGimm: An integrated web-based platform for cluster analysis, functional analysis, and interactive visualization of results

    Directory of Open Access Journals (Sweden)

    Medvedovic Mario

    2011-01-01

    Full Text Available Abstract Cluster analysis methods have been extensively researched, but the adoption of new methods is often hindered by technical barriers in their implementation and use. WebGimm is a free cluster analysis web-service, and an open source general purpose clustering web-server infrastructure designed to facilitate easy deployment of integrated cluster analysis servers based on clustering and functional annotation algorithms implemented in R. Integrated functional analyses and interactive browsing of both, clustering structure and functional annotations provides a complete analytical environment for cluster analysis and interpretation of results. The Java Web Start client-based interface is modeled after the familiar cluster/treeview packages making its use intuitive to a wide array of biomedical researchers. For biomedical researchers, WebGimm provides an avenue to access state of the art clustering procedures. For Bioinformatics methods developers, WebGimm offers a convenient avenue to deploy their newly developed clustering methods. WebGimm server, software and manuals can be freely accessed at http://ClusterAnalysis.org/.

  6. WebGimm: An integrated web-based platform for cluster analysis, functional analysis, and interactive visualization of results.

    Science.gov (United States)

    Joshi, Vineet K; Freudenberg, Johannes M; Hu, Zhen; Medvedovic, Mario

    2011-01-17

    Cluster analysis methods have been extensively researched, but the adoption of new methods is often hindered by technical barriers in their implementation and use. WebGimm is a free cluster analysis web-service, and an open source general purpose clustering web-server infrastructure designed to facilitate easy deployment of integrated cluster analysis servers based on clustering and functional annotation algorithms implemented in R. Integrated functional analyses and interactive browsing of both, clustering structure and functional annotations provides a complete analytical environment for cluster analysis and interpretation of results. The Java Web Start client-based interface is modeled after the familiar cluster/treeview packages making its use intuitive to a wide array of biomedical researchers. For biomedical researchers, WebGimm provides an avenue to access state of the art clustering procedures. For Bioinformatics methods developers, WebGimm offers a convenient avenue to deploy their newly developed clustering methods. WebGimm server, software and manuals can be freely accessed at http://ClusterAnalysis.org/.

  7. A Combinational Clustering Based Method for cDNA Microarray Image Segmentation.

    Directory of Open Access Journals (Sweden)

    Guifang Shao

    Full Text Available Microarray technology plays an important role in drawing useful biological conclusions by analyzing thousands of gene expressions simultaneously. Especially, image analysis is a key step in microarray analysis and its accuracy strongly depends on segmentation. The pioneering works of clustering based segmentation have shown that k-means clustering algorithm and moving k-means clustering algorithm are two commonly used methods in microarray image processing. However, they usually face unsatisfactory results because the real microarray image contains noise, artifacts and spots that vary in size, shape and contrast. To improve the segmentation accuracy, in this article we present a combination clustering based segmentation approach that may be more reliable and able to segment spots automatically. First, this new method starts with a very simple but effective contrast enhancement operation to improve the image quality. Then, an automatic gridding based on the maximum between-class variance is applied to separate the spots into independent areas. Next, among each spot region, the moving k-means clustering is first conducted to separate the spot from background and then the k-means clustering algorithms are combined for those spots failing to obtain the entire boundary. Finally, a refinement step is used to replace the false segmentation and the inseparable ones of missing spots. In addition, quantitative comparisons between the improved method and the other four segmentation algorithms--edge detection, thresholding, k-means clustering and moving k-means clustering--are carried out on cDNA microarray images from six different data sets. Experiments on six different data sets, 1 Stanford Microarray Database (SMD, 2 Gene Expression Omnibus (GEO, 3 Baylor College of Medicine (BCM, 4 Swiss Institute of Bioinformatics (SIB, 5 Joe DeRisi's individual tiff files (DeRisi, and 6 University of California, San Francisco (UCSF, indicate that the improved

  8. A Combinational Clustering Based Method for cDNA Microarray Image Segmentation.

    Science.gov (United States)

    Shao, Guifang; Li, Tiejun; Zuo, Wangda; Wu, Shunxiang; Liu, Tundong

    2015-01-01

    Microarray technology plays an important role in drawing useful biological conclusions by analyzing thousands of gene expressions simultaneously. Especially, image analysis is a key step in microarray analysis and its accuracy strongly depends on segmentation. The pioneering works of clustering based segmentation have shown that k-means clustering algorithm and moving k-means clustering algorithm are two commonly used methods in microarray image processing. However, they usually face unsatisfactory results because the real microarray image contains noise, artifacts and spots that vary in size, shape and contrast. To improve the segmentation accuracy, in this article we present a combination clustering based segmentation approach that may be more reliable and able to segment spots automatically. First, this new method starts with a very simple but effective contrast enhancement operation to improve the image quality. Then, an automatic gridding based on the maximum between-class variance is applied to separate the spots into independent areas. Next, among each spot region, the moving k-means clustering is first conducted to separate the spot from background and then the k-means clustering algorithms are combined for those spots failing to obtain the entire boundary. Finally, a refinement step is used to replace the false segmentation and the inseparable ones of missing spots. In addition, quantitative comparisons between the improved method and the other four segmentation algorithms--edge detection, thresholding, k-means clustering and moving k-means clustering--are carried out on cDNA microarray images from six different data sets. Experiments on six different data sets, 1) Stanford Microarray Database (SMD), 2) Gene Expression Omnibus (GEO), 3) Baylor College of Medicine (BCM), 4) Swiss Institute of Bioinformatics (SIB), 5) Joe DeRisi's individual tiff files (DeRisi), and 6) University of California, San Francisco (UCSF), indicate that the improved approach is

  9. Cluster-based control of a separating flow over a smoothly contoured ramp

    Science.gov (United States)

    Kaiser, Eurika; Noack, Bernd R.; Spohn, Andreas; Cattafesta, Louis N.; Morzyński, Marek

    2017-12-01

    The ability to manipulate and control fluid flows is of great importance in many scientific and engineering applications. The proposed closed-loop control framework addresses a key issue of model-based control: The actuation effect often results from slow dynamics of strongly nonlinear interactions which the flow reveals at timescales much longer than the prediction horizon of any model. Hence, we employ a probabilistic approach based on a cluster-based discretization of the Liouville equation for the evolution of the probability distribution. The proposed methodology frames high-dimensional, nonlinear dynamics into low-dimensional, probabilistic, linear dynamics which considerably simplifies the optimal control problem while preserving nonlinear actuation mechanisms. The data-driven approach builds upon a state space discretization using a clustering algorithm which groups kinematically similar flow states into a low number of clusters. The temporal evolution of the probability distribution on this set of clusters is then described by a control-dependent Markov model. This Markov model can be used as predictor for the ergodic probability distribution for a particular control law. This probability distribution approximates the long-term behavior of the original system on which basis the optimal control law is determined. We examine how the approach can be used to improve the open-loop actuation in a separating flow dominated by Kelvin-Helmholtz shedding. For this purpose, the feature space, in which the model is learned, and the admissible control inputs are tailored to strongly oscillatory flows.

  10. Cluster based statistical feature extraction method for automatic bleeding detection in wireless capsule endoscopy video.

    Science.gov (United States)

    Ghosh, Tonmoy; Fattah, Shaikh Anowarul; Wahid, Khan A; Zhu, Wei-Ping; Ahmad, M Omair

    2018-03-01

    Wireless capsule endoscopy (WCE) is capable of demonstrating the entire gastrointestinal tract at an expense of exhaustive reviewing process for detecting bleeding disorders. The main objective is to develop an automatic method for identifying the bleeding frames and zones from WCE video. Different statistical features are extracted from the overlapping spatial blocks of the preprocessed WCE image in a transformed color plane containing green to red pixel ratio. The unique idea of the proposed method is to first perform unsupervised clustering of different blocks for obtaining two clusters and then extract cluster based features (CBFs). Finally, a global feature consisting of the CBFs and differential CBF is used to detect bleeding frame via supervised classification. In order to handle continuous WCE video, a post-processing scheme is introduced utilizing the feature trends in neighboring frames. The CBF along with some morphological operations is employed to identify bleeding zones. Based on extensive experimentation on several WCE videos, it is found that the proposed method offers significantly better performance in comparison to some existing methods in terms of bleeding detection accuracy, sensitivity, specificity and precision in bleeding zone detection. It is found that the bleeding detection performance obtained by using the proposed CBF based global feature is better than the feature extracted from the non-clustered image. The proposed method can reduce the burden of physicians in investigating WCE video to detect bleeding frame and zone with a high level of accuracy. Copyright © 2018 Elsevier Ltd. All rights reserved.

  11. Image Retrieval Based on Multiview Constrained Nonnegative Matrix Factorization and Gaussian Mixture Model Spectral Clustering Method

    Directory of Open Access Journals (Sweden)

    Qunyi Xie

    2016-01-01

    Full Text Available Content-based image retrieval has recently become an important research topic and has been widely used for managing images from repertories. In this article, we address an efficient technique, called MNGS, which integrates multiview constrained nonnegative matrix factorization (NMF and Gaussian mixture model- (GMM- based spectral clustering for image retrieval. In the proposed methodology, the multiview NMF scheme provides competitive sparse representations of underlying images through decomposition of a similarity-preserving matrix that is formed by fusing multiple features from different visual aspects. In particular, the proposed method merges manifold constraints into the standard NMF objective function to impose an orthogonality constraint on the basis matrix and satisfy the structure preservation requirement of the coefficient matrix. To manipulate the clustering method on sparse representations, this paper has developed a GMM-based spectral clustering method in which the Gaussian components are regrouped in spectral space, which significantly improves the retrieval effectiveness. In this way, image retrieval of the whole database translates to a nearest-neighbour search in the cluster containing the query image. Simultaneously, this study investigates the proof of convergence of the objective function and the analysis of the computational complexity. Experimental results on three standard image datasets reveal the advantages that can be achieved with the proposed retrieval scheme.

  12. Implementation of a couple-based HIV prevention program: a cluster randomized trial comparing manual versus Web-based approaches.

    Science.gov (United States)

    Witte, Susan S; Wu, Elwin; El-Bassel, Nabila; Hunt, Timothy; Gilbert, Louisa; Medina, Katie Potocnik; Chang, Mingway; Kelsey, Ryan; Rowe, Jessica; Remien, Robert

    2014-09-11

    Despite great need, the number of HIV prevention implementation studies remains limited. The challenge for researchers, in this time of limited HIV services agency resources, is to conceptualize and test how to disseminate efficacious, practical, and sustainable prevention programs more rapidly, and to understand how to do so in the absence of additional agency resources. We tested whether training and technical assistance (TA) in a couple-based HIV prevention program using a Web-based modality would yield greater program adoption of the program compared to training and TA in the same program in a manual-based modality among facilitators who delivered the interventions at 80 agencies in New York State. This study used a cluster randomized controlled design. Participants were HIV services agencies (N = 80) and up to 6 staff members at each agency (N = 253). Agencies were recruited, matched on key variables, and randomly assigned to two conditions. Staff members participated in a four-day, face-to-face training session, followed by TA calls at two and four months, and follow-up assessments at 6, 12, and 18 months post- training and TA. The primary outcomes examined number of couples with whom staff implemented the program, mean number of sessions implemented, whether staff implemented at least one session or whether staff implemented a complete intervention (all six sessions) of the program. Outcomes were measured at both the agency and participant level. Over 18 months following training and TA, at least one participant from 13 (33%) Web-based assigned agencies and 19 (48%) traditional agencies reported program use. Longitudinal multilevel analysis found no differences between groups on any outcomes at the agency or participant level with one exception: Web-based agencies implemented the program with 35% fewer couples compared with staff at manual-based agencies (IRR 0.35, CI, 0.13-0.94). Greater implementation of a Web-based program may require more

  13. Kernel-based tests for joint independence

    DEFF Research Database (Denmark)

    Pfister, Niklas; Bühlmann, Peter; Schölkopf, Bernhard

    2018-01-01

    if the $d$ variables are jointly independent, as long as the kernel is characteristic. Based on an empirical estimate of dHSIC, we define three different non-parametric hypothesis tests: a permutation test, a bootstrap test and a test based on a Gamma approximation. We prove that the permutation test......We investigate the problem of testing whether $d$ random variables, which may or may not be continuous, are jointly (or mutually) independent. Our method builds on ideas of the two variable Hilbert-Schmidt independence criterion (HSIC) but allows for an arbitrary number of variables. We embed...... the $d$-dimensional joint distribution and the product of the marginals into a reproducing kernel Hilbert space and define the $d$-variable Hilbert-Schmidt independence criterion (dHSIC) as the squared distance between the embeddings. In the population case, the value of dHSIC is zero if and only...

  14. Researches on the Security of Cluster-based Communication Protocol for Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Yanhong Sun

    2014-08-01

    Full Text Available Along with the in-depth application of sensor networks, the security issues have gradually become the bottleneck of wireless sensor applications. To provide a solution for security scheme is a common concern not only of researchers but also of providers, integrators and users of wireless sensor networks. Based on this demand, this paper focuses on the research of strengthening the security of cluster-based wireless sensor networks. Based on the systematic analysis of the clustering protocol and its security enhancement scheme, the paper introduces the broadcast authentication scheme, and proposes an SA-LEACH network security enhancement protocol. The performance analysis and simulation experiments prove that the protocol consumes less energy with the same security requirements, and when the base station is comparatively far from the network deployment area, it is more advantageous in terms of energy consumption and t more suitable for wireless sensor networks.

  15. The gap in the color-magnitude diagram of NGC 2420: A test of convective overshoot and cluster age

    Science.gov (United States)

    Demarque, Pierre; Sarajedini, Ata; Guo, X.-J.

    1994-05-01

    Theoretical isochrones have been constructed using the OPAL opacities specifically to study the color-magnitude diagram of the open star cluster NGC 2420. This cluster provides a rare test of core convection in intermediate-mass stars. At the same time, its age is of interest because of its low metallicity and relatively high Galactic latitude for an open cluster. The excellent color-magnitude diagram constructed by Anthony-Twarog et al. (1990) allows a detailed fit of the isochrones to the photometric data. We discuss the importance of convective overshoot at the convective core edge in determining the morphology of the gap located near the main-sequence turnoff. We find that given the assumptions made in the models, a modest amount of overshoot (0.23 Hp) is required for the best fit. Good agreement is achieved with all features of the turnoff gap for a cluster age of 2.4 +/- 0.2 Gyr. We note that a photometrically complete luminosity function near the main-sequence turnoff and subgiant branch would also provide an important test of the overshoot models.

  16. Methodology for testing and validating knowledge bases

    Science.gov (United States)

    Krishnamurthy, C.; Padalkar, S.; Sztipanovits, J.; Purves, B. R.

    1987-01-01

    A test and validation toolset developed for artificial intelligence programs is described. The basic premises of this method are: (1) knowledge bases have a strongly declarative character and represent mostly structural information about different domains, (2) the conditions for integrity, consistency, and correctness can be transformed into structural properties of knowledge bases, and (3) structural information and structural properties can be uniformly represented by graphs and checked by graph algorithms. The interactive test and validation environment have been implemented on a SUN workstation.

  17. Designing a VOIP Based Language Test

    Science.gov (United States)

    Garcia Laborda, Jesus; Magal Royo, Teresa; Otero de Juan, Nuria; Gimenez Lopez, Jose L.

    2015-01-01

    Assessing speaking is one of the most difficult tasks in computer based language testing. Many countries all over the world face the need to implement standardized language tests where speaking tasks are commonly included. However, a number of problems make them rather impractical such as the costs, the personnel involved, the length of time for…

  18. Testing a workplace physical activity intervention: a cluster randomized controlled trial.

    Science.gov (United States)

    McEachan, Rosemary R C; Lawton, Rebecca J; Jackson, Cath; Conner, Mark; Meads, David M; West, Robert M

    2011-04-11

    Increased physical activity levels benefit both an individuals' health and productivity at work. The purpose of the current study was to explore the impact and cost-effectiveness of a workplace physical activity intervention designed to increase physical activity levels. A total of 1260 participants from 44 UK worksites (based within 5 organizations) were recruited to a cluster randomized controlled trial with worksites randomly allocated to an intervention or control condition. Measurement of physical activity and other variables occurred at baseline, and at 0 months, 3 months and 9 months post-intervention. Health outcomes were measured during a 30 minute health check conducted in worksites at baseline and 9 months post intervention. The intervention consisted of a 3 month tool-kit of activities targeting components of the Theory of Planned Behavior, delivered in-house by nominated facilitators. Self-reported physical activity (measured using the IPAQ short-form) and health outcomes were assessed. Multilevel modelling found no significant effect of the intervention on MET minutes of activity (from the IPAQ) at any of the follow-up time points controlling for baseline activity. However, the intervention did significantly reduce systolic blood pressure (B=-1.79 mm/Hg) and resting heart rate (B=-2.08 beats) and significantly increased body mass index (B=.18 units) compared to control. The intervention was found not to be cost-effective, however the substantial variability round this estimate suggested that further research is warranted. The current study found mixed support for this worksite physical activity intervention. The paper discusses some of the tensions involved in conducting rigorous evaluations of large-scale randomized controlled trials in real-world settings. © 2011 McEachan et al; licensee BioMed Central Ltd.

  19. Interference Mitigation in IEEE 802.15.4-A Cluster Based Scheduling Approach

    OpenAIRE

    G. M. Tamilselvan; A. Shanmugam

    2011-01-01

    Problem statement: In universal networking environments; two or more heterogeneous communication systems coexisting in a single place. Especially, Wireless Local Area Networks (WLANs) based on IEEE 802.11b specifications and Wireless Personal Area Networks (WPANs) based on IEEE 802.15.4 specifications need to coexist in the same Industrial, Science and Medial (ISM) band. If the WPAN communication coverage is expanded using a cluster-tree network topology, then the 802...

  20. A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method.

    Science.gov (United States)

    Yoo, Illhoi; Hu, Xiaohua; Song, Il-Yeol

    2007-11-27

    A huge amount of biomedical textual information has been produced and collected in MEDLINE for decades. In order to easily utilize biomedical information in the free text, document clustering and text summarization together are used as a solution for text information overload problem. In this paper, we introduce a coherent graph-based semantic clustering and summarization approach for biomedical literature. Our extensive experimental results show the approach shows 45% cluster quality improvement and 72% clustering reliability improvement, in terms of misclassification index, over Bisecting K-means as a leading document clustering approach. In addition, our approach provides concise but rich text summary in key concepts and sentences. Our coherent biomedical literature clustering and summarization approach that takes advantage of ontology-enriched graphical representations significantly improves the quality of document clusters and understandability of documents through summaries.

  1. Fuzzy clustering-based feature extraction method for mental task classification.

    Science.gov (United States)

    Gupta, Akshansh; Kumar, Dhirendra

    2017-06-01

    A brain computer interface (BCI) is a communication system by which a person can send messages or requests for basic necessities without using peripheral nerves and muscles. Response to mental task-based BCI is one of the privileged areas of investigation. Electroencephalography (EEG) signals are used to represent the brain activities in the BCI domain. For any mental task classification model, the performance of the learning model depends on the extraction of features from EEG signal. In literature, wavelet transform and empirical mode decomposition are two popular feature extraction methods used to analyze a signal having non-linear and non-stationary property. By adopting the virtue of both techniques, a theoretical adaptive filter-based method to decompose non-linear and non-stationary signal has been proposed known as empirical wavelet transform (EWT) in recent past. EWT does not work well for the signals having overlapped in frequency and time domain and failed to provide good features for further classification. In this work, Fuzzy c-means algorithm is utilized along with EWT to handle this problem. It has been observed from the experimental results that EWT along with fuzzy clustering outperforms in comparison to EWT for the EEG-based response to mental task problem. Further, in case of mental task classification, the ratio of samples to features is very small. To handle the problem of small ratio of samples to features, in this paper, we have also utilized three well-known multivariate feature selection methods viz. Bhattacharyya distance (BD), ratio of scatter matrices (SR), and linear regression (LR). The results of experiment demonstrate that the performance of mental task classification has improved considerably by aforesaid methods. Ranking method and Friedman's statistical test are also performed to rank and compare different combinations of feature extraction methods and feature selection methods which endorse the efficacy of the proposed approach.

  2. Conveyor Performance based on Motor DC 12 Volt Eg-530ad-2f using K-Means Clustering

    Science.gov (United States)

    Arifin, Zaenal; Artini, Sri DP; Much Ibnu Subroto, Imam

    2017-04-01

    To produce goods in industry, a controlled tool to improve production is required. Separation process has become a part of production process. Separation process is carried out based on certain criteria to get optimum result. By knowing the characteristics performance of a controlled tools in separation process the optimum results is also possible to be obtained. Clustering analysis is popular method for clustering data into smaller segments. Clustering analysis is useful to divide a group of object into a k-group in which the member value of the group is homogeny or similar. Similarity in the group is set based on certain criteria. The work in this paper based on K-Means method to conduct clustering of loading in the performance of a conveyor driven by a dc motor 12 volt eg-530-2f. This technique gives a complete clustering data for a prototype of conveyor driven by dc motor to separate goods in term of height. The parameters involved are voltage, current, time of travelling. These parameters give two clusters namely optimal cluster with center of cluster 10.50 volt, 0.3 Ampere, 10.58 second, and unoptimal cluster with center of cluster 10.88 volt, 0.28 Ampere and 40.43 second.

  3. Structured free-water clusters near lubricating surfaces are essential in water-based lubrication.

    Science.gov (United States)

    Hou, Jiapeng; Veeregowda, Deepak H; de Vries, Joop; Van der Mei, Henny C; Busscher, Henk J

    2016-10-01

    Water-based lubrication provides cheap and environmentally friendly lubrication and, although hydrophilic surfaces are preferred in water-based lubrication, often lubricating surfaces do not retain water molecules during shear. We show here that hydrophilic (42° water contact angle) quartz surfaces facilitate water-based lubrication to the same extent as more hydrophobic Si crystal surfaces (61°), while lubrication by hydrophilic Ge crystal surfaces (44°) is best. Thus surface hydrophilicity is not sufficient for water-based lubrication. Surface-thermodynamic analyses demonstrated that all surfaces, regardless of their water-based lubrication, were predominantly electron donating, implying water binding with their hydrogen groups. X-ray photoelectron spectroscopy showed that Ge crystal surfaces providing optimal lubrication consisted of a mixture of -O and =O functionalities, while Si crystal and quartz surfaces solely possessed -O functionalities. Comparison of infrared absorption bands of the crystals in water indicated fewer bound-water layers on hydrophilic Ge than on hydrophobic Si crystal surfaces, while absorption bands for free water on the Ge crystal surface indicated a much more pronounced presence of structured, free-water clusters near the Ge crystal than near Si crystal surfaces. Accordingly, we conclude that the presence of structured, free-water clusters is essential for water-based lubrication. The prevalence of structured water clusters can be regulated by adjusting the ratio between surface electron-donating and electron-accepting groups and between -O and =O functionalities. © 2016 The Author(s).

  4. Risk based surveillance test interval optimization

    International Nuclear Information System (INIS)

    Cepin, M.; Mavko, B.

    1995-01-01

    First step towards the risk based regulation is to determine the optimal surveillance test intervals for the safety equipment which is tested at nuclear power plant operation. In the paper we have presented the process of optimal surveillance test interval optimization from our perspective. It consist of three levels: component level, system level and plant level. It bases on the results of the Probabilistic Safety Assessment and is focused to minimize risk. At component and system level the risk measure is component or system mean unavailability respectively. At plant level the risk measure is core damage frequency. (author)

  5. Classification of high resolution satellite images using spatial constraints-based fuzzy clustering

    Science.gov (United States)

    Singh, Pankaj Pratap; Garg, Rahul Dev

    2014-01-01

    A spatial constraints-based fuzzy clustering technique is introduced in the paper and the target application is classification of high resolution multispectral satellite images. This fuzzy-C-means (FCM) technique enhances the classification results with the help of a weighted membership function (Wmf). Initially, spatial fuzzy clustering (FC) is used to segment the targeted vegetation areas with the surrounding low vegetation areas, which include the information of spatial constraints (SCs). The performance of the FCM image segmentation is subject to appropriate initialization of Wmf and SC. It is able to evolve directly from the initial segmentation by spatial fuzzy clustering. The controlling parameters in fuzziness of the FCM approach, Wmf and SC, help to estimate the segmented road results, then the Stentiford thinning algorithm is used to estimate the road network from the classified results. Such improvements facilitate FCM method manipulation and lead to segmentation that is more robust. The results confirm its effectiveness for satellite image classification, which extracts useful information in suburban and urban areas. The proposed approach, spatial constraint-based fuzzy clustering with a weighted membership function (SCFCWmf), has been used to extract the information of healthy trees with vegetation and shadows showing elevated features in satellite images. The performance values of quality assessment parameters show a good degree of accuracy for segmented roads using the proposed hybrid SCFCWmf-MO (morphological operations) approach which also occluded nonroad parts.

  6. Generating clustered scale-free networks using Poisson based localization of edges

    Science.gov (United States)

    Türker, İlker

    2018-05-01

    We introduce a variety of network models using a Poisson-based edge localization strategy, which result in clustered scale-free topologies. We first verify the success of our localization strategy by realizing a variant of the well-known Watts-Strogatz model with an inverse approach, implying a small-world regime of rewiring from a random network through a regular one. We then apply the rewiring strategy to a pure Barabasi-Albert model and successfully achieve a small-world regime, with a limited capacity of scale-free property. To imitate the high clustering property of scale-free networks with higher accuracy, we adapted the Poisson-based wiring strategy to a growing network with the ingredients of both preferential attachment and local connectivity. To achieve the collocation of these properties, we used a routine of flattening the edges array, sorting it, and applying a mixing procedure to assemble both global connections with preferential attachment and local clusters. As a result, we achieved clustered scale-free networks with a computational fashion, diverging from the recent studies by following a simple but efficient approach.

  7. A new multistage medical segmentation method based on superpixel and fuzzy clustering.

    Science.gov (United States)

    Ji, Shiyong; Wei, Benzheng; Yu, Zhen; Yang, Gongping; Yin, Yilong

    2014-01-01

    The medical image segmentation is the key approach of image processing for brain MRI images. However, due to the visual complex appearance of image structures and the imaging characteristic, it is still challenging to automatically segment brain MRI image. A new multi-stage segmentation method based on superpixel and fuzzy clustering (MSFCM) is proposed to achieve the good brain MRI segmentation results. The MSFCM utilizes the superpixels as the clustering objects instead of pixels, and it can increase the clustering granularity and overcome the influence of noise and bias effectively. In the first stage, the MRI image is parsed into several atomic areas, namely, superpixels, and a further parsing step is adopted for the areas with bigger gray variance over setting threshold. Subsequently, designed fuzzy clustering is carried out to the fuzzy membership of each superpixel, and an iterative broadcast method based on the Butterworth function is used to redefine their classifications. Finally, the segmented image is achieved by merging the superpixels which have the same classification label. The simulated brain database from BrainWeb site is used in the experiments, and the experimental results demonstrate that MSFCM method outperforms the traditional FCM algorithm in terms of segmentation accuracy and stability for MRI image.

  8. Hessian regularization based non-negative matrix factorization for gene expression data clustering.

    Science.gov (United States)

    Liu, Xiao; Shi, Jun; Wang, Congzhi

    2015-01-01

    Since a key step in the analysis of gene expression data is to detect groups of genes that have similar expression patterns, clustering technique is then commonly used to analyze gene expression data. Data representation plays an important role in clustering analysis. The non-negative matrix factorization (NMF) is a widely used data representation method with great success in machine learning. Although the traditional manifold regularization method, Laplacian regularization (LR), can improve the performance of NMF, LR still suffers from the problem of its weak extrapolating power. Hessian regularization (HR) is a newly developed manifold regularization method, whose natural properties make it more extrapolating, especially for small sample data. In this work, we propose the HR-based NMF (HR-NMF) algorithm, and then apply it to represent gene expression data for further clustering task. The clustering experiments are conducted on five commonly used gene datasets, and the results indicate that the proposed HR-NMF outperforms LR-based NMM and original NMF, which suggests the potential application of HR-NMF for gene expression data.

  9. Feature Selection and Kernel Learning for Local Learning-Based Clustering.

    Science.gov (United States)

    Zeng, Hong; Cheung, Yiu-ming

    2011-08-01

    The performance of the most clustering algorithms highly relies on the representation of data in the input space or the Hilbert space of kernel methods. This paper is to obtain an appropriate data representation through feature selection or kernel learning within the framework of the Local Learning-Based Clustering (LLC) (Wu and Schölkopf 2006) method, which can outperform the global learning-based ones when dealing with the high-dimensional data lying on manifold. Specifically, we associate a weight to each feature or kernel and incorporate it into the built-in regularization of the LLC algorithm to take into account the relevance of each feature or kernel for the clustering. Accordingly, the weights are estimated iteratively in the clustering process. We show that the resulting weighted regularization with an additional constraint on the weights is equivalent to a known sparse-promoting penalty. Hence, the weights of those irrelevant features or kernels can be shrunk toward zero. Extensive experiments show the efficacy of the proposed methods on the benchmark data sets.

  10. Testing the Bose-Einstein Condensate dark matter model at galactic cluster scale

    Energy Technology Data Exchange (ETDEWEB)

    Harko, Tiberiu [Department of Mathematics, University College London, Gower Street, London, WC1E 6BT (United Kingdom); Liang, Pengxiang; Liang, Shi-Dong [State Key Laboratory of Optoelectronic Material and Technology, and Guangdong Province Key Laboratory of Display Material and Technology, School of Physics and Engineering, Sun Yat-Sen University, Guangzhou 510275 (China); Mocanu, Gabriela, E-mail: t.harko@ucl.ac.uk, E-mail: lpengx@mail2.sysu.edu.cn2, E-mail: stslsd@mail.sysu.edu.cn, E-mail: gabriela.mocanu@ubbcluj.ro [Astronomical Institute, Astronomical Observatory Cluj-Napoca, Romanian Academy, 15 Cire\\csilor Street, 400487 Cluj-Napoca (Romania)

    2015-11-01

    The possibility that dark matter may be in the form of a Bose-Einstein Condensate (BEC) has been extensively explored at galactic scale. In particular, good fits for the galactic rotations curves have been obtained, and upper limits for the dark matter particle mass and scattering length have been estimated. In the present paper we extend the investigation of the properties of the BEC dark matter to the galactic cluster scale, involving dark matter dominated astrophysical systems formed of thousands of galaxies each. By considering that one of the major components of a galactic cluster, the intra-cluster hot gas, is described by King's β-model, and that both intra-cluster gas and dark matter are in hydrostatic equilibrium, bound by the same total mass profile, we derive the mass and density profiles of the BEC dark matter. In our analysis we consider several theoretical models, corresponding to isothermal hot gas and zero temperature BEC dark matter, non-isothermal gas and zero temperature dark matter, and isothermal gas and finite temperature BEC, respectively. The properties of the finite temperature BEC dark matter cluster are investigated in detail numerically. We compare our theoretical results with the observational data of 106 galactic clusters. Using a least-squares fitting, as well as the observational results for the dark matter self-interaction cross section, we obtain some upper bounds for the mass and scattering length of the dark matter particle. Our results suggest that the mass of the dark matter particle is of the order of μ eV, while the scattering length has values in the range of 10{sup −7} fm.

  11. Testing the Bose-Einstein Condensate dark matter model at galactic cluster scale

    International Nuclear Information System (INIS)

    Harko, Tiberiu; Liang, Pengxiang; Liang, Shi-Dong; Mocanu, Gabriela

    2015-01-01

    The possibility that dark matter may be in the form of a Bose-Einstein Condensate (BEC) has been extensively explored at galactic scale. In particular, good fits for the galactic rotations curves have been obtained, and upper limits for the dark matter particle mass and scattering length have been estimated. In the present paper we extend the investigation of the properties of the BEC dark matter to the galactic cluster scale, involving dark matter dominated astrophysical systems formed of thousands of galaxies each. By considering that one of the major components of a galactic cluster, the intra-cluster hot gas, is described by King's β-model, and that both intra-cluster gas and dark matter are in hydrostatic equilibrium, bound by the same total mass profile, we derive the mass and density profiles of the BEC dark matter. In our analysis we consider several theoretical models, corresponding to isothermal hot gas and zero temperature BEC dark matter, non-isothermal gas and zero temperature dark matter, and isothermal gas and finite temperature BEC, respectively. The properties of the finite temperature BEC dark matter cluster are investigated in detail numerically. We compare our theoretical results with the observational data of 106 galactic clusters. Using a least-squares fitting, as well as the observational results for the dark matter self-interaction cross section, we obtain some upper bounds for the mass and scattering length of the dark matter particle. Our results suggest that the mass of the dark matter particle is of the order of μ eV, while the scattering length has values in the range of 10 −7 fm

  12. Testing Gravity on Cosmological Scales with the Observed Abundance of Galaxy Clusters

    DEFF Research Database (Denmark)

    Rapetti Serra, David Angelo

    2011-01-01

    supernovae, baryon acoustic oscillations, and cosmic microwave background data we find a tight correlation between the growth index and the normalization of the matter power spectrum. Allowing the growth index and the dark energy equation of state parameter to take any constant values, we find no evidence......, and obtain improved constraints on departures from General Relativity (GR) on cosmological scales. We parameterize the linear growth rate of cosmic structure with a power law of the mean matter density to the growth index. Combining the X-ray cluster growth data with cluster gas-mass fraction, type Ia...

  13. Traceability in Model-Based Testing

    Directory of Open Access Journals (Sweden)

    Mathew George

    2012-11-01

    Full Text Available The growing complexities of software and the demand for shorter time to market are two important challenges that face today’s IT industry. These challenges demand the increase of both productivity and quality of software. Model-based testing is a promising technique for meeting these challenges. Traceability modeling is a key issue and challenge in model-based testing. Relationships between the different models will help to navigate from one model to another, and trace back to the respective requirements and the design model when the test fails. In this paper, we present an approach for bridging the gaps between the different models in model-based testing. We propose relation definition markup language (RDML for defining the relationships between models.

  14. Adaptive Reliable Routing Based on Cluster Hierarchy for Wireless Multimedia Sensor Networks

    Directory of Open Access Journals (Sweden)

    Chen Min

    2010-01-01

    Full Text Available As a multimedia information acquisition and processing method, wireless multimedia sensor network(WMSN has great application potential in military and civilian areas. Compared with traditional wireless sensor network, the routing design of WMSN should obtain more attention on the quality of transmission. This paper proposes an adaptive reliable routing based on clustering hierarchy named ARCH, which includes energy prediction and power allocation mechanism. To obtain a better performance, the cluster structure is formed based on cellular topology. The introduced prediction mechanism makes the sensor nodes predict the remaining energy of other nodes, which dramatically reduces the overall information needed for energy balancing. ARCH can dynamically balance the energy consumption of nodes based on the predicted results provided by power allocation. The simulation results prove the efficiency of the proposed ARCH routing.

  15. Yeast homologous recombination-based promoter engineering for the activation of silent natural product biosynthetic gene clusters.

    Science.gov (United States)

    Montiel, Daniel; Kang, Hahk-Soo; Chang, Fang-Yuan; Charlop-Powers, Zachary; Brady, Sean F

    2015-07-21

    Large-scale sequencing of prokaryotic (meta)genomic DNA suggests that most bacterial natural product gene clusters are not expressed under common laboratory culture conditions. Silent gene clusters represent a promising resource for natural product discovery and the development of a new generation of therapeutics. Unfortunately, the characterization of molecules encoded by these clusters is hampered owing to our inability to express these gene clusters in the laboratory. To address this bottleneck, we have developed a promoter-engineering platform to transcriptionally activate silent gene clusters in a model heterologous host. Our approach uses yeast homologous recombination, an auxotrophy complementation-based yeast selection system and sequence orthogonal promoter cassettes to exchange all native promoters in silent gene clusters with constitutively active promoters. As part of this platform, we constructed and validated a set of bidirectional promoter cassettes consisting of orthogonal promoter sequences, Streptomyces ribosome binding sites, and yeast selectable marker genes. Using these tools we demonstrate the ability to simultaneously insert multiple promoter cassettes into a gene cluster, thereby expediting the reengineering process. We apply this method to model active and silent gene clusters (rebeccamycin and tetarimycin) and to the silent, cryptic pseudogene-containing, environmental DNA-derived Lzr gene cluster. Complete promoter refactoring and targeted gene exchange in this "dead" cluster led to the discovery of potent indolotryptoline antiproliferative agents, lazarimides A and B. This potentially scalable and cost-effective promoter reengineering platform should streamline the discovery of natural products from silent natural product biosynthetic gene clusters.

  16. A cluster randomized control field trial of the ABRACADABRA web-based literacy intervention: Replication and extension of basic findings.

    Directory of Open Access Journals (Sweden)

    Noella Angele Piquette

    2014-12-01

    Full Text Available The present paper reports a cluster randomized control trial evaluation of teaching using ABRACADABRA (ABRA, an evidence-based and web-based literacy intervention (http://abralite.concordia.ca with 107 kindergarten and 96 grade 1 children in 24 classes (12 intervention 12 control classes from all 12 elementary schools in one school district in Canada. Children in the intervention condition received 10-12 hours of whole class instruction using ABRA between pre- and post-test. Hierarchical linear modeling of post-test results showed significant gains in letter-sound knowledge for intervention classrooms over control classrooms. In addition, medium effect sizes were evident for three of five outcome measures favoring the intervention: letter-sound knowledge (d = +.66, phonological blending (d = +.52, and word reading (d = +.52, over effect sizes for regular teaching. It is concluded that regular teaching with ABRA technology adds significantly to literacy in the early elementary years.

  17. Automatic detection of arterial input function in dynamic contrast enhanced MRI based on affinity propagation clustering.

    Science.gov (United States)

    Shi, Lin; Wang, Defeng; Liu, Wen; Fang, Kui; Wang, Yi-Xiang J; Huang, Wenhua; King, Ann D; Heng, Pheng Ann; Ahuja, Anil T

    2014-05-01

    To automatically and robustly detect the arterial input function (AIF) with high detection accuracy and low computational cost in dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). In this study, we developed an automatic AIF detection method using an accelerated version (Fast-AP) of affinity propagation (AP) clustering. The validity of this Fast-AP-based method was proved on two DCE-MRI datasets, i.e., rat kidney and human head and neck. The detailed AIF detection performance of this proposed method was assessed in comparison with other clustering-based methods, namely original AP and K-means, as well as the manual AIF detection method. Both the automatic AP- and Fast-AP-based methods achieved satisfactory AIF detection accuracy, but the computational cost of Fast-AP could be reduced by 64.37-92.10% on rat dataset and 73.18-90.18% on human dataset compared with the cost of AP. The K-means yielded the lowest computational cost, but resulted in the lowest AIF detection accuracy. The experimental results demonstrated that both the AP- and Fast-AP-based methods were insensitive to the initialization of cluster centers, and had superior robustness compared with K-means method. The Fast-AP-based method enables automatic AIF detection with high accuracy and efficiency. Copyright © 2013 Wiley Periodicals, Inc.

  18. The Formation of Competitive Advantages for Corporate Structures Based on the Cluster Integration

    Directory of Open Access Journals (Sweden)

    Ekaterina Vasilyevna Pustynnikova

    2017-06-01

    Full Text Available The article studies the cluster forms of integration as well as the development of corporate and cluster connections. At present, economic knowledge is rather focused on the development of integrated regional systems recognized as one of the most effective forms of integration. In turn, the processes, based on the interdependence and cooperation of economic entities located on the same territory, determine the possibility of stable economic relations, synergetic effect and growth of the competitive advantages of these territories. Such development tendencies reflect corporate interests and define trends for the integration of corporations in the context of regional and industrial limitations. Thus, one of the main aspects of integration is focused on the establishment of sustainable cost-beneficial relationships between corporate entities. The dialectical unity of the coordination and cooperation of corporate structures in economic clusters expands the traditional boundaries of economic benefits. Considering the government of corporate structure on the basis of internal approach, we can see that the benefits from the fragmented leadership may be neutralized due to unevenness of expenses. The corporate-cluster approach of corporate structure government allows not only to coordinate actions at the micro-level but also to generate more sustainable economic relations at the industrial, market and regional levels. It is reflected in the synergistic effect. The coordination of economic processes and geographic concentration contribute to system flexibility and adaptability in the market conditions as well as stimulate economic processes. Therefore, all cluster participants benefit from mutually beneficial cooperation. This, in turn, contributes to the decrease of total expenses and hastens the responses of entities on different market changes. The authors’ hypothesis assumes the coordination of interests in the economic cluster that allows to create

  19. A Web-based multidrug-resistant organisms surveillance and outbreak detection system with rule-based classification and clustering.

    Science.gov (United States)

    Tseng, Yi-Ju; Wu, Jung-Hsuan; Ping, Xiao-Ou; Lin, Hui-Chi; Chen, Ying-Yu; Shang, Rung-Ji; Chen, Ming-Yuan; Lai, Feipei; Chen, Yee-Chun

    2012-10-24

    The emergence and spread of multidrug-resistant organisms (MDROs) are causing a global crisis. Combating antimicrobial resistance requires prevention of transmission of resistant organisms and improved use of antimicrobials. To develop a Web-based information system for automatic integration, analysis, and interpretation of the antimicrobial susceptibility of all clinical isolates that incorporates rule-based classification and cluster analysis of MDROs and implements control chart analysis to facilitate outbreak detection. Electronic microbiological data from a 2200-bed teaching hospital in Taiwan were classified according to predefined criteria of MDROs. The numbers of organisms, patients, and incident patients in each MDRO pattern were presented graphically to describe spatial and time information in a Web-based user interface. Hierarchical clustering with 7 upper control limits (UCL) was used to detect suspicious outbreaks. The system's performance in outbreak detection was evaluated based on vancomycin-resistant enterococcal outbreaks determined by a hospital-wide prospective active surveillance database compiled by infection control personnel. The optimal UCL for MDRO outbreak detection was the upper 90% confidence interval (CI) using germ criterion with clustering (area under ROC curve (AUC) 0.93, 95% CI 0.91 to 0.95), upper 85% CI using patient criterion (AUC 0.87, 95% CI 0.80 to 0.93), and one standard deviation using incident patient criterion (AUC 0.84, 95% CI 0.75 to 0.92). The performance indicators of each UCL were statistically significantly higher with clustering than those without clustering in germ criterion (P < .001), patient criterion (P = .04), and incident patient criterion (P < .001). This system automatically identifies MDROs and accurately detects suspicious outbreaks of MDROs based on the antimicrobial susceptibility of all clinical isolates.