WorldWideScience

Sample records for cluster detection method

  1. An Examination of Three Spatial Event Cluster Detection Methods

    Directory of Open Access Journals (Sweden)

    Hensley H. Mariathas

    2015-03-01

    Full Text Available In spatial disease surveillance, geographic areas with large numbers of disease cases are to be identified, so that targeted investigations can be pursued. Geographic areas with high disease rates are called disease clusters and statistical cluster detection tests are used to identify geographic areas with higher disease rates than expected by chance alone. In some situations, disease-related events rather than individuals are of interest for geographical surveillance, and methods to detect clusters of disease-related events are called event cluster detection methods. In this paper, we examine three distributional assumptions for the events in cluster detection: compound Poisson, approximate normal and multiple hypergeometric (exact. The methods differ on the choice of distributional assumption for the potentially multiple correlated events per individual. The methods are illustrated on emergency department (ED presentations by children and youth (age < 18 years because of substance use in the province of Alberta, Canada, during 1 April 2007, to 31 March 2008. Simulation studies are conducted to investigate Type I error and the power of the clustering methods.

  2. Multiple-Features-Based Semisupervised Clustering DDoS Detection Method

    Directory of Open Access Journals (Sweden)

    Yonghao Gu

    2017-01-01

    Full Text Available DDoS attack stream from different agent host converged at victim host will become very large, which will lead to system halt or network congestion. Therefore, it is necessary to propose an effective method to detect the DDoS attack behavior from the massive data stream. In order to solve the problem that large numbers of labeled data are not provided in supervised learning method, and the relatively low detection accuracy and convergence speed of unsupervised k-means algorithm, this paper presents a semisupervised clustering detection method using multiple features. In this detection method, we firstly select three features according to the characteristics of DDoS attacks to form detection feature vector. Then, Multiple-Features-Based Constrained-K-Means (MF-CKM algorithm is proposed based on semisupervised clustering. Finally, using MIT Laboratory Scenario (DDoS 1.0 data set, we verify that the proposed method can improve the convergence speed and accuracy of the algorithm under the condition of using a small amount of labeled data sets.

  3. Method for detecting clusters of possible uranium deposits

    International Nuclear Information System (INIS)

    Conover, W.J.; Bement, T.R.; Iman, R.L.

    1978-01-01

    When a two-dimensional map contains points that appear to be scattered somewhat at random, a question that often arises is whether groups of points that appear to cluster are merely exhibiting ordinary behavior, which one can expect with any random distribution of points, or whether the clusters are too pronounced to be attributable to chance alone. A method for detecting clusters along a straight line is applied to the two-dimensional map of 214 Bi anomalies observed as part of the National Uranium Resource Evaluation Program in the Lubbock, Texas, region. Some exact probabilities associated with this method are computed and compared with two approximate methods. The two methods for approximating probabilities work well in the cases examined and can be used when it is not feasible to obtain the exact probabilities

  4. Comparison of Bayesian clustering and edge detection methods for inferring boundaries in landscape genetics

    Science.gov (United States)

    Safner, T.; Miller, M.P.; McRae, B.H.; Fortin, M.-J.; Manel, S.

    2011-01-01

    Recently, techniques available for identifying clusters of individuals or boundaries between clusters using genetic data from natural populations have expanded rapidly. Consequently, there is a need to evaluate these different techniques. We used spatially-explicit simulation models to compare three spatial Bayesian clustering programs and two edge detection methods. Spatially-structured populations were simulated where a continuous population was subdivided by barriers. We evaluated the ability of each method to correctly identify boundary locations while varying: (i) time after divergence, (ii) strength of isolation by distance, (iii) level of genetic diversity, and (iv) amount of gene flow across barriers. To further evaluate the methods' effectiveness to detect genetic clusters in natural populations, we used previously published data on North American pumas and a European shrub. Our results show that with simulated and empirical data, the Bayesian spatial clustering algorithms outperformed direct edge detection methods. All methods incorrectly detected boundaries in the presence of strong patterns of isolation by distance. Based on this finding, we support the application of Bayesian spatial clustering algorithms for boundary detection in empirical datasets, with necessary tests for the influence of isolation by distance. ?? 2011 by the authors; licensee MDPI, Basel, Switzerland.

  5. a Probabilistic Embedding Clustering Method for Urban Structure Detection

    Science.gov (United States)

    Lin, X.; Li, H.; Zhang, Y.; Gao, L.; Zhao, L.; Deng, M.

    2017-09-01

    Urban structure detection is a basic task in urban geography. Clustering is a core technology to detect the patterns of urban spatial structure, urban functional region, and so on. In big data era, diverse urban sensing datasets recording information like human behaviour and human social activity, suffer from complexity in high dimension and high noise. And unfortunately, the state-of-the-art clustering methods does not handle the problem with high dimension and high noise issues concurrently. In this paper, a probabilistic embedding clustering method is proposed. Firstly, we come up with a Probabilistic Embedding Model (PEM) to find latent features from high dimensional urban sensing data by "learning" via probabilistic model. By latent features, we could catch essential features hidden in high dimensional data known as patterns; with the probabilistic model, we can also reduce uncertainty caused by high noise. Secondly, through tuning the parameters, our model could discover two kinds of urban structure, the homophily and structural equivalence, which means communities with intensive interaction or in the same roles in urban structure. We evaluated the performance of our model by conducting experiments on real-world data and experiments with real data in Shanghai (China) proved that our method could discover two kinds of urban structure, the homophily and structural equivalence, which means clustering community with intensive interaction or under the same roles in urban space.

  6. A PROBABILISTIC EMBEDDING CLUSTERING METHOD FOR URBAN STRUCTURE DETECTION

    Directory of Open Access Journals (Sweden)

    X. Lin

    2017-09-01

    Full Text Available Urban structure detection is a basic task in urban geography. Clustering is a core technology to detect the patterns of urban spatial structure, urban functional region, and so on. In big data era, diverse urban sensing datasets recording information like human behaviour and human social activity, suffer from complexity in high dimension and high noise. And unfortunately, the state-of-the-art clustering methods does not handle the problem with high dimension and high noise issues concurrently. In this paper, a probabilistic embedding clustering method is proposed. Firstly, we come up with a Probabilistic Embedding Model (PEM to find latent features from high dimensional urban sensing data by “learning” via probabilistic model. By latent features, we could catch essential features hidden in high dimensional data known as patterns; with the probabilistic model, we can also reduce uncertainty caused by high noise. Secondly, through tuning the parameters, our model could discover two kinds of urban structure, the homophily and structural equivalence, which means communities with intensive interaction or in the same roles in urban structure. We evaluated the performance of our model by conducting experiments on real-world data and experiments with real data in Shanghai (China proved that our method could discover two kinds of urban structure, the homophily and structural equivalence, which means clustering community with intensive interaction or under the same roles in urban space.

  7. A semi-supervised method to detect seismic random noise with fuzzy GK clustering

    International Nuclear Information System (INIS)

    Hashemi, Hosein; Javaherian, Abdolrahim; Babuska, Robert

    2008-01-01

    We present a new method to detect random noise in seismic data using fuzzy Gustafson–Kessel (GK) clustering. First, using an adaptive distance norm, a matrix is constructed from the observed seismic amplitudes. The next step is to find centres of ellipsoidal clusters and construct a partition matrix which determines the soft decision boundaries between seismic events and random noise. The GK algorithm updates the cluster centres in order to iteratively minimize the cluster variance. Multiplication of the fuzzy membership function with values of each sample yields new sections; we name them 'clustered sections'. The seismic amplitude values of the clustered sections are given in a way to decrease the level of noise in the original noisy seismic input. In pre-stack data, it is essential to study the clustered sections in a f–k domain; finding the quantitative index for weighting the post-stack data needs a similar approach. Using the knowledge of a human specialist together with the fuzzy unsupervised clustering, the method is a semi-supervised random noise detection. The efficiency of this method is investigated on synthetic and real seismic data for both pre- and post-stack data. The results show a significant improvement of the input noisy sections without harming the important amplitude and phase information of the original data. The procedure for finding the final weights of each clustered section should be carefully done in order to keep almost all the evident seismic amplitudes in the output section. The method interactively uses the knowledge of the seismic specialist in detecting the noise

  8. Spatial cluster detection using dynamic programming

    Directory of Open Access Journals (Sweden)

    Sverchkov Yuriy

    2012-03-01

    Full Text Available Abstract Background The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. Methods We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. Results When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. Conclusions We conclude that the dynamic

  9. K2: A NEW METHOD FOR THE DETECTION OF GALAXY CLUSTERS BASED ON CANADA-FRANCE-HAWAII TELESCOPE LEGACY SURVEY MULTICOLOR IMAGES

    International Nuclear Information System (INIS)

    Thanjavur, Karun; Willis, Jon; Crampton, David

    2009-01-01

    We have developed a new method, K2, optimized for the detection of galaxy clusters in multicolor images. Based on the Red Sequence approach, K2 detects clusters using simultaneous enhancements in both colors and position. The detection significance is robustly determined through extensive Monte Carlo simulations and through comparison with available cluster catalogs based on two different optical methods, and also on X-ray data. K2 also provides quantitative estimates of the candidate clusters' richness and photometric redshifts. Initially, K2 was applied to the two color (gri) 161 deg 2 images of the Canada-France-Hawaii Telescope Legacy Survey Wide (CFHTLS-W) data. Our simulations show that the false detection rate for these data, at our selected threshold, is only ∼1%, and that the cluster catalogs are ∼80% complete up to a redshift of z = 0.6 for Fornax-like and richer clusters and to z ∼ 0.3 for poorer clusters. Based on the g-, r-, and i-band photometric catalogs of the Terapix T05 release, 35 clusters/deg 2 are detected, with 1-2 Fornax-like or richer clusters every 2 deg 2 . Catalogs containing data for 6144 galaxy clusters have been prepared, of which 239 are rich clusters. These clusters, especially the latter, are being searched for gravitational lenses-one of our chief motivations for cluster detection in CFHTLS. The K2 method can be easily extended to use additional color information and thus improve overall cluster detection to higher redshifts. The complete set of K2 cluster catalogs, along with the supplementary catalogs for the member galaxies, are available on request from the authors.

  10. A novel intrusion detection method based on OCSVM and K-means recursive clustering

    Directory of Open Access Journals (Sweden)

    Leandros A. Maglaras

    2015-01-01

    Full Text Available In this paper we present an intrusion detection module capable of detecting malicious network traffic in a SCADA (Supervisory Control and Data Acquisition system, based on the combination of One-Class Support Vector Machine (OCSVM with RBF kernel and recursive k-means clustering. Important parameters of OCSVM, such as Gaussian width o and parameter v affect the performance of the classifier. Tuning of these parameters is of great importance in order to avoid false positives and over fitting. The combination of OCSVM with recursive k- means clustering leads the proposed intrusion detection module to distinguish real alarms from possible attacks regardless of the values of parameters o and v, making it ideal for real-time intrusion detection mechanisms for SCADA systems. Extensive simulations have been conducted with datasets extracted from small and medium sized HTB SCADA testbeds, in order to compare the accuracy, false alarm rate and execution time against the base line OCSVM method.

  11. An Energy-Efficient Cluster-Based Vehicle Detection on Road Network Using Intention Numeration Method

    Directory of Open Access Journals (Sweden)

    Deepa Devasenapathy

    2015-01-01

    Full Text Available The traffic in the road network is progressively increasing at a greater extent. Good knowledge of network traffic can minimize congestions using information pertaining to road network obtained with the aid of communal callers, pavement detectors, and so on. Using these methods, low featured information is generated with respect to the user in the road network. Although the existing schemes obtain urban traffic information, they fail to calculate the energy drain rate of nodes and to locate equilibrium between the overhead and quality of the routing protocol that renders a great challenge. Thus, an energy-efficient cluster-based vehicle detection in road network using the intention numeration method (CVDRN-IN is developed. Initially, sensor nodes that detect a vehicle are grouped into separate clusters. Further, we approximate the strength of the node drain rate for a cluster using polynomial regression function. In addition, the total node energy is estimated by taking the integral over the area. Finally, enhanced data aggregation is performed to reduce the amount of data transmission using digital signature tree. The experimental performance is evaluated with Dodgers loop sensor data set from UCI repository and the performance evaluation outperforms existing work on energy consumption, clustering efficiency, and node drain rate.

  12. An energy-efficient cluster-based vehicle detection on road network using intention numeration method.

    Science.gov (United States)

    Devasenapathy, Deepa; Kannan, Kathiravan

    2015-01-01

    The traffic in the road network is progressively increasing at a greater extent. Good knowledge of network traffic can minimize congestions using information pertaining to road network obtained with the aid of communal callers, pavement detectors, and so on. Using these methods, low featured information is generated with respect to the user in the road network. Although the existing schemes obtain urban traffic information, they fail to calculate the energy drain rate of nodes and to locate equilibrium between the overhead and quality of the routing protocol that renders a great challenge. Thus, an energy-efficient cluster-based vehicle detection in road network using the intention numeration method (CVDRN-IN) is developed. Initially, sensor nodes that detect a vehicle are grouped into separate clusters. Further, we approximate the strength of the node drain rate for a cluster using polynomial regression function. In addition, the total node energy is estimated by taking the integral over the area. Finally, enhanced data aggregation is performed to reduce the amount of data transmission using digital signature tree. The experimental performance is evaluated with Dodgers loop sensor data set from UCI repository and the performance evaluation outperforms existing work on energy consumption, clustering efficiency, and node drain rate.

  13. Detecting space-time disease clusters with arbitrary shapes and sizes using a co-clustering approach

    Directory of Open Access Journals (Sweden)

    Sami Ullah

    2017-11-01

    Full Text Available Ability to detect potential space-time clusters in spatio-temporal data on disease occurrences is necessary for conducting surveillance and implementing disease prevention policies. Most existing techniques use geometrically shaped (circular, elliptical or square scanning windows to discover disease clusters. In certain situations, where the disease occurrences tend to cluster in very irregularly shaped areas, these algorithms are not feasible in practise for the detection of space-time clusters. To address this problem, a new algorithm is proposed, which uses a co-clustering strategy to detect prospective and retrospective space-time disease clusters with no restriction on shape and size. The proposed method detects space-time disease clusters by tracking the changes in space–time occurrence structure instead of an in-depth search over space. This method was utilised to detect potential clusters in the annual and monthly malaria data in Khyber Pakhtunkhwa Province, Pakistan from 2012 to 2016 visualising the results on a heat map. The results of the annual data analysis showed that the most likely hotspot emerged in three sub-regions in the years 2013-2014. The most likely hotspots in monthly data appeared in the month of July to October in each year and showed a strong periodic trend.

  14. Spatial Cluster Detection for Repeatedly Measured Outcomes while Accounting for Residential History

    OpenAIRE

    Cook, Andrea J.; Gold, Diane R.; Li, Yi

    2009-01-01

    Spatial cluster detection has become an important methodology in quantifying the effect of hazardous exposures. Previous methods have focused on cross-sectional outcomes that are binary or continuous. There are virtually no spatial cluster detection methods proposed for longitudinal outcomes. This paper proposes a new spatial cluster detection method for repeated outcomes using cumulative geographic residuals. A major advantage of this method is its ability to readily incorporate information ...

  15. The detection of neutron clusters

    Energy Technology Data Exchange (ETDEWEB)

    Marques, F.M.; Labiche, M.; Orr, N.A.; Angelique, J.C. [Caen Univ., 14 (France). Lab. de Physique Corpusculaire] [and others

    2001-11-01

    A new approach to the production and detection of bound neutron clusters is presented. The technique is based on the breakup of beams of very neutron-rich nuclei and the subsequent detection of the recoiling proton in a liquid scintillator. The method has been tested in the breakup of {sup 11}Li, {sup 14}Be and {sup 15}B beams by a C target. Some 6 events were observed that exhibit the characteristics of a multi-neutron cluster liberated in the breakup of {sup 14}Be, most probably in the channel {sup 10}Be+{sup 4}n. The various backgrounds that may mimic such a signal are discussed in detail. (author)

  16. Spatial cluster detection for repeatedly measured outcomes while accounting for residential history.

    Science.gov (United States)

    Cook, Andrea J; Gold, Diane R; Li, Yi

    2009-10-01

    Spatial cluster detection has become an important methodology in quantifying the effect of hazardous exposures. Previous methods have focused on cross-sectional outcomes that are binary or continuous. There are virtually no spatial cluster detection methods proposed for longitudinal outcomes. This paper proposes a new spatial cluster detection method for repeated outcomes using cumulative geographic residuals. A major advantage of this method is its ability to readily incorporate information on study participants relocation, which most cluster detection statistics cannot. Application of these methods will be illustrated by the Home Allergens and Asthma prospective cohort study analyzing the relationship between environmental exposures and repeated measured outcome, occurrence of wheeze in the last 6 months, while taking into account mobile locations.

  17. [A cloud detection algorithm for MODIS images combining Kmeans clustering and multi-spectral threshold method].

    Science.gov (United States)

    Wang, Wei; Song, Wei-Guo; Liu, Shi-Xing; Zhang, Yong-Ming; Zheng, Hong-Yang; Tian, Wei

    2011-04-01

    An improved method for detecting cloud combining Kmeans clustering and the multi-spectral threshold approach is described. On the basis of landmark spectrum analysis, MODIS data is categorized into two major types initially by Kmeans method. The first class includes clouds, smoke and snow, and the second class includes vegetation, water and land. Then a multi-spectral threshold detection is applied to eliminate interference such as smoke and snow for the first class. The method is tested with MODIS data at different time under different underlying surface conditions. By visual method to test the performance of the algorithm, it was found that the algorithm can effectively detect smaller area of cloud pixels and exclude the interference of underlying surface, which provides a good foundation for the next fire detection approach.

  18. Weighted community detection and data clustering using message passing

    Science.gov (United States)

    Shi, Cheng; Liu, Yanchen; Zhang, Pan

    2018-03-01

    Grouping objects into clusters based on the similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message-passing algorithms and spectral algorithms proposed for an unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to the Potts model at the critical temperature of spin-glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem, where the data were generated by mixture models in the sparse regime, we show that our method works all the way down to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which heavily reduce the computation complexity in dense networks but give almost the same performance as belief propagation.

  19. Comparison of Molecular Typing Methods Useful for Detecting Clusters of Campylobacter jejuni and C. coli Isolates through Routine Surveillance

    Science.gov (United States)

    Taboada, Eduardo; Grant, Christopher C. R.; Blakeston, Connie; Pollari, Frank; Marshall, Barbara; Rahn, Kris; MacKinnon, Joanne; Daignault, Danielle; Pillai, Dylan; Ng, Lai-King

    2012-01-01

    Campylobacter spp. may be responsible for unreported outbreaks of food-borne disease. The detection of these outbreaks is made more difficult by the fact that appropriate methods for detecting clusters of Campylobacter have not been well defined. We have compared the characteristics of five molecular typing methods on Campylobacter jejuni and C. coli isolates obtained from human and nonhuman sources during sentinel site surveillance during a 3-year period. Comparative genomic fingerprinting (CGF) appears to be one of the optimal methods for the detection of clusters of cases, and it could be supplemented by the sequencing of the flaA gene short variable region (flaA SVR sequence typing), with or without subsequent multilocus sequence typing (MLST). Different methods may be optimal for uncovering different aspects of source attribution. Finally, the use of several different molecular typing or analysis methods for comparing individuals within a population reveals much more about that population than a single method. Similarly, comparing several different typing methods reveals a great deal about differences in how the methods group individuals within the population. PMID:22162562

  20. Semi-supervised spectral algorithms for community detection in complex networks based on equivalence of clustering methods

    Science.gov (United States)

    Ma, Xiaoke; Wang, Bingbo; Yu, Liang

    2018-01-01

    Community detection is fundamental for revealing the structure-functionality relationship in complex networks, which involves two issues-the quantitative function for community as well as algorithms to discover communities. Despite significant research on either of them, few attempt has been made to establish the connection between the two issues. To attack this problem, a generalized quantification function is proposed for community in weighted networks, which provides a framework that unifies several well-known measures. Then, we prove that the trace optimization of the proposed measure is equivalent with the objective functions of algorithms such as nonnegative matrix factorization, kernel K-means as well as spectral clustering. It serves as the theoretical foundation for designing algorithms for community detection. On the second issue, a semi-supervised spectral clustering algorithm is developed by exploring the equivalence relation via combining the nonnegative matrix factorization and spectral clustering. Different from the traditional semi-supervised algorithms, the partial supervision is integrated into the objective of the spectral algorithm. Finally, through extensive experiments on both artificial and real world networks, we demonstrate that the proposed method improves the accuracy of the traditional spectral algorithms in community detection.

  1. Fault Detection Using the Clustering-kNN Rule for Gas Sensor Arrays

    Directory of Open Access Journals (Sweden)

    Jingli Yang

    2016-12-01

    Full Text Available The k-nearest neighbour (kNN rule, which naturally handles the possible non-linearity of data, is introduced to solve the fault detection problem of gas sensor arrays. In traditional fault detection methods based on the kNN rule, the detection process of each new test sample involves all samples in the entire training sample set. Therefore, these methods can be computation intensive in monitoring processes with a large volume of variables and training samples and may be impossible for real-time monitoring. To address this problem, a novel clustering-kNN rule is presented. The landmark-based spectral clustering (LSC algorithm, which has low computational complexity, is employed to divide the entire training sample set into several clusters. Further, the kNN rule is only conducted in the cluster that is nearest to the test sample; thus, the efficiency of the fault detection methods can be enhanced by reducing the number of training samples involved in the detection process of each test sample. The performance of the proposed clustering-kNN rule is fully verified in numerical simulations with both linear and non-linear models and a real gas sensor array experimental system with different kinds of faults. The results of simulations and experiments demonstrate that the clustering-kNN rule can greatly enhance both the accuracy and efficiency of fault detection methods and provide an excellent solution to reliable and real-time monitoring of gas sensor arrays.

  2. Fault Detection Using the Clustering-kNN Rule for Gas Sensor Arrays

    Science.gov (United States)

    Yang, Jingli; Sun, Zhen; Chen, Yinsheng

    2016-01-01

    The k-nearest neighbour (kNN) rule, which naturally handles the possible non-linearity of data, is introduced to solve the fault detection problem of gas sensor arrays. In traditional fault detection methods based on the kNN rule, the detection process of each new test sample involves all samples in the entire training sample set. Therefore, these methods can be computation intensive in monitoring processes with a large volume of variables and training samples and may be impossible for real-time monitoring. To address this problem, a novel clustering-kNN rule is presented. The landmark-based spectral clustering (LSC) algorithm, which has low computational complexity, is employed to divide the entire training sample set into several clusters. Further, the kNN rule is only conducted in the cluster that is nearest to the test sample; thus, the efficiency of the fault detection methods can be enhanced by reducing the number of training samples involved in the detection process of each test sample. The performance of the proposed clustering-kNN rule is fully verified in numerical simulations with both linear and non-linear models and a real gas sensor array experimental system with different kinds of faults. The results of simulations and experiments demonstrate that the clustering-kNN rule can greatly enhance both the accuracy and efficiency of fault detection methods and provide an excellent solution to reliable and real-time monitoring of gas sensor arrays. PMID:27929412

  3. Space-time clusters for early detection of grizzly bear predation.

    Science.gov (United States)

    Kermish-Wells, Joseph; Massolo, Alessandro; Stenhouse, Gordon B; Larsen, Terrence A; Musiani, Marco

    2018-01-01

    Accurate detection and classification of predation events is important to determine predation and consumption rates by predators. However, obtaining this information for large predators is constrained by the speed at which carcasses disappear and the cost of field data collection. To accurately detect predation events, researchers have used GPS collar technology combined with targeted site visits. However, kill sites are often investigated well after the predation event due to limited data retrieval options on GPS collars (VHF or UHF downloading) and to ensure crew safety when working with large predators. This can lead to missing information from small-prey (including young ungulates) kill sites due to scavenging and general site deterioration (e.g., vegetation growth). We used a space-time permutation scan statistic (STPSS) clustering method (SaTScan) to detect predation events of grizzly bears ( Ursus arctos ) fitted with satellite transmitting GPS collars. We used generalized linear mixed models to verify predation events and the size of carcasses using spatiotemporal characteristics as predictors. STPSS uses a probability model to compare expected cluster size (space and time) with the observed size. We applied this method retrospectively to data from 2006 to 2007 to compare our method to random GPS site selection. In 2013-2014, we applied our detection method to visit sites one week after their occupation. Both datasets were collected in the same study area. Our approach detected 23 of 27 predation sites verified by visiting 464 random grizzly bear locations in 2006-2007, 187 of which were within space-time clusters and 277 outside. Predation site detection increased by 2.75 times (54 predation events of 335 visited clusters) using 2013-2014 data. Our GLMMs showed that cluster size and duration predicted predation events and carcass size with high sensitivity (0.72 and 0.94, respectively). Coupling GPS satellite technology with clusters using a program based

  4. Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture

    Science.gov (United States)

    Sanfilippo, Antonio [Richland, WA; Calapristi, Augustin J [West Richland, WA; Crow, Vernon L [Richland, WA; Hetzler, Elizabeth G [Kennewick, WA; Turner, Alan E [Kennewick, WA

    2009-12-22

    Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture are described. In one aspect, a document clustering method includes providing a document set comprising a plurality of documents, providing a cluster comprising a subset of the documents of the document set, using a plurality of terms of the documents, providing a cluster label indicative of subject matter content of the documents of the cluster, wherein the cluster label comprises a plurality of word senses, and selecting one of the word senses of the cluster label.

  5. A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation.

    Directory of Open Access Journals (Sweden)

    Rosemary M McCloskey

    2017-11-01

    Full Text Available Clustering infections by genetic similarity is a popular technique for identifying potential outbreaks of infectious disease, in part because sequences are now routinely collected for clinical management of many infections. A diverse number of nonparametric clustering methods have been developed for this purpose. These methods are generally intuitive, rapid to compute, and readily scale with large data sets. However, we have found that nonparametric clustering methods can be biased towards identifying clusters of diagnosis-where individuals are sampled sooner post-infection-rather than the clusters of rapid transmission that are meant to be potential foci for public health efforts. We develop a fundamentally new approach to genetic clustering based on fitting a Markov-modulated Poisson process (MMPP, which represents the evolution of transmission rates along the tree relating different infections. We evaluated this model-based method alongside five nonparametric clustering methods using both simulated and actual HIV sequence data sets. For simulated clusters of rapid transmission, the MMPP clustering method obtained higher mean sensitivity (85% and specificity (91% than the nonparametric methods. When we applied these clustering methods to published sequences from a study of HIV-1 genetic clusters in Seattle, USA, we found that the MMPP method categorized about half (46% as many individuals to clusters compared to the other methods. Furthermore, the mean internal branch lengths that approximate transmission rates were significantly shorter in clusters extracted using MMPP, but not by other methods. We determined that the computing time for the MMPP method scaled linearly with the size of trees, requiring about 30 seconds for a tree of 1,000 tips and about 20 minutes for 50,000 tips on a single computer. This new approach to genetic clustering has significant implications for the application of pathogen sequence analysis to public health, where

  6. Detection of protein complex from protein-protein interaction network using Markov clustering

    International Nuclear Information System (INIS)

    Ochieng, P J; Kusuma, W A; Haryanto, T

    2017-01-01

    Detection of complexes, or groups of functionally related proteins, is an important challenge while analysing biological networks. However, existing algorithms to identify protein complexes are insufficient when applied to dense networks of experimentally derived interaction data. Therefore, we introduced a graph clustering method based on Markov clustering algorithm to identify protein complex within highly interconnected protein-protein interaction networks. Protein-protein interaction network was first constructed to develop geometrical network, the network was then partitioned using Markov clustering to detect protein complexes. The interest of the proposed method was illustrated by its application to Human Proteins associated to type II diabetes mellitus. Flow simulation of MCL algorithm was initially performed and topological properties of the resultant network were analysed for detection of the protein complex. The results indicated the proposed method successfully detect an overall of 34 complexes with 11 complexes consisting of overlapping modules and 20 non-overlapping modules. The major complex consisted of 102 proteins and 521 interactions with cluster modularity and density of 0.745 and 0.101 respectively. The comparison analysis revealed MCL out perform AP, MCODE and SCPS algorithms with high clustering coefficient (0.751) network density and modularity index (0.630). This demonstrated MCL was the most reliable and efficient graph clustering algorithm for detection of protein complexes from PPI networks. (paper)

  7. Fast clustering using adaptive density peak detection.

    Science.gov (United States)

    Wang, Xiao-Feng; Xu, Yifan

    2017-12-01

    Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.

  8. Detecting edges in the X-ray surface brightness of galaxy clusters

    Science.gov (United States)

    Sanders, J. S.; Fabian, A. C.; Russell, H. R.; Walker, S. A.; Blundell, K. M.

    2016-08-01

    The effects of many physical processes in the intracluster medium of galaxy clusters imprint themselves in X-ray surface brightness images. It is therefore important to choose optimal methods for extracting information from and enhancing the interpretability of such images. We describe in detail a gradient filtering edge detection method that we previously applied to images of the Centaurus cluster of galaxies. The Gaussian gradient filter measures the gradient in the surface brightness distribution on particular spatial scales. We apply this filter on different scales to Chandra X-ray observatory images of two clusters with active galactic nucleus feedback, the Perseus cluster and M 87, and a merging system, A 3667. By combining filtered images on different scales using radial filters spectacular images of the edges in a cluster are produced. We describe how to assess the significance of features in filtered images. We find the gradient filtering technique to have significant advantages for detecting many kinds of features compared to other analysis techniques, such as unsharp masking. Filtering cluster images in this way in a hard energy band allows shocks to be detected.

  9. Detection and quantification of solute clusters in a nanostructured ferritic alloy

    Energy Technology Data Exchange (ETDEWEB)

    Miller, M.K., E-mail: millermk@ornl.gov [Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6139 (United States); Reinhard, D., E-mail: David.Reinhard@ametek.com [CAMECA Instruments, Inc., 5500 Nobel Drive, Madison, WI 53711 (United States); Larson, D.J., E-mail: David.Larson@ametek.com [CAMECA Instruments, Inc., 5500 Nobel Drive, Madison, WI 53711 (United States)

    2015-07-15

    Highlights: • Simulated APT data indicate that solute clusters can be resolved at 80% detection efficiency. • Solute clusters containing 2–9 atoms were detected in a prototype ∼80% detection efficiency LEAP. • High densities, 1.8 × 10{sup 24} m{sup −3}, of solute clusters were detected in as-milled flakes of 14YWT. • Lower densities, 1.2 × 10{sup 24} m{sup −3}, were detected in the stir zone of a FSW. • Vacancies stabilize the clusters, which retard diffusion and confers excellent stability. - Abstract: A series of simulated atom probe datasets were examined with a friends-of-friends method to establish the detection efficiency required to resolve solute clusters in the ferrite phase of a 14YWT nanostructured ferritic alloy. The size and number densities of solute clusters in the ferrite of the as-milled mechanically-alloyed condition and the stir zone of a friction stir weld were estimated with a prototype high-detection-efficiency (∼80%) local electrode atom probe. High number densities, 1.8 × 10{sup 24} m{sup −3} and 1.2 × 10{sup 24} m{sup −3}, respectively of solute clusters containing between 2 and 9 solute atoms of Ti, Y and O and were detected for these two conditions. These results support first principle calculations that predicted that vacancies stabilize these Ti–Y–O– clusters, which retard diffusion and contribute to the excellent high temperature stability of the microstructure and radiation tolerance of nanostructured ferritic alloys.

  10. Segmentation of the Clustered Cells with Optimized Boundary Detection in Negative Phase Contrast Images.

    Science.gov (United States)

    Wang, Yuliang; Zhang, Zaicheng; Wang, Huimin; Bi, Shusheng

    2015-01-01

    Cell image segmentation plays a central role in numerous biology studies and clinical applications. As a result, the development of cell image segmentation algorithms with high robustness and accuracy is attracting more and more attention. In this study, an automated cell image segmentation algorithm is developed to get improved cell image segmentation with respect to cell boundary detection and segmentation of the clustered cells for all cells in the field of view in negative phase contrast images. A new method which combines the thresholding method and edge based active contour method was proposed to optimize cell boundary detection. In order to segment clustered cells, the geographic peaks of cell light intensity were utilized to detect numbers and locations of the clustered cells. In this paper, the working principles of the algorithms are described. The influence of parameters in cell boundary detection and the selection of the threshold value on the final segmentation results are investigated. At last, the proposed algorithm is applied to the negative phase contrast images from different experiments. The performance of the proposed method is evaluated. Results show that the proposed method can achieve optimized cell boundary detection and highly accurate segmentation for clustered cells.

  11. Performance improvement of haptic collision detection using subdivision surface and sphere clustering.

    Directory of Open Access Journals (Sweden)

    A Ram Choi

    Full Text Available Haptics applications such as surgery simulations require collision detections that are more precise than others. An efficient collision detection method based on the clustering of bounding spheres was proposed in our prior study. This paper analyzes and compares the applied effects of the five most common subdivision surface methods on some 3D models for haptic collision detection. The five methods are Butterfly, Catmull-Clark, Mid-point, Loop, and LS3 (Least Squares Subdivision Surface. After performing a number of experiments, we have concluded that LS3 method is the most appropriate for haptic simulations. The more we applied surface subdivision, the more the collision detection results became precise. However, it is observed that the performance becomes better until a certain threshold and degrades afterward. In order to reduce the performance degradation, we adopted our prior work, which was the fast and precise collision detection method based on adaptive clustering. As a result, we obtained a notable improvement of the speed of collision detection.

  12. Semi-supervised clustering methods.

    Science.gov (United States)

    Bair, Eric

    2013-01-01

    Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.

  13. Uncertainty of a detected spatial cluster in 1D: quantification and visualization

    KAUST Repository

    Lee, Junho; Gangnon, Ronald E.; Zhu, Jun; Liang, Jingjing

    2017-01-01

    Spatial cluster detection is an important problem in a variety of scientific disciplines such as environmental sciences, epidemiology and sociology. However, there appears to be very limited statistical methodology for quantifying the uncertainty of a detected cluster. In this paper, we develop a new method for the quantification and visualization of uncertainty associated with a detected cluster. Our approach is defining a confidence set for the true cluster and visualizing the confidence set, based on the maximum likelihood, in time or in one-dimensional space. We evaluate the pivotal property of the statistic used to construct the confidence set and the coverage rate for the true cluster via empirical distributions. For illustration, our methodology is applied to both simulated data and an Alaska boreal forest dataset. Copyright © 2017 John Wiley & Sons, Ltd.

  14. Uncertainty of a detected spatial cluster in 1D: quantification and visualization

    KAUST Repository

    Lee, Junho

    2017-10-19

    Spatial cluster detection is an important problem in a variety of scientific disciplines such as environmental sciences, epidemiology and sociology. However, there appears to be very limited statistical methodology for quantifying the uncertainty of a detected cluster. In this paper, we develop a new method for the quantification and visualization of uncertainty associated with a detected cluster. Our approach is defining a confidence set for the true cluster and visualizing the confidence set, based on the maximum likelihood, in time or in one-dimensional space. We evaluate the pivotal property of the statistic used to construct the confidence set and the coverage rate for the true cluster via empirical distributions. For illustration, our methodology is applied to both simulated data and an Alaska boreal forest dataset. Copyright © 2017 John Wiley & Sons, Ltd.

  15. AMICO: optimized detection of galaxy clusters in photometric surveys

    Science.gov (United States)

    Bellagamba, Fabio; Roncarelli, Mauro; Maturi, Matteo; Moscardini, Lauro

    2018-02-01

    We present Adaptive Matched Identifier of Clustered Objects (AMICO), a new algorithm for the detection of galaxy clusters in photometric surveys. AMICO is based on the Optimal Filtering technique, which allows to maximize the signal-to-noise ratio (S/N) of the clusters. In this work, we focus on the new iterative approach to the extraction of cluster candidates from the map produced by the filter. In particular, we provide a definition of membership probability for the galaxies close to any cluster candidate, which allows us to remove its imprint from the map, allowing the detection of smaller structures. As demonstrated in our tests, this method allows the deblending of close-by and aligned structures in more than 50 per cent of the cases for objects at radial distance equal to 0.5 × R200 or redshift distance equal to 2 × σz, being σz the typical uncertainty of photometric redshifts. Running AMICO on mocks derived from N-body simulations and semi-analytical modelling of the galaxy evolution, we obtain a consistent mass-amplitude relation through the redshift range of 0.3 slope of ∼0.55 and a logarithmic scatter of ∼0.14. The fraction of false detections is steeply decreasing with S/N and negligible at S/N > 5.

  16. A Cluster-based Approach Towards Detecting and Modeling Network Dictionary Attacks

    Directory of Open Access Journals (Sweden)

    A. Tajari Siahmarzkooh

    2016-12-01

    Full Text Available In this paper, we provide an approach to detect network dictionary attacks using a data set collected as flows based on which a clustered graph is resulted. These flows provide an aggregated view of the network traffic in which the exchanged packets in the network are considered so that more internally connected nodes would be clustered. We show that dictionary attacks could be detected through some parameters namely the number and the weight of clusters in time series and their evolution over the time. Additionally, the Markov model based on the average weight of clusters,will be also created. Finally, by means of our suggested model, we demonstrate that artificial clusters of the flows are created for normal and malicious traffic. The results of the proposed approach on CAIDA 2007 data set suggest a high accuracy for the model and, therefore, it provides a proper method for detecting the dictionary attack.

  17. Segmentation of the Clustered Cells with Optimized Boundary Detection in Negative Phase Contrast Images.

    Directory of Open Access Journals (Sweden)

    Yuliang Wang

    Full Text Available Cell image segmentation plays a central role in numerous biology studies and clinical applications. As a result, the development of cell image segmentation algorithms with high robustness and accuracy is attracting more and more attention. In this study, an automated cell image segmentation algorithm is developed to get improved cell image segmentation with respect to cell boundary detection and segmentation of the clustered cells for all cells in the field of view in negative phase contrast images. A new method which combines the thresholding method and edge based active contour method was proposed to optimize cell boundary detection. In order to segment clustered cells, the geographic peaks of cell light intensity were utilized to detect numbers and locations of the clustered cells. In this paper, the working principles of the algorithms are described. The influence of parameters in cell boundary detection and the selection of the threshold value on the final segmentation results are investigated. At last, the proposed algorithm is applied to the negative phase contrast images from different experiments. The performance of the proposed method is evaluated. Results show that the proposed method can achieve optimized cell boundary detection and highly accurate segmentation for clustered cells.

  18. Three-Dimensional Computer-Aided Detection of Microcalcification Clusters in Digital Breast Tomosynthesis

    Directory of Open Access Journals (Sweden)

    Ji-wook Jeong

    2016-01-01

    Full Text Available We propose computer-aided detection (CADe algorithm for microcalcification (MC clusters in reconstructed digital breast tomosynthesis (DBT images. The algorithm consists of prescreening, MC detection, clustering, and false-positive (FP reduction steps. The DBT images containing the MC-like objects were enhanced by a multiscale Hessian-based three-dimensional (3D objectness response function and a connected-component segmentation method was applied to extract the cluster seed objects as potential clustering centers of MCs. Secondly, a signal-to-noise ratio (SNR enhanced image was also generated to detect the individual MC candidates and prescreen the MC-like objects. Each cluster seed candidate was prescreened by counting neighboring individual MC candidates nearby the cluster seed object according to several microcalcification clustering criteria. As a second step, we introduced bounding boxes for the accepted seed candidate, clustered all the overlapping cubes, and examined. After the FP reduction step, the average number of FPs per case was estimated to be 2.47 per DBT volume with a sensitivity of 83.3%.

  19. Semi-supervised clustering methods

    Science.gov (United States)

    Bair, Eric

    2013-01-01

    Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as “semi-supervised clustering” methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided. PMID:24729830

  20. An Improved Semisupervised Outlier Detection Algorithm Based on Adaptive Feature Weighted Clustering

    Directory of Open Access Journals (Sweden)

    Tingquan Deng

    2016-01-01

    Full Text Available There exist already various approaches to outlier detection, in which semisupervised methods achieve encouraging superiority due to the introduction of prior knowledge. In this paper, an adaptive feature weighted clustering-based semisupervised outlier detection strategy is proposed. This method maximizes the membership degree of a labeled normal object to the cluster it belongs to and minimizes the membership degrees of a labeled outlier to all clusters. In consideration of distinct significance of features or components in a dataset in determining an object being an inlier or outlier, each feature is adaptively assigned different weights according to the deviation degrees between this feature of all objects and that of a certain cluster prototype. A series of experiments on a synthetic dataset and several real-world datasets are implemented to verify the effectiveness and efficiency of the proposal.

  1. Local Community Detection Algorithm Based on Minimal Cluster

    Directory of Open Access Journals (Sweden)

    Yong Zhou

    2016-01-01

    Full Text Available In order to discover the structure of local community more effectively, this paper puts forward a new local community detection algorithm based on minimal cluster. Most of the local community detection algorithms begin from one node. The agglomeration ability of a single node must be less than multiple nodes, so the beginning of the community extension of the algorithm in this paper is no longer from the initial node only but from a node cluster containing this initial node and nodes in the cluster are relatively densely connected with each other. The algorithm mainly includes two phases. First it detects the minimal cluster and then finds the local community extended from the minimal cluster. Experimental results show that the quality of the local community detected by our algorithm is much better than other algorithms no matter in real networks or in simulated networks.

  2. Fault detection of flywheel system based on clustering and principal component analysis

    Directory of Open Access Journals (Sweden)

    Wang Rixin

    2015-12-01

    Full Text Available Considering the nonlinear, multifunctional properties of double-flywheel with closed-loop control, a two-step method including clustering and principal component analysis is proposed to detect the two faults in the multifunctional flywheels. At the first step of the proposed algorithm, clustering is taken as feature recognition to check the instructions of “integrated power and attitude control” system, such as attitude control, energy storage or energy discharge. These commands will ask the flywheel system to work in different operation modes. Therefore, the relationship of parameters in different operations can define the cluster structure of training data. Ordering points to identify the clustering structure (OPTICS can automatically identify these clusters by the reachability-plot. K-means algorithm can divide the training data into the corresponding operations according to the reachability-plot. Finally, the last step of proposed model is used to define the relationship of parameters in each operation through the principal component analysis (PCA method. Compared with the PCA model, the proposed approach is capable of identifying the new clusters and learning the new behavior of incoming data. The simulation results show that it can effectively detect the faults in the multifunctional flywheels system.

  3. The use of cluster analysis method for the localization of acoustic emission sources detected during the hydrotest of PWR pressure vessels

    International Nuclear Information System (INIS)

    Liska, J.; Svetlik, M.; Slama, K.

    1982-01-01

    The acoustic emission method is a promising tool for checking reactor pressure vessel integrity. Localization of emission sources is the first and the most important step in processing emission signals. The paper describes the emission sources localization method which is based on cluster analysis of a set of points depicting the emission events in the plane of coordinates of their occurrence. The method is based on using this set of points for constructing the minimum spanning tree and its partition into fragments corresponding to point clusters. Furthermore, the laws are considered of probability distribution of the minimum spanning tree edge length for one and several clusters with the aim of finding the optimum length of the critical edge for the partition of the tree. Practical application of the method is demonstrated on localizing the emission sources detected during a hydrotest of a pressure vessel used for testing the reactor pressure vessel covers. (author)

  4. Cluster detection methods applied to the Upper Cape Cod cancer data

    Directory of Open Access Journals (Sweden)

    Ozonoff David

    2005-09-01

    Full Text Available Abstract Background A variety of statistical methods have been suggested to assess the degree and/or the location of spatial clustering of disease cases. However, there is relatively little in the literature devoted to comparison and critique of different methods. Most of the available comparative studies rely on simulated data rather than real data sets. Methods We have chosen three methods currently used for examining spatial disease patterns: the M-statistic of Bonetti and Pagano; the Generalized Additive Model (GAM method as applied by Webster; and Kulldorff's spatial scan statistic. We apply these statistics to analyze breast cancer data from the Upper Cape Cancer Incidence Study using three different latency assumptions. Results The three different latency assumptions produced three different spatial patterns of cases and controls. For 20 year latency, all three methods generally concur. However, for 15 year latency and no latency assumptions, the methods produce different results when testing for global clustering. Conclusion The comparative analyses of real data sets by different statistical methods provides insight into directions for further research. We suggest a research program designed around examining real data sets to guide focused investigation of relevant features using simulated data, for the purpose of understanding how to interpret statistical methods applied to epidemiological data with a spatial component.

  5. Comparative Investigation of Guided Fuzzy Clustering and Mean Shift Clustering for Edge Detection in Electrical Resistivity Tomography Images of Mineral Deposits

    Science.gov (United States)

    Ward, Wil; Wilkinson, Paul; Chambers, Jon; Bai, Li

    2014-05-01

    Geophysical surveying using electrical resistivity tomography (ERT) can be used as a rapid non-intrusive method to investigate mineral deposits [1]. One of the key challenges with this approach is to find a robust automated method to assess and characterise deposits on the basis of an ERT image. Recent research applying edge detection techniques has yielded a framework that can successfully locate geological interfaces in ERT images using a minimal assumption data clustering technique, the guided fuzzy clustering method (gfcm) [2]. Non-parametric clustering techniques are statistically grounded methods of image segmentation that do not require any assumptions about the distribution of data under investigation. This study is a comparison of two such methods to assess geological structure based on the resistivity images. In addition to gfcm, a method called mean-shift clustering [3] is investigated with comparisons directed at accuracy, computational expense, and degree of user interaction. Neither approach requires the number of clusters as input (a common parameter and often impractical), rather they are based on a similar theory that data can be clustered based on peaks in the probability density function (pdf) of the data. Each local maximum in these functions represents the modal value of a particular population corresponding to a cluster and as such the data are assigned based on their relationships to these model values. The two methods differ in that gfcm approximates the pdf using kernel density estimation and identifies population means, assigning cluster membership probabilities to each resistivity value in the model based on its distance from the distribution averages. Whereas, in mean-shift clustering, the density function is not calculated, but a gradient ascent method creates a vector that leads each datum towards high density distributions iteratively using weighted kernels to calculate locally dense regions. The only parameter needed in both methods

  6. Automatic detection of arterial input function in dynamic contrast enhanced MRI based on affinity propagation clustering.

    Science.gov (United States)

    Shi, Lin; Wang, Defeng; Liu, Wen; Fang, Kui; Wang, Yi-Xiang J; Huang, Wenhua; King, Ann D; Heng, Pheng Ann; Ahuja, Anil T

    2014-05-01

    To automatically and robustly detect the arterial input function (AIF) with high detection accuracy and low computational cost in dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). In this study, we developed an automatic AIF detection method using an accelerated version (Fast-AP) of affinity propagation (AP) clustering. The validity of this Fast-AP-based method was proved on two DCE-MRI datasets, i.e., rat kidney and human head and neck. The detailed AIF detection performance of this proposed method was assessed in comparison with other clustering-based methods, namely original AP and K-means, as well as the manual AIF detection method. Both the automatic AP- and Fast-AP-based methods achieved satisfactory AIF detection accuracy, but the computational cost of Fast-AP could be reduced by 64.37-92.10% on rat dataset and 73.18-90.18% on human dataset compared with the cost of AP. The K-means yielded the lowest computational cost, but resulted in the lowest AIF detection accuracy. The experimental results demonstrated that both the AP- and Fast-AP-based methods were insensitive to the initialization of cluster centers, and had superior robustness compared with K-means method. The Fast-AP-based method enables automatic AIF detection with high accuracy and efficiency. Copyright © 2013 Wiley Periodicals, Inc.

  7. Detecting space-time cancer clusters using residential histories

    Science.gov (United States)

    Jacquez, Geoffrey M.; Meliker, Jaymie R.

    2007-04-01

    Methods for analyzing geographic clusters of disease typically ignore the space-time variability inherent in epidemiologic datasets, do not adequately account for known risk factors (e.g., smoking and education) or covariates (e.g., age, gender, and race), and do not permit investigation of the latency window between exposure and disease. Our research group recently developed Q-statistics for evaluating space-time clustering in cancer case-control studies with residential histories. This technique relies on time-dependent nearest neighbor relationships to examine clustering at any moment in the life-course of the residential histories of cases relative to that of controls. In addition, in place of the widely used null hypothesis of spatial randomness, each individual's probability of being a case is instead based on his/her risk factors and covariates. Case-control clusters will be presented using residential histories of 220 bladder cancer cases and 440 controls in Michigan. In preliminary analyses of this dataset, smoking, age, gender, race and education were sufficient to explain the majority of the clustering of residential histories of the cases. Clusters of unexplained risk, however, were identified surrounding the business address histories of 10 industries that emit known or suspected bladder cancer carcinogens. The clustering of 5 of these industries began in the 1970's and persisted through the 1990's. This systematic approach for evaluating space-time clustering has the potential to generate novel hypotheses about environmental risk factors. These methods may be extended to detect differences in space-time patterns of any two groups of people, making them valuable for security intelligence and surveillance operations.

  8. Membership determination of open clusters based on a spectral clustering method

    Science.gov (United States)

    Gao, Xin-Hua

    2018-06-01

    We present a spectral clustering (SC) method aimed at segregating reliable members of open clusters in multi-dimensional space. The SC method is a non-parametric clustering technique that performs cluster division using eigenvectors of the similarity matrix; no prior knowledge of the clusters is required. This method is more flexible in dealing with multi-dimensional data compared to other methods of membership determination. We use this method to segregate the cluster members of five open clusters (Hyades, Coma Ber, Pleiades, Praesepe, and NGC 188) in five-dimensional space; fairly clean cluster members are obtained. We find that the SC method can capture a small number of cluster members (weak signal) from a large number of field stars (heavy noise). Based on these cluster members, we compute the mean proper motions and distances for the Hyades, Coma Ber, Pleiades, and Praesepe clusters, and our results are in general quite consistent with the results derived by other authors. The test results indicate that the SC method is highly suitable for segregating cluster members of open clusters based on high-precision multi-dimensional astrometric data such as Gaia data.

  9. Unsupervised Learning —A Novel Clustering Method for Rolling Bearing Faults Identification

    Science.gov (United States)

    Kai, Li; Bo, Luo; Tao, Ma; Xuefeng, Yang; Guangming, Wang

    2017-12-01

    To promptly process the massive fault data and automatically provide accurate diagnosis results, numerous studies have been conducted on intelligent fault diagnosis of rolling bearing. Among these studies, such as artificial neural networks, support vector machines, decision trees and other supervised learning methods are used commonly. These methods can detect the failure of rolling bearing effectively, but to achieve better detection results, it often requires a lot of training samples. Based on above, a novel clustering method is proposed in this paper. This novel method is able to find the correct number of clusters automatically the effectiveness of the proposed method is validated using datasets from rolling element bearings. The diagnosis results show that the proposed method can accurately detect the fault types of small samples. Meanwhile, the diagnosis results are also relative high accuracy even for massive samples.

  10. Cluster Detection Tests in Spatial Epidemiology: A Global Indicator for Performance Assessment.

    Directory of Open Access Journals (Sweden)

    Aline Guttmann

    Full Text Available In cluster detection of disease, the use of local cluster detection tests (CDTs is current. These methods aim both at locating likely clusters and testing for their statistical significance. New or improved CDTs are regularly proposed to epidemiologists and must be subjected to performance assessment. Because location accuracy has to be considered, performance assessment goes beyond the raw estimation of type I or II errors. As no consensus exists for performance evaluations, heterogeneous methods are used, and therefore studies are rarely comparable. A global indicator of performance, which assesses both spatial accuracy and usual power, would facilitate the exploration of CDTs behaviour and help between-studies comparisons. The Tanimoto coefficient (TC is a well-known measure of similarity that can assess location accuracy but only for one detected cluster. In a simulation study, performance is measured for many tests. From the TC, we here propose two statistics, the averaged TC and the cumulated TC, as indicators able to provide a global overview of CDTs performance for both usual power and location accuracy. We evidence the properties of these two indicators and the superiority of the cumulated TC to assess performance. We tested these indicators to conduct a systematic spatial assessment displayed through performance maps.

  11. Cluster Detection Tests in Spatial Epidemiology: A Global Indicator for Performance Assessment

    Science.gov (United States)

    Guttmann, Aline; Li, Xinran; Feschet, Fabien; Gaudart, Jean; Demongeot, Jacques; Boire, Jean-Yves; Ouchchane, Lemlih

    2015-01-01

    In cluster detection of disease, the use of local cluster detection tests (CDTs) is current. These methods aim both at locating likely clusters and testing for their statistical significance. New or improved CDTs are regularly proposed to epidemiologists and must be subjected to performance assessment. Because location accuracy has to be considered, performance assessment goes beyond the raw estimation of type I or II errors. As no consensus exists for performance evaluations, heterogeneous methods are used, and therefore studies are rarely comparable. A global indicator of performance, which assesses both spatial accuracy and usual power, would facilitate the exploration of CDTs behaviour and help between-studies comparisons. The Tanimoto coefficient (TC) is a well-known measure of similarity that can assess location accuracy but only for one detected cluster. In a simulation study, performance is measured for many tests. From the TC, we here propose two statistics, the averaged TC and the cumulated TC, as indicators able to provide a global overview of CDTs performance for both usual power and location accuracy. We evidence the properties of these two indicators and the superiority of the cumulated TC to assess performance. We tested these indicators to conduct a systematic spatial assessment displayed through performance maps. PMID:26086911

  12. Performance Analysis of Unsupervised Clustering Methods for Brain Tumor Segmentation

    Directory of Open Access Journals (Sweden)

    Tushar H Jaware

    2013-10-01

    Full Text Available Medical image processing is the most challenging and emerging field of neuroscience. The ultimate goal of medical image analysis in brain MRI is to extract important clinical features that would improve methods of diagnosis & treatment of disease. This paper focuses on methods to detect & extract brain tumour from brain MR images. MATLAB is used to design, software tool for locating brain tumor, based on unsupervised clustering methods. K-Means clustering algorithm is implemented & tested on data base of 30 images. Performance evolution of unsupervised clusteringmethods is presented.

  13. An automated three-dimensional detection and segmentation method for touching cells by integrating concave points clustering and random walker algorithm.

    Directory of Open Access Journals (Sweden)

    Yong He

    Full Text Available Characterizing cytoarchitecture is crucial for understanding brain functions and neural diseases. In neuroanatomy, it is an important task to accurately extract cell populations' centroids and contours. Recent advances have permitted imaging at single cell resolution for an entire mouse brain using the Nissl staining method. However, it is difficult to precisely segment numerous cells, especially those cells touching each other. As presented herein, we have developed an automated three-dimensional detection and segmentation method applied to the Nissl staining data, with the following two key steps: 1 concave points clustering to determine the seed points of touching cells; and 2 random walker segmentation to obtain cell contours. Also, we have evaluated the performance of our proposed method with several mouse brain datasets, which were captured with the micro-optical sectioning tomography imaging system, and the datasets include closely touching cells. Comparing with traditional detection and segmentation methods, our approach shows promising detection accuracy and high robustness.

  14. Detection of CO emission in Hydra 1 cluster galaxies

    International Nuclear Information System (INIS)

    Huchtmeier, W.K.

    1990-01-01

    A survey of bright Hydra cluster spiral galaxies for the CO(1-0) transition at 115 GHz was performed with the 15m Swedish-ESO submillimeter telescope (SEST). Five out of 15 galaxies observed have been detected in the CO(1-0) line. The largest spiral galaxy in the cluster, NGC 3312, got more CO than any spiral of the Virgo cluster. This Sa-type galaxy is optically largely distorted and disrupted on one side. It is a good candidate for ram pressure stripping while passing through the cluster's central region. A comparison with global CO properties of Virgo cluster spirals shows a relatively good agreement with the detected Hydra cluster galaxies

  15. Fast EEG spike detection via eigenvalue analysis and clustering of spatial amplitude distribution

    Science.gov (United States)

    Fukami, Tadanori; Shimada, Takamasa; Ishikawa, Bunnoshin

    2018-06-01

    Objective. In the current study, we tested a proposed method for fast spike detection in electroencephalography (EEG). Approach. We performed eigenvalue analysis in two-dimensional space spanned by gradients calculated from two neighboring samples to detect high-amplitude negative peaks. We extracted the spike candidates by imposing restrictions on parameters regarding spike shape and eigenvalues reflecting detection characteristics of individual medical doctors. We subsequently performed clustering, classifying detected peaks by considering the amplitude distribution at 19 scalp electrodes. Clusters with a small number of candidates were excluded. We then defined a score for eliminating spike candidates for which the pattern of detected electrodes differed from the overall pattern in a cluster. Spikes were detected by setting the score threshold. Main results. Based on visual inspection by a psychiatrist experienced in EEG, we evaluated the proposed method using two statistical measures of precision and recall with respect to detection performance. We found that precision and recall exhibited a trade-off relationship. The average recall value was 0.708 in eight subjects with the score threshold that maximized the F-measure, with 58.6  ±  36.2 spikes per subject. Under this condition, the average precision was 0.390, corresponding to a false positive rate 2.09 times higher than the true positive rate. Analysis of the required processing time revealed that, using a general-purpose computer, our method could be used to perform spike detection in 12.1% of the recording time. The process of narrowing down spike candidates based on shape occupied most of the processing time. Significance. Although the average recall value was comparable with that of other studies, the proposed method significantly shortened the processing time.

  16. Automated detection of microcalcification clusters in mammograms

    Science.gov (United States)

    Karale, Vikrant A.; Mukhopadhyay, Sudipta; Singh, Tulika; Khandelwal, Niranjan; Sadhu, Anup

    2017-03-01

    Mammography is the most efficient modality for detection of breast cancer at early stage. Microcalcifications are tiny bright spots in mammograms and can often get missed by the radiologist during diagnosis. The presence of microcalcification clusters in mammograms can act as an early sign of breast cancer. This paper presents a completely automated computer-aided detection (CAD) system for detection of microcalcification clusters in mammograms. Unsharp masking is used as a preprocessing step which enhances the contrast between microcalcifications and the background. The preprocessed image is thresholded and various shape and intensity based features are extracted. Support vector machine (SVM) classifier is used to reduce the false positives while preserving the true microcalcification clusters. The proposed technique is applied on two different databases i.e DDSM and private database. The proposed technique shows good sensitivity with moderate false positives (FPs) per image on both databases.

  17. Unsupervised Video Shot Detection Using Clustering Ensemble with a Color Global Scale-Invariant Feature Transform Descriptor

    Directory of Open Access Journals (Sweden)

    Yuchou Chang

    2008-02-01

    Full Text Available Scale-invariant feature transform (SIFT transforms a grayscale image into scale-invariant coordinates of local features that are invariant to image scale, rotation, and changing viewpoints. Because of its scale-invariant properties, SIFT has been successfully used for object recognition and content-based image retrieval. The biggest drawback of SIFT is that it uses only grayscale information and misses important visual information regarding color. In this paper, we present the development of a novel color feature extraction algorithm that addresses this problem, and we also propose a new clustering strategy using clustering ensembles for video shot detection. Based on Fibonacci lattice-quantization, we develop a novel color global scale-invariant feature transform (CGSIFT for better description of color contents in video frames for video shot detection. CGSIFT first quantizes a color image, representing it with a small number of color indices, and then uses SIFT to extract features from the quantized color index image. We also develop a new space description method using small image regions to represent global color features as the second step of CGSIFT. Clustering ensembles focusing on knowledge reuse are then applied to obtain better clustering results than using single clustering methods for video shot detection. Evaluation of the proposed feature extraction algorithm and the new clustering strategy using clustering ensembles reveals very promising results for video shot detection.

  18. Unsupervised Video Shot Detection Using Clustering Ensemble with a Color Global Scale-Invariant Feature Transform Descriptor

    Directory of Open Access Journals (Sweden)

    Hong Yi

    2008-01-01

    Full Text Available Abstract Scale-invariant feature transform (SIFT transforms a grayscale image into scale-invariant coordinates of local features that are invariant to image scale, rotation, and changing viewpoints. Because of its scale-invariant properties, SIFT has been successfully used for object recognition and content-based image retrieval. The biggest drawback of SIFT is that it uses only grayscale information and misses important visual information regarding color. In this paper, we present the development of a novel color feature extraction algorithm that addresses this problem, and we also propose a new clustering strategy using clustering ensembles for video shot detection. Based on Fibonacci lattice-quantization, we develop a novel color global scale-invariant feature transform (CGSIFT for better description of color contents in video frames for video shot detection. CGSIFT first quantizes a color image, representing it with a small number of color indices, and then uses SIFT to extract features from the quantized color index image. We also develop a new space description method using small image regions to represent global color features as the second step of CGSIFT. Clustering ensembles focusing on knowledge reuse are then applied to obtain better clustering results than using single clustering methods for video shot detection. Evaluation of the proposed feature extraction algorithm and the new clustering strategy using clustering ensembles reveals very promising results for video shot detection.

  19. Clustering methods for the optimization of atomic cluster structure

    Science.gov (United States)

    Bagattini, Francesco; Schoen, Fabio; Tigli, Luca

    2018-04-01

    In this paper, we propose a revised global optimization method and apply it to large scale cluster conformation problems. In the 1990s, the so-called clustering methods were considered among the most efficient general purpose global optimization techniques; however, their usage has quickly declined in recent years, mainly due to the inherent difficulties of clustering approaches in large dimensional spaces. Inspired from the machine learning literature, we redesigned clustering methods in order to deal with molecular structures in a reduced feature space. Our aim is to show that by suitably choosing a good set of geometrical features coupled with a very efficient descent method, an effective optimization tool is obtained which is capable of finding, with a very high success rate, all known putative optima for medium size clusters without any prior information, both for Lennard-Jones and Morse potentials. The main result is that, beyond being a reliable approach, the proposed method, based on the idea of starting a computationally expensive deep local search only when it seems worth doing so, is capable of saving a huge amount of searches with respect to an analogous algorithm which does not employ a clustering phase. In this paper, we are not claiming the superiority of the proposed method compared to specific, refined, state-of-the-art procedures, but rather indicating a quite straightforward way to save local searches by means of a clustering scheme working in a reduced variable space, which might prove useful when included in many modern methods.

  20. Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster

    Science.gov (United States)

    Syakur, M. A.; Khotimah, B. K.; Rochman, E. M. S.; Satoto, B. D.

    2018-04-01

    Clustering is a data mining technique used to analyse data that has variations and the number of lots. Clustering was process of grouping data into a cluster, so they contained data that is as similar as possible and different from other cluster objects. SMEs Indonesia has a variety of customers, but SMEs do not have the mapping of these customers so they did not know which customers are loyal or otherwise. Customer mapping is a grouping of customer profiling to facilitate analysis and policy of SMEs in the production of goods, especially batik sales. Researchers will use a combination of K-Means method with elbow to improve efficient and effective k-means performance in processing large amounts of data. K-Means Clustering is a localized optimization method that is sensitive to the selection of the starting position from the midpoint of the cluster. So choosing the starting position from the midpoint of a bad cluster will result in K-Means Clustering algorithm resulting in high errors and poor cluster results. The K-means algorithm has problems in determining the best number of clusters. So Elbow looks for the best number of clusters on the K-means method. Based on the results obtained from the process in determining the best number of clusters with elbow method can produce the same number of clusters K on the amount of different data. The result of determining the best number of clusters with elbow method will be the default for characteristic process based on case study. Measurement of k-means value of k-means has resulted in the best clusters based on SSE values on 500 clusters of batik visitors. The result shows the cluster has a sharp decrease is at K = 3, so K as the cut-off point as the best cluster.

  1. A flexible spatial scan statistic with a restricted likelihood ratio for detecting disease clusters.

    Science.gov (United States)

    Tango, Toshiro; Takahashi, Kunihiko

    2012-12-30

    Spatial scan statistics are widely used tools for detection of disease clusters. Especially, the circular spatial scan statistic proposed by Kulldorff (1997) has been utilized in a wide variety of epidemiological studies and disease surveillance. However, as it cannot detect noncircular, irregularly shaped clusters, many authors have proposed different spatial scan statistics, including the elliptic version of Kulldorff's scan statistic. The flexible spatial scan statistic proposed by Tango and Takahashi (2005) has also been used for detecting irregularly shaped clusters. However, this method sets a feasible limitation of a maximum of 30 nearest neighbors for searching candidate clusters because of heavy computational load. In this paper, we show a flexible spatial scan statistic implemented with a restricted likelihood ratio proposed by Tango (2008) to (1) eliminate the limitation of 30 nearest neighbors and (2) to have surprisingly much less computational time than the original flexible spatial scan statistic. As a side effect, it is shown to be able to detect clusters with any shape reasonably well as the relative risk of the cluster becomes large via Monte Carlo simulation. We illustrate the proposed spatial scan statistic with data on mortality from cerebrovascular disease in the Tokyo Metropolitan area, Japan. Copyright © 2012 John Wiley & Sons, Ltd.

  2. A Distributed Algorithm for the Cluster-Based Outlier Detection Using Unsupervised Extreme Learning Machines

    Directory of Open Access Journals (Sweden)

    Xite Wang

    2017-01-01

    Full Text Available Outlier detection is an important data mining task, whose target is to find the abnormal or atypical objects from a given dataset. The techniques for detecting outliers have a lot of applications, such as credit card fraud detection and environment monitoring. Our previous work proposed the Cluster-Based (CB outlier and gave a centralized method using unsupervised extreme learning machines to compute CB outliers. In this paper, we propose a new distributed algorithm for the CB outlier detection (DACB. On the master node, we collect a small number of points from the slave nodes to obtain a threshold. On each slave node, we design a new filtering method that can use the threshold to efficiently speed up the computation. Furthermore, we also propose a ranking method to optimize the order of cluster scanning. At last, the effectiveness and efficiency of the proposed approaches are verified through a plenty of simulation experiments.

  3. A Test for Cluster Bias: Detecting Violations of Measurement Invariance across Clusters in Multilevel Data

    Science.gov (United States)

    Jak, Suzanne; Oort, Frans J.; Dolan, Conor V.

    2013-01-01

    We present a test for cluster bias, which can be used to detect violations of measurement invariance across clusters in 2-level data. We show how measurement invariance assumptions across clusters imply measurement invariance across levels in a 2-level factor model. Cluster bias is investigated by testing whether the within-level factor loadings…

  4. Detecting and extracting clusters in atom probe data: A simple, automated method using Voronoi cells

    International Nuclear Information System (INIS)

    Felfer, P.; Ceguerra, A.V.; Ringer, S.P.; Cairney, J.M.

    2015-01-01

    The analysis of the formation of clusters in solid solutions is one of the most common uses of atom probe tomography. Here, we present a method where we use the Voronoi tessellation of the solute atoms and its geometric dual, the Delaunay triangulation to test for spatial/chemical randomness of the solid solution as well as extracting the clusters themselves. We show how the parameters necessary for cluster extraction can be determined automatically, i.e. without user interaction, making it an ideal tool for the screening of datasets and the pre-filtering of structures for other spatial analysis techniques. Since the Voronoi volumes are closely related to atomic concentrations, the parameters resulting from this analysis can also be used for other concentration based methods such as iso-surfaces. - Highlights: • Cluster analysis of atom probe data can be significantly simplified by using the Voronoi cell volumes of the atomic distribution. • Concentration fields are defined on a single atomic basis using Voronoi cells. • All parameters for the analysis are determined by optimizing the separation probability of bulk atoms vs clustered atoms

  5. A method of clustering observers with different visual characteristics

    Energy Technology Data Exchange (ETDEWEB)

    Niimi, Takanaga [Nagoya University School of Health Sciences, Department of Radiological Technology, 1-1-20 Daiko-minami, Higashi-ku, Nagoya 461-8673 (Japan); Imai, Kuniharu [Nagoya University School of Health Sciences, Department of Radiological Technology, 1-1-20 Daiko-minami, Higashi-ku, Nagoya 461-8673 (Japan); Ikeda, Mitsuru [Nagoya University School of Health Sciences, Department of Radiological Technology, 1-1-20 Daiko-minami, Higashi-ku, Nagoya 461-8673 (Japan); Maeda, Hisatoshi [Nagoya University School of Health Sciences, Department of Radiological Technology, 1-1-20 Daiko-minami, Higashi-ku, Nagoya 461-8673 (Japan)

    2006-01-15

    Evaluation of observer's image perception in medical images is important, and yet has not been performed because it is difficult to quantify visual characteristics. In the present study, we investigated the observer's image perception by clustering a group of 20 observers. Images of a contrast-detail (C-D) phantom, which had cylinders of 10 rows and 10 columns with different diameters and lengths, were acquired with an X-ray screen-film system with fixed exposure conditions. A group of 10 films were prepared for visual evaluations. Sixteen radiological technicians, three radiologists and one medical physicist participated in the observation test. All observers read the phantom radiographs on a transillumination image viewer with room lights off. The detectability was defined as the shortest length of the cylinders of which border the observers could recognize from the background, and was recorded using the number of columns. The detectability was calculated as the average of 10 readings for each observer, and plotted for different phantom diameter. The unweighted pair-group method using arithmetic averages (UPGMA) was adopted for clustering. The observers were clustered into two groups: one group selected objects with a demarcation from the vicinity, and the other group searched for the objects with their eyes constrained. This study showed a usefulness of the cluster method to select personnel with the similar perceptual predisposition when a C-D phantom was used in image quality control.

  6. A method of clustering observers with different visual characteristics

    International Nuclear Information System (INIS)

    Niimi, Takanaga; Imai, Kuniharu; Ikeda, Mitsuru; Maeda, Hisatoshi

    2006-01-01

    Evaluation of observer's image perception in medical images is important, and yet has not been performed because it is difficult to quantify visual characteristics. In the present study, we investigated the observer's image perception by clustering a group of 20 observers. Images of a contrast-detail (C-D) phantom, which had cylinders of 10 rows and 10 columns with different diameters and lengths, were acquired with an X-ray screen-film system with fixed exposure conditions. A group of 10 films were prepared for visual evaluations. Sixteen radiological technicians, three radiologists and one medical physicist participated in the observation test. All observers read the phantom radiographs on a transillumination image viewer with room lights off. The detectability was defined as the shortest length of the cylinders of which border the observers could recognize from the background, and was recorded using the number of columns. The detectability was calculated as the average of 10 readings for each observer, and plotted for different phantom diameter. The unweighted pair-group method using arithmetic averages (UPGMA) was adopted for clustering. The observers were clustered into two groups: one group selected objects with a demarcation from the vicinity, and the other group searched for the objects with their eyes constrained. This study showed a usefulness of the cluster method to select personnel with the similar perceptual predisposition when a C-D phantom was used in image quality control

  7. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods.

    Science.gov (United States)

    Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo

    2016-01-01

    Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community.

  8. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods

    Science.gov (United States)

    Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo

    2016-01-01

    Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community. PMID:27124610

  9. Automatic detection of multiple UXO-like targets using magnetic anomaly inversion and self-adaptive fuzzy c-means clustering

    Science.gov (United States)

    Yin, Gang; Zhang, Yingtang; Fan, Hongbo; Ren, Guoquan; Li, Zhining

    2017-12-01

    We have developed a method for automatically detecting UXO-like targets based on magnetic anomaly inversion and self-adaptive fuzzy c-means clustering. Magnetic anomaly inversion methods are used to estimate the initial locations of multiple UXO-like sources. Although these initial locations have some errors with respect to the real positions, they form dense clouds around the actual positions of the magnetic sources. Then we use the self-adaptive fuzzy c-means clustering algorithm to cluster these initial locations. The estimated number of cluster centroids represents the number of targets and the cluster centroids are regarded as the locations of magnetic targets. Effectiveness of the method has been demonstrated using synthetic datasets. Computational results show that the proposed method can be applied to the case of several UXO-like targets that are randomly scattered within in a confined, shallow subsurface, volume. A field test was carried out to test the validity of the proposed method and the experimental results show that the prearranged magnets can be detected unambiguously and located precisely.

  10. Automatic video shot boundary detection using k-means clustering and improved adaptive dual threshold comparison

    Science.gov (United States)

    Sa, Qila; Wang, Zhihui

    2018-03-01

    At present, content-based video retrieval (CBVR) is the most mainstream video retrieval method, using the video features of its own to perform automatic identification and retrieval. This method involves a key technology, i.e. shot segmentation. In this paper, the method of automatic video shot boundary detection with K-means clustering and improved adaptive dual threshold comparison is proposed. First, extract the visual features of every frame and divide them into two categories using K-means clustering algorithm, namely, one with significant change and one with no significant change. Then, as to the classification results, utilize the improved adaptive dual threshold comparison method to determine the abrupt as well as gradual shot boundaries.Finally, achieve automatic video shot boundary detection system.

  11. Efficient nonparametric and asymptotic Bayesian model selection methods for attributed graph clustering

    KAUST Repository

    Xu, Zhiqiang

    2017-02-16

    Attributed graph clustering, also known as community detection on attributed graphs, attracts much interests recently due to the ubiquity of attributed graphs in real life. Many existing algorithms have been proposed for this problem, which are either distance based or model based. However, model selection in attributed graph clustering has not been well addressed, that is, most existing algorithms assume the cluster number to be known a priori. In this paper, we propose two efficient approaches for attributed graph clustering with automatic model selection. The first approach is a popular Bayesian nonparametric method, while the second approach is an asymptotic method based on a recently proposed model selection criterion, factorized information criterion. Experimental results on both synthetic and real datasets demonstrate that our approaches for attributed graph clustering with automatic model selection significantly outperform the state-of-the-art algorithm.

  12. Efficient nonparametric and asymptotic Bayesian model selection methods for attributed graph clustering

    KAUST Repository

    Xu, Zhiqiang; Cheng, James; Xiao, Xiaokui; Fujimaki, Ryohei; Muraoka, Yusuke

    2017-01-01

    Attributed graph clustering, also known as community detection on attributed graphs, attracts much interests recently due to the ubiquity of attributed graphs in real life. Many existing algorithms have been proposed for this problem, which are either distance based or model based. However, model selection in attributed graph clustering has not been well addressed, that is, most existing algorithms assume the cluster number to be known a priori. In this paper, we propose two efficient approaches for attributed graph clustering with automatic model selection. The first approach is a popular Bayesian nonparametric method, while the second approach is an asymptotic method based on a recently proposed model selection criterion, factorized information criterion. Experimental results on both synthetic and real datasets demonstrate that our approaches for attributed graph clustering with automatic model selection significantly outperform the state-of-the-art algorithm.

  13. Regions of micro-calcifications clusters detection based on new features from imbalance data in mammograms

    Science.gov (United States)

    Wang, Keju; Dong, Min; Yang, Zhen; Guo, Yanan; Ma, Yide

    2017-02-01

    Breast cancer is the most common cancer among women. Micro-calcification cluster on X-ray mammogram is one of the most important abnormalities, and it is effective for early cancer detection. Surrounding Region Dependence Method (SRDM), a statistical texture analysis method is applied for detecting Regions of Interest (ROIs) containing microcalcifications. Inspired by the SRDM, we present a method that extract gray and other features which are effective to predict the positive and negative regions of micro-calcifications clusters in mammogram. By constructing a set of artificial images only containing micro-calcifications, we locate the suspicious pixels of calcifications of a SRDM matrix in original image map. Features are extracted based on these pixels for imbalance date and then the repeated random subsampling method and Random Forest (RF) classifier are used for classification. True Positive (TP) rate and False Positive (FP) can reflect how the result will be. The TP rate is 90% and FP rate is 88.8% when the threshold q is 10. We draw the Receiver Operating Characteristic (ROC) curve and the Area Under the ROC Curve (AUC) value reaches 0.9224. The experiment indicates that our method is effective. A novel regions of micro-calcifications clusters detection method is developed, which is based on new features for imbalance data in mammography, and it can be considered to help improving the accuracy of computer aided diagnosis breast cancer.

  14. A Novel Automatic Detection System for ECG Arrhythmias Using Maximum Margin Clustering with Immune Evolutionary Algorithm

    Directory of Open Access Journals (Sweden)

    Bohui Zhu

    2013-01-01

    Full Text Available This paper presents a novel maximum margin clustering method with immune evolution (IEMMC for automatic diagnosis of electrocardiogram (ECG arrhythmias. This diagnostic system consists of signal processing, feature extraction, and the IEMMC algorithm for clustering of ECG arrhythmias. First, raw ECG signal is processed by an adaptive ECG filter based on wavelet transforms, and waveform of the ECG signal is detected; then, features are extracted from ECG signal to cluster different types of arrhythmias by the IEMMC algorithm. Three types of performance evaluation indicators are used to assess the effect of the IEMMC method for ECG arrhythmias, such as sensitivity, specificity, and accuracy. Compared with K-means and iterSVR algorithms, the IEMMC algorithm reflects better performance not only in clustering result but also in terms of global search ability and convergence ability, which proves its effectiveness for the detection of ECG arrhythmias.

  15. Computerized detection method for asymptomatic white matter lesions in brain screening MR images using a clustering technique

    International Nuclear Information System (INIS)

    Kunieda, Takuya; Uchiyama, Yoshikazu; Hara, Takeshi

    2008-01-01

    Asymptomatic white matter lesions are frequently identified by the screening system known as Brain Dock, which is intended for the detection of asymptomatic brain diseases. The detection of asymptomatic white matter lesions is important because their presence is associated with an increased risk of stroke. Therefore, we have developed a computerized method for the detection of asymptomatic white matter lesions in order to assist radiologists in image interpretation as a ''second opinion''. Our database consisted of T 1 - and T 2 -weighted images obtained from 73 patients. The locations of the white matter lesions were determined by an experienced neuroradiologist. In order to restrict the area to be searched for white matter lesions, we first segmented the cerebral region in T 1 -weighted images by applying thresholding and region-growing techniques. To identify the initial candidate lesions, k-means clustering with pixel values in T 1 - and T 2 -weighted images was applied to the segmented cerebral region. To eliminate false positives (FPs), we determined the features, such as location, size, and circularity, of each of the initial candidate lesions. Finally, a rule-based scheme and a quadratic discriminant analysis with these features were employed to distinguish between white matter lesions and FPs. The results showed that the sensitivity for the detection of white matter lesions was 93.2%, with 4.3 FPs per image, suggesting that our computerized method may be useful for the detection of asymptomatic white matter lesions in T 1 - and T 2 -weighted images. (author)

  16. A comparison of three clustering methods for finding subgroups in MRI, SMS or clinical data: SPSS TwoStep Cluster analysis, Latent Gold and SNOB.

    Science.gov (United States)

    Kent, Peter; Jensen, Rikke K; Kongsted, Alice

    2014-10-02

    There are various methodological approaches to identifying clinically important subgroups and one method is to identify clusters of characteristics that differentiate people in cross-sectional and/or longitudinal data using Cluster Analysis (CA) or Latent Class Analysis (LCA). There is a scarcity of head-to-head comparisons that can inform the choice of which clustering method might be suitable for particular clinical datasets and research questions. Therefore, the aim of this study was to perform a head-to-head comparison of three commonly available methods (SPSS TwoStep CA, Latent Gold LCA and SNOB LCA). The performance of these three methods was compared: (i) quantitatively using the number of subgroups detected, the classification probability of individuals into subgroups, the reproducibility of results, and (ii) qualitatively using subjective judgments about each program's ease of use and interpretability of the presentation of results.We analysed five real datasets of varying complexity in a secondary analysis of data from other research projects. Three datasets contained only MRI findings (n = 2,060 to 20,810 vertebral disc levels), one dataset contained only pain intensity data collected for 52 weeks by text (SMS) messaging (n = 1,121 people), and the last dataset contained a range of clinical variables measured in low back pain patients (n = 543 people). Four artificial datasets (n = 1,000 each) containing subgroups of varying complexity were also analysed testing the ability of these clustering methods to detect subgroups and correctly classify individuals when subgroup membership was known. The results from the real clinical datasets indicated that the number of subgroups detected varied, the certainty of classifying individuals into those subgroups varied, the findings had perfect reproducibility, some programs were easier to use and the interpretability of the presentation of their findings also varied. The results from the artificial datasets

  17. Distribution-based fuzzy clustering of electrical resistivity tomography images for interface detection

    Science.gov (United States)

    Ward, W. O. C.; Wilkinson, P. B.; Chambers, J. E.; Oxby, L. S.; Bai, L.

    2014-04-01

    A novel method for the effective identification of bedrock subsurface elevation from electrical resistivity tomography images is described. Identifying subsurface boundaries in the topographic data can be difficult due to smoothness constraints used in inversion, so a statistical population-based approach is used that extends previous work in calculating isoresistivity surfaces. The analysis framework involves a procedure for guiding a clustering approach based on the fuzzy c-means algorithm. An approximation of resistivity distributions, found using kernel density estimation, was utilized as a means of guiding the cluster centroids used to classify data. A fuzzy method was chosen over hard clustering due to uncertainty in hard edges in the topography data, and a measure of clustering uncertainty was identified based on the reciprocal of cluster membership. The algorithm was validated using a direct comparison of known observed bedrock depths at two 3-D survey sites, using real-time GPS information of exposed bedrock by quarrying on one site, and borehole logs at the other. Results show similarly accurate detection as a leading isosurface estimation method, and the proposed algorithm requires significantly less user input and prior site knowledge. Furthermore, the method is effectively dimension-independent and will scale to data of increased spatial dimensions without a significant effect on the runtime. A discussion on the results by automated versus supervised analysis is also presented.

  18. Prediction, Detection, and Validation of Isotope Clusters in Mass Spectrometry Data

    Directory of Open Access Journals (Sweden)

    Hendrik Treutler

    2016-10-01

    Full Text Available Mass spectrometry is a key analytical platform for metabolomics. The precise quantification and identification of small molecules is a prerequisite for elucidating the metabolism and the detection, validation, and evaluation of isotope clusters in LC-MS data is important for this task. Here, we present an approach for the improved detection of isotope clusters using chemical prior knowledge and the validation of detected isotope clusters depending on the substance mass using database statistics. We find remarkable improvements regarding the number of detected isotope clusters and are able to predict the correct molecular formula in the top three ranks in 92 % of the cases. We make our methodology freely available as part of the Bioconductor packages xcms version 1.50.0 and CAMERA version 1.30.0.

  19. Agglomerative concentric hypersphere clustering applied to structural damage detection

    Science.gov (United States)

    Silva, Moisés; Santos, Adam; Santos, Reginaldo; Figueiredo, Eloi; Sales, Claudomiro; Costa, João C. W. A.

    2017-08-01

    The present paper proposes a novel cluster-based method, named as agglomerative concentric hypersphere (ACH), to detect structural damage in engineering structures. Continuous structural monitoring systems often require unsupervised approaches to automatically infer the health condition of a structure. However, when a structure is under linear and nonlinear effects caused by environmental and operational variability, data normalization procedures are also required to overcome these effects. The proposed approach aims, through a straightforward clustering procedure, to discover automatically the optimal number of clusters, representing the main state conditions of a structural system. Three initialization procedures are introduced to evaluate the impact of deterministic and stochastic initializations on the performance of this approach. The ACH is compared to state-of-the-art approaches, based on Gaussian mixture models and Mahalanobis squared distance, on standard data sets from a post-tensioned bridge located in Switzerland: the Z-24 Bridge. The proposed approach demonstrates more efficiency in modeling the normal condition of the structure and its corresponding main clusters. Furthermore, it reveals a better classification performance than the alternative ones in terms of false-positive and false-negative indications of damage, demonstrating a promising applicability in real-world structural health monitoring scenarios.

  20. Comprehensive cluster analysis with Transitivity Clustering.

    Science.gov (United States)

    Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan

    2011-03-01

    Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

  1. A Cluster-Based Fuzzy Fusion Algorithm for Event Detection in Heterogeneous Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    ZiQi Hao

    2015-01-01

    Full Text Available As limited energy is one of the tough challenges in wireless sensor networks (WSN, energy saving becomes important in increasing the lifecycle of the network. Data fusion enables combining information from several sources thus to provide a unified scenario, which can significantly save sensor energy and enhance sensing data accuracy. In this paper, we propose a cluster-based data fusion algorithm for event detection. We use k-means algorithm to form the nodes into clusters, which can significantly reduce the energy consumption of intracluster communication. Distances between cluster heads and event and energy of clusters are fuzzified, thus to use a fuzzy logic to select the clusters that will participate in data uploading and fusion. Fuzzy logic method is also used by cluster heads for local decision, and then the local decision results are sent to the base station. Decision-level fusion for final decision of event is performed by base station according to the uploaded local decisions and fusion support degree of clusters calculated by fuzzy logic method. The effectiveness of this algorithm is demonstrated by simulation results.

  2. A scan statistic for binary outcome based on hypergeometric probability model, with an application to detecting spatial clusters of Japanese encephalitis.

    Science.gov (United States)

    Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong

    2013-01-01

    As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.

  3. Detection of secondary structure elements in proteins by hydrophobic cluster analysis.

    Science.gov (United States)

    Woodcock, S; Mornon, J P; Henrissat, B

    1992-10-01

    Hydrophobic cluster analysis (HCA) is a protein sequence comparison method based on alpha-helical representations of the sequences where the size, shape and orientation of the clusters of hydrophobic residues are primarily compared. The effectiveness of HCA has been suggested to originate from its potential ability to focus on the residues forming the hydrophobic core of globular proteins. We have addressed the robustness of the bidimensional representation used for HCA in its ability to detect the regular secondary structure elements of proteins. Various parameters have been studied such as those governing cluster size and limits, the hydrophobic residues constituting the clusters as well as the potential shift of the cluster positions with respect to the position of the regular secondary structure elements. The following results have been found to support the alpha-helical bidimensional representation used in HCA: (i) there is a positive correlation (clearly above background noise) between the hydrophobic clusters and the regular secondary structure elements in proteins; (ii) the hydrophobic clusters are centred on the regular secondary structure elements; (iii) the pitch of the helical representation which gives the best correspondence is that of an alpha-helix. The correspondence between hydrophobic clusters and regular secondary structure elements suggests a way to implement variable gap penalties during the automatic alignment of protein sequences.

  4. Statistical method on nonrandom clustering with application to somatic mutations in cancer

    Directory of Open Access Journals (Sweden)

    Rejto Paul A

    2010-01-01

    Full Text Available Abstract Background Human cancer is caused by the accumulation of tumor-specific mutations in oncogenes and tumor suppressors that confer a selective growth advantage to cells. As a consequence of genomic instability and high levels of proliferation, many passenger mutations that do not contribute to the cancer phenotype arise alongside mutations that drive oncogenesis. While several approaches have been developed to separate driver mutations from passengers, few approaches can specifically identify activating driver mutations in oncogenes, which are more amenable for pharmacological intervention. Results We propose a new statistical method for detecting activating mutations in cancer by identifying nonrandom clusters of amino acid mutations in protein sequences. A probability model is derived using order statistics assuming that the location of amino acid mutations on a protein follows a uniform distribution. Our statistical measure is the differences between pair-wise order statistics, which is equivalent to the size of an amino acid mutation cluster, and the probabilities are derived from exact and approximate distributions of the statistical measure. Using data in the Catalog of Somatic Mutations in Cancer (COSMIC database, we have demonstrated that our method detects well-known clusters of activating mutations in KRAS, BRAF, PI3K, and β-catenin. The method can also identify new cancer targets as well as gain-of-function mutations in tumor suppressors. Conclusions Our proposed method is useful to discover activating driver mutations in cancer by identifying nonrandom clusters of somatic amino acid mutations in protein sequences.

  5. A Multidimensional and Multimembership Clustering Method for Social Networks and Its Application in Customer Relationship Management

    Directory of Open Access Journals (Sweden)

    Peixin Zhao

    2013-01-01

    Full Text Available Community detection in social networks plays an important role in cluster analysis. Many traditional techniques for one-dimensional problems have been proven inadequate for high-dimensional or mixed type datasets due to the data sparseness and attribute redundancy. In this paper we propose a graph-based clustering method for multidimensional datasets. This novel method has two distinguished features: nonbinary hierarchical tree and the multi-membership clusters. The nonbinary hierarchical tree clearly highlights meaningful clusters, while the multimembership feature may provide more useful service strategies. Experimental results on the customer relationship management confirm the effectiveness of the new method.

  6. Medical Imaging Lesion Detection Based on Unified Gravitational Fuzzy Clustering

    Directory of Open Access Journals (Sweden)

    Jean Marie Vianney Kinani

    2017-01-01

    Full Text Available We develop a swift, robust, and practical tool for detecting brain lesions with minimal user intervention to assist clinicians and researchers in the diagnosis process, radiosurgery planning, and assessment of the patient’s response to the therapy. We propose a unified gravitational fuzzy clustering-based segmentation algorithm, which integrates the Newtonian concept of gravity into fuzzy clustering. We first perform fuzzy rule-based image enhancement on our database which is comprised of T1/T2 weighted magnetic resonance (MR and fluid-attenuated inversion recovery (FLAIR images to facilitate a smoother segmentation. The scalar output obtained is fed into a gravitational fuzzy clustering algorithm, which separates healthy structures from the unhealthy. Finally, the lesion contour is automatically outlined through the initialization-free level set evolution method. An advantage of this lesion detection algorithm is its precision and its simultaneous use of features computed from the intensity properties of the MR scan in a cascading pattern, which makes the computation fast, robust, and self-contained. Furthermore, we validate our algorithm with large-scale experiments using clinical and synthetic brain lesion datasets. As a result, an 84%–93% overlap performance is obtained, with an emphasis on robustness with respect to different and heterogeneous types of lesion and a swift computation time.

  7. Quick detection of QRS complexes and R-waves using a wavelet transform and K-means clustering.

    Science.gov (United States)

    Xia, Yong; Han, Junze; Wang, Kuanquan

    2015-01-01

    Based on the idea of telemedicine, 24-hour uninterrupted monitoring on electrocardiograms (ECG) has started to be implemented. To create an intelligent ECG monitoring system, an efficient and quick detection algorithm for the characteristic waveforms is needed. This paper aims to give a quick and effective method for detecting QRS-complexes and R-waves in ECGs. The real ECG signal from the MIT-BIH Arrhythmia Database is used for the performance evaluation. The method proposed combined a wavelet transform and the K-means clustering algorithm. A wavelet transform is adopted in the data analysis and preprocessing. Then, based on the slope information of the filtered data, a segmented K-means clustering method is adopted to detect the QRS region. Detection of the R-peak is based on comparing the local amplitudes in each QRS region, which is different from other approaches, and the time cost of R-wave detection is reduced. Of the tested 8 records (total 18201 beats) from the MIT-BIH Arrhythmia Database, an average R-peak detection sensitivity of 99.72 and a positive predictive value of 99.80% are gained; the average time consumed detecting a 30-min original signal is 5.78s, which is competitive with other methods.

  8. An Integrated Intrusion Detection Model of Cluster-Based Wireless Sensor Network.

    Science.gov (United States)

    Sun, Xuemei; Yan, Bo; Zhang, Xinzhong; Rong, Chuitian

    2015-01-01

    Considering wireless sensor network characteristics, this paper combines anomaly and mis-use detection and proposes an integrated detection model of cluster-based wireless sensor network, aiming at enhancing detection rate and reducing false rate. Adaboost algorithm with hierarchical structures is used for anomaly detection of sensor nodes, cluster-head nodes and Sink nodes. Cultural-Algorithm and Artificial-Fish-Swarm-Algorithm optimized Back Propagation is applied to mis-use detection of Sink node. Plenty of simulation demonstrates that this integrated model has a strong performance of intrusion detection.

  9. Galaxy Clusters in the Swift/BAT era II: 10 more Clusters detected above 15 keV

    Energy Technology Data Exchange (ETDEWEB)

    Ajello, M.; /SLAC /KIPAC, Menlo Park; Rebusco, P.; /KIPAC, Menlo Park; Cappelluti, N.; /Garching, Max Planck Inst., MPE /Maryland U., Baltimore County; Reimer, O.; /SLAC /Palermo Observ.; Boehringer, H.; /Garching, Max Planck Inst., MPE; La Parola, V.; Cusumano, G.; /Palermo Observ.

    2010-10-27

    We report on the discovery of 10 additional galaxy clusters detected in the ongoing Swift/BAT all-sky survey. Among the newly BAT-discovered clusters there are: Bullet, Abell 85, Norma, and PKS 0745-19. Norma is the only cluster, among those presented here, which is resolved by BAT. For all the clusters we perform a detailed spectral analysis using XMM-Newton and Swift/BAT data to investigate the presence of a hard (non-thermal) X-ray excess. We find that in most cases the clusters emission in the 0.3-200 keV band can be explained by a multi-temperature thermal model confirming our previous results. For two clusters (Bullet and Abell 3667) we find evidence for the presence of a hard X-ray excess. In the case of the Bullet cluster, our analysis confirms the presence of a non-thermal, power-law like, component with a 20-100 keV flux of 3.4 x 10{sup -12} erg cm{sup -2} s{sup -1} as detected in previous studies. For Abell 3667 the excess emission can be successfully modeled as a hot component (kT = {approx}13 keV). We thus conclude that the hard X-ray emission from galaxy clusters (except the Bullet) has most likely thermal origin.

  10. GALAXY CLUSTERS IN THE SWIFT/BAT ERA. II. 10 MORE CLUSTERS DETECTED ABOVE 15 keV

    International Nuclear Information System (INIS)

    Ajello, M.; Reimer, O.; Rebusco, P.; Cappelluti, N.; Boehringer, H.; La Parola, V.; Cusumano, G.

    2010-01-01

    We report on the discovery of 10 additional galaxy clusters detected in the ongoing Swift/Burst Alert Telescope (BAT) all-sky survey. Among the newly BAT-discovered clusters there are Bullet, A85, Norma, and PKS 0745-19. Norma is the only cluster, among those presented here, which is resolved by BAT. For all the clusters, we perform a detailed spectral analysis using XMM-Newton and Swift/BAT data to investigate the presence of a hard (non-thermal) X-ray excess. We find that in most cases the clusters' emission in the 0.3-200 keV band can be explained by a multi-temperature thermal model confirming our previous results. For two clusters (Bullet and A3667), we find evidence for the presence of a hard X-ray excess. In the case of the Bullet cluster, our analysis confirms the presence of a non-thermal, power-law-like, component with a 20-100 keV flux of 3.4 x 10 -12 erg cm -2 s -1 as detected in previous studies. For A3667, the excess emission can be successfully modeled as a hot component (kT ∼ 13 keV). We thus conclude that the hard X-ray emission from galaxy clusters (except the Bullet) has most likely a thermal origin.

  11. CCM: A Text Classification Method by Clustering

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    In this paper, a new Cluster based Classification Model (CCM) for suspicious email detection and other text classification tasks, is presented. Comparative experiments of the proposed model against traditional classification models and the boosting algorithm are also discussed. Experimental results...... show that the CCM outperforms traditional classification models as well as the boosting algorithm for the task of suspicious email detection on terrorism domain email dataset and topic categorization on the Reuters-21578 and 20 Newsgroups datasets. The overall finding is that applying a cluster based...

  12. Multichannel response analysis on 2D projection views for detection of clustered microcalcifications in digital breast tomosynthesis

    International Nuclear Information System (INIS)

    Wei, Jun; Chan, Heang-Ping; Hadjiiski, Lubomir M.; Helvie, Mark A.; Lu, Yao; Zhou, Chuan; Samala, Ravi

    2014-01-01

    Purpose: To investigate the feasibility of a new two-dimensional (2D) multichannel response (MCR) analysis approach for the detection of clustered microcalcifications (MCs) in digital breast tomosynthesis (DBT). Methods: With IRB approval and informed consent, a data set of two-view DBTs from 42 breasts containing biopsy-proven MC clusters was collected in this study. The authors developed a 2D approach for MC detection using projection view (PV) images rather than the reconstructed three-dimensional (3D) DBT volume. Signal-to-noise ratio (SNR) enhancement processing was first applied to each PV to enhance the potential MCs. The locations of MC candidates were then identified with iterative thresholding. The individual MCs were decomposed with Hermite–Gaussian (HG) and Laguerre–Gaussian (LG) basis functions and the channelized Hotelling model was trained to produce the MCRs for each MC on the 2D images. The MCRs from the PVs were fused in 3D by a coincidence counting method that backprojects the MC candidates on the PVs and traces the coincidence of their ray paths in 3D. The 3D MCR was used to differentiate the true MCs from false positives (FPs). Finally a dynamic clustering method was used to identify the potential MC clusters in the DBT volume based on the fact that true MCs of clinical significance appear in clusters. Using two-fold cross validation, the performance of the 3D MCR for classification of true and false MCs was estimated by the area under the receiver operating characteristic (ROC) curve and the overall performance of the MCR approach for detection of clustered MCs was assessed by free response receiver operating characteristic (FROC) analysis. Results: When the HG basis function was used for MCR analysis, the detection of MC cluster achieved case-based test sensitivities of 80% and 90% at the average FP rates of 0.65 and 1.55 FPs per DBT volume, respectively. With LG basis function, the average FP rates were 0.62 and 1.57 per DBT volume at

  13. INTERSECTION DETECTION BASED ON QUALITATIVE SPATIAL REASONING ON STOPPING POINT CLUSTERS

    Directory of Open Access Journals (Sweden)

    S. Zourlidou

    2016-06-01

    Full Text Available The purpose of this research is to propose and test a method for detecting intersections by analysing collectively acquired trajectories of moving vehicles. Instead of solely relying on the geometric features of the trajectories, such as heading changes, which may indicate turning points and consequently intersections, we extract semantic features of the trajectories in form of sequences of stops and moves. Under this spatiotemporal prism, the extracted semantic information which indicates where vehicles stop can reveal important locations, such as junctions. The advantage of the proposed approach in comparison with existing turning-points oriented approaches is that it can detect intersections even when not all the crossing road segments are sampled and therefore no turning points are observed in the trajectories. The challenge with this approach is that first of all, not all vehicles stop at the same location – thus, the stop-location is blurred along the direction of the road; this, secondly, leads to the effect that nearby junctions can induce similar stop-locations. As a first step, a density-based clustering is applied on the layer of stop observations and clusters of stop events are found. Representative points of the clusters are determined (one per cluster and in a last step the existence of an intersection is clarified based on spatial relational cluster reasoning, with which less informative geospatial clusters, in terms of whether a junction exists and where its centre lies, are transformed in more informative ones. Relational reasoning criteria, based on the relative orientation of the clusters with their adjacent ones are discussed for making sense of the relation that connects them, and finally for forming groups of stop events that belong to the same junction.

  14. A relevance vector machine technique for the automatic detection of clustered microcalcifications (Honorable Mention Poster Award)

    Science.gov (United States)

    Wei, Liyang; Yang, Yongyi; Nishikawa, Robert M.

    2005-04-01

    Microcalcification (MC) clusters in mammograms can be important early signs of breast cancer in women. Accurate detection of MC clusters is an important but challenging problem. In this paper, we propose the use of a recently developed machine learning technique -- relevance vector machine (RVM) -- for automatic detection of MCs in digitized mammograms. RVM is based on Bayesian estimation theory, and as a feature it can yield a decision function that depends on only a very small number of so-called relevance vectors. We formulate MC detection as a supervised-learning problem, and use RVM to classify if an MC object is present or not at each location in a mammogram image. MC clusters are then identified by grouping the detected MC objects. The proposed method is tested using a database of 141 clinical mammograms, and compared with a support vector machine (SVM) classifier which we developed previously. The detection performance is evaluated using the free-response receiver operating characteristic (FROC) curves. It is demonstrated that the RVM classifier matches closely with the SVM classifier in detection performance, and does so with a much sparser kernel representation than the SVM classifier. Consequently, the RVM classifier greatly reduces the computational complexity, making it more suitable for real-time processing of MC clusters in mammograms.

  15. Automatic detection of erythemato-squamous diseases using k-means clustering.

    Science.gov (United States)

    Ubeyli, Elif Derya; Doğdu, Erdoğan

    2010-04-01

    A new approach based on the implementation of k-means clustering is presented for automated detection of erythemato-squamous diseases. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. The studied domain contained records of patients with known diagnosis. The k-means clustering algorithm's task was to classify the data points, in this case the patients with attribute data, to one of the five clusters. The algorithm was used to detect the five erythemato-squamous diseases when 33 features defining five disease indications were used. The purpose is to determine an optimum classification scheme for this problem. The present research demonstrated that the features well represent the erythemato-squamous diseases and the k-means clustering algorithm's task achieved high classification accuracies for only five erythemato-squamous diseases.

  16. Cluster temperature. Methods for its measurement and stabilization

    International Nuclear Information System (INIS)

    Makarov, G N

    2008-01-01

    Cluster temperature is an important material parameter essential to many physical and chemical processes involving clusters and cluster beams. Because of the diverse methods by which clusters can be produced, excited, and stabilized, and also because of the widely ranging values of atomic and molecular binding energies (approximately from 10 -5 to 10 eV) and numerous energy relaxation channels in clusters, cluster temperature (internal energy) ranges from 10 -3 to about 10 8 K. This paper reviews research on cluster temperature and describes methods for its measurement and stabilization. The role of cluster temperature in and its influence on physical and chemical processes is discussed. Results on the temperature dependence of cluster properties are presented. The way in which cluster temperature relates to cluster structure and to atomic and molecular interaction potentials in clusters is addressed. Methods for strong excitation of clusters and channels for their energy relaxation are discussed. Some applications of clusters and cluster beams are considered. (reviews of topical problems)

  17. Why so GLUMM? Detecting depression clusters through graphing lifestyle-environs using machine-learning methods (GLUMM).

    Science.gov (United States)

    Dipnall, J F; Pasco, J A; Berk, M; Williams, L J; Dodd, S; Jacka, F N; Meyer, D

    2017-01-01

    Key lifestyle-environ risk factors are operative for depression, but it is unclear how risk factors cluster. Machine-learning (ML) algorithms exist that learn, extract, identify and map underlying patterns to identify groupings of depressed individuals without constraints. The aim of this research was to use a large epidemiological study to identify and characterise depression clusters through "Graphing lifestyle-environs using machine-learning methods" (GLUMM). Two ML algorithms were implemented: unsupervised Self-organised mapping (SOM) to create GLUMM clusters and a supervised boosted regression algorithm to describe clusters. Ninety-six "lifestyle-environ" variables were used from the National health and nutrition examination study (2009-2010). Multivariate logistic regression validated clusters and controlled for possible sociodemographic confounders. The SOM identified two GLUMM cluster solutions. These solutions contained one dominant depressed cluster (GLUMM5-1, GLUMM7-1). Equal proportions of members in each cluster rated as highly depressed (17%). Alcohol consumption and demographics validated clusters. Boosted regression identified GLUMM5-1 as more informative than GLUMM7-1. Members were more likely to: have problems sleeping; unhealthy eating; ≤2 years in their home; an old home; perceive themselves underweight; exposed to work fumes; experienced sex at ≤14 years; not perform moderate recreational activities. A positive relationship between GLUMM5-1 (OR: 7.50, Pdepression was found, with significant interactions with those married/living with partner (P=0.001). Using ML based GLUMM to form ordered depressive clusters from multitudinous lifestyle-environ variables enabled a deeper exploration of the heterogeneous data to uncover better understandings into relationships between the complex mental health factors. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  18. The detection of clusters in rare diseases

    Energy Technology Data Exchange (ETDEWEB)

    Besag, J. (Washington Univ., Seattle, WA (USA) Newcastle upon Tyne Univ. (UK)); Newell, J. (Newcastle upon Tyne Univ. (UK))

    1991-01-01

    Tests for clustering of rare diseases investigate whether an observed pattern of cases in one or more geographical regions could reasonably have arisen by chance alone, bearing in mind the variation in background population density. In contrast, tests for the detection of clusters are concerned with screening a large region for evidence of individual 'hot spots' of disease but without any preconception about their likely locations; the results of such tests may form the basis for subsequent small area investigations, statistical or non-statistical, but will rarely be an end in themselves. The main intention of the paper is to describe and illustrate a new technique for the identification of small clusters of disease. A secondary purpose is to discuss some common pitfalls in the application of tests of clustering to epidemiological data. (author).

  19. AN EFFICIENT INITIALIZATION METHOD FOR K-MEANS CLUSTERING OF HYPERSPECTRAL DATA

    Directory of Open Access Journals (Sweden)

    A. Alizade Naeini

    2014-10-01

    Full Text Available K-means is definitely the most frequently used partitional clustering algorithm in the remote sensing community. Unfortunately due to its gradient decent nature, this algorithm is highly sensitive to the initial placement of cluster centers. This problem deteriorates for the high-dimensional data such as hyperspectral remotely sensed imagery. To tackle this problem, in this paper, the spectral signatures of the endmembers in the image scene are extracted and used as the initial positions of the cluster centers. For this purpose, in the first step, A Neyman–Pearson detection theory based eigen-thresholding method (i.e., the HFC method has been employed to estimate the number of endmembers in the image. Afterwards, the spectral signatures of the endmembers are obtained using the Minimum Volume Enclosing Simplex (MVES algorithm. Eventually, these spectral signatures are used to initialize the k-means clustering algorithm. The proposed method is implemented on a hyperspectral dataset acquired by ROSIS sensor with 103 spectral bands over the Pavia University campus, Italy. For comparative evaluation, two other commonly used initialization methods (i.e., Bradley & Fayyad (BF and Random methods are implemented and compared. The confusion matrix, overall accuracy and Kappa coefficient are employed to assess the methods’ performance. The evaluations demonstrate that the proposed solution outperforms the other initialization methods and can be applied for unsupervised classification of hyperspectral imagery for landcover mapping.

  20. Global detection approach for clustered microcalcifications in mammograms using a deep learning network.

    Science.gov (United States)

    Wang, Juan; Nishikawa, Robert M; Yang, Yongyi

    2017-04-01

    In computerized detection of clustered microcalcifications (MCs) from mammograms, the traditional approach is to apply a pattern detector to locate the presence of individual MCs, which are subsequently grouped into clusters. Such an approach is often susceptible to the occurrence of false positives (FPs) caused by local image patterns that resemble MCs. We investigate the feasibility of a direct detection approach to determining whether an image region contains clustered MCs or not. Toward this goal, we develop a deep convolutional neural network (CNN) as the classifier model to which the input consists of a large image window ([Formula: see text] in size). The multiple layers in the CNN classifier are trained to automatically extract image features relevant to MCs at different spatial scales. In the experiments, we demonstrated this approach on a dataset consisting of both screen-film mammograms and full-field digital mammograms. We evaluated the detection performance both on classifying image regions of clustered MCs using a receiver operating characteristic (ROC) analysis and on detecting clustered MCs from full mammograms by a free-response receiver operating characteristic analysis. For comparison, we also considered a recently developed MC detector with FP suppression. In classifying image regions of clustered MCs, the CNN classifier achieved 0.971 in the area under the ROC curve, compared to 0.944 for the MC detector. In detecting clustered MCs from full mammograms, at 90% sensitivity, the CNN classifier obtained an FP rate of 0.69 clusters/image, compared to 1.17 clusters/image by the MC detector. These results indicate that using global image features can be more effective in discriminating clustered MCs from FPs caused by various sources, such as linear structures, thereby providing a more accurate detection of clustered MCs on mammograms.

  1. A spatial hazard model for cluster detection on continuous indicators of disease: application to somatic cell score.

    Science.gov (United States)

    Gay, Emilie; Senoussi, Rachid; Barnouin, Jacques

    2007-01-01

    Methods for spatial cluster detection dealing with diseases quantified by continuous variables are few, whereas several diseases are better approached by continuous indicators. For example, subclinical mastitis of the dairy cow is evaluated using a continuous marker of udder inflammation, the somatic cell score (SCS). Consequently, this study proposed to analyze spatialized risk and cluster components of herd SCS through a new method based on a spatial hazard model. The dataset included annual SCS for 34 142 French dairy herds for the year 2000, and important SCS risk factors: mean parity, percentage of winter and spring calvings, and herd size. The model allowed the simultaneous estimation of the effects of known risk factors and of potential spatial clusters on SCS, and the mapping of the estimated clusters and their range. Mean parity and winter and spring calvings were significantly associated with subclinical mastitis risk. The model with the presence of 3 clusters was highly significant, and the 3 clusters were attractive, i.e. closeness to cluster center increased the occurrence of high SCS. The three localizations were the following: close to the city of Troyes in the northeast of France; around the city of Limoges in the center-west; and in the southwest close to the city of Tarbes. The semi-parametric method based on spatial hazard modeling applies to continuous variables, and takes account of both risk factors and potential heterogeneity of the background population. This tool allows a quantitative detection but assumes a spatially specified form for clusters.

  2. Trend analysis using non-stationary time series clustering based on the finite element method

    Science.gov (United States)

    Gorji Sefidmazgi, M.; Sayemuzzaman, M.; Homaifar, A.; Jha, M. K.; Liess, S.

    2014-05-01

    In order to analyze low-frequency variability of climate, it is useful to model the climatic time series with multiple linear trends and locate the times of significant changes. In this paper, we have used non-stationary time series clustering to find change points in the trends. Clustering in a multi-dimensional non-stationary time series is challenging, since the problem is mathematically ill-posed. Clustering based on the finite element method (FEM) is one of the methods that can analyze multidimensional time series. One important attribute of this method is that it is not dependent on any statistical assumption and does not need local stationarity in the time series. In this paper, it is shown how the FEM-clustering method can be used to locate change points in the trend of temperature time series from in situ observations. This method is applied to the temperature time series of North Carolina (NC) and the results represent region-specific climate variability despite higher frequency harmonics in climatic time series. Next, we investigated the relationship between the climatic indices with the clusters/trends detected based on this clustering method. It appears that the natural variability of climate change in NC during 1950-2009 can be explained mostly by AMO and solar activity.

  3. Momentum-space cluster dual-fermion method

    Science.gov (United States)

    Iskakov, Sergei; Terletska, Hanna; Gull, Emanuel

    2018-03-01

    Recent years have seen the development of two types of nonlocal extensions to the single-site dynamical mean field theory. On one hand, cluster approximations, such as the dynamical cluster approximation, recover short-range momentum-dependent correlations nonperturbatively. On the other hand, diagrammatic extensions, such as the dual-fermion theory, recover long-ranged corrections perturbatively. The correct treatment of both strong short-ranged and weak long-ranged correlations within the same framework is therefore expected to lead to a quick convergence of results, and offers the potential of obtaining smooth self-energies in nonperturbative regimes of phase space. In this paper, we present an exact cluster dual-fermion method based on an expansion around the dynamical cluster approximation. Unlike previous formulations, our method does not employ a coarse-graining approximation to the interaction, which we show to be the leading source of error at high temperature, and converges to the exact result independently of the size of the underlying cluster. We illustrate the power of the method with results for the second-order cluster dual-fermion approximation to the single-particle self-energies and double occupancies.

  4. Improving local clustering based top-L link prediction methods via asymmetric link clustering information

    Science.gov (United States)

    Wu, Zhihao; Lin, Youfang; Zhao, Yiji; Yan, Hongyan

    2018-02-01

    Networks can represent a wide range of complex systems, such as social, biological and technological systems. Link prediction is one of the most important problems in network analysis, and has attracted much research interest recently. Many link prediction methods have been proposed to solve this problem with various techniques. We can note that clustering information plays an important role in solving the link prediction problem. In previous literatures, we find node clustering coefficient appears frequently in many link prediction methods. However, node clustering coefficient is limited to describe the role of a common-neighbor in different local networks, because it cannot distinguish different clustering abilities of a node to different node pairs. In this paper, we shift our focus from nodes to links, and propose the concept of asymmetric link clustering (ALC) coefficient. Further, we improve three node clustering based link prediction methods via the concept of ALC. The experimental results demonstrate that ALC-based methods outperform node clustering based methods, especially achieving remarkable improvements on food web, hamster friendship and Internet networks. Besides, comparing with other methods, the performance of ALC-based methods are very stable in both globalized and personalized top-L link prediction tasks.

  5. A speeded-up saliency region-based contrast detection method for small targets

    Science.gov (United States)

    Li, Zhengjie; Zhang, Haiying; Bai, Jiaojiao; Zhou, Zhongjun; Zheng, Huihuang

    2018-04-01

    To cope with the rapid development of the real applications for infrared small targets, the researchers have tried their best to pursue more robust detection methods. At present, the contrast measure-based method has become a promising research branch. Following the framework, in this paper, a speeded-up contrast measure scheme is proposed based on the saliency detection and density clustering. First, the saliency region is segmented by saliency detection method, and then, the Multi-scale contrast calculation is carried out on it instead of traversing the whole image. Second, the target with a certain "integrity" property in spatial is exploited to distinguish the target from the isolated noises by density clustering. Finally, the targets are detected by a self-adaptation threshold. Compared with time-consuming MPCM (Multiscale Patch Contrast Map), the time cost of the speeded-up version is within a few seconds. Additional, due to the use of "clustering segmentation", the false alarm caused by heavy noises can be restrained to a lower level. The experiments show that our method has a satisfied FASR (False alarm suppression ratio) and real-time performance compared with the state-of-art algorithms no matter in cloudy sky or sea-sky background.

  6. Population clustering based on copy number variations detected from next generation sequencing data.

    Science.gov (United States)

    Duan, Junbo; Zhang, Ji-Gang; Wan, Mingxi; Deng, Hong-Wen; Wang, Yu-Ping

    2014-08-01

    Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.

  7. A statistical method (cross-validation) for bone loss region detection after spaceflight

    Science.gov (United States)

    Zhao, Qian; Li, Wenjun; Li, Caixia; Chu, Philip W.; Kornak, John; Lang, Thomas F.

    2010-01-01

    Astronauts experience bone loss after the long spaceflight missions. Identifying specific regions that undergo the greatest losses (e.g. the proximal femur) could reveal information about the processes of bone loss in disuse and disease. Methods for detecting such regions, however, remains an open problem. This paper focuses on statistical methods to detect such regions. We perform statistical parametric mapping to get t-maps of changes in images, and propose a new cross-validation method to select an optimum suprathreshold for forming clusters of pixels. Once these candidate clusters are formed, we use permutation testing of longitudinal labels to derive significant changes. PMID:20632144

  8. System and Method for Outlier Detection via Estimating Clusters

    Science.gov (United States)

    Iverson, David J. (Inventor)

    2016-01-01

    An efficient method and system for real-time or offline analysis of multivariate sensor data for use in anomaly detection, fault detection, and system health monitoring is provided. Models automatically derived from training data, typically nominal system data acquired from sensors in normally operating conditions or from detailed simulations, are used to identify unusual, out of family data samples (outliers) that indicate possible system failure or degradation. Outliers are determined through analyzing a degree of deviation of current system behavior from the models formed from the nominal system data. The deviation of current system behavior is presented as an easy to interpret numerical score along with a measure of the relative contribution of each system parameter to any off-nominal deviation. The techniques described herein may also be used to "clean" the training data.

  9. Human population structure detection via multilocus genotype clustering

    Directory of Open Access Journals (Sweden)

    Starmer Joshua

    2007-06-01

    Full Text Available Abstract Background We describe a hierarchical clustering algorithm for using Single Nucleotide Polymorphism (SNP genetic data to assign individuals to populations. The method does not assume Hardy-Weinberg equilibrium and linkage equilibrium among loci in sample population individuals. Results We show that the algorithm can assign sample individuals highly accurately to their corresponding ethnic groups in our tests using HapMap SNP data and it is also robust to admixed populations when tested with Perlegen SNP data. Moreover, it can detect fine-scale population structure as subtle as that between Chinese and Japanese by using genome-wide high-diversity SNP loci. Conclusion The algorithm provides an alternative approach to the popular STRUCTURE program, especially for fine-scale population structure detection in genome-wide association studies. This is the first successful separation of Chinese and Japanese samples using random SNP loci with high statistical support.

  10. Progeny Clustering: A Method to Identify Biological Phenotypes

    Science.gov (United States)

    Hu, Chenyue W.; Kornblau, Steven M.; Slater, John H.; Qutub, Amina A.

    2015-01-01

    Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset. PMID:26267476

  11. Hybrid Tracking Algorithm Improvements and Cluster Analysis Methods.

    Science.gov (United States)

    1982-02-26

    UPGMA ), and Ward’s method. Ling’s papers describe a (k,r) clustering method. Each of these methods have individual characteristics which make them...Reference 7), UPGMA is probably the most frequently used clustering strategy. UPGMA tries to group new points into an existing cluster by using an

  12. Detecting Android Malwares with High-Efficient Hybrid Analyzing Methods

    Directory of Open Access Journals (Sweden)

    Yu Liu

    2018-01-01

    Full Text Available In order to tackle the security issues caused by malwares of Android OS, we proposed a high-efficient hybrid-detecting scheme for Android malwares. Our scheme employed different analyzing methods (static and dynamic methods to construct a flexible detecting scheme. In this paper, we proposed some detecting techniques such as Com+ feature based on traditional Permission and API call features to improve the performance of static detection. The collapsing issue of traditional function call graph-based malware detection was also avoided, as we adopted feature selection and clustering method to unify function call graph features of various dimensions into same dimension. In order to verify the performance of our scheme, we built an open-access malware dataset in our experiments. The experimental results showed that the suggested scheme achieved high malware-detecting accuracy, and the scheme could be used to establish Android malware-detecting cloud services, which can automatically adopt high-efficiency analyzing methods according to the properties of the Android applications.

  13. Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.

    Directory of Open Access Journals (Sweden)

    Ujjwal Maulik

    Full Text Available Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request.sarkar@labri.fr.

  14. Fuzzy C-means method for clustering microarray data.

    Science.gov (United States)

    Dembélé, Doulaye; Kastner, Philippe

    2003-05-22

    Clustering analysis of data from DNA microarray hybridization studies is essential for identifying biologically relevant groups of genes. Partitional clustering methods such as K-means or self-organizing maps assign each gene to a single cluster. However, these methods do not provide information about the influence of a given gene for the overall shape of clusters. Here we apply a fuzzy partitioning method, Fuzzy C-means (FCM), to attribute cluster membership values to genes. A major problem in applying the FCM method for clustering microarray data is the choice of the fuzziness parameter m. We show that the commonly used value m = 2 is not appropriate for some data sets, and that optimal values for m vary widely from one data set to another. We propose an empirical method, based on the distribution of distances between genes in a given data set, to determine an adequate value for m. By setting threshold levels for the membership values, genes which are tigthly associated to a given cluster can be selected. Using a yeast cell cycle data set as an example, we show that this selection increases the overall biological significance of the genes within the cluster. Supplementary text and Matlab functions are available at http://www-igbmc.u-strasbg.fr/fcm/

  15. Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases.

    Science.gov (United States)

    Le Vu, Stéphane; Ratmann, Oliver; Delpech, Valerie; Brown, Alison E; Gill, O Noel; Tostevin, Anna; Fraser, Christophe; Volz, Erik M

    2018-06-01

    Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  16. Form gene clustering method about pan-ethnic-group products based on emotional semantic

    Science.gov (United States)

    Chen, Dengkai; Ding, Jingjing; Gao, Minzhuo; Ma, Danping; Liu, Donghui

    2016-09-01

    The use of pan-ethnic-group products form knowledge primarily depends on a designer's subjective experience without user participation. The majority of studies primarily focus on the detection of the perceptual demands of consumers from the target product category. A pan-ethnic-group products form gene clustering method based on emotional semantic is constructed. Consumers' perceptual images of the pan-ethnic-group products are obtained by means of product form gene extraction and coding and computer aided product form clustering technology. A case of form gene clustering about the typical pan-ethnic-group products is investigated which indicates that the method is feasible. This paper opens up a new direction for the future development of product form design which improves the agility of product design process in the era of Industry 4.0.

  17. Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential Evolution

    OpenAIRE

    Satish Gajawada; Durga Toshniwal

    2012-01-01

    Differential Evolution (DE) is an algorithm for evolutionary optimization. Clustering problems have beensolved by using DE based clustering methods but these methods may fail to find clusters hidden insubspaces of high dimensional datasets. Subspace and projected clustering methods have been proposed inliterature to find subspace clusters that are present in subspaces of dataset. In this paper we proposeVINAYAKA, a semi-supervised projected clustering method based on DE. In this method DE opt...

  18. A method of detecting spatial clustering of disease

    International Nuclear Information System (INIS)

    Openshaw, S.; Wilkie, D.; Binks, K.; Wakeford, R.; Gerrard, M.H.; Croasdale, M.R.

    1989-01-01

    A statistical technique has been developed to identify extreme groupings of a disease and is being applied to childhood cancers, initially to acute lymphoblastic leukaemia incidence in the Northern and North-Western Regions of England. The method covers the area with a square grid, the size of which is varied over a wide range and whose origin is moved in small increments in two directions. The population at risk within any square is estimated using the 1971 and 1981 censuses. The significance of an excess of disease is determined by random simulation. In addition, tests to detect a general departure from a background Poisson process are carried out. Available results will be presented at the conference. (author)

  19. Single pass kernel k-means clustering method

    Indian Academy of Sciences (India)

    In unsupervised classification, kernel -means clustering method has been shown to perform better than conventional -means clustering method in ... 518501, India; Department of Computer Science and Engineering, Jawaharlal Nehru Technological University, Anantapur College of Engineering, Anantapur 515002, India ...

  20. Comparative analysis on the selection of number of clusters in community detection

    Science.gov (United States)

    Kawamoto, Tatsuro; Kabashima, Yoshiyuki

    2018-02-01

    We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.

  1. The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

    Science.gov (United States)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-07-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  2. Locally adaptive decision in detection of clustered microcalcifications in mammograms

    Science.gov (United States)

    Sainz de Cea, María V.; Nishikawa, Robert M.; Yang, Yongyi

    2018-02-01

    In computer-aided detection or diagnosis of clustered microcalcifications (MCs) in mammograms, the performance often suffers from not only the presence of false positives (FPs) among the detected individual MCs but also large variability in detection accuracy among different cases. To address this issue, we investigate a locally adaptive decision scheme in MC detection by exploiting the noise characteristics in a lesion area. Instead of developing a new MC detector, we propose a decision scheme on how to best decide whether a detected object is an MC or not in the detector output. We formulate the individual MCs as statistical outliers compared to the many noisy detections in a lesion area so as to account for the local image characteristics. To identify the MCs, we first consider a parametric method for outlier detection, the Mahalanobis distance detector, which is based on a multi-dimensional Gaussian distribution on the noisy detections. We also consider a non-parametric method which is based on a stochastic neighbor graph model of the detected objects. We demonstrated the proposed decision approach with two existing MC detectors on a set of 188 full-field digital mammograms (95 cases). The results, evaluated using free response operating characteristic (FROC) analysis, showed a significant improvement in detection accuracy by the proposed outlier decision approach over traditional thresholding (the partial area under the FROC curve increased from 3.95 to 4.25, p-value  FPs at a given sensitivity level. The proposed adaptive decision approach could not only reduce the number of FPs in detected MCs but also improve case-to-case consistency in detection.

  3. Community detection in complex networks using proximate support vector clustering

    Science.gov (United States)

    Wang, Feifan; Zhang, Baihai; Chai, Senchun; Xia, Yuanqing

    2018-03-01

    Community structure, one of the most attention attracting properties in complex networks, has been a cornerstone in advances of various scientific branches. A number of tools have been involved in recent studies concentrating on the community detection algorithms. In this paper, we propose a support vector clustering method based on a proximity graph, owing to which the introduced algorithm surpasses the traditional support vector approach both in accuracy and complexity. Results of extensive experiments undertaken on computer generated networks and real world data sets illustrate competent performances in comparison with the other counterparts.

  4. Galaxy clusters in the cosmic web

    Science.gov (United States)

    Acebrón, A.; Durret, F.; Martinet, N.; Adami, C.; Guennou, L.

    2014-12-01

    Simulations of large scale structure formation in the universe predict that matter is essentially distributed along filaments at the intersection of which lie galaxy clusters. We have analysed 9 clusters in the redshift range 0.4DAFT/FADA survey, which combines deep large field multi-band imaging and spectroscopic data, in order to detect filaments and/or structures around these clusters. Based on colour-magnitude diagrams, we have selected the galaxies likely to be in the cluster redshift range and studied their spatial distribution. We detect a number of structures and filaments around several clusters, proving that colour-magnitude diagrams are a reliable method to detect filaments around galaxy clusters. Since this method excludes blue (spiral) galaxies at the cluster redshift, we also apply the LePhare software to compute photometric redshifts from BVRIZ images to select galaxy cluster members and study their spatial distribution. We then find that, if only galaxies classified as early-type by LePhare are considered, we obtain the same distribution than with a red sequence selection, while taking into account late-type galaxies just pollutes the background level and deteriorates our detections. The photometric redshift based method therefore does not provide any additional information.

  5. A spatial scan statistic for nonisotropic two-level risk cluster.

    Science.gov (United States)

    Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie

    2012-01-30

    Spatial scan statistic methods are commonly used for geographical disease surveillance and cluster detection. The standard spatial scan statistic does not model any variability in the underlying risks of subregions belonging to a detected cluster. For a multilevel risk cluster, the isotonic spatial scan statistic could model a centralized high-risk kernel in the cluster. Because variations in disease risks are anisotropic owing to different social, economical, or transport factors, the real high-risk kernel will not necessarily take the central place in a whole cluster area. We propose a spatial scan statistic for a nonisotropic two-level risk cluster, which could be used to detect a whole cluster and a noncentralized high-risk kernel within the cluster simultaneously. The performance of the three methods was evaluated through an intensive simulation study. Our proposed nonisotropic two-level method showed better power and geographical precision with two-level risk cluster scenarios, especially for a noncentralized high-risk kernel. Our proposed method is illustrated using the hand-foot-mouth disease data in Pingdu City, Shandong, China in May 2009, compared with two other methods. In this practical study, the nonisotropic two-level method is the only way to precisely detect a high-risk area in a detected whole cluster. Copyright © 2011 John Wiley & Sons, Ltd.

  6. Penalized likelihood and multi-objective spatial scans for the detection and inference of irregular clusters

    Directory of Open Access Journals (Sweden)

    Fonseca Carlos M

    2010-10-01

    Full Text Available Abstract Background Irregularly shaped spatial clusters are difficult to delineate. A cluster found by an algorithm often spreads through large portions of the map, impacting its geographical meaning. Penalized likelihood methods for Kulldorff's spatial scan statistics have been used to control the excessive freedom of the shape of clusters. Penalty functions based on cluster geometry and non-connectivity have been proposed recently. Another approach involves the use of a multi-objective algorithm to maximize two objectives: the spatial scan statistics and the geometric penalty function. Results & Discussion We present a novel scan statistic algorithm employing a function based on the graph topology to penalize the presence of under-populated disconnection nodes in candidate clusters, the disconnection nodes cohesion function. A disconnection node is defined as a region within a cluster, such that its removal disconnects the cluster. By applying this function, the most geographically meaningful clusters are sifted through the immense set of possible irregularly shaped candidate cluster solutions. To evaluate the statistical significance of solutions for multi-objective scans, a statistical approach based on the concept of attainment function is used. In this paper we compared different penalized likelihoods employing the geometric and non-connectivity regularity functions and the novel disconnection nodes cohesion function. We also build multi-objective scans using those three functions and compare them with the previous penalized likelihood scans. An application is presented using comprehensive state-wide data for Chagas' disease in puerperal women in Minas Gerais state, Brazil. Conclusions We show that, compared to the other single-objective algorithms, multi-objective scans present better performance, regarding power, sensitivity and positive predicted value. The multi-objective non-connectivity scan is faster and better suited for the

  7. Cluster size statistic and cluster mass statistic: two novel methods for identifying changes in functional connectivity between groups or conditions.

    Science.gov (United States)

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS) and cluster mass statistic (CMS)--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.

  8. A Novel Fusion-Based Ship Detection Method from Pol-SAR Images

    Directory of Open Access Journals (Sweden)

    Wenguang Wang

    2015-09-01

    Full Text Available A novel fusion-based ship detection method from polarimetric Synthetic Aperture Radar (Pol-SAR images is proposed in this paper. After feature extraction and constant false alarm rate (CFAR detection, the detection results of HH channel, diplane scattering by Pauli decomposition and helical factor by Barnes decomposition are fused together. The confirmed targets and potential target pixels can be obtained after the fusion process. Using the difference degree of the target, potential target pixels can be classified. The fusion-based ship detection method works accurately by utilizing three different features comprehensively. The result of applying the technique to measured Airborne Synthetic Radar (AIRSAR data shows that the novel detection method can achieve better performance in both ship’s detection and ship’s shape preservation compared to the result of K-means clustering method and the Notch Filter method.

  9. Lane Detection in Video-Based Intelligent Transportation Monitoring via Fast Extracting and Clustering of Vehicle Motion Trajectories

    Directory of Open Access Journals (Sweden)

    Jianqiang Ren

    2014-01-01

    Full Text Available Lane detection is a crucial process in video-based transportation monitoring system. This paper proposes a novel method to detect the lane center via rapid extraction and high accuracy clustering of vehicle motion trajectories. First, we use the activity map to realize automatically the extraction of road region, the calibration of dynamic camera, and the setting of three virtual detecting lines. Secondly, the three virtual detecting lines and a local background model with traffic flow feedback are used to extract and group vehicle feature points in unit of vehicle. Then, the feature point groups are described accurately by edge weighted dynamic graph and modified by a motion-similarity Kalman filter during the sparse feature point tracking. After obtaining the vehicle trajectories, a rough k-means incremental clustering with Hausdorff distance is designed to realize the rapid online extraction of lane center with high accuracy. The use of rough set reduces effectively the accuracy decrease, which results from the trajectories that run irregularly. Experimental results prove that the proposed method can detect lane center position efficiently, the affected time of subsequent tasks can be reduced obviously, and the safety of traffic surveillance systems can be enhanced significantly.

  10. Indirect photometric detection of boron cluster anions electrophoretically separated in methanol.

    Science.gov (United States)

    Vítová, Lada; Fojt, Lukáš; Vespalec, Radim

    2014-04-18

    3,5-Dinitrobenzoate and picrate are light absorbing anions pertinent to indirect photometric detection of boron cluster anions in buffered methanolic background electrolytes (BGEs). Tris(hydroxymethyl)aminomethane and morpholine have been used as buffering bases, which eliminated baseline steps, and minimized the baseline noise. In methanolic BGEs, mobilities of boron cluster anions depend on both ionic constituents of the BGE buffer. This dependence can be explained by ion pair interaction of detected anions with BGE cations, which are not bonded into ion pairs with the BGE anions. The former ion pair interaction decreases sensitivity of the indirect photometric detection. Copyright © 2014 Elsevier B.V. All rights reserved.

  11. Threshold selection for classification of MR brain images by clustering method

    Energy Technology Data Exchange (ETDEWEB)

    Moldovanu, Simona [Faculty of Sciences and Environment, Department of Chemistry, Physics and Environment, Dunărea de Jos University of Galaţi, 47 Domnească St., 800008, Romania, Phone: +40 236 460 780 (Romania); Dumitru Moţoc High School, 15 Milcov St., 800509, Galaţi (Romania); Obreja, Cristian; Moraru, Luminita, E-mail: luminita.moraru@ugal.ro [Faculty of Sciences and Environment, Department of Chemistry, Physics and Environment, Dunărea de Jos University of Galaţi, 47 Domnească St., 800008, Romania, Phone: +40 236 460 780 (Romania)

    2015-12-07

    Given a grey-intensity image, our method detects the optimal threshold for a suitable binarization of MR brain images. In MR brain image processing, the grey levels of pixels belonging to the object are not substantially different from the grey levels belonging to the background. Threshold optimization is an effective tool to separate objects from the background and further, in classification applications. This paper gives a detailed investigation on the selection of thresholds. Our method does not use the well-known method for binarization. Instead, we perform a simple threshold optimization which, in turn, will allow the best classification of the analyzed images into healthy and multiple sclerosis disease. The dissimilarity (or the distance between classes) has been established using the clustering method based on dendrograms. We tested our method using two classes of images: the first consists of 20 T2-weighted and 20 proton density PD-weighted scans from two healthy subjects and from two patients with multiple sclerosis. For each image and for each threshold, the number of the white pixels (or the area of white objects in binary image) has been determined. These pixel numbers represent the objects in clustering operation. The following optimum threshold values are obtained, T = 80 for PD images and T = 30 for T2w images. Each mentioned threshold separate clearly the clusters that belonging of the studied groups, healthy patient and multiple sclerosis disease.

  12. A Trajectory Regression Clustering Technique Combining a Novel Fuzzy C-Means Clustering Algorithm with the Least Squares Method

    Directory of Open Access Journals (Sweden)

    Xiangbing Zhou

    2018-04-01

    Full Text Available Rapidly growing GPS (Global Positioning System trajectories hide much valuable information, such as city road planning, urban travel demand, and population migration. In order to mine the hidden information and to capture better clustering results, a trajectory regression clustering method (an unsupervised trajectory clustering method is proposed to reduce local information loss of the trajectory and to avoid getting stuck in the local optimum. Using this method, we first define our new concept of trajectory clustering and construct a novel partitioning (angle-based partitioning method of line segments; second, the Lagrange-based method and Hausdorff-based K-means++ are integrated in fuzzy C-means (FCM clustering, which are used to maintain the stability and the robustness of the clustering process; finally, least squares regression model is employed to achieve regression clustering of the trajectory. In our experiment, the performance and effectiveness of our method is validated against real-world taxi GPS data. When comparing our clustering algorithm with the partition-based clustering algorithms (K-means, K-median, and FCM, our experimental results demonstrate that the presented method is more effective and generates a more reasonable trajectory.

  13. A density-based clustering model for community detection in complex networks

    Science.gov (United States)

    Zhao, Xiang; Li, Yantao; Qu, Zehui

    2018-04-01

    Network clustering (or graph partitioning) is an important technique for uncovering the underlying community structures in complex networks, which has been widely applied in various fields including astronomy, bioinformatics, sociology, and bibliometric. In this paper, we propose a density-based clustering model for community detection in complex networks (DCCN). The key idea is to find group centers with a higher density than their neighbors and a relatively large integrated-distance from nodes with higher density. The experimental results indicate that our approach is efficient and effective for community detection of complex networks.

  14. Clustering and Recurring Anomaly Identification: Recurring Anomaly Detection System (ReADS)

    Science.gov (United States)

    McIntosh, Dawn

    2006-01-01

    This viewgraph presentation reviews the Recurring Anomaly Detection System (ReADS). The Recurring Anomaly Detection System is a tool to analyze text reports, such as aviation reports and maintenance records: (1) Text clustering algorithms group large quantities of reports and documents; Reduces human error and fatigue (2) Identifies interconnected reports; Automates the discovery of possible recurring anomalies; (3) Provides a visualization of the clusters and recurring anomalies We have illustrated our techniques on data from Shuttle and ISS discrepancy reports, as well as ASRS data. ReADS has been integrated with a secure online search

  15. Scale invariant SURF detector and automatic clustering segmentation for infrared small targets detection

    Science.gov (United States)

    Zhang, Haiying; Bai, Jiaojiao; Li, Zhengjie; Liu, Yan; Liu, Kunhong

    2017-06-01

    The detection and discrimination of infrared small dim targets is a challenge in automatic target recognition (ATR), because there is no salient information of size, shape and texture. Many researchers focus on mining more discriminative information of targets in temporal-spatial. However, such information may not be available with the change of imaging environments, and the targets size and intensity keep changing in different imaging distance. So in this paper, we propose a novel research scheme using density-based clustering and backtracking strategy. In this scheme, the speeded up robust feature (SURF) detector is applied to capture candidate targets in single frame at first. And then, these points are mapped into one frame, so that target traces form a local aggregation pattern. In order to isolate the targets from noises, a newly proposed density-based clustering algorithm, fast search and find of density peak (FSFDP for short), is employed to cluster targets by the spatial intensive distribution. Two important factors of the algorithm, percent and γ , are exploited fully to determine the clustering scale automatically, so as to extract the trace with highest clutter suppression ratio. And at the final step, a backtracking algorithm is designed to detect and discriminate target trace as well as to eliminate clutter. The consistence and continuity of the short-time target trajectory in temporal-spatial is incorporated into the bounding function to speed up the pruning. Compared with several state-of-arts methods, our algorithm is more effective for the dim targets with lower signal-to clutter ratio (SCR). Furthermore, it avoids constructing the candidate target trajectory searching space, so its time complexity is limited to a polynomial level. The extensive experimental results show that it has superior performance in probability of detection (Pd) and false alarm suppressing rate aiming at variety of complex backgrounds.

  16. A clustering based method to evaluate soil corrosivity for pipeline external integrity management

    International Nuclear Information System (INIS)

    Yajima, Ayako; Wang, Hui; Liang, Robert Y.; Castaneda, Homero

    2015-01-01

    One important category of transportation infrastructure is underground pipelines. Corrosion of these buried pipeline systems may cause pipeline failures with the attendant hazards of property loss and fatalities. Therefore, developing the capability to estimate the soil corrosivity is important for designing and preserving materials and for risk assessment. The deterioration rate of metal is highly influenced by the physicochemical characteristics of a material and the environment of its surroundings. In this study, the field data obtained from the southeast region of Mexico was examined using various data mining techniques to determine the usefulness of these techniques for clustering soil corrosivity level. Specifically, the soil was classified into different corrosivity level clusters by k-means and Gaussian mixture model (GMM). In terms of physical space, GMM shows better separability; therefore, the distributions of the material loss of the buried petroleum pipeline walls were estimated via the empirical density within GMM clusters. The soil corrosivity levels of the clusters were determined based on the medians of metal loss. The proposed clustering method was demonstrated to be capable of classifying the soil into different levels of corrosivity severity. - Highlights: • The clustering approach is applied to the data extracted from a real-life pipeline system. • Soil properties in the right-of-way are analyzed via clustering techniques to assess corrosivity. • GMM is selected as the preferred method for detecting the hidden pattern of in-situ data. • K–W test is performed for significant difference of corrosivity level between clusters

  17. THE DETECTION AND STATISTICS OF GIANT ARCS BEHIND CLASH CLUSTERS

    International Nuclear Information System (INIS)

    Xu, Bingxiao; Zheng, Wei; Postman, Marc; Bradley, Larry; Meneghetti, Massimo; Koekemoer, Anton; Seitz, Stella; Zitrin, Adi; Merten, Julian; Maoz, Dani; Frye, Brenda; Umetsu, Keiichi; Vega, Jesus

    2016-01-01

    We developed an algorithm to find and characterize gravitationally lensed galaxies (arcs) to perform a comparison of the observed and simulated arc abundance. Observations are from the Cluster Lensing And Supernova survey with Hubble (CLASH). Simulated CLASH images are created using the MOKA package and also clusters selected from the high-resolution, hydrodynamical simulations, MUSIC, over the same mass and redshift range as the CLASH sample. The algorithm's arc elongation accuracy, completeness, and false positive rate are determined and used to compute an estimate of the true arc abundance. We derive a lensing efficiency of 4 ± 1 arcs (with length ≥6″ and length-to-width ratio ≥7) per cluster for the X-ray-selected CLASH sample, 4 ± 1 arcs per cluster for the MOKA-simulated sample, and 3 ± 1 arcs per cluster for the MUSIC-simulated sample. The observed and simulated arc statistics are in full agreement. We measure the photometric redshifts of all detected arcs and find a median redshift z s = 1.9 with 33% of the detected arcs having z s  > 3. We find that the arc abundance does not depend strongly on the source redshift distribution but is sensitive to the mass distribution of the dark matter halos (e.g., the c–M relation). Our results show that consistency between the observed and simulated distributions of lensed arc sizes and axial ratios can be achieved by using cluster-lensing simulations that are carefully matched to the selection criteria used in the observations

  18. The Detection and Statistics of Giant Arcs behind CLASH Clusters

    Science.gov (United States)

    Xu, Bingxiao; Postman, Marc; Meneghetti, Massimo; Seitz, Stella; Zitrin, Adi; Merten, Julian; Maoz, Dani; Frye, Brenda; Umetsu, Keiichi; Zheng, Wei; Bradley, Larry; Vega, Jesus; Koekemoer, Anton

    2016-02-01

    We developed an algorithm to find and characterize gravitationally lensed galaxies (arcs) to perform a comparison of the observed and simulated arc abundance. Observations are from the Cluster Lensing And Supernova survey with Hubble (CLASH). Simulated CLASH images are created using the MOKA package and also clusters selected from the high-resolution, hydrodynamical simulations, MUSIC, over the same mass and redshift range as the CLASH sample. The algorithm's arc elongation accuracy, completeness, and false positive rate are determined and used to compute an estimate of the true arc abundance. We derive a lensing efficiency of 4 ± 1 arcs (with length ≥6″ and length-to-width ratio ≥7) per cluster for the X-ray-selected CLASH sample, 4 ± 1 arcs per cluster for the MOKA-simulated sample, and 3 ± 1 arcs per cluster for the MUSIC-simulated sample. The observed and simulated arc statistics are in full agreement. We measure the photometric redshifts of all detected arcs and find a median redshift zs = 1.9 with 33% of the detected arcs having zs > 3. We find that the arc abundance does not depend strongly on the source redshift distribution but is sensitive to the mass distribution of the dark matter halos (e.g., the c-M relation). Our results show that consistency between the observed and simulated distributions of lensed arc sizes and axial ratios can be achieved by using cluster-lensing simulations that are carefully matched to the selection criteria used in the observations.

  19. Detection of wood failure by image processing method: influence of algorithm, adhesive and wood species

    Science.gov (United States)

    Lanying Lin; Sheng He; Feng Fu; Xiping Wang

    2015-01-01

    Wood failure percentage (WFP) is an important index for evaluating the bond strength of plywood. Currently, the method used for detecting WFP is visual inspection, which lacks efficiency. In order to improve it, image processing methods are applied to wood failure detection. The present study used thresholding and K-means clustering algorithms in wood failure detection...

  20. MANNER OF STOCKS SORTING USING CLUSTER ANALYSIS METHODS

    Directory of Open Access Journals (Sweden)

    Jana Halčinová

    2014-06-01

    Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.

  1. Clustering Methods Application for Customer Segmentation to Manage Advertisement Campaign

    Directory of Open Access Journals (Sweden)

    Maciej Kutera

    2010-10-01

    Full Text Available Clustering methods are recently so advanced elaborated algorithms for large collection data analysis that they have been already included today to data mining methods. Clustering methods are nowadays larger and larger group of methods, very quickly evolving and having more and more various applications. In the article, our research concerning usefulness of clustering methods in customer segmentation to manage advertisement campaign is presented. We introduce results obtained by using four selected methods which have been chosen because their peculiarities suggested their applicability to our purposes. One of the analyzed method k-means clustering with random selected initial cluster seeds gave very good results in customer segmentation to manage advertisement campaign and these results were presented in details in the article. In contrast one of the methods (hierarchical average linkage was found useless in customer segmentation. Further investigations concerning benefits of clustering methods in customer segmentation to manage advertisement campaign is worth continuing, particularly that finding solutions in this field can give measurable profits for marketing activity.

  2. Big Data Clustering via Community Detection and Hyperbolic Network Embedding in IoT Applications.

    Science.gov (United States)

    Karyotis, Vasileios; Tsitseklis, Konstantinos; Sotiropoulos, Konstantinos; Papavassiliou, Symeon

    2018-04-15

    In this paper, we present a novel data clustering framework for big sensory data produced by IoT applications. Based on a network representation of the relations among multi-dimensional data, data clustering is mapped to node clustering over the produced data graphs. To address the potential very large scale of such datasets/graphs that test the limits of state-of-the-art approaches, we map the problem of data clustering to a community detection one over the corresponding data graphs. Specifically, we propose a novel computational approach for enhancing the traditional Girvan-Newman (GN) community detection algorithm via hyperbolic network embedding. The data dependency graph is embedded in the hyperbolic space via Rigel embedding, allowing more efficient computation of edge-betweenness centrality needed in the GN algorithm. This allows for more efficient clustering of the nodes of the data graph in terms of modularity, without sacrificing considerable accuracy. In order to study the operation of our approach with respect to enhancing GN community detection, we employ various representative types of artificial complex networks, such as scale-free, small-world and random geometric topologies, and frequently-employed benchmark datasets for demonstrating its efficacy in terms of data clustering via community detection. Furthermore, we provide a proof-of-concept evaluation by applying the proposed framework over multi-dimensional datasets obtained from an operational smart-city/building IoT infrastructure provided by the Federated Interoperable Semantic IoT/cloud Testbeds and Applications (FIESTA-IoT) testbed federation. It is shown that the proposed framework can be indeed used for community detection/data clustering and exploited in various other IoT applications, such as performing more energy-efficient smart-city/building sensing.

  3. Progress on clustered DNA damage in radiation research

    International Nuclear Information System (INIS)

    Yang Li'na; Zhang Hong; Di Cuixia; Zhang Qiuning; Wang Xiaohu

    2012-01-01

    Clustered DNA damage which caused by high LET heavy ion radiation can lead to mutation, tumorigenesis and apoptosis. Promoting apoptosis of cancer cells is always the basis of cancer treatment. Clustered DNA damage has been the hot topic in radiobiology. The detect method is diversity, but there is not a detail and complete protocol to analyze clustered DNA damage. In order to provide reference for clustered DNA damage in the radiotherapy study, the clustered DNA damage characteristics, the latest progresses on clustered DNA damage and the detecting methods are reviewed and discussed in detail in this paper. (authors)

  4. A simple and fast method to determine the parameters for fuzzy c-means cluster analysis

    DEFF Research Database (Denmark)

    Schwämmle, Veit; Jensen, Ole Nørregaard

    2010-01-01

    MOTIVATION: Fuzzy c-means clustering is widely used to identify cluster structures in high-dimensional datasets, such as those obtained in DNA microarray and quantitative proteomics experiments. One of its main limitations is the lack of a computationally fast method to set optimal values...... of algorithm parameters. Wrong parameter values may either lead to the inclusion of purely random fluctuations in the results or ignore potentially important data. The optimal solution has parameter values for which the clustering does not yield any results for a purely random dataset but which detects cluster...... formation with maximum resolution on the edge of randomness. RESULTS: Estimation of the optimal parameter values is achieved by evaluation of the results of the clustering procedure applied to randomized datasets. In this case, the optimal value of the fuzzifier follows common rules that depend only...

  5. Comparing the performance of biomedical clustering methods

    DEFF Research Database (Denmark)

    Wiwie, Christian; Baumbach, Jan; Röttger, Richard

    2015-01-01

    expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future......Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene....... This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide...

  6. The NIDS Cluster: Scalable, Stateful Network Intrusion Detection on Commodity Hardware

    Energy Technology Data Exchange (ETDEWEB)

    Tierney, Brian L; Vallentin, Matthias; Sommer, Robin; Lee, Jason; Leres, Craig; Paxson, Vern; Tierney, Brian

    2007-09-19

    In this work we present a NIDS cluster as a scalable solution for realizing high-performance, stateful network intrusion detection on commodity hardware. The design addresses three challenges: (i) distributing traffic evenly across an extensible set of analysis nodes in a fashion that minimizes the communication required for coordination, (ii) adapting the NIDS's operation to support coordinating its low-level analysis rather than just aggregating alerts; and (iii) validating that the cluster produces sound results. Prototypes of our NIDS cluster now operate at the Lawrence Berkeley National Laboratory and the University of California at Berkeley. In both environments the clusters greatly enhance the power of the network security monitoring.

  7. THE DETECTION AND STATISTICS OF GIANT ARCS BEHIND CLASH CLUSTERS

    Energy Technology Data Exchange (ETDEWEB)

    Xu, Bingxiao; Zheng, Wei [Department of Physics and Astronomy, The Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218 (United States); Postman, Marc; Bradley, Larry [Space Telescope Science Institute, 3700 San Martin Drive, Baltimore, MD 21208 (United States); Meneghetti, Massimo; Koekemoer, Anton [INAF, Osservatorio Astronomico di Bologna, and INFN, Sezione di Bologna, Via Ranzani 1, I-40127 Bologna (Italy); Seitz, Stella [Universitaets-Sternwarte, Fakultaet fuer Physik, Ludwig-Maximilians Universitaet Muenchen, Scheinerstr. 1, D-81679 Muenchen (Germany); Zitrin, Adi [California Institute of Technology, MC 249-17, Pasadena, CA 91125 (United States); Merten, Julian [University of Oxford, Department of Physics, Denys Wilkinson Building, Keble Road, Oxford, OX1 3RH (United Kingdom); Maoz, Dani [School of Physics and Astronomy, Tel Aviv University, Tel-Aviv 69978 (Israel); Frye, Brenda [Steward Observatory/Department of Astronomy, University of Arizona, 933 N. Cherry Ave., Tucson, AZ 85721 (United States); Umetsu, Keiichi [Institute of Astronomy and Astrophysics, Academia Sinica, P.O. Box 23-141, Taipei 10617, Taiwan (China); Vega, Jesus, E-mail: bxu6@jhu.edu [Universidad Autonoma de Madrid, Ciudad Universitaria de Cantoblanco, E-28049 Madrid (Spain)

    2016-02-01

    We developed an algorithm to find and characterize gravitationally lensed galaxies (arcs) to perform a comparison of the observed and simulated arc abundance. Observations are from the Cluster Lensing And Supernova survey with Hubble (CLASH). Simulated CLASH images are created using the MOKA package and also clusters selected from the high-resolution, hydrodynamical simulations, MUSIC, over the same mass and redshift range as the CLASH sample. The algorithm's arc elongation accuracy, completeness, and false positive rate are determined and used to compute an estimate of the true arc abundance. We derive a lensing efficiency of 4 ± 1 arcs (with length ≥6″ and length-to-width ratio ≥7) per cluster for the X-ray-selected CLASH sample, 4 ± 1 arcs per cluster for the MOKA-simulated sample, and 3 ± 1 arcs per cluster for the MUSIC-simulated sample. The observed and simulated arc statistics are in full agreement. We measure the photometric redshifts of all detected arcs and find a median redshift z{sub s} = 1.9 with 33% of the detected arcs having z{sub s} > 3. We find that the arc abundance does not depend strongly on the source redshift distribution but is sensitive to the mass distribution of the dark matter halos (e.g., the c–M relation). Our results show that consistency between the observed and simulated distributions of lensed arc sizes and axial ratios can be achieved by using cluster-lensing simulations that are carefully matched to the selection criteria used in the observations.

  8. Detection of the power lines in UAV remote sensed images using spectral-spatial methods.

    Science.gov (United States)

    Bhola, Rishav; Krishna, Nandigam Hari; Ramesh, K N; Senthilnath, J; Anand, Gautham

    2018-01-15

    In this paper, detection of the power lines on images acquired by Unmanned Aerial Vehicle (UAV) based remote sensing is carried out using spectral-spatial methods. Spectral clustering was performed using Kmeans and Expectation Maximization (EM) algorithm to classify the pixels into the power lines and non-power lines. The spectral clustering methods used in this study are parametric in nature, to automate the number of clusters Davies-Bouldin index (DBI) is used. The UAV remote sensed image is clustered into the number of clusters determined by DBI. The k clustered image is merged into 2 clusters (power lines and non-power lines). Further, spatial segmentation was performed using morphological and geometric operations, to eliminate the non-power line regions. In this study, UAV images acquired at different altitudes and angles were analyzed to validate the robustness of the proposed method. It was observed that the EM with spatial segmentation (EM-Seg) performed better than the Kmeans with spatial segmentation (Kmeans-Seg) on most of the UAV images. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. METHOD OF CONSTRUCTION OF GENETIC DATA CLUSTERS

    Directory of Open Access Journals (Sweden)

    N. A. Novoselova

    2016-01-01

    Full Text Available The paper presents a method of construction of genetic data clusters (functional modules using the randomized matrices. To build the functional modules the selection and analysis of the eigenvalues of the gene profiles correlation matrix is performed. The principal components, corresponding to the eigenvalues, which are significantly different from those obtained for the randomly generated correlation matrix, are used for the analysis. Each selected principal component forms gene cluster. In a comparative experiment with the analogs the proposed method shows the advantage in allocating statistically significant different-sized clusters, the ability to filter non- informative genes and to extract the biologically interpretable functional modules matching the real data structure.

  10. Big Data Clustering via Community Detection and Hyperbolic Network Embedding in IoT Applications

    Directory of Open Access Journals (Sweden)

    Vasileios Karyotis

    2018-04-01

    Full Text Available In this paper, we present a novel data clustering framework for big sensory data produced by IoT applications. Based on a network representation of the relations among multi-dimensional data, data clustering is mapped to node clustering over the produced data graphs. To address the potential very large scale of such datasets/graphs that test the limits of state-of-the-art approaches, we map the problem of data clustering to a community detection one over the corresponding data graphs. Specifically, we propose a novel computational approach for enhancing the traditional Girvan–Newman (GN community detection algorithm via hyperbolic network embedding. The data dependency graph is embedded in the hyperbolic space via Rigel embedding, allowing more efficient computation of edge-betweenness centrality needed in the GN algorithm. This allows for more efficient clustering of the nodes of the data graph in terms of modularity, without sacrificing considerable accuracy. In order to study the operation of our approach with respect to enhancing GN community detection, we employ various representative types of artificial complex networks, such as scale-free, small-world and random geometric topologies, and frequently-employed benchmark datasets for demonstrating its efficacy in terms of data clustering via community detection. Furthermore, we provide a proof-of-concept evaluation by applying the proposed framework over multi-dimensional datasets obtained from an operational smart-city/building IoT infrastructure provided by the Federated Interoperable Semantic IoT/cloud Testbeds and Applications (FIESTA-IoT testbed federation. It is shown that the proposed framework can be indeed used for community detection/data clustering and exploited in various other IoT applications, such as performing more energy-efficient smart-city/building sensing.

  11. Heartbeat detection from a hydraulic bed sensor using a clustering approach.

    Science.gov (United States)

    Rosales, Licet; Skubic, Marjorie; Heise, David; Devaney, Michael J; Schaumburg, Mark

    2012-01-01

    Encouraged by previous performance of a hydraulic bed sensor, this work presents a new hydraulic transducer configuration which improves the system's ability to capture a heartbeat signal from four subjects with different body weight and height, gender, age and cardiac history. It also proposes a new approach for detecting the occurrence of heartbeats from ballistocardiogram (BCG) signals through the use of the k-means clustering algorithm, based on finding the location of the J-peaks. Preliminary testing showed that the new transducer arrangement was able to capture the occurrence of heartbeats for all the participants, and the clustering approach achieved correct heartbeat detection ranging from 98.6 to 100% for three of them. Some considerations are discussed regarding adjustments that can be done in order to increase the correct detection of heartbeats for the participant whose percentage of correct detection ranged from 71.0 to 92.5%.

  12. Fermi Detection of a Luminous gamma-ray Pulsar in a Globular Cluster

    Science.gov (United States)

    Freire, P. C. C.; Abdo, A. A.; Ajello, M.; Allafort, A.; Ballet, J.; Barbiellini, G.; Bastieri, D.; Bechtol, K.; Bellazzini, R.; Blandford, R. D.; hide

    2011-01-01

    We report the Fermi Large Area Telescope detection of gamma -ray (>100 mega-electron volts) pulsations from pulsar J1823--3021A in the globular cluster NGC 6624 with high significance (approx 7 sigma). Its gamma-ray luminosity L (sub 3) = (8:4 +/- 1:6) X 10(exp 34) ergs per second, is the highest observed for any millisecond pulsar (MSP) to date, and it accounts for most of the cluster emission. The non-detection of the cluster in the off-pulse phase implies that its contains < 32 gamma-ray MSPs, not approx 100 as previously estimated. The gamma -ray luminosity indicates that the unusually large rate of change of its period is caused by its intrinsic spin-down. This implies that J1823--3021A has the largest magnetic field and is the youngest MSP ever detected, and that such anomalous objects might be forming at rates comparable to those of the more normal MSPs.

  13. Clustering Methods Application for Customer Segmentation to Manage Advertisement Campaign

    OpenAIRE

    Maciej Kutera; Mirosława Lasek

    2010-01-01

    Clustering methods are recently so advanced elaborated algorithms for large collection data analysis that they have been already included today to data mining methods. Clustering methods are nowadays larger and larger group of methods, very quickly evolving and having more and more various applications. In the article, our research concerning usefulness of clustering methods in customer segmentation to manage advertisement campaign is presented. We introduce results obtained by using four sel...

  14. Hough transform for clustered microcalcifications detection in full-field digital mammograms

    Science.gov (United States)

    Fanizzi, A.; Basile, T. M. A.; Losurdo, L.; Amoroso, N.; Bellotti, R.; Bottigli, U.; Dentamaro, R.; Didonna, V.; Fausto, A.; Massafra, R.; Moschetta, M.; Tamborra, P.; Tangaro, S.; La Forgia, D.

    2017-09-01

    Many screening programs use mammography as principal diagnostic tool for detecting breast cancer at a very early stage. Despite the efficacy of the mammograms in highlighting breast diseases, the detection of some lesions is still doubtless for radiologists. In particular, the extremely minute and elongated salt-like particles of microcalcifications are sometimes no larger than 0.1 mm and represent approximately half of all cancer detected by means of mammograms. Hence the need for automatic tools able to support radiologists in their work. Here, we propose a computer assisted diagnostic tool to support radiologists in identifying microcalcifications in full (native) digital mammographic images. The proposed CAD system consists of a pre-processing step, that improves contrast and reduces noise by applying Sobel edge detection algorithm and Gaussian filter, followed by a microcalcification detection step performed by exploiting the circular Hough transform. The procedure performance was tested on 200 images coming from the Breast Cancer Digital Repository (BCDR), a publicly available database. The automatically detected clusters of microcalcifications were evaluated by skilled radiologists which asses the validity of the correctly identified regions of interest as well as the system error in case of missed clustered microcalcifications. The system performance was evaluated in terms of Sensitivity and False Positives per images (FPi) rate resulting comparable to the state-of-art approaches. The proposed model was able to accurately predict the microcalcification clusters obtaining performances (sensibility = 91.78% and FPi rate = 3.99) which favorably compare to other state-of-the-art approaches.

  15. Clustering Multiple Sclerosis Subgroups with Multifractal Methods and Self-Organizing Map Algorithm

    Science.gov (United States)

    Karaca, Yeliz; Cattani, Carlo

    Magnetic resonance imaging (MRI) is the most sensitive method to detect chronic nervous system diseases such as multiple sclerosis (MS). In this paper, Brownian motion Hölder regularity functions (polynomial, periodic (sine), exponential) for 2D image, such as multifractal methods were applied to MR brain images, aiming to easily identify distressed regions, in MS patients. With these regions, we have proposed an MS classification based on the multifractal method by using the Self-Organizing Map (SOM) algorithm. Thus, we obtained a cluster analysis by identifying pixels from distressed regions in MR images through multifractal methods and by diagnosing subgroups of MS patients through artificial neural networks.

  16. A comparison of three clustering methods for finding subgroups in MRI, SMS or clinical data

    DEFF Research Database (Denmark)

    Kent, Peter; Jensen, Rikke K; Kongsted, Alice

    2014-01-01

    ). There is a scarcity of head-to-head comparisons that can inform the choice of which clustering method might be suitable for particular clinical datasets and research questions. Therefore, the aim of this study was to perform a head-to-head comparison of three commonly available methods (SPSS TwoStep CA, Latent Gold...... LCA and SNOB LCA). METHODS: The performance of these three methods was compared: (i) quantitatively using the number of subgroups detected, the classification probability of individuals into subgroups, the reproducibility of results, and (ii) qualitatively using subjective judgments about each program...... classify individuals into those subgroups. CONCLUSIONS: Our subjective judgement was that Latent Gold offered the best balance of sensitivity to subgroups, ease of use and presentation of results with these datasets but we recognise that different clustering methods may suit other types of data...

  17. Determining wood chip size: image analysis and clustering methods

    Directory of Open Access Journals (Sweden)

    Paolo Febbi

    2013-09-01

    Full Text Available One of the standard methods for the determination of the size distribution of wood chips is the oscillating screen method (EN 15149- 1:2010. Recent literature demonstrated how image analysis could return highly accurate measure of the dimensions defined for each individual particle, and could promote a new method depending on the geometrical shape to determine the chip size in a more accurate way. A sample of wood chips (8 litres was sieved through horizontally oscillating sieves, using five different screen hole diameters (3.15, 8, 16, 45, 63 mm; the wood chips were sorted in decreasing size classes and the mass of all fractions was used to determine the size distribution of the particles. Since the chip shape and size influence the sieving results, Wang’s theory, which concerns the geometric forms, was considered. A cluster analysis on the shape descriptors (Fourier descriptors and size descriptors (area, perimeter, Feret diameters, eccentricity was applied to observe the chips distribution. The UPGMA algorithm was applied on Euclidean distance. The obtained dendrogram shows a group separation according with the original three sieving fractions. A comparison has been made between the traditional sieve and clustering results. This preliminary result shows how the image analysis-based method has a high potential for the characterization of wood chip size distribution and could be further investigated. Moreover, this method could be implemented in an online detection machine for chips size characterization. An improvement of the results is expected by using supervised multivariate methods that utilize known class memberships. The main objective of the future activities will be to shift the analysis from a 2-dimensional method to a 3- dimensional acquisition process.

  18. Locating irregularly shaped clusters of infection intensity

    Directory of Open Access Journals (Sweden)

    Niko Yiannakoulias

    2010-05-01

    Full Text Available Patterns of disease may take on irregular geographic shapes, especially when features of the physical environment influence risk. Identifying these patterns can be important for planning, and also identifying new environmental or social factors associated with high or low risk of illness. Until recently, cluster detection methods were limited in their ability to detect irregular spatial patterns, and limited to finding clusters that were roughly circular in shape. This approach has less power to detect irregularly-shaped, yet important spatial anomalies, particularly at high spatial resolutions. We employ a new method of finding irregularly-shaped spatial clusters at micro-geographical scales using both simulated and real data on Schistosoma mansoni and hookworm infection intensities. This method, which we refer to as the “greedy growth scan”, is a modification of the spatial scan method for cluster detection. Real data are based on samples of hookworm and S. mansoni from Kitengei, Makueni district, Kenya. Our analysis of simulated data shows how methods able to find irregular shapes are more likely to identify clusters along rivers than methods constrained to fixed geometries. Our analysis of infection intensity identifies two small areas within the study region in which infection intensity is elevated, possibly due to local features of the physical or social environment. Collectively, our results show that the “greedy growth scan” is a suitable method for exploratory geographical analysis of infection intensity data when irregular shapes are suspected, especially at micro-geographical scales.

  19. Gold atomic cluster mediated electrochemical aptasensor for the detection of lipopolysaccharide.

    Science.gov (United States)

    Posha, Biyas; Nambiar, Sindhu R; Sandhyarani, N

    2018-03-15

    We have constructed an aptamer immobilized gold atomic cluster mediated, ultrasensitive electrochemical biosensor (Apt/AuAC/Au) for LPS detection without any additional signal amplification strategy. The aptamer self-assemble onto the gold atomic clusters makes Apt/AuAC/Au an excellent platform for the LPS detection. Differential pulse voltammetry and EIS were used for the quantitative LPS detection. The Apt/AuAC/Au sensor offers an ultrasensitive and selective detection of LPS down to 7.94 × 10 -21 M level with a wide dynamic range from 0.01 attomolar to 1pM. The sensor exhibited excellent selectivity and stability. The real sample analysis was performed by spiking the diluted insulin sample with various concentration of LPS and obtained recovery within 2% error value. The sensor is found to be more sensitive than most of the literature reports. The simple and easy way of construction of this sensor provides an efficient and promising detection of an even trace amount of LPS. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. A possibilistic approach to clustering

    Science.gov (United States)

    Krishnapuram, Raghu; Keller, James M.

    1993-01-01

    Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering methods in that total commitment of a vector to a given class is not required at each image pattern recognition iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from the 'Fuzzy C-Means' (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Recently, we cast the clustering problem into the framework of possibility theory using an approach in which the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.

  1. A Web service substitution method based on service cluster nets

    Science.gov (United States)

    Du, YuYue; Gai, JunJing; Zhou, MengChu

    2017-11-01

    Service substitution is an important research topic in the fields of Web services and service-oriented computing. This work presents a novel method to analyse and substitute Web services. A new concept, called a Service Cluster Net Unit, is proposed based on Web service clusters. A service cluster is converted into a Service Cluster Net Unit. Then it is used to analyse whether the services in the cluster can satisfy some service requests. Meanwhile, the substitution methods of an atomic service and a composite service are proposed. The correctness of the proposed method is proved, and the effectiveness is shown and compared with the state-of-the-art method via an experiment. It can be readily applied to e-commerce service substitution to meet the business automation needs.

  2. Stability of maximum-likelihood-based clustering methods: exploring the backbone of classifications

    International Nuclear Information System (INIS)

    Mungan, Muhittin; Ramasco, José J

    2010-01-01

    Components of complex systems are often classified according to the way they interact with each other. In graph theory such groups are known as clusters or communities. Many different techniques have been recently proposed to detect them, some of which involve inference methods using either Bayesian or maximum likelihood approaches. In this paper, we study a statistical model designed for detecting clusters based on connection similarity. The basic assumption of the model is that the graph was generated by a certain grouping of the nodes and an expectation maximization algorithm is employed to infer that grouping. We show that the method admits further development to yield a stability analysis of the groupings that quantifies the extent to which each node influences its neighbors' group membership. Our approach naturally allows for the identification of the key elements responsible for the grouping and their resilience to changes in the network. Given the generality of the assumptions underlying the statistical model, such nodes are likely to play special roles in the original system. We illustrate this point by analyzing several empirical networks for which further information about the properties of the nodes is available. The search and identification of stabilizing nodes constitutes thus a novel technique to characterize the relevance of nodes in complex networks

  3. What if LIGO's gravitational wave detections are strongly lensed by massive galaxy clusters?

    Science.gov (United States)

    Smith, Graham P.; Jauzac, Mathilde; Veitch, John; Farr, Will M.; Massey, Richard; Richard, Johan

    2018-04-01

    Motivated by the preponderance of so-called `heavy black holes' in the binary black hole (BBH) gravitational wave (GW) detections to date, and the role that gravitational lensing continues to play in discovering new galaxy populations, we explore the possibility that the GWs are strongly lensed by massive galaxy clusters. For example, if one of the GW sources were actually located at z = 1, then the rest-frame mass of the associated BHs would be reduced by a factor of ˜2. Based on the known populations of BBH GW sources and strong-lensing clusters, we estimate a conservative lower limit on the number of BBH mergers detected per detector year at LIGO/Virgo's current sensitivity that are multiply-imaged, of Rdetect ≃ 10-5 yr-1. This is equivalent to rejecting the hypothesis that one of the BBH GWs detected to date was multiply-imaged at ≲4σ. It is therefore unlikely, but not impossible, that one of the GWs is multiply-imaged. We identify three spectroscopically confirmed strong-lensing clusters with well-constrained mass models within the 90 per cent credible sky localizations of the BBH GWs from LIGO's first observing run. In the event that one of these clusters multiply-imaged one of the BBH GWs, we predict that 20-60 per cent of the putative next appearances of the GWs would be detectable by LIGO, and that they would arrive at Earth within 3yr of first detection.

  4. The relationship between supplier networks and industrial clusters: an analysis based on the cluster mapping method

    Directory of Open Access Journals (Sweden)

    Ichiro IWASAKI

    2010-06-01

    Full Text Available Michael Porter’s concept of competitive advantages emphasizes the importance of regional cooperation of various actors in order to gain competitiveness on globalized markets. Foreign investors may play an important role in forming such cooperation networks. Their local suppliers tend to concentrate regionally. They can form, together with local institutions of education, research, financial and other services, development agencies, the nucleus of cooperative clusters. This paper deals with the relationship between supplier networks and clusters. Two main issues are discussed in more detail: the interest of multinational companies in entering regional clusters and the spillover effects that may stem from their participation. After the discussion on the theoretical background, the paper introduces a relatively new analytical method: “cluster mapping” - a method that can spot regional hot spots of specific economic activities with cluster building potential. Experience with the method was gathered in the US and in the European Union. After the discussion on the existing empirical evidence, the authors introduce their own cluster mapping results, which they obtained by using a refined version of the original methodology.

  5. Single pass kernel k-means clustering method

    Indian Academy of Sciences (India)

    paper proposes a simple and faster version of the kernel k-means clustering ... It has been considered as an important tool ... On the other hand, kernel-based clustering methods, like kernel k-means clus- ..... able at the UCI machine learning repository (Murphy 1994). ... All the data sets have only numeric valued features.

  6. Clustering Methods with Qualitative Data: a Mixed-Methods Approach for Prevention Research with Small Samples.

    Science.gov (United States)

    Henry, David; Dymnicki, Allison B; Mohatt, Nathaniel; Allen, James; Kelly, James G

    2015-10-01

    Qualitative methods potentially add depth to prevention research but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed-methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed-methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-means clustering, and latent class analysis produced similar levels of accuracy with binary data and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a "real-world" example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities.

  7. Clustering Methods with Qualitative Data: A Mixed Methods Approach for Prevention Research with Small Samples

    Science.gov (United States)

    Henry, David; Dymnicki, Allison B.; Mohatt, Nathaniel; Allen, James; Kelly, James G.

    2016-01-01

    Qualitative methods potentially add depth to prevention research, but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data, but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-Means clustering, and latent class analysis produced similar levels of accuracy with binary data, and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a “real-world” example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities. PMID:25946969

  8. The polarizable embedding coupled cluster method

    DEFF Research Database (Denmark)

    Sneskov, Kristian; Schwabe, Tobias; Kongsted, Jacob

    2011-01-01

    We formulate a new combined quantum mechanics/molecular mechanics (QM/MM) method based on a self-consistent polarizable embedding (PE) scheme. For the description of the QM region, we apply the popular coupled cluster (CC) method detailing the inclusion of electrostatic and polarization effects...

  9. Computer aided detection of clusters of microcalcifications on full field digital mammograms

    International Nuclear Information System (INIS)

    Ge Jun; Sahiner, Berkman; Hadjiiski, Lubomir M.; Chan, H.-P.; Wei Jun; Helvie, Mark A.; Zhou Chuan

    2006-01-01

    We are developing a computer-aided detection (CAD) system to identify microcalcification clusters (MCCs) automatically on full field digital mammograms (FFDMs). The CAD system includes six stages: preprocessing; image enhancement; segmentation of microcalcification candidates; false positive (FP) reduction for individual microcalcifications; regional clustering; and FP reduction for clustered microcalcifications. At the stage of FP reduction for individual microcalcifications, a truncated sum-of-squares error function was used to improve the efficiency and robustness of the training of an artificial neural network in our CAD system for FFDMs. At the stage of FP reduction for clustered microcalcifications, morphological features and features derived from the artificial neural network outputs were extracted from each cluster. Stepwise linear discriminant analysis (LDA) was used to select the features. An LDA classifier was then used to differentiate clustered microcalcifications from FPs. A data set of 96 cases with 192 images was collected at the University of Michigan. This data set contained 96 MCCs, of which 28 clusters were proven by biopsy to be malignant and 68 were proven to be benign. The data set was separated into two independent data sets for training and testing of the CAD system in a cross-validation scheme. When one data set was used to train and validate the convolution neural network (CNN) in our CAD system, the other data set was used to evaluate the detection performance. With the use of a truncated error metric, the training of CNN could be accelerated and the classification performance was improved. The CNN in combination with an LDA classifier could substantially reduce FPs with a small tradeoff in sensitivity. By using the free-response receiver operating characteristic methodology, it was found that our CAD system can achieve a cluster-based sensitivity of 70, 80, and 90 % at 0.21, 0.61, and 1.49 FPs/image, respectively. For case

  10. Atomic and electronic structure of clusters from car-Parrinello method

    International Nuclear Information System (INIS)

    Kumar, V.

    1994-06-01

    With the development of ab-initio molecular dynamics method, it has now become possible to study the static and dynamical properties of clusters containing up to a few tens of atoms. Here I present a review of the method within the framework of the density functional theory and pseudopotential approach to represent the electron-ion interaction and discuss some of its applications to clusters. Particular attention is focussed on the structure and bonding properties of clusters as a function of their size. Applications to clusters of alkali metals and Al, non-metal - metal transition in divalent metal clusters, molecular clusters of carbon and Sb are discussed in detail. Some results are also presented on mixed clusters. (author). 121 refs, 24 ifigs

  11. De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units

    Directory of Open Access Journals (Sweden)

    Sarah L. Westcott

    2015-12-01

    VSEARCH have a high level of sensitivity to detect reference sequences, the specificity of those matches was poor relative to the true best match.Discussion. Our analysis calls into question the quality and stability of OTU assignments generated by the open and closed-reference methods as implemented in current version of QIIME. This study demonstrates that de novo methods are the optimal method of assigning sequences into OTUs and that the quality of these assignments needs to be assessed for multiple methods to identify the optimal clustering method for a particular dataset.

  12. Substructure in clusters of galaxies

    International Nuclear Information System (INIS)

    Fitchett, M.J.

    1988-01-01

    Optical observations suggesting the existence of substructure in clusters of galaxies are examined. Models of cluster formation and methods used to detect substructure in clusters are reviewed. Consideration is given to classification schemes based on a departure of bright cluster galaxies from a spherically symmetric distribution, evidence for statistically significant substructure, and various types of substructure, including velocity, spatial, and spatial-velocity substructure. The substructure observed in the galaxy distribution in clusters is discussed, focusing on observations from general cluster samples, the Virgo cluster, the Hydra cluster, Centaurus, the Coma cluster, and the Cancer cluster. 88 refs

  13. The rotation of galaxy clusters

    International Nuclear Information System (INIS)

    Tovmassian, H.M.

    2015-01-01

    The method for detection of the galaxy cluster rotation based on the study of distribution of member galaxies with velocities lower and higher of the cluster mean velocity over the cluster image is proposed. The search for rotation is made for flat clusters with a/b> 1.8 and BMI type clusters which are expected to be rotating. For comparison there were studied also round clusters and clusters of NBMI type, the second by brightness galaxy in which does not differ significantly from the cluster cD galaxy. Seventeen out of studied 65 clusters are found to be rotating. It was found that the detection rate is sufficiently high for flat clusters, over 60 per cent, and clusters of BMI type with dominant cD galaxy, ≈ 35 per cent. The obtained results show that clusters were formed from the huge primordial gas clouds and preserved the rotation of the primordial clouds, unless they did not have mergings with other clusters and groups of galaxies, in the result of which the rotation has been prevented

  14. Automated detection of very Low Surface Brightness galaxies in the Virgo Cluster

    Science.gov (United States)

    Prole, D. J.; Davies, J. I.; Keenan, O. C.; Davies, L. J. M.

    2018-04-01

    We report the automatic detection of a new sample of very low surface brightness (LSB) galaxies, likely members of the Virgo cluster. We introduce our new software, DeepScan, that has been designed specifically to detect extended LSB features automatically using the DBSCAN algorithm. We demonstrate the technique by applying it over a 5 degree2 portion of the Next-Generation Virgo Survey (NGVS) data to reveal 53 low surface brightness galaxies that are candidate cluster members based on their sizes and colours. 30 of these sources are new detections despite the region being searched specifically for LSB galaxies previously. Our final sample contains galaxies with 26.0 ≤ ⟨μe⟩ ≤ 28.5 and 19 ≤ mg ≤ 21, making them some of the faintest known in Virgo. The majority of them have colours consistent with the red sequence, and have a mean stellar mass of 106.3 ± 0.5M⊙ assuming cluster membership. After using ProFit to fit Sérsic profiles to our detections, none of the new sources have effective radii larger than 1.5 Kpc and do not meet the criteria for ultra-diffuse galaxy (UDG) classification, so we classify them as ultra-faint dwarfs.

  15. Performance Analysis of Entropy Methods on K Means in Clustering Process

    Science.gov (United States)

    Dicky Syahputra Lubis, Mhd.; Mawengkang, Herman; Suwilo, Saib

    2017-12-01

    K Means is a non-hierarchical data clustering method that attempts to partition existing data into one or more clusters / groups. This method partitions the data into clusters / groups so that data that have the same characteristics are grouped into the same cluster and data that have different characteristics are grouped into other groups.The purpose of this data clustering is to minimize the objective function set in the clustering process, which generally attempts to minimize variation within a cluster and maximize the variation between clusters. However, the main disadvantage of this method is that the number k is often not known before. Furthermore, a randomly chosen starting point may cause two points to approach the distance to be determined as two centroids. Therefore, for the determination of the starting point in K Means used entropy method where this method is a method that can be used to determine a weight and take a decision from a set of alternatives. Entropy is able to investigate the harmony in discrimination among a multitude of data sets. Using Entropy criteria with the highest value variations will get the highest weight. Given this entropy method can help K Means work process in determining the starting point which is usually determined at random. Thus the process of clustering on K Means can be more quickly known by helping the entropy method where the iteration process is faster than the K Means Standard process. Where the postoperative patient dataset of the UCI Repository Machine Learning used and using only 12 data as an example of its calculations is obtained by entropy method only with 2 times iteration can get the desired end result.

  16. Motif-Independent De Novo Detection of Secondary Metabolite Gene Clusters – Towards Identification of Novel Secondary Metabolisms from Filamentous Fungi -

    Directory of Open Access Journals (Sweden)

    Myco eUmemura

    2015-05-01

    Full Text Available Secondary metabolites are produced mostly by clustered genes that are essential to their biosynthesis. The transcriptional expression of these genes is often cooperatively regulated by a transcription factor located inside or close to a cluster. Most of the secondary metabolism biosynthesis (SMB gene clusters identified to date contain so-called core genes with distinctive sequence features, such as polyketide synthase (PKS and non-ribosomal peptide synthetase (NRPS. Recent efforts in sequencing fungal genomes have revealed far more SMB gene clusters than expected based on the number of core genes in the genomes. Several bioinformatics tools have been developed to survey SMB gene clusters using the sequence motif information of the core genes, including SMURF and antiSMASH.More recently, accompanied by the development of sequencing techniques allowing to obtain large-scale genomic and transcriptomic data, motif-independent prediction methods of SMB gene clusters, including MIDDAS-M, have been developed. Most these methods detect the clusters in which the genes are cooperatively regulated at transcriptional levels, thus allowing the identification of novel SMB gene clusters regardless of the presence of the core genes. Another type of the method, MIPS-CG, uses the characteristics of SMB genes, which are highly enriched in non-syntenic blocks (NSBs, enabling the prediction even without transcriptome data although the results have not been evaluated in detail. Considering that large portion of SMB gene clusters might be sufficiently expressed only in limited uncommon conditions, it seems that prediction of SMB gene clusters by bioinformatics and successive experimental validation is an only way to efficiently uncover hidden SMB gene clusters. Here, we describe and discuss possible novel approaches for the determination of SMB gene clusters that have not been identified using conventional methods.

  17. A Clustering Method for Data in Cylindrical Coordinates

    Directory of Open Access Journals (Sweden)

    Kazuhisa Fujita

    2017-01-01

    Full Text Available We propose a new clustering method for data in cylindrical coordinates based on the k-means. The goal of the k-means family is to maximize an optimization function, which requires a similarity. Thus, we need a new similarity to obtain the new clustering method for data in cylindrical coordinates. In this study, we first derive a new similarity for the new clustering method by assuming a particular probabilistic model. A data point in cylindrical coordinates has radius, azimuth, and height. We assume that the azimuth is sampled from a von Mises distribution and the radius and the height are independently generated from isotropic Gaussian distributions. We derive the new similarity from the log likelihood of the assumed probability distribution. Our experiments demonstrate that the proposed method using the new similarity can appropriately partition synthetic data defined in cylindrical coordinates. Furthermore, we apply the proposed method to color image quantization and show that the methods successfully quantize a color image with respect to the hue element.

  18. A cluster merging method for time series microarray with production values.

    Science.gov (United States)

    Chira, Camelia; Sedano, Javier; Camara, Monica; Prieto, Carlos; Villar, Jose R; Corchado, Emilio

    2014-09-01

    A challenging task in time-course microarray data analysis is to cluster genes meaningfully combining the information provided by multiple replicates covering the same key time points. This paper proposes a novel cluster merging method to accomplish this goal obtaining groups with highly correlated genes. The main idea behind the proposed method is to generate a clustering starting from groups created based on individual temporal series (representing different biological replicates measured in the same time points) and merging them by taking into account the frequency by which two genes are assembled together in each clustering. The gene groups at the level of individual time series are generated using several shape-based clustering methods. This study is focused on a real-world time series microarray task with the aim to find co-expressed genes related to the production and growth of a certain bacteria. The shape-based clustering methods used at the level of individual time series rely on identifying similar gene expression patterns over time which, in some models, are further matched to the pattern of production/growth. The proposed cluster merging method is able to produce meaningful gene groups which can be naturally ranked by the level of agreement on the clustering among individual time series. The list of clusters and genes is further sorted based on the information correlation coefficient and new problem-specific relevant measures. Computational experiments and results of the cluster merging method are analyzed from a biological perspective and further compared with the clustering generated based on the mean value of time series and the same shape-based algorithm.

  19. Temporal Data-Driven Sleep Scheduling and Spatial Data-Driven Anomaly Detection for Clustered Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Gang Li

    2016-09-01

    Full Text Available The spatial–temporal correlation is an important feature of sensor data in wireless sensor networks (WSNs. Most of the existing works based on the spatial–temporal correlation can be divided into two parts: redundancy reduction and anomaly detection. These two parts are pursued separately in existing works. In this work, the combination of temporal data-driven sleep scheduling (TDSS and spatial data-driven anomaly detection is proposed, where TDSS can reduce data redundancy. The TDSS model is inspired by transmission control protocol (TCP congestion control. Based on long and linear cluster structure in the tunnel monitoring system, cooperative TDSS and spatial data-driven anomaly detection are then proposed. To realize synchronous acquisition in the same ring for analyzing the situation of every ring, TDSS is implemented in a cooperative way in the cluster. To keep the precision of sensor data, spatial data-driven anomaly detection based on the spatial correlation and Kriging method is realized to generate an anomaly indicator. The experiment results show that cooperative TDSS can realize non-uniform sensing effectively to reduce the energy consumption. In addition, spatial data-driven anomaly detection is quite significant for maintaining and improving the precision of sensor data.

  20. Grouped fuzzy SVM with EM-based partition of sample space for clustered microcalcification detection.

    Science.gov (United States)

    Wang, Huiya; Feng, Jun; Wang, Hongyu

    2017-07-20

    Detection of clustered microcalcification (MC) from mammograms plays essential roles in computer-aided diagnosis for early stage breast cancer. To tackle problems associated with the diversity of data structures of MC lesions and the variability of normal breast tissues, multi-pattern sample space learning is required. In this paper, a novel grouped fuzzy Support Vector Machine (SVM) algorithm with sample space partition based on Expectation-Maximization (EM) (called G-FSVM) is proposed for clustered MC detection. The diversified pattern of training data is partitioned into several groups based on EM algorithm. Then a series of fuzzy SVM are integrated for classification with each group of samples from the MC lesions and normal breast tissues. From DDSM database, a total of 1,064 suspicious regions are selected from 239 mammography, and the measurement of Accuracy, True Positive Rate (TPR), False Positive Rate (FPR) and EVL = TPR* 1-FPR are 0.82, 0.78, 0.14 and 0.72, respectively. The proposed method incorporates the merits of fuzzy SVM and multi-pattern sample space learning, decomposing the MC detection problem into serial simple two-class classification. Experimental results from synthetic data and DDSM database demonstrate that our integrated classification framework reduces the false positive rate significantly while maintaining the true positive rate.

  1. Detection of Clostridium difficile infection clusters, using the temporal scan statistic, in a community hospital in southern Ontario, Canada, 2006-2011.

    Science.gov (United States)

    Faires, Meredith C; Pearl, David L; Ciccotelli, William A; Berke, Olaf; Reid-Smith, Richard J; Weese, J Scott

    2014-05-12

    In hospitals, Clostridium difficile infection (CDI) surveillance relies on unvalidated guidelines or threshold criteria to identify outbreaks. This can result in false-positive and -negative cluster alarms. The application of statistical methods to identify and understand CDI clusters may be a useful alternative or complement to standard surveillance techniques. The objectives of this study were to investigate the utility of the temporal scan statistic for detecting CDI clusters and determine if there are significant differences in the rate of CDI cases by month, season, and year in a community hospital. Bacteriology reports of patients identified with a CDI from August 2006 to February 2011 were collected. For patients detected with CDI from March 2010 to February 2011, stool specimens were obtained. Clostridium difficile isolates were characterized by ribotyping and investigated for the presence of toxin genes by PCR. CDI clusters were investigated using a retrospective temporal scan test statistic. Statistically significant clusters were compared to known CDI outbreaks within the hospital. A negative binomial regression model was used to identify associations between year, season, month and the rate of CDI cases. Overall, 86 CDI cases were identified. Eighteen specimens were analyzed and nine ribotypes were classified with ribotype 027 (n = 6) the most prevalent. The temporal scan statistic identified significant CDI clusters at the hospital (n = 5), service (n = 6), and ward (n = 4) levels (P ≤ 0.05). Three clusters were concordant with the one C. difficile outbreak identified by hospital personnel. Two clusters were identified as potential outbreaks. The negative binomial model indicated years 2007-2010 (P ≤ 0.05) had decreased CDI rates compared to 2006 and spring had an increased CDI rate compared to the fall (P = 0.023). Application of the temporal scan statistic identified several clusters, including potential outbreaks not detected by hospital

  2. The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    Chen Yidong

    2004-01-01

    Full Text Available An unsupervised data clustering method, called the local maximum clustering (LMC method, is proposed for identifying clusters in experiment data sets based on research interest. A magnitude property is defined according to research purposes, and data sets are clustered around each local maximum of the magnitude property. By properly defining a magnitude property, this method can overcome many difficulties in microarray data clustering such as reduced projection in similarities, noises, and arbitrary gene distribution. To critically evaluate the performance of this clustering method in comparison with other methods, we designed three model data sets with known cluster distributions and applied the LMC method as well as the hierarchic clustering method, the -mean clustering method, and the self-organized map method to these model data sets. The results show that the LMC method produces the most accurate clustering results. As an example of application, we applied the method to cluster the leukemia samples reported in the microarray study of Golub et al. (1999.

  3. A {sup 13}CO Detection in a Brightest Cluster Galaxy

    Energy Technology Data Exchange (ETDEWEB)

    Vantyghem, A. N.; McNamara, B. R.; Hogan, M. T. [Department of Physics and Astronomy, University of Waterloo, Waterloo, ON N2L 3G1 (Canada); Edge, A. C. [Department of Physics, Durham University, Durham DH1 3LE (United Kingdom); Combes, F.; Salomé, P. [LERMA, Observatoire de Paris, CNRS, UPMC, PSL Univ., 61 avenue de l’Observatoire, F-75014 Paris (France); Russell, H. R.; Fabian, A. C. [Institute of Astronomy, Madingley Road, Cambridge CB3 0HA (United Kingdom); McDonald, M. [Kavli Institute for Astrophysics and Space Research, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139 (United States); Nulsen, P. E. J., E-mail: a2vantyg@uwaterloo.ca [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138 (United States)

    2017-10-20

    We present ALMA Cycle 4 observations of CO(1-0), CO(3-2), and {sup 13}CO(3-2) line emission in the brightest cluster galaxy (BCG) of RXJ0821+0752. This is one of the first detections of {sup 13}CO line emission in a galaxy cluster. Half of the CO(3-2) line emission originates from two clumps of molecular gas that are spatially offset from the galactic center. These clumps are surrounded by diffuse emission that extends 8 kpc in length. The detected {sup 13}CO emission is confined entirely to the two bright clumps, with any emission outside of this region lying below our detection threshold. Two distinct velocity components with similar integrated fluxes are detected in the {sup 12}CO spectra. The narrower component (60 km s{sup −1} FWHM) is consistent in both velocity centroid and linewidth with {sup 13}CO(3-2) emission, while the broader (130–160 km s{sup −1}), slightly blueshifted wing has no associated {sup 13}CO(3-2) emission. A simple local thermodynamic model indicates that the {sup 13}CO emission traces 2.1 × 10{sup 9} M {sub ⊙} of molecular gas. Isolating the {sup 12}CO velocity component that accompanies the {sup 13}CO emission yields a CO-to-H{sub 2} conversion factor of α {sub CO} = 2.3 M {sub ⊙} (K km s{sup −1}){sup −1}, which is a factor of two lower than the Galactic value. Adopting the Galactic CO-to-H{sub 2} conversion factor in BCGs may therefore overestimate their molecular gas masses by a factor of two. This is within the object-to-object scatter from extragalactic sources, so calibrations in a larger sample of clusters are necessary in order to confirm a sub-Galactic conversion factor.

  4. Comparative analysis of clustering methods for gene expression time course data

    Directory of Open Access Journals (Sweden)

    Ivan G. Costa

    2004-01-01

    Full Text Available This work performs a data driven comparative study of clustering methods used in the analysis of gene expression time courses (or time series. Five clustering methods found in the literature of gene expression analysis are compared: agglomerative hierarchical clustering, CLICK, dynamical clustering, k-means and self-organizing maps. In order to evaluate the methods, a k-fold cross-validation procedure adapted to unsupervised methods is applied. The accuracy of the results is assessed by the comparison of the partitions obtained in these experiments with gene annotation, such as protein function and series classification.

  5. Detecting Gravitational Lensing of the Cosmic Microwave Background by Galaxy Clusters

    Energy Technology Data Exchange (ETDEWEB)

    Baxter, Eric Jones [Univ. of Chicago, IL (United States)

    2014-08-01

    Clusters of galaxies gravitationally lens the Cosmic Microwave Background (CMB) leading to a distinct signal in the CMB on arcminute scales. Measurement of the cluster lensing effect offers the exciting possibility of constraining the masses of galaxy clusters using CMB data alone. Improved constraints on cluster masses are in turn essential to the use of clusters as cosmological probes: uncertainties in cluster masses are currently the dominant systematic affecting cluster abundance constraints on cosmology. To date, however, the CMB cluster lensing signal remains undetected because of its small magnitude and angular size. In this thesis, we develop a maximum likelihood approach to extracting the signal from CMB temperature data. We validate the technique by applying it to mock data designed to replicate as closely as possible real data from the South Pole Telescope’s (SPT) Sunyaev-Zel’dovich (SZ) survey: the effects of the SPT beam, transfer function, instrumental noise and cluster selection are incorporated. We consider the effects of foreground emission on the analysis and show that uncertainty in amount of foreground lensing results in a small systematic error on the lensing constraints. Additionally, we show that if unaccounted for, the SZ effect leads to unacceptably large biases on the lensing constraints and develop an approach for removing SZ contamination. The results of the mock analysis presented here suggest that a 4σ first detection of the cluster lensing effect can be achieved with current SPT-SZ data.

  6. bcl::Cluster : A method for clustering biological molecules coupled with visualization in the Pymol Molecular Graphics System.

    Science.gov (United States)

    Alexander, Nathan; Woetzel, Nils; Meiler, Jens

    2011-02-01

    Clustering algorithms are used as data analysis tools in a wide variety of applications in Biology. Clustering has become especially important in protein structure prediction and virtual high throughput screening methods. In protein structure prediction, clustering is used to structure the conformational space of thousands of protein models. In virtual high throughput screening, databases with millions of drug-like molecules are organized by structural similarity, e.g. common scaffolds. The tree-like dendrogram structure obtained from hierarchical clustering can provide a qualitative overview of the results, which is important for focusing detailed analysis. However, in practice it is difficult to relate specific components of the dendrogram directly back to the objects of which it is comprised and to display all desired information within the two dimensions of the dendrogram. The current work presents a hierarchical agglomerative clustering method termed bcl::Cluster. bcl::Cluster utilizes the Pymol Molecular Graphics System to graphically depict dendrograms in three dimensions. This allows simultaneous display of relevant biological molecules as well as additional information about the clusters and the members comprising them.

  7. THE RELATION BETWEEN COOL CLUSTER CORES AND HERSCHEL-DETECTED STAR FORMATION IN BRIGHTEST CLUSTER GALAXIES

    Energy Technology Data Exchange (ETDEWEB)

    Rawle, T. D.; Egami, E.; Rex, M.; Fiedler, A.; Haines, C. P.; Pereira, M. J.; Portouw, J.; Walth, G. [Steward Observatory, University of Arizona, 933 N. Cherry Ave., Tucson, AZ 85721 (United States); Edge, A. C. [Institute for Computational Cosmology, Durham University, South Road, Durham DH1 3LE (United Kingdom); Smith, G. P. [School of Physics and Astronomy, University of Birmingham, Edgbaston, Birmingham B15 2TT (United Kingdom); Altieri, B.; Valtchanov, I. [Herschel Science Centre, ESAC, ESA, P.O. Box 78, Villanueva de la Canada, 28691 Madrid (Spain); Perez-Gonzalez, P. G. [Departamento de Astrofisica, Facultad de CC. Fisicas, Universidad Complutense de Madrid, E-28040 Madrid (Spain); Van der Werf, P. P. [Sterrewacht Leiden, Leiden University, P.O. Box 9513, 2300 RA, Leiden (Netherlands); Zemcov, M., E-mail: trawle@as.arizona.edu [Department of Physics, Mathematics and Astronomy, California Institute of Technology, Pasadena, CA 91125 (United States)

    2012-03-01

    We present far-infrared (FIR) analysis of 68 brightest cluster galaxies (BCGs) at 0.08 < z < 1.0. Deriving total infrared luminosities directly from Spitzer and Herschel photometry spanning the peak of the dust component (24-500 {mu}m), we calculate the obscured star formation rate (SFR). 22{sup +6.2}{sub -5.3}% of the BCGs are detected in the far-infrared, with SFR = 1-150 M{sub Sun} yr{sup -1}. The infrared luminosity is highly correlated with cluster X-ray gas cooling times for cool-core clusters (gas cooling time <1 Gyr), strongly suggesting that the star formation in these BCGs is influenced by the cluster-scale cooling process. The occurrence of the molecular gas tracing H{alpha} emission is also correlated with obscured star formation. For all but the most luminous BCGs (L{sub TIR} > 2 Multiplication-Sign 10{sup 11} L{sub Sun }), only a small ({approx}<0.4 mag) reddening correction is required for SFR(H{alpha}) to agree with SFR{sub FIR}. The relatively low H{alpha} extinction (dust obscuration), compared to values reported for the general star-forming population, lends further weight to an alternate (external) origin for the cold gas. Finally, we use a stacking analysis of non-cool-core clusters to show that the majority of the fuel for star formation in the FIR-bright BCGs is unlikely to originate from normal stellar mass loss.

  8. Evaluation of null-point detection methods on simulation data

    Science.gov (United States)

    Olshevsky, Vyacheslav; Fu, Huishan; Vaivads, Andris; Khotyaintsev, Yuri; Lapenta, Giovanni; Markidis, Stefano

    2014-05-01

    We model the measurements of artificial spacecraft that resemble the configuration of CLUSTER propagating in the particle-in-cell simulation of turbulent magnetic reconnection. The simulation domain contains multiple isolated X-type null-points, but the majority are O-type null-points. Simulations show that current pinches surrounded by twisted fields, analogous to laboratory pinches, are formed along the sequences of O-type nulls. In the simulation, the magnetic reconnection is mainly driven by the kinking of the pinches, at spatial scales of several ion inertial lentghs. We compute the locations of magnetic null-points and detect their type. When the satellites are separated by the fractions of ion inertial length, as it is for CLUSTER, they are able to locate both the isolated null-points, and the pinches. We apply the method to the real CLUSTER data and speculate how common are pinches in the magnetosphere, and whether they play a dominant role in the dissipation of magnetic energy.

  9. Fermi detection of a luminous γ-ray pulsar in a globular cluster.

    Science.gov (United States)

    2011-11-25

    We report on the Fermi Large Area Telescope's detection of γ-ray (>100 mega-electron volts) pulsations from pulsar J1823-3021A in the globular cluster NGC 6624 with high significance (~7 σ). Its γ-ray luminosity, L(γ) = (8.4 ± 1.6) × 10(34) ergs per second, is the highest observed for any millisecond pulsar (MSP) to date, and it accounts for most of the cluster emission. The nondetection of the cluster in the off-pulse phase implies that it contains <32 γ-ray MSPs, not ~100 as previously estimated. The γ-ray luminosity indicates that the unusually large rate of change of its period is caused by its intrinsic spin-down. This implies that J1823-3021A has the largest magnetic field and is the youngest MSP ever detected and that such anomalous objects might be forming at rates comparable to those of the more normal MSPs.

  10. Fast optimization of binary clusters using a novel dynamic lattice searching method

    International Nuclear Information System (INIS)

    Wu, Xia; Cheng, Wen

    2014-01-01

    Global optimization of binary clusters has been a difficult task despite of much effort and many efficient methods. Directing toward two types of elements (i.e., homotop problem) in binary clusters, two classes of virtual dynamic lattices are constructed and a modified dynamic lattice searching (DLS) method, i.e., binary DLS (BDLS) method, is developed. However, it was found that the BDLS can only be utilized for the optimization of binary clusters with small sizes because homotop problem is hard to be solved without atomic exchange operation. Therefore, the iterated local search (ILS) method is adopted to solve homotop problem and an efficient method based on the BDLS method and ILS, named as BDLS-ILS, is presented for global optimization of binary clusters. In order to assess the efficiency of the proposed method, binary Lennard-Jones clusters with up to 100 atoms are investigated. Results show that the method is proved to be efficient. Furthermore, the BDLS-ILS method is also adopted to study the geometrical structures of (AuPd) 79 clusters with DFT-fit parameters of Gupta potential

  11. Onto-clust--a methodology for combining clustering analysis and ontological methods for identifying groups of comorbidities for developmental disorders.

    Science.gov (United States)

    Peleg, Mor; Asbeh, Nuaman; Kuflik, Tsvi; Schertz, Mitchell

    2009-02-01

    Children with developmental disorders usually exhibit multiple developmental problems (comorbidities). Hence, such diagnosis needs to revolve on developmental disorder groups. Our objective is to systematically identify developmental disorder groups and represent them in an ontology. We developed a methodology that combines two methods (1) a literature-based ontology that we created, which represents developmental disorders and potential developmental disorder groups, and (2) clustering for detecting comorbid developmental disorders in patient data. The ontology is used to interpret and improve clustering results and the clustering results are used to validate the ontology and suggest directions for its development. We evaluated our methodology by applying it to data of 1175 patients from a child development clinic. We demonstrated that the ontology improves clustering results, bringing them closer to an expert generated gold-standard. We have shown that our methodology successfully combines an ontology with a clustering method to support systematic identification and representation of developmental disorder groups.

  12. Influence of the input database in detecting fire space-time clusters

    Science.gov (United States)

    Pereira, Mário; Costa, Ricardo; Tonini, Marj; Vega Orozco, Carmen; Parente, Joana

    2015-04-01

    Fire incidence variability is influenced by local environmental variables such as topography, land use, vegetation and weather conditions. These induce a cluster pattern of the fire events distribution. The space-time permutation scan statistics (STPSS) method developed by Kulldorff et al. (2005) and implemented in the SaTScanTM software (http://www.satscan.org/) proves to be able to detect space-time clusters in many different fields, even when using incomplete and/or inaccurate input data. Nevertheless, the dependence of the STPSS method on the different characteristics of different datasets describing the same environmental phenomenon has not been studied yet. In this sense, the objective of this study is to assess the robustness of the STPSS for detecting real clusters using different input datasets and to justify the obtained results. This study takes advantage of the existence of two very different official fire datasets currently available for Portugal, both provided by the Institute for the Conservation of Nature and Forests. The first one is the aggregated Portuguese Rural Fire Database PRFD (Pereira et al., 2011), which is based on ground measurements and provides detailed information about the ignition and extinction date/time and the area burnt by each fire in forest, scrubs and agricultural areas. However, in the PRFD, the fire location of each fire is indicated by the name of smallest administrative unit (the parish) where the ignition occurred. Consequently, since the application of the STPSS requires the geographic coordinates of the events, the centroid of the parishes was considered. The second fire dataset is the national mapping burnt areas (NMBA), which is based on satellite measurements and delivered in shape file format. The NMBA provides a detailed spatial information (shape and size of each fire) but the temporal information is restricted to the year of occurrence. Besides these differences, the two datasets cover different periods, they

  13. Quantifying clutter: A comparison of four methods and their relationship to bat detection

    Science.gov (United States)

    Joy M. O’Keefe; Susan C. Loeb; Hoke S. Hill Jr.; J. Drew Lanham

    2014-01-01

    The degree of spatial complexity in the environment, or clutter, affects the quality of foraging habitats for bats and their detection with acoustic systems. Clutter has been assessed in a variety of ways but there are no standardized methods for measuring clutter. We compared four methods (Visual Clutter, Cluster, Single Variable, and Clutter Index) and related these...

  14. TRUSTWORTHY OPTIMIZED CLUSTERING BASED TARGET DETECTION AND TRACKING FOR WIRELESS SENSOR NETWORK

    Directory of Open Access Journals (Sweden)

    C. Jehan

    2016-06-01

    Full Text Available In this paper, an efficient approach is proposed to address the problem of target tracking in wireless sensor network (WSN. The problem being tackled here uses adaptive dynamic clustering scheme for tracking the target. It is a specific problem in object tracking. The proposed adaptive dynamic clustering target tracking scheme uses three steps for target tracking. The first step deals with the identification of clusters and cluster heads using OGSAFCM. Here, kernel fuzzy c-means (KFCM and gravitational search algorithm (GSA are combined to create clusters. At first, oppositional gravitational search algorithm (OGSA is used to optimize the initial clustering center and then the KFCM algorithm is availed to guide the classification and the cluster formation process. In the OGSA, the concept of the opposition based population initialization in the basic GSA to improve the convergence profile. The identified clusters are changed dynamically. The second step deals with the data transmission to the cluster heads. The third step deals with the transmission of aggregated data to the base station as well as the detection of target. From the experimental results, the proposed scheme efficiently and efficiently identifies the target. As a result the tracking error is minimized.

  15. Blood detection in wireless capsule endoscopy using expectation maximization clustering

    Science.gov (United States)

    Hwang, Sae; Oh, JungHwan; Cox, Jay; Tang, Shou Jiang; Tibbals, Harry F.

    2006-03-01

    Wireless Capsule Endoscopy (WCE) is a relatively new technology (FDA approved in 2002) allowing doctors to view most of the small intestine. Other endoscopies such as colonoscopy, upper gastrointestinal endoscopy, push enteroscopy, and intraoperative enteroscopy could be used to visualize up to the stomach, duodenum, colon, and terminal ileum, but there existed no method to view most of the small intestine without surgery. With the miniaturization of wireless and camera technologies came the ability to view the entire gestational track with little effort. A tiny disposable video capsule is swallowed, transmitting two images per second to a small data receiver worn by the patient on a belt. During an approximately 8-hour course, over 55,000 images are recorded to a worn device and then downloaded to a computer for later examination. Typically, a medical clinician spends more than two hours to analyze a WCE video. Research has been attempted to automatically find abnormal regions (especially bleeding) to reduce the time needed to analyze the videos. The manufacturers also provide the software tool to detect the bleeding called Suspected Blood Indicator (SBI), but its accuracy is not high enough to replace human examination. It was reported that the sensitivity and the specificity of SBI were about 72% and 85%, respectively. To address this problem, we propose a technique to detect the bleeding regions automatically utilizing the Expectation Maximization (EM) clustering algorithm. Our experimental results indicate that the proposed bleeding detection method achieves 92% and 98% of sensitivity and specificity, respectively.

  16. Short-Term Wind Power Forecasting Based on Clustering Pre-Calculated CFD Method

    Directory of Open Access Journals (Sweden)

    Yimei Wang

    2018-04-01

    Full Text Available To meet the increasing wind power forecasting (WPF demands of newly built wind farms without historical data, physical WPF methods are widely used. The computational fluid dynamics (CFD pre-calculated flow fields (CPFF-based WPF is a promising physical approach, which can balance well the competing demands of computational efficiency and accuracy. To enhance its adaptability for wind farms in complex terrain, a WPF method combining wind turbine clustering with CPFF is first proposed where the wind turbines in the wind farm are clustered and a forecasting is undertaken for each cluster. K-means, hierarchical agglomerative and spectral analysis methods are used to establish the wind turbine clustering models. The Silhouette Coefficient, Calinski-Harabaz index and within-between index are proposed as criteria to evaluate the effectiveness of the established clustering models. Based on different clustering methods and schemes, various clustering databases are built for clustering pre-calculated CFD (CPCC-based short-term WPF. For the wind farm case studied, clustering evaluation criteria show that hierarchical agglomerative clustering has reasonable results, spectral clustering is better and K-means gives the best performance. The WPF results produced by different clustering databases also prove the effectiveness of the three evaluation criteria in turn. The newly developed CPCC model has a much higher WPF accuracy than the CPFF model without using clustering techniques, both on temporal and spatial scales. The research provides supports for both the development and improvement of short-term physical WPF systems.

  17. Robust Pseudo-Hierarchical Support Vector Clustering

    DEFF Research Database (Denmark)

    Hansen, Michael Sass; Sjöstrand, Karl; Olafsdóttir, Hildur

    2007-01-01

    Support vector clustering (SVC) has proven an efficient algorithm for clustering of noisy and high-dimensional data sets, with applications within many fields of research. An inherent problem, however, has been setting the parameters of the SVC algorithm. Using the recent emergence of a method...... for calculating the entire regularization path of the support vector domain description, we propose a fast method for robust pseudo-hierarchical support vector clustering (HSVC). The method is demonstrated to work well on generated data, as well as for detecting ischemic segments from multidimensional myocardial...

  18. An Optimized Clustering Approach for Automated Detection of White Matter Lesions in MRI Brain Images

    Directory of Open Access Journals (Sweden)

    M. Anitha

    2012-04-01

    Full Text Available Settings White Matter lesions (WMLs are small areas of dead cells found in parts of the brain. In general, it is difficult for medical experts to accurately quantify the WMLs due to decreased contrast between White Matter (WM and Grey Matter (GM. The aim of this paper is to
    automatically detect the White Matter Lesions which is present in the brains of elderly people. WML detection process includes the following stages: 1. Image preprocessing, 2. Clustering (Fuzzy c-means clustering, Geostatistical Possibilistic clustering and Geostatistical Fuzzy clustering and 3.Optimization using Particle Swarm Optimization (PSO. The proposed system is tested on a database of 208 MRI images. GFCM yields high sensitivity of 89%, specificity of 94% and overall accuracy of 93% over FCM and GPC. The clustered brain images are then subjected to Particle Swarm Optimization (PSO. The optimized result obtained from GFCM-PSO provides sensitivity of 90%, specificity of 94% and accuracy of 95%. The detection results reveals that GFCM and GFCMPSO better localizes the large regions of lesions and gives less false positive rate when compared to GPC and GPC-PSO which captures the largest loads of WMLs only in the upper ventral horns of the brain.

  19. Comparative study of methods on outlying data detection in experimental results

    International Nuclear Information System (INIS)

    Oliveira, P.M.S.; Munita, C.S.; Hazenfratz, R.

    2009-01-01

    The interpretation of experimental results through multivariate statistical methods might reveal the outliers existence, which is rarely taken into account by the analysts. However, their presence can influence the results interpretation, generating false conclusions. This paper shows the importance of the outliers determination for one data base of 89 samples of ceramic fragments, analyzed by neutron activation analysis. The results were submitted to five procedures to detect outliers: Mahalanobis distance, cluster analysis, principal component analysis, factor analysis, and standardized residual. The results showed that although cluster analysis is one of the procedures most used to identify outliers, it can fail by not showing the samples that are easily identified as outliers by other methods. In general, the statistical procedures for the identification of the outliers are little known by the analysts. (author)

  20. A two-stage method for microcalcification cluster segmentation in mammography by deformable models

    International Nuclear Information System (INIS)

    Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.; Karahaliou, A.; Costaridou, L.; Vassiou, K.

    2015-01-01

    Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods are applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST cluster , average of minimum distance—AMINDIST cluster ) and the area overlap measure (AOM cluster ). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing tenfold cross

  1. A comparison of heuristic and model-based clustering methods for dietary pattern analysis.

    Science.gov (United States)

    Greve, Benjamin; Pigeot, Iris; Huybrechts, Inge; Pala, Valeria; Börnhorst, Claudia

    2016-02-01

    Cluster analysis is widely applied to identify dietary patterns. A new method based on Gaussian mixture models (GMM) seems to be more flexible compared with the commonly applied k-means and Ward's method. In the present paper, these clustering approaches are compared to find the most appropriate one for clustering dietary data. The clustering methods were applied to simulated data sets with different cluster structures to compare their performance knowing the true cluster membership of observations. Furthermore, the three methods were applied to FFQ data assessed in 1791 children participating in the IDEFICS (Identification and Prevention of Dietary- and Lifestyle-Induced Health Effects in Children and Infants) Study to explore their performance in practice. The GMM outperformed the other methods in the simulation study in 72 % up to 100 % of cases, depending on the simulated cluster structure. Comparing the computationally less complex k-means and Ward's methods, the performance of k-means was better in 64-100 % of cases. Applied to real data, all methods identified three similar dietary patterns which may be roughly characterized as a 'non-processed' cluster with a high consumption of fruits, vegetables and wholemeal bread, a 'balanced' cluster with only slight preferences of single foods and a 'junk food' cluster. The simulation study suggests that clustering via GMM should be preferred due to its higher flexibility regarding cluster volume, shape and orientation. The k-means seems to be a good alternative, being easier to use while giving similar results when applied to real data.

  2. Synthesis of colloidal silver nanoparticle clusters and their application in ascorbic acid detection by SERS.

    Science.gov (United States)

    Cholula-Díaz, Jorge L; Lomelí-Marroquín, Diana; Pramanick, Bidhan; Nieto-Argüello, Alfonso; Cantú-Castillo, Luis A; Hwang, Hyundoo

    2018-03-01

    Ascorbic acid (vitamin C) has an essential role in the human body mainly due to its antioxidant function. In this work, metallic silver nanoparticle (AgNP) colloids were used in SERS experiments to detect ascorbic acid in aqueous solution. The AgNPs were synthesized by a green method using potato starch as reducing and stabilizing agent, and water as the solvent. The optical properties of the yellowish as-synthesized silver colloids were characterized by UV-vis spectroscopy, in which besides a typical band at 410 nm related to the localized surface plasmon resonance of the silver nanoparticles, a shoulder band around 500 nm, due to silver nanoparticle cluster formation, is presented when relatively higher concentrations of starch are used in the synthesis. These starch-capped silver nanoparticles show an intrinsic Raman peak at 1386 cm -1 assigned to deformation modes of the starch structure. The increase of the intensity of the SERS peak at 1386 cm -1 with an increase in the concentration of the ascorbic acid is related to a decrease of the gap between dimers and trimers of the silver nanoparticle clusters produced by the presence of ascorbic acid in the colloid. The limit of detection of this technique for ascorbic acid is 0.02 mM with a measurement concentration range of 0.02-10 mM, which is relevant for the application of this method for detecting ascorbic acid in biological specimen. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Relevant Subspace Clustering

    DEFF Research Database (Denmark)

    Müller, Emmanuel; Assent, Ira; Günnemann, Stephan

    2009-01-01

    Subspace clustering aims at detecting clusters in any subspace projection of a high dimensional space. As the number of possible subspace projections is exponential in the number of dimensions, the result is often tremendously large. Recent approaches fail to reduce results to relevant subspace...... clusters. Their results are typically highly redundant, i.e. many clusters are detected multiple times in several projections. In this work, we propose a novel model for relevant subspace clustering (RESCU). We present a global optimization which detects the most interesting non-redundant subspace clusters...... achieves top clustering quality while competing approaches show greatly varying performance....

  4. Advanced cluster methods for correlated-electron systems

    Energy Technology Data Exchange (ETDEWEB)

    Fischer, Andre

    2015-04-27

    In this thesis, quantum cluster methods are used to calculate electronic properties of correlated-electron systems. A special focus lies in the determination of the ground state properties of a 3/4 filled triangular lattice within the one-band Hubbard model. At this filling, the electronic density of states exhibits a so-called van Hove singularity and the Fermi surface becomes perfectly nested, causing an instability towards a variety of spin-density-wave (SDW) and superconducting states. While chiral d+id-wave superconductivity has been proposed as the ground state in the weak coupling limit, the situation towards strong interactions is unclear. Additionally, quantum cluster methods are used here to investigate the interplay of Coulomb interactions and symmetry-breaking mechanisms within the nematic phase of iron-pnictide superconductors. The transition from a tetragonal to an orthorhombic phase is accompanied by a significant change in electronic properties, while long-range magnetic order is not established yet. The driving force of this transition may not only be phonons but also magnetic or orbital fluctuations. The signatures of these scenarios are studied with quantum cluster methods to identify the most important effects. Here, cluster perturbation theory (CPT) and its variational extention, the variational cluster approach (VCA) are used to treat the respective systems on a level beyond mean-field theory. Short-range correlations are incorporated numerically exactly by exact diagonalization (ED). In the VCA, long-range interactions are included by variational optimization of a fictitious symmetry-breaking field based on a self-energy functional approach. Due to limitations of ED, cluster sizes are limited to a small number of degrees of freedom. For the 3/4 filled triangular lattice, the VCA is performed for different cluster symmetries. A strong symmetry dependence and finite-size effects make a comparison of the results from different clusters difficult

  5. AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number

    Directory of Open Access Journals (Sweden)

    Cooper James B

    2010-03-01

    Full Text Available Abstract Background Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry. Results We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four. Conclusions By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at http://jimcooperlab.mcdb.ucsb.edu/autosome.

  6. Detection of a Double Relic in the Torpedo Cluster: SPT-CL J0245-5302

    Science.gov (United States)

    Zheng, Q.; Johnston-Hollitt, M.; Duchesne, S. W.; Li, W. T.

    2018-06-01

    The Torpedo cluster, SPT-CL J0245-5302 (S0295) is a massive, merging cluster at a redshift of z = 0.300, which exhibits a strikingly similar morphology to the Bullet cluster 1E 0657-55.8 (z = 0.296), including a classic bow shock in the cluster's intra-cluster medium revealed by Chandra X-ray observations. We present Australia Telescope Compact Array data centred at 2.1 GHz and Murchison Widefield Array data at frequencies between 72 MHz and 231 MHz which we use to study the properties of the cluster. We characterise a number of discrete and diffuse radio sources in the cluster, including the detection of two previously unknown radio relics on the cluster periphery. The average spectral index of the diffuse emission between 70 MHz and 3.1 GHz is α =-1.63_{-0.10}^{+0.10} and a radio-derived Mach number for the shock in the west of the cluster is calculated as M = 2.04. The Torpedo cluster is thus a double relic system at moderate redshift.

  7. Scalable Density-Based Subspace Clustering

    DEFF Research Database (Denmark)

    Müller, Emmanuel; Assent, Ira; Günnemann, Stephan

    2011-01-01

    For knowledge discovery in high dimensional databases, subspace clustering detects clusters in arbitrary subspace projections. Scalability is a crucial issue, as the number of possible projections is exponential in the number of dimensions. We propose a scalable density-based subspace clustering...... method that steers mining to few selected subspace clusters. Our novel steering technique reduces subspace processing by identifying and clustering promising subspaces and their combinations directly. Thereby, it narrows down the search space while maintaining accuracy. Thorough experiments on real...... and synthetic databases show that steering is efficient and scalable, with high quality results. For future work, our steering paradigm for density-based subspace clustering opens research potential for speeding up other subspace clustering approaches as well....

  8. ICARES: a real-time automated detection tool for clusters of infectious diseases in the Netherlands.

    NARCIS (Netherlands)

    Groeneveld, Geert H; Dalhuijsen, Anton; Kara-Zaïtri, Chakib; Hamilton, Bob; de Waal, Margot W; van Dissel, Jaap T; van Steenbergen, Jim E

    2017-01-01

    Clusters of infectious diseases are frequently detected late. Real-time, detailed information about an evolving cluster and possible associated conditions is essential for local policy makers, travelers planning to visit the area, and the local population. This is currently illustrated in the Zika

  9. Prioritizing the risk of plant pests by clustering methods; self-organising maps, k-means and hierarchical clustering

    Directory of Open Access Journals (Sweden)

    Susan Worner

    2013-09-01

    Full Text Available For greater preparedness, pest risk assessors are required to prioritise long lists of pest species with potential to establish and cause significant impact in an endangered area. Such prioritization is often qualitative, subjective, and sometimes biased, relying mostly on expert and stakeholder consultation. In recent years, cluster based analyses have been used to investigate regional pest species assemblages or pest profiles to indicate the risk of new organism establishment. Such an approach is based on the premise that the co-occurrence of well-known global invasive pest species in a region is not random, and that the pest species profile or assemblage integrates complex functional relationships that are difficult to tease apart. In other words, the assemblage can help identify and prioritise species that pose a threat in a target region. A computational intelligence method called a Kohonen self-organizing map (SOM, a type of artificial neural network, was the first clustering method applied to analyse assemblages of invasive pests. The SOM is a well known dimension reduction and visualization method especially useful for high dimensional data that more conventional clustering methods may not analyse suitably. Like all clustering algorithms, the SOM can give details of clusters that identify regions with similar pest assemblages, possible donor and recipient regions. More important, however SOM connection weights that result from the analysis can be used to rank the strength of association of each species within each regional assemblage. Species with high weights that are not already established in the target region are identified as high risk. However, the SOM analysis is only the first step in a process to assess risk to be used alongside or incorporated within other measures. Here we illustrate the application of SOM analyses in a range of contexts in invasive species risk assessment, and discuss other clustering methods such as k

  10. A NEW METHOD TO QUANTIFY X-RAY SUBSTRUCTURES IN CLUSTERS OF GALAXIES

    Energy Technology Data Exchange (ETDEWEB)

    Andrade-Santos, Felipe; Lima Neto, Gastao B.; Lagana, Tatiana F. [Departamento de Astronomia, Instituto de Astronomia, Geofisica e Ciencias Atmosfericas, Universidade de Sao Paulo, Geofisica e Ciencias Atmosfericas, Rua do Matao 1226, Cidade Universitaria, 05508-090 Sao Paulo, SP (Brazil)

    2012-02-20

    We present a new method to quantify substructures in clusters of galaxies, based on the analysis of the intensity of structures. This analysis is done in a residual image that is the result of the subtraction of a surface brightness model, obtained by fitting a two-dimensional analytical model ({beta}-model or Sersic profile) with elliptical symmetry, from the X-ray image. Our method is applied to 34 clusters observed by the Chandra Space Telescope that are in the redshift range z in [0.02, 0.2] and have a signal-to-noise ratio (S/N) greater than 100. We present the calibration of the method and the relations between the substructure level with physical quantities, such as the mass, X-ray luminosity, temperature, and cluster redshift. We use our method to separate the clusters in two sub-samples of high- and low-substructure levels. We conclude, using Monte Carlo simulations, that the method recuperates very well the true amount of substructure for small angular core radii clusters (with respect to the whole image size) and good S/N observations. We find no evidence of correlation between the substructure level and physical properties of the clusters such as gas temperature, X-ray luminosity, and redshift; however, analysis suggest a trend between the substructure level and cluster mass. The scaling relations for the two sub-samples (high- and low-substructure level clusters) are different (they present an offset, i.e., given a fixed mass or temperature, low-substructure clusters tend to be more X-ray luminous), which is an important result for cosmological tests using the mass-luminosity relation to obtain the cluster mass function, since they rely on the assumption that clusters do not present different scaling relations according to their dynamical state.

  11. Sensitivity evaluation of dynamic speckle activity measurements using clustering methods

    International Nuclear Information System (INIS)

    Etchepareborda, Pablo; Federico, Alejandro; Kaufmann, Guillermo H.

    2010-01-01

    We evaluate and compare the use of competitive neural networks, self-organizing maps, the expectation-maximization algorithm, K-means, and fuzzy C-means techniques as partitional clustering methods, when the sensitivity of the activity measurement of dynamic speckle images needs to be improved. The temporal history of the acquired intensity generated by each pixel is analyzed in a wavelet decomposition framework, and it is shown that the mean energy of its corresponding wavelet coefficients provides a suited feature space for clustering purposes. The sensitivity obtained by using the evaluated clustering techniques is also compared with the well-known methods of Konishi-Fujii, weighted generalized differences, and wavelet entropy. The performance of the partitional clustering approach is evaluated using simulated dynamic speckle patterns and also experimental data.

  12. Efficient image duplicated region detection model using sequential block clustering

    Czech Academy of Sciences Publication Activity Database

    Sekeh, M. A.; Maarof, M. A.; Rohani, M. F.; Mahdian, Babak

    2013-01-01

    Roč. 10, č. 1 (2013), s. 73-84 ISSN 1742-2876 Institutional support: RVO:67985556 Keywords : Image forensic * Copy–paste forgery * Local block matching Subject RIV: IN - Informatics, Computer Science Impact factor: 0.986, year: 2013 http://library.utia.cas.cz/separaty/2013/ZOI/mahdian-efficient image duplicated region detection model using sequential block clustering.pdf

  13. Clustering and Candidate Motif Detection in Exosomal miRNAs by Application of Machine Learning Algorithms.

    Science.gov (United States)

    Gaur, Pallavi; Chaturvedi, Anoop

    2017-07-22

    The clustering pattern and motifs give immense information about any biological data. An application of machine learning algorithms for clustering and candidate motif detection in miRNAs derived from exosomes is depicted in this paper. Recent progress in the field of exosome research and more particularly regarding exosomal miRNAs has led much bioinformatic-based research to come into existence. The information on clustering pattern and candidate motifs in miRNAs of exosomal origin would help in analyzing existing, as well as newly discovered miRNAs within exosomes. Along with obtaining clustering pattern and candidate motifs in exosomal miRNAs, this work also elaborates the usefulness of the machine learning algorithms that can be efficiently used and executed on various programming languages/platforms. Data were clustered and sequence candidate motifs were detected successfully. The results were compared and validated with some available web tools such as 'BLASTN' and 'MEME suite'. The machine learning algorithms for aforementioned objectives were applied successfully. This work elaborated utility of machine learning algorithms and language platforms to achieve the tasks of clustering and candidate motif detection in exosomal miRNAs. With the information on mentioned objectives, deeper insight would be gained for analyses of newly discovered miRNAs in exosomes which are considered to be circulating biomarkers. In addition, the execution of machine learning algorithms on various language platforms gives more flexibility to users to try multiple iterations according to their requirements. This approach can be applied to other biological data-mining tasks as well.

  14. An Extended Affinity Propagation Clustering Method Based on Different Data Density Types

    Directory of Open Access Journals (Sweden)

    XiuLi Zhao

    2015-01-01

    Full Text Available Affinity propagation (AP algorithm, as a novel clustering method, does not require the users to specify the initial cluster centers in advance, which regards all data points as potential exemplars (cluster centers equally and groups the clusters totally by the similar degree among the data points. But in many cases there exist some different intensive areas within the same data set, which means that the data set does not distribute homogeneously. In such situation the AP algorithm cannot group the data points into ideal clusters. In this paper, we proposed an extended AP clustering algorithm to deal with such a problem. There are two steps in our method: firstly the data set is partitioned into several data density types according to the nearest distances of each data point; and then the AP clustering method is, respectively, used to group the data points into clusters in each data density type. Two experiments are carried out to evaluate the performance of our algorithm: one utilizes an artificial data set and the other uses a real seismic data set. The experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithm itself.

  15. How to detect trap cluster systems?

    International Nuclear Information System (INIS)

    Mandowski, Arkadiusz

    2008-01-01

    Spatially correlated traps and recombination centres (trap-recombination centre pairs and larger clusters) are responsible for many anomalous phenomena that are difficult to explain in the framework of both classical models, i.e. model of localized transitions (LT) and the simple trap model (STM), even with a number of discrete energy levels. However, these 'anomalous' effects may provide a good platform for identifying trap cluster systems. This paper considers selected cluster-type effects, mainly relating to an anomalous dependence of TL on absorbed dose in the system of isolated clusters (ICs). Some consequences for interacting cluster (IAC) systems, involving both localized and delocalized transitions occurring simultaneously, are also discussed

  16. Minimal disease detection of B-cell lymphoproliferative disorders by flow cytometry: multidimensional cluster analysis.

    Science.gov (United States)

    Duque, Ricardo E

    2012-04-01

    Flow cytometric analysis of cell suspensions involves the sequential 'registration' of intrinsic and extrinsic parameters of thousands of cells in list mode files. Thus, it is almost irresistible to describe phenomena in numerical terms or by 'ratios' that have the appearance of 'accuracy' due to the presence of numbers obtained from thousands of cells. The concepts involved in the detection and characterization of B cell lymphoproliferative processes are revisited in this paper by identifying parameters that, when analyzed appropriately, are both necessary and sufficient. The neoplastic process (cluster) can be visualized easily because the parameters that distinguish it form a cluster in multidimensional space that is unique and distinguishable from neighboring clusters that are not of diagnostic interest but serve to provide a background. For B cell neoplasia it is operationally necessary to identify the multidimensional space occupied by a cluster whose kappa:lambda ratio is 100:0 or 0:100. Thus, the concept of kappa:lambda ratio is without meaning and would not detect B cell neoplasia in an unacceptably high number of cases.

  17. Comparison of floods non-stationarity detection methods: an Austrian case study

    Science.gov (United States)

    Salinas, Jose Luis; Viglione, Alberto; Blöschl, Günter

    2016-04-01

    Non-stationarities in flood regimes have a huge impact in any mid and long term flood management strategy. In particular the estimation of design floods is very sensitive to any kind of flood non-stationarity, as they should be linked to a return period, concept that can be ill defined in a non-stationary context. Therefore it is crucial when analyzing existent flood time series to detect and, where possible, attribute flood non-stationarities to changing hydroclimatic and land-use processes. This works presents the preliminary results of applying different non-stationarity detection methods on annual peak discharges time series over more than 400 gauging stations in Austria. The kind of non-stationarities analyzed include trends (linear and non-linear), breakpoints, clustering beyond stochastic randomness, and detection of flood rich/flood poor periods. Austria presents a large variety of landscapes, elevations and climates that allow us to interpret the spatial patterns obtained with the non-stationarity detection methods in terms of the dominant flood generation mechanisms.

  18. Supported silver clusters as nanoplasmonic transducers for protein sensing

    DEFF Research Database (Denmark)

    Fojan, Peter; Hanif, Muhammad; Bartling, Stephen

    2015-01-01

    Transducers for optical sensing of proteins are prepared using cluster beam deposition on quartz substrates. Surface plasmon resonance phenomenon of the supported silver clusters is used for the detection. It is shown that surface immobilisation procedure providing adhesion of the silver clusters...... stages and protein immobilisation scheme the sensing of protein of interest can be assured using a relatively simple optical spectroscopy method....... an enhancement of the plasmon absorption band used for the detection. Atomic force microscopy study allows to suggest that immobilisation of antibodies on silver clusters has been achieved, thus giving a possibility to incubate and detect an antigen of interest. Hence, by applying the developed preparation...

  19. Swarm: robust and fast clustering method for amplicon-based studies

    Science.gov (United States)

    Rognes, Torbjørn; Quince, Christopher; de Vargas, Colomban; Dunthorn, Micah

    2014-01-01

    Popular de novo amplicon clustering methods suffer from two fundamental flaws: arbitrary global clustering thresholds, and input-order dependency induced by centroid selection. Swarm was developed to address these issues by first clustering nearly identical amplicons iteratively using a local threshold, and then by using clusters’ internal structure and amplicon abundances to refine its results. This fast, scalable, and input-order independent approach reduces the influence of clustering parameters and produces robust operational taxonomic units. PMID:25276506

  20. Swarm: robust and fast clustering method for amplicon-based studies

    Directory of Open Access Journals (Sweden)

    Frédéric Mahé

    2014-09-01

    Full Text Available Popular de novo amplicon clustering methods suffer from two fundamental flaws: arbitrary global clustering thresholds, and input-order dependency induced by centroid selection. Swarm was developed to address these issues by first clustering nearly identical amplicons iteratively using a local threshold, and then by using clusters’ internal structure and amplicon abundances to refine its results. This fast, scalable, and input-order independent approach reduces the influence of clustering parameters and produces robust operational taxonomic units.

  1. Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees.

    Science.gov (United States)

    Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H

    2017-10-25

    Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.

  2. An incremental DPMM-based method for trajectory clustering, modeling, and retrieval.

    Science.gov (United States)

    Hu, Weiming; Li, Xi; Tian, Guodong; Maybank, Stephen; Zhang, Zhongfei

    2013-05-01

    Trajectory analysis is the basis for many applications, such as indexing of motion events in videos, activity recognition, and surveillance. In this paper, the Dirichlet process mixture model (DPMM) is applied to trajectory clustering, modeling, and retrieval. We propose an incremental version of a DPMM-based clustering algorithm and apply it to cluster trajectories. An appropriate number of trajectory clusters is determined automatically. When trajectories belonging to new clusters arrive, the new clusters can be identified online and added to the model without any retraining using the previous data. A time-sensitive Dirichlet process mixture model (tDPMM) is applied to each trajectory cluster for learning the trajectory pattern which represents the time-series characteristics of the trajectories in the cluster. Then, a parameterized index is constructed for each cluster. A novel likelihood estimation algorithm for the tDPMM is proposed, and a trajectory-based video retrieval model is developed. The tDPMM-based probabilistic matching method and the DPMM-based model growing method are combined to make the retrieval model scalable and adaptable. Experimental comparisons with state-of-the-art algorithms demonstrate the effectiveness of our algorithm.

  3. Community detection by graph Voronoi diagrams

    Science.gov (United States)

    Deritei, Dávid; Lázár, Zsolt I.; Papp, István; Járai-Szabó, Ferenc; Sumi, Róbert; Varga, Levente; Ravasz Regan, Erzsébet; Ercsey-Ravasz, Mária

    2014-06-01

    Accurate and efficient community detection in networks is a key challenge for complex network theory and its applications. The problem is analogous to cluster analysis in data mining, a field rich in metric space-based methods. Common to these methods is a geometric, distance-based definition of clusters or communities. Here we propose a new geometric approach to graph community detection based on graph Voronoi diagrams. Our method serves as proof of principle that the definition of appropriate distance metrics on graphs can bring a rich set of metric space-based clustering methods to network science. We employ a simple edge metric that reflects the intra- or inter-community character of edges, and a graph density-based rule to identify seed nodes of Voronoi cells. Our algorithm outperforms most network community detection methods applicable to large networks on benchmark as well as real-world networks. In addition to offering a computationally efficient alternative for community detection, our method opens new avenues for adapting a wide range of data mining algorithms to complex networks from the class of centroid- and density-based clustering methods.

  4. Transmitted ion energy loss distributions to detect cluster formation in silicon

    International Nuclear Information System (INIS)

    Selen, L.J.M.; Loon, A. van; IJzendoorn, L.J. van; Voigt, M.J.A. de

    2002-01-01

    The energy loss distribution of ions transmitted through a 5.7±0.2 μm thick Si crystal was measured and simulated with the Monte Carlo channeling simulation code FLUX. A general resemblance between the measured and simulated energy loss distributions was obtained after incorporation of an energy dependent energy loss in the simulation program. The energy loss calculations are used to investigate the feasibility to detect the presence of light element dopant clusters in a host crystal from the shape of the energy loss distribution, with transmission ion channeling. A curved crystal structure is used as a model for a region in the host crystal with clusters. The presence of the curvature does have a large influence on the transmitted energy distribution, which offers the possibility to determine the presence of dopant clusters in a host crystal with transmission ion channeling

  5. Relation between financial market structure and the real economy: comparison between clustering methods.

    Science.gov (United States)

    Musmeci, Nicoló; Aste, Tomaso; Di Matteo, T

    2015-01-01

    We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing the clustering structure with the underlying industrial activity classification. We apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. By taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover,we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging [corrected].

  6. Relation between financial market structure and the real economy: comparison between clustering methods.

    Directory of Open Access Journals (Sweden)

    Nicoló Musmeci

    Full Text Available We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing the clustering structure with the underlying industrial activity classification. We apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. By taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover,we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging [corrected].

  7. Detection of structural defects in lecithin membranes by the small-angle neutron scattering method

    International Nuclear Information System (INIS)

    Bezzabotnov, V.Yu.; Gordelij, V.I.; Ostanevich, Yu.M.; Yaguzhinskij, L.S.

    1989-01-01

    Irregularities interpreted as interdomain defects have been detected in model lipid membranes of dipalmitoil lecithin in liquid L α -phase by the method of small-angle scattering (lateral diffraction). The dimensions and concentrations of the defects were about those supposed within the dynamic cluster model of bilayer (Ivkov, 1984). No irregularities were detected in the solid Lβ ' -phase (the diffusion scattering intensity was at least ten times less)

  8. Symmetrized partial-wave method for density-functional cluster calculations

    International Nuclear Information System (INIS)

    Averill, F.W.; Painter, G.S.

    1994-01-01

    The computational advantage and accuracy of the Harris method is linked to the simplicity and adequacy of the reference-density model. In an earlier paper, we investigated one way the Harris functional could be extended to systems outside the limits of weakly interacting atoms by making the charge density of the interacting atoms self-consistent within the constraints of overlapping spherical atomic densities. In the present study, a method is presented for augmenting the interacting atom charge densities with symmetrized partial-wave expansions on each atomic site. The added variational freedom of the partial waves leads to a scheme capable of giving exact results within a given exchange-correlation approximation while maintaining many of the desirable convergence and stability properties of the original Harris method. Incorporation of the symmetry of the cluster in the partial-wave construction further reduces the level of computational effort. This partial-wave cluster method is illustrated by its application to the dimer C 2 , the hypothetical atomic cluster Fe 6 Al 8 , and the benzene molecule

  9. Maximum-likelihood model averaging to profile clustering of site types across discrete linear sequences.

    Directory of Open Access Journals (Sweden)

    Zhang Zhang

    2009-06-01

    Full Text Available A major analytical challenge in computational biology is the detection and description of clusters of specified site types, such as polymorphic or substituted sites within DNA or protein sequences. Progress has been stymied by a lack of suitable methods to detect clusters and to estimate the extent of clustering in discrete linear sequences, particularly when there is no a priori specification of cluster size or cluster count. Here we derive and demonstrate a maximum likelihood method of hierarchical clustering. Our method incorporates a tripartite divide-and-conquer strategy that models sequence heterogeneity, delineates clusters, and yields a profile of the level of clustering associated with each site. The clustering model may be evaluated via model selection using the Akaike Information Criterion, the corrected Akaike Information Criterion, and the Bayesian Information Criterion. Furthermore, model averaging using weighted model likelihoods may be applied to incorporate model uncertainty into the profile of heterogeneity across sites. We evaluated our method by examining its performance on a number of simulated datasets as well as on empirical polymorphism data from diverse natural alleles of the Drosophila alcohol dehydrogenase gene. Our method yielded greater power for the detection of clustered sites across a breadth of parameter ranges, and achieved better accuracy and precision of estimation of clusters, than did the existing empirical cumulative distribution function statistics.

  10. Fluorescence detection of a protein-bound 2Fe2S cluster.

    Science.gov (United States)

    Hoff, Kevin G; Goodlitt, Rochelle; Li, Rui; Smolke, Christina D; Silberg, Jonathan J

    2009-03-02

    A fluorescent biosensor is described for 2Fe2S clusters that is composed of green fluorescent protein (GFP) fused to glutaredoxin 2 (Grx2), as illustrated here. 2Fe2S detection is based on the reduction of GFP fluorescence upon the 2Fe2S-induced dimerization of GFP-Grx2. This assay is sufficiently sensitive to detect submicromolar changes in 2Fe2S levels, thus making it suitable for high-throughput measurements of metallocluster degradation and synthesis reactions.

  11. Lagrangian based methods for coherent structure detection

    Energy Technology Data Exchange (ETDEWEB)

    Allshouse, Michael R., E-mail: mallshouse@chaos.utexas.edu [Center for Nonlinear Dynamics and Department of Physics, University of Texas at Austin, Austin, Texas 78712 (United States); Peacock, Thomas, E-mail: tomp@mit.edu [Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139 (United States)

    2015-09-15

    There has been a proliferation in the development of Lagrangian analytical methods for detecting coherent structures in fluid flow transport, yielding a variety of qualitatively different approaches. We present a review of four approaches and demonstrate the utility of these methods via their application to the same sample analytic model, the canonical double-gyre flow, highlighting the pros and cons of each approach. Two of the methods, the geometric and probabilistic approaches, are well established and require velocity field data over the time interval of interest to identify particularly important material lines and surfaces, and influential regions, respectively. The other two approaches, implementing tools from cluster and braid theory, seek coherent structures based on limited trajectory data, attempting to partition the flow transport into distinct regions. All four of these approaches share the common trait that they are objective methods, meaning that their results do not depend on the frame of reference used. For each method, we also present a number of example applications ranging from blood flow and chemical reactions to ocean and atmospheric flows.

  12. An unexpected detection of bifurcated blue straggler sequences in the young globular cluster NGC 2173

    OpenAIRE

    Li, Chengyuan; Deng, Licai; de Grijs, Richard; Jiang, Dengkai; Xin, Yu

    2018-01-01

    Bifurcated patterns of blue straggler stars in their color--magnitude diagrams have atracted significant attention. This type of special (but rare) pattern of two distinct blue straggler sequences is commonly interpreted as evidence of cluster core-collapse-driven stellar collisions as an efficient formation mechanism. Here, we report the detection of a bifurcated blue straggler distribution in a young Large MagellanicCloud cluster, NGC 2173. Because of the cluster's low central stellar numbe...

  13. Toward the detection of pure carbon clusters in the Interstellar Medium (ISM)

    Science.gov (United States)

    Heath, J. R.; Van Orden, A.; Hwang, H. J.; Kuo, E. W.; Tanaka, K.; Saykally, R. J.

    1995-01-01

    Determination of the form and distribution of carbon in the universe is critical to understanding the origin of life on Earth and elsewhere. Two potentially large reservoirs of carbon in the interstellar medium (ISM) remain unexplored. These are polycyclic aromatic hydrocarbons (PAH) and pure carbon clusters. Little information exists on the structures, properties, and transition frequencies of pure carbon clusters. The work described is designed to provide a specific inventory of laboratory frequencies and physical properties of this carbon clusters so that efforts can be made to detect them in cold interstellar sources by far-infrared astronomy. Data is given from infrared laser spectroscopy determination of the structure of C3, C4, C5, C6, C7, and C9.

  14. A Latent Variable Clustering Method for Wireless Sensor Networks

    DEFF Research Database (Denmark)

    Vasilev, Vladislav; Iliev, Georgi; Poulkov, Vladimir

    2016-01-01

    In this paper we derive a clustering method based on the Hidden Conditional Random Field (HCRF) model in order to maximizes the performance of a wireless sensor. Our novel approach to clustering in this paper is in the application of an index invariant graph that we defined in a previous work and...

  15. The Views of Turkish Pre-Service Teachers about Effectiveness of Cluster Method as a Teaching Writing Method

    Science.gov (United States)

    Kitis, Emine; Türkel, Ali

    2017-01-01

    The aim of this study is to find out Turkish pre-service teachers' views on effectiveness of cluster method as a writing teaching method. The Cluster Method can be defined as a connotative creative writing method. The way the method works is that the person who brainstorms on connotations of a word or a concept in abscence of any kind of…

  16. Improvement of economic potential estimation methods for enterprise with potential branch clusters use

    Directory of Open Access Journals (Sweden)

    V.Ya. Nusinov

    2017-08-01

    Full Text Available The research determines that the current existing methods of enterprise’s economic potential estimation are based on the use of additive, multiplicative and rating models. It is determined that the existing methods have a row of defects. For example, not all the methods take into account the branch features of the analysis, and also the level of development of the enterprise comparatively with other enterprises. It is suggested to level such defects by an account at the estimation of potential integral level not only by branch features of enterprises activity but also by the intra-account economic clusterization of such enterprises. Scientific works which are connected with the using of clusters for the estimation of economic potential are generalized. According to the results of generalization it is determined that it is possible to distinguish 9 scientific approaches in this direction: the use of natural clusterization of enterprises with the purpose of estimation and increase of region potential; the use of natural clusterization of enterprises with the purpose of estimation and increase of industry potential; use of artificial clusterization of enterprises with the purpose of estimation and increase of region potential; use of artificial clusterization of enterprises with the purpose of estimation and increase of industry potential; the use of artificial clusterization of enterprises with the purpose of clustering potential estimation; the use of artificial clusterization of enterprises with the purpose of estimation of clustering competitiveness potential; the use of natural (artificial clusterization for the estimation of clustering efficiency; the use of natural (artificial clusterization for the increase of level at region (industries development; the use of methods of economic potential of region (industries estimation or its constituents for the construction of the clusters. It is determined that the use of clusterization method in

  17. Detection of enhancement in number densities of background galaxies due to magnification by massive galaxy clusters

    Energy Technology Data Exchange (ETDEWEB)

    Chiu, I.; Dietrich, J. P.; Mohr, J.; Applegate, D. E.; Benson, B. A.; Bleem, L. E.; Bayliss, M. B.; Bocquet, S.; Carlstrom, J. E.; Capasso, R.; Desai, S.; Gangkofner, C.; Gonzalez, A. H.; Gupta, N.; Hennig, C.; Hoekstra, H.; von der Linden, A.; Liu, J.; McDonald, M.; Reichardt, C. L.; Saro, A.; Schrabback, T.; Strazzullo, V.; Stubbs, C. W.; Zenteno, A.

    2016-02-18

    We present a detection of the enhancement in the number densities of background galaxies induced from lensing magnification and use it to test the Sunyaev-Zel'dovich effect (SZE-) inferred masses in a sample of 19 galaxy clusters with median redshift z similar or equal to 0.42 selected from the South Pole Telescope SPT-SZ survey. These clusters are observed by the Megacam on the Magellan Clay Telescope though gri filters. Two background galaxy populations are selected for this study through their photometric colours; they have median redshifts zmedian similar or equal to 0.9 (low-z background) and z(median) similar or equal to 1.8 (high-z background). Stacking these populations, we detect the magnification bias effect at 3.3 sigma and 1.3 sigma for the low-and high-z backgrounds, respectively. We fit Navarro, Frenk and White models simultaneously to all observed magnification bias profiles to estimate the multiplicative factor. that describes the ratio of the weak lensing mass to the mass inferred from the SZE observable-mass relation. We further quantify systematic uncertainties in. resulting from the photometric noise and bias, the cluster galaxy contamination and the estimations of the background properties. The resulting. for the combined background populations with 1 sigma uncertainties is 0.83 +/- 0.24(stat) +/- 0.074(sys), indicating good consistency between the lensing and the SZE-inferred masses. We use our best-fitting eta to predict the weak lensing shear profiles and compare these predictions with observations, showing agreement between the magnification and shear mass constraints. This work demonstrates the promise of using the magnification as a complementary method to estimate cluster masses in large surveys.

  18. Kernel method for clustering based on optimal target vector

    International Nuclear Information System (INIS)

    Angelini, Leonardo; Marinazzo, Daniele; Pellicoro, Mario; Stramaglia, Sebastiano

    2006-01-01

    We introduce Ising models, suitable for dichotomic clustering, with couplings that are (i) both ferro- and anti-ferromagnetic (ii) depending on the whole data-set and not only on pairs of samples. Couplings are determined exploiting the notion of optimal target vector, here introduced, a link between kernel supervised and unsupervised learning. The effectiveness of the method is shown in the case of the well-known iris data-set and in benchmarks of gene expression levels, where it works better than existing methods for dichotomic clustering

  19. Developing cluster strategy of apples dodol SMEs by integration K-means clustering and analytical hierarchy process method

    Science.gov (United States)

    Mustaniroh, S. A.; Effendi, U.; Silalahi, R. L. R.; Sari, T.; Ala, M.

    2018-03-01

    The purposes of this research were to determine the grouping of apples dodol small and medium enterprises (SMEs) in Batu City and to determine an appropriate development strategy for each cluster. The methods used for clustering SMEs was k-means. The Analytical Hierarchy Process (AHP) approach was then applied to determine the development strategy priority for each cluster. The variables used in grouping include production capacity per month, length of operation, investment value, average sales revenue per month, amount of SMEs assets, and the number of workers. Several factors were considered in AHP include industry cluster, government, as well as related and supporting industries. Data was collected using the methods of questionaire and interviews. SMEs respondents were selected among SMEs appels dodol in Batu City using purposive sampling. The result showed that two clusters were formed from five apples dodol SMEs. The 1stcluster of apples dodol SMEs, classified as small enterprises, included SME A, SME C, and SME D. The 2ndcluster of SMEs apples dodol, classified as medium enterprises, consisted of SME B and SME E. The AHP results indicated that the priority development strategy for the 1stcluster of apples dodol SMEs was improving quality and the product standardisation, while for the 2nd cluster was increasing the marketing access.

  20. In vivo fluorescent detection of Fe-S clusters coordinated by human GRX2.

    Science.gov (United States)

    Hoff, Kevin G; Culler, Stephanie J; Nguyen, Peter Q; McGuire, Ryan M; Silberg, Jonathan J; Smolke, Christina D

    2009-12-24

    A major challenge to studying Fe-S cluster biosynthesis in higher eukaryotes is the lack of simple tools for imaging metallocluster binding to proteins. We describe the first fluorescent approach for in vivo detection of 2Fe2S clusters that is based upon the complementation of Venus fluorescent protein fragments via human glutaredoxin 2 (GRX2) coordination of a 2Fe2S cluster. We show that Escherichia coli and mammalian cells expressing Venus fragments fused to GRX2 exhibit greater fluorescence than cells expressing fragments fused to a C37A mutant that cannot coordinate a metallocluster. In addition, we find that maximal fluorescence in the cytosol of mammalian cells requires the iron-sulfur cluster assembly proteins ISCU and NFS1. These findings provide evidence that glutaredoxins can dimerize within mammalian cells through coordination of a 2Fe2S cluster as observed with purified recombinant proteins. Copyright 2009 Elsevier Ltd. All rights reserved.

  1. Feature selection for anomaly–based network intrusion detection using cluster validity indices

    CSIR Research Space (South Africa)

    Naidoo, T

    2015-09-01

    Full Text Available for Anomaly–Based Network Intrusion Detection Using Cluster Validity Indices Tyrone Naidoo_, Jules–Raymond Tapamoy, Andre McDonald_ Modelling and Digital Science, Council for Scientific and Industrial Research, South Africa 1tnaidoo2@csir.co.za 3...

  2. Developing a Clustering-Based Empirical Bayes Analysis Method for Hotspot Identification

    Directory of Open Access Journals (Sweden)

    Yajie Zou

    2017-01-01

    Full Text Available Hotspot identification (HSID is a critical part of network-wide safety evaluations. Typical methods for ranking sites are often rooted in using the Empirical Bayes (EB method to estimate safety from both observed crash records and predicted crash frequency based on similar sites. The performance of the EB method is highly related to the selection of a reference group of sites (i.e., roadway segments or intersections similar to the target site from which safety performance functions (SPF used to predict crash frequency will be developed. As crash data often contain underlying heterogeneity that, in essence, can make them appear to be generated from distinct subpopulations, methods are needed to select similar sites in a principled manner. To overcome this possible heterogeneity problem, EB-based HSID methods that use common clustering methodologies (e.g., mixture models, K-means, and hierarchical clustering to select “similar” sites for building SPFs are developed. Performance of the clustering-based EB methods is then compared using real crash data. Here, HSID results, when computed on Texas undivided rural highway cash data, suggest that all three clustering-based EB analysis methods are preferred over the conventional statistical methods. Thus, properly classifying the road segments for heterogeneous crash data can further improve HSID accuracy.

  3. Mixture model-based clustering and logistic regression for automatic detection of microaneurysms in retinal images

    Science.gov (United States)

    Sánchez, Clara I.; Hornero, Roberto; Mayo, Agustín; García, María

    2009-02-01

    Diabetic Retinopathy is one of the leading causes of blindness and vision defects in developed countries. An early detection and diagnosis is crucial to avoid visual complication. Microaneurysms are the first ocular signs of the presence of this ocular disease. Their detection is of paramount importance for the development of a computer-aided diagnosis technique which permits a prompt diagnosis of the disease. However, the detection of microaneurysms in retinal images is a difficult task due to the wide variability that these images usually present in screening programs. We propose a statistical approach based on mixture model-based clustering and logistic regression which is robust to the changes in the appearance of retinal fundus images. The method is evaluated on the public database proposed by the Retinal Online Challenge in order to obtain an objective performance measure and to allow a comparative study with other proposed algorithms.

  4. Consensus of satellite cluster flight using an energy-matching optimal control method

    Science.gov (United States)

    Luo, Jianjun; Zhou, Liang; Zhang, Bo

    2017-11-01

    This paper presents an optimal control method for consensus of satellite cluster flight under a kind of energy matching condition. Firstly, the relation between energy matching and satellite periodically bounded relative motion is analyzed, and the satellite energy matching principle is applied to configure the initial conditions. Then, period-delayed errors are adopted as state variables to establish the period-delayed errors dynamics models of a single satellite and the cluster. Next a novel satellite cluster feedback control protocol with coupling gain is designed, so that the satellite cluster periodically bounded relative motion consensus problem (period-delayed errors state consensus problem) is transformed to the stability of a set of matrices with the same low dimension. Based on the consensus region theory in the research of multi-agent system consensus issues, the coupling gain can be obtained to satisfy the requirement of consensus region and decouple the satellite cluster information topology and the feedback control gain matrix, which can be determined by Linear quadratic regulator (LQR) optimal method. This method can realize the consensus of satellite cluster period-delayed errors, leading to the consistency of semi-major axes (SMA) and the energy-matching of satellite cluster. Then satellites can emerge the global coordinative cluster behavior. Finally the feasibility and effectiveness of the present energy-matching optimal consensus for satellite cluster flight is verified through numerical simulations.

  5. Kinetic methods for measuring the temperature of clusters and nanoparticles in molecular beams

    International Nuclear Information System (INIS)

    Makarov, Grigorii N

    2011-01-01

    The temperature (internal energy) of clusters and nanoparticles is an important physical parameter which affects many of their properties and the character of processes they are involved in. At the same time, determining the temperature of free clusters and nanoparticles in molecular beams is a rather complicated problem because the temperature of small particles depends on their size. In this paper, recently developed kinetic methods for measuring the temperature of clusters and nanoparticles in molecular beams are reviewed. The definition of temperature in the present context is given, and how the temperature affects the properties of and the processes involving the particles is discussed. The temperature behavior of clusters and nanoparticles near a phase transition point is analyzed. Early methods for measuring the temperature of large clusters are briefly described. It is shown that, compared to other methods, new kinetic methods are more universal and applicable for determining the temperature of clusters and nanoparticles of practically any size and composition. The future development and applications of these methods are outlined. (reviews of topical problems)

  6. A semantics-based method for clustering of Chinese web search results

    Science.gov (United States)

    Zhang, Hui; Wang, Deqing; Wang, Li; Bi, Zhuming; Chen, Yong

    2014-01-01

    Information explosion is a critical challenge to the development of modern information systems. In particular, when the application of an information system is over the Internet, the amount of information over the web has been increasing exponentially and rapidly. Search engines, such as Google and Baidu, are essential tools for people to find the information from the Internet. Valuable information, however, is still likely submerged in the ocean of search results from those tools. By clustering the results into different groups based on subjects automatically, a search engine with the clustering feature allows users to select most relevant results quickly. In this paper, we propose an online semantics-based method to cluster Chinese web search results. First, we employ the generalised suffix tree to extract the longest common substrings (LCSs) from search snippets. Second, we use the HowNet to calculate the similarities of the words derived from the LCSs, and extract the most representative features by constructing the vocabulary chain. Third, we construct a vector of text features and calculate snippets' semantic similarities. Finally, we improve the Chameleon algorithm to cluster snippets. Extensive experimental results have shown that the proposed algorithm has outperformed over the suffix tree clustering method and other traditional clustering methods.

  7. A Negative Selection Algorithm Based on Hierarchical Clustering of Self Set and its Application in Anomaly Detection

    Directory of Open Access Journals (Sweden)

    Wen Chen

    2011-08-01

    Full Text Available A negative selection algorithm based on the hierarchical clustering of self set HC-RNSA is introduced in this paper. Several strategies are applied to improve the algorithm performance. First, the self data set is replaced by the self cluster centers to compare with the detector candidates in each cluster level. As the number of self clusters is much less than the self set size, the detector generation efficiency is improved. Second, during the detector generation process, the detector candidates are restricted to the lower coverage space to reduce detector redundancy. In the article, the problem that the distances between antigens coverage to a constant value in the high dimensional space is analyzed, accordingly the Principle Component Analysis (PCA method is used to reduce the data dimension, and the fractional distance function is employed to enhance the distinctiveness between the self and non-self antigens. The detector generation procedure is terminated when the expected non-self coverage is reached. The theory analysis and experimental results demonstrate that the detection rate of HC-RNSA is higher than that of the traditional negative selection algorithms while the false alarm rate and time cost are reduced.

  8. Dynamic Trajectory Extraction from Stereo Vision Using Fuzzy Clustering

    Science.gov (United States)

    Onishi, Masaki; Yoda, Ikushi

    In recent years, many human tracking researches have been proposed in order to analyze human dynamic trajectory. These researches are general technology applicable to various fields, such as customer purchase analysis in a shopping environment and safety control in a (railroad) crossing. In this paper, we present a new approach for tracking human positions by stereo image. We use the framework of two-stepped clustering with k-means method and fuzzy clustering to detect human regions. In the initial clustering, k-means method makes middle clusters from objective features extracted by stereo vision at high speed. In the last clustering, c-means fuzzy method cluster middle clusters based on attributes into human regions. Our proposed method can be correctly clustered by expressing ambiguity using fuzzy clustering, even when many people are close to each other. The validity of our technique was evaluated with the experiment of trajectories extraction of doctors and nurses in an emergency room of a hospital.

  9. A novel clustering and supervising users' profiles method

    Institute of Scientific and Technical Information of China (English)

    Zhu Mingfu; Zhang Hongbin; Song Fangyun

    2005-01-01

    To better understand different users' accessing intentions, a novel clustering and supervising method based on accessing path is presented. This method divides users' interest space to express the distribution of users' interests, and directly to instruct the constructing process of web pages indexing for advanced performance.

  10. Agent-based method for distributed clustering of textual information

    Science.gov (United States)

    Potok, Thomas E [Oak Ridge, TN; Reed, Joel W [Knoxville, TN; Elmore, Mark T [Oak Ridge, TN; Treadwell, Jim N [Louisville, TN

    2010-09-28

    A computer method and system for storing, retrieving and displaying information has a multiplexing agent (20) that calculates a new document vector (25) for a new document (21) to be added to the system and transmits the new document vector (25) to master cluster agents (22) and cluster agents (23) for evaluation. These agents (22, 23) perform the evaluation and return values upstream to the multiplexing agent (20) based on the similarity of the document to documents stored under their control. The multiplexing agent (20) then sends the document (21) and the document vector (25) to the master cluster agent (22), which then forwards it to a cluster agent (23) or creates a new cluster agent (23) to manage the document (21). The system also searches for stored documents according to a search query having at least one term and identifying the documents found in the search, and displays the documents in a clustering display (80) of similarity so as to indicate similarity of the documents to each other.

  11. Communication: Time-dependent optimized coupled-cluster method for multielectron dynamics

    Science.gov (United States)

    Sato, Takeshi; Pathak, Himadri; Orimo, Yuki; Ishikawa, Kenichi L.

    2018-02-01

    Time-dependent coupled-cluster method with time-varying orbital functions, called time-dependent optimized coupled-cluster (TD-OCC) method, is formulated for multielectron dynamics in an intense laser field. We have successfully derived the equations of motion for CC amplitudes and orthonormal orbital functions based on the real action functional, and implemented the method including double excitations (TD-OCCD) and double and triple excitations (TD-OCCDT) within the optimized active orbitals. The present method is size extensive and gauge invariant, a polynomial cost-scaling alternative to the time-dependent multiconfiguration self-consistent-field method. The first application of the TD-OCC method of intense-laser driven correlated electron dynamics in Ar atom is reported.

  12. Radionuclide identification using subtractive clustering method

    International Nuclear Information System (INIS)

    Farias, Marcos Santana; Mourelle, Luiza de Macedo

    2011-01-01

    Radionuclide identification is crucial to planning protective measures in emergency situations. This paper presents the application of a method for a classification system of radioactive elements with a fast and efficient response. To achieve this goal is proposed the application of subtractive clustering algorithm. The proposed application can be implemented in reconfigurable hardware, a flexible medium to implement digital hardware circuits. (author)

  13. Cluster analysis for DNA methylation profiles having a detection threshold

    Directory of Open Access Journals (Sweden)

    Siegmund Kimberly D

    2006-07-01

    Full Text Available Abstract Background DNA methylation, a molecular feature used to investigate tumor heterogeneity, can be measured on many genomic regions using the MethyLight technology. Due to the combination of the underlying biology of DNA methylation and the MethyLight technology, the measurements, while being generated on a continuous scale, have a large number of 0 values. This suggests that conventional clustering methodology may not perform well on this data. Results We compare performance of existing methodology (such as k-means with two novel methods that explicitly allow for the preponderance of values at 0. We also consider how the ability to successfully cluster such data depends upon the number of informative genes for which methylation is measured and the correlation structure of the methylation values for those genes. We show that when data is collected for a sufficient number of genes, our models do improve clustering performance compared to methods, such as k-means, that do not explicitly respect the supposed biological realities of the situation. Conclusion The performance of analysis methods depends upon how well the assumptions of those methods reflect the properties of the data being analyzed. Differing technologies will lead to data with differing properties, and should therefore be analyzed differently. Consequently, it is prudent to give thought to what the properties of the data are likely to be, and which analysis method might therefore be likely to best capture those properties.

  14. Orthology detection combining clustering and synteny for very large datasets

    OpenAIRE

    Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K.; Prohaska, Sonja J.; Stadler, Peter F.

    2014-01-01

    The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the ...

  15. Recent advances in coupled-cluster methods

    CERN Document Server

    Bartlett, Rodney J

    1997-01-01

    Today, coupled-cluster (CC) theory has emerged as the most accurate, widely applicable approach for the correlation problem in molecules. Furthermore, the correct scaling of the energy and wavefunction with size (i.e. extensivity) recommends it for studies of polymers and crystals as well as molecules. CC methods have also paid dividends for nuclei, and for certain strongly correlated systems of interest in field theory.In order for CC methods to have achieved this distinction, it has been necessary to formulate new, theoretical approaches for the treatment of a variety of essential quantities

  16. N-body modeling of globular clusters: detecting intermediate-mass black holes by non-equipartition in HST proper motions

    Science.gov (United States)

    Trenti, Michele

    2010-09-01

    Intermediate Mass Black Holes {IMBHs} are objects of considerable astrophysical significance. They have been invoked as possible remnants of Population III stars, precursors of supermassive black holes, sources of ultra-luminous X-ray emission, and emitters of gravitational waves. The centers of globular clusters, where they may have formed through runaway collapse of massive stars, may be our best chance of detecting them. HST studies of velocity dispersions have provided tentative evidence, but the measurements are difficult and the results have been disputed. It is thus important to explore and develop additional indicators of the presence of an IMBH in these systems. In a Cycle 16 theory project we focused on the fingerprints of an IMBH derived from HST photometry. We showed that an IMBH leads to a detectable quenching of mass segregation. Analysis of HST-ACS data for NGC 2298 validated the method, and ruled out an IMBH of more than 300 solar masses. We propose here to extend the search for IMBH signatures from photometry to kinematics. The velocity dispersion of stars in collisionally relaxed stellar systems such as globular clusters scales with main sequence mass as sigma m^alpha. A value alpha = -0.5 corresponds to equipartition. Mass-dependent kinematics can now be measured from HST proper motion studies {e.g., alpha = -0.21 for Omega Cen}. Preliminary analysis shows that the value of alpha can be used as indicator of the presence of an IMBH. In fact, the quenching of mass segregation is a result of the degree of equipartition that the system attains. However, detailed numerical simulations are required to quantify this. Therefore we propose {a} to carry out a new, larger set of realistic N-body simulations of star clusters with IMBHs, primordial binaries and stellar evolution to predict in detail the expected kinematic signatures and {b} to compare these predictions to datasets that are {becoming} available. Considerable HST resources have been invested in

  17. Detection of gold cluster ions by ion-to-ion conversion using a CsI-converter

    International Nuclear Information System (INIS)

    Nguyen, V.-T.; Novilkov, A.C.; Obnorskii, V.V.

    1997-01-01

    Gold cluster ions in the m/z range of 10 4 -2 x 10 6 u were produced by bombarding a thin film of gold with 252 Cf-fission fragments. The gold covering a C-Al substrate formed islets having a mean diameter of 44 A. Their size- and mass-distribution was determined by means of electron microscopy. The main task was to measure the m/z distribution of the cluster ions ejected from the sample surface. For this purpose we built a time-of-flight (TOF) mass spectrometer, which could be used as a linear TOF instrument or, alternatively, as a tandem-TOF instrument being equipped with an ion-to-ion converter. Combining the results obtained in both modes, it turned out that the linear TOF instrument equipped with micro-channel plates had a mean detection efficiency for 20 keV cluster ions of about 40%. In the tandem mode, the cluster ions hit a CsI converter with energies of 40z keV (z = charge state), from where secondary ions - mainly Cs + and (CsI) n Cs + cluster ions - were ejected. These ions were used to measure the TOF spectrum of the gold cluster ions. The detection efficiency of the cluster ions was found to vary in the available mass range from 99.7% to 96.5%. The complete mass distribution between 4 x 10 4 and 4 x 10 6 u was determined and compared with the corresponding mass distribution of the gold islets covering the substrate. (orig.)

  18. Multishell method: Exact treatment of a cluster in an effective medium

    International Nuclear Information System (INIS)

    Gonis, A.; Garland, J.W.

    1977-01-01

    A method is presented for the exact determination of the Green's function of a cluster embedded in a given effective medium. This method, the multishell method, is applicable even to systems with off-diagonal disorder, extended-range hopping, multiple bands, and/or hybridization, and is computationally practicable for any system described by a tight-binding or interpolation-scheme Hamiltonian. It allows one to examine the effects of local environment on the densities of states and site spectral weight functions of disordered systems. For any given analytic effective medium characterized by a non-negative density of states the method yields analytic cluster Green's functions and non-negative site spectral weight functions. Previous methods used for the calculation of the Green's function of a cluster embedded in a given effective medium have not been exact. The results of numerical calculations for model systems show that even the best of these previous methods can lead to substantial errors, at least for small clusters in two- and three-dimensional lattices. These results also show that fluctuations in local environment have large effects on site spectral weight functions, even in cases in which the single-site coherent-potential approximation yields an accurate overall density of states

  19. Novel Clustering Method Based on K-Medoids and Mobility Metric

    Directory of Open Access Journals (Sweden)

    Y. Hamzaoui

    2018-06-01

    Full Text Available The structure and constraint of MANETS influence negatively the performance of QoS, moreover the main routing protocols proposed generally operate in flat routing. Hence, this structure gives the bad results of QoS when the network becomes larger and denser. To solve this problem we use one of the most popular methods named clustering. The present paper comes within the frameworks of research to improve the QoS in MANETs. In this paper we propose a new algorithm of clustering based on the new mobility metric and K-Medoid to distribute the nodes into several clusters. Intuitively our algorithm can give good results in terms of stability of the cluster, and can also extend life time of cluster head.

  20. Galaxy clusters in the SDSS Stripe 82 based on photometric redshifts

    International Nuclear Information System (INIS)

    Durret, F.; Adami, C.; Bertin, E.; Hao, J.; Márquez, I.

    2015-01-01

    Based on a recent photometric redshift galaxy catalogue, we have searched for galaxy clusters in the Stripe ~82 region of the Sloan Digital Sky Survey by applying the Adami & MAzure Cluster FInder (AMACFI). Extensive tests were made to fine-tune the AMACFI parameters and make the cluster detection as reliable as possible. The same method was applied to the Millennium simulation to estimate our detection efficiency and the approximate masses of the detected clusters. Considering all the cluster galaxies (i.e. within a 1 Mpc radius of the cluster to which they belong and with a photoz differing by less than 0.05 from that of the cluster), we stacked clusters in various redshift bins to derive colour-magnitude diagrams and galaxy luminosity functions (GLFs). For each galaxy with absolute magnitude brighter than -19.0 in the r band, we computed the disk and spheroid components by applying SExtractor, and by stacking clusters we determined how the disk-to-spheroid flux ratio varies with cluster redshift and mass. We also detected 3663 clusters in the redshift range 0.15< z<0.70, with estimated mean masses between 10"1"3 and a few 10"1"4 solar masses. Furthermore, by stacking the cluster galaxies in various redshift bins, we find a clear red sequence in the (g'-r') versus r' colour-magnitude diagrams, and the GLFs are typical of clusters, though with a possible contamination from field galaxies. The morphological analysis of the cluster galaxies shows that the fraction of late-type to early-type galaxies shows an increase with redshift (particularly in high mass clusters) and a decrease with detection level, i.e. cluster mass. From the properties of the cluster galaxies, the majority of the candidate clusters detected here seem to be real clusters with typical cluster properties.

  1. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis

    Directory of Open Access Journals (Sweden)

    Huanhuan Li

    2017-08-01

    Full Text Available The Shipboard Automatic Identification System (AIS is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW, a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our

  2. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis.

    Science.gov (United States)

    Li, Huanhuan; Liu, Jingxian; Liu, Ryan Wen; Xiong, Naixue; Wu, Kefeng; Kim, Tai-Hoon

    2017-08-04

    The Shipboard Automatic Identification System (AIS) is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW), a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA) is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our proposed method with

  3. Intrinsic alignment in redMaPPer clusters - II. Radial alignment of satellites towards cluster centres

    Science.gov (United States)

    Huang, Hung-Jin; Mandelbaum, Rachel; Freeman, Peter E.; Chen, Yen-Chi; Rozo, Eduardo; Rykoff, Eli

    2018-03-01

    We study the orientations of satellite galaxies in redMaPPer clusters constructed from the Sloan Digital Sky Survey at 0.1 measurement methods (re-Gaussianization, de Vaucouleurs, and isophotal shapes), which trace galaxy light profiles at different radii. The measured SA signal depends on these shape measurement methods. We detect the strongest SA signal in isophotal shapes, followed by de Vaucouleurs shapes. While no net SA signal is detected using re-Gaussianization shapes across the entire sample, the observed SA signal reaches a statistically significant level when limiting to a subsample of higher luminosity satellites. We further investigate the impact of noise, systematics, and real physical isophotal twisting effects in the comparison between the SA signal detected via different shape measurement methods. Unlike previous studies, which only consider the dependence of SA on a few parameters, here we explore a total of 17 galaxy and cluster properties, using a statistical model averaging technique to naturally account for parameter correlations and identify significant SA predictors. We find that the measured SA signal is strongest for satellites with the following characteristics: higher luminosity, smaller distance to the cluster centre, rounder in shape, higher bulge fraction, and distributed preferentially along the major axis directions of their centrals. Finally, we provide physical explanations for the identified dependences and discuss the connection to theories of SA.

  4. A parameter-free community detection method based on centrality and dispersion of nodes in complex networks

    Science.gov (United States)

    Li, Yafang; Jia, Caiyan; Yu, Jian

    2015-11-01

    K-means is a simple and efficient clustering algorithm to detect communities in networks. However, it may suffer from a bad choice of initial seeds (also called centers) that seriously affect the clustering accuracy and the convergence rate. Additionally, in K-means, the number of communities should be specified in advance. Till now, it is still an open problem on how to select initial seeds and how to determine the number of communities. In this study, a new parameter-free community detection method (named K-rank-D) was proposed. First, based on the fact that good initial seeds usually have high importance and are dispersedly located in a network, we proposed a modified PageRank centrality to evaluate the importance of a node, and drew a decision graph to depict the importance and the dispersion of nodes. Then, the initial seeds and the number of communities were selected from the decision graph actively and intuitively as the 'start' parameter of K-means. Experimental results on synthetic and real-world networks demonstrate the superior performance of our approach over competing methods for community detection.

  5. CORECLUSTER: A Degeneracy Based Graph Clustering Framework

    OpenAIRE

    Giatsidis , Christos; Malliaros , Fragkiskos; Thilikos , Dimitrios M. ,; Vazirgiannis , Michalis

    2014-01-01

    International audience; Graph clustering or community detection constitutes an important task forinvestigating the internal structure of graphs, with a plethora of applications in several domains. Traditional tools for graph clustering, such asspectral methods, typically suffer from high time and space complexity. In thisarticle, we present \\textsc{CoreCluster}, an efficient graph clusteringframework based on the concept of graph degeneracy, that can be used along withany known graph clusteri...

  6. Development of techniques using DNA analysis method for detection/analysis of radiation-induced mutation. Development of an useful probe/primer and improvement of detection efficacy

    International Nuclear Information System (INIS)

    Maekawa, Hideaki; Tsuchida, Kozo; Hashido, Kazuo; Takada, Naoko; Kameoka, Yosuke; Hirata, Makoto

    1999-01-01

    Previously, it was demonstrated that detection of centromere became easy and reliable through fluorescent staining by FISH method using a probe of the sequence preserved in α-satelite DNA. Since it was, however, found inappropriate to detect dicentrics based on the relative amount of DNA probe on each chromosome. A prove which allows homogeneous detection of α-satelite DNA for each chromosome was constructed. A presumed sequence specific to kinetochore, CENP-B box was amplified by PCR method and the product DNA was used as a probe. However, the variation in amounts of probe DNA among chromosomes was decreased by only about 20%. Then, a program for image processing of the results obtained from FISH using α-satelite DNA was constructed to use as a marker for centromere. When compared with detection of abnormal chromosomes stained by the conventional method, calculation efficacy for only detection of centromere was improved by the use of this program. Calculation to discriminate the normal or not was still complicated and the detection efficacy was little improved. Chromosomal abnormalities in lymphocytes were used to detect the effects of radiation. In this method, it is needed to shift the phase of cells into metaphase. The mutation induced by radiation might be often repaired during shifting. To exclude this possibility, DNA extraction was conducted at a low temperature and immediately after exposure to 137 Cs, and a rapid genome detection method was established using the genome DNA. As the model genomes, the following three were used: 1) long chain repeated sequences widely dispersed over chromosome, 2) cluster genes, 3) single copy genes. The effects of radiation were detectable at 1-2 Gy for the long repeated sequences and at 7 Gy for the cluster genes, respectively, whereas no significant effects were observed at any Gy tested for the single copy genes. Amplification was marked in the cells exposed at 1-10 Gy (peak at 4 Gy), suggesting that these regions had

  7. A PVC/polypyrrole sensor designed for beef taste detection using electrochemical methods and sensory evaluation.

    Science.gov (United States)

    Zhu, Lingtao; Wang, Xiaodan; Han, Yunxiu; Cai, Yingming; Jin, Jiahui; Wang, Hongmei; Xu, Liping; Wu, Ruijia

    2018-03-01

    An electrochemical sensor for detection of beef taste was designed in this study. This sensor was based on the structure of polyvinyl chloride/polypyrrole (PVC/PPy), which was polymerized onto the surface of a platinum (Pt) electrode to form a Pt-PPy-PVC film. Detecting by electrochemical methods, the sensor was well characterized by electrochemical impedance spectroscopy (EIS) and cyclic voltammetry (CV). The sensor was applied to detect 10 rib-eye beef samples and the accuracy of the new sensor was validated by sensory evaluation and ion sensor detection. Several cluster analysis methods were used in the study to distinguish the beef samples. According to the obtained results, the designed sensor showed a high degree of association of electrochemical detection and sensory evaluation, which proved a fast and precise sensor for beef taste detection. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. A novel community detection method in bipartite networks

    Science.gov (United States)

    Zhou, Cangqi; Feng, Liang; Zhao, Qianchuan

    2018-02-01

    Community structure is a common and important feature in many complex networks, including bipartite networks, which are used as a standard model for many empirical networks comprised of two types of nodes. In this paper, we propose a two-stage method for detecting community structure in bipartite networks. Firstly, we extend the widely-used Louvain algorithm to bipartite networks. The effectiveness and efficiency of the Louvain algorithm have been proved by many applications. However, there lacks a Louvain-like algorithm specially modified for bipartite networks. Based on bipartite modularity, a measure that extends unipartite modularity and that quantifies the strength of partitions in bipartite networks, we fill the gap by developing the Bi-Louvain algorithm that iteratively groups the nodes in each part by turns. This algorithm in bipartite networks often produces a balanced network structure with equal numbers of two types of nodes. Secondly, for the balanced network yielded by the first algorithm, we use an agglomerative clustering method to further cluster the network. We demonstrate that the calculation of the gain of modularity of each aggregation, and the operation of joining two communities can be compactly calculated by matrix operations for all pairs of communities simultaneously. At last, a complete hierarchical community structure is unfolded. We apply our method to two benchmark data sets and a large-scale data set from an e-commerce company, showing that it effectively identifies community structure in bipartite networks.

  9. Advanced defect detection algorithm using clustering in ultrasonic NDE

    Science.gov (United States)

    Gongzhang, Rui; Gachagan, Anthony

    2016-02-01

    A range of materials used in industry exhibit scattering properties which limits ultrasonic NDE. Many algorithms have been proposed to enhance defect detection ability, such as the well-known Split Spectrum Processing (SSP) technique. Scattering noise usually cannot be fully removed and the remaining noise can be easily confused with real feature signals, hence becoming artefacts during the image interpretation stage. This paper presents an advanced algorithm to further reduce the influence of artefacts remaining in A-scan data after processing using a conventional defect detection algorithm. The raw A-scan data can be acquired from either traditional single transducer or phased array configurations. The proposed algorithm uses the concept of unsupervised machine learning to cluster segmental defect signals from pre-processed A-scans into different classes. The distinction and similarity between each class and the ensemble of randomly selected noise segments can be observed by applying a classification algorithm. Each class will then be labelled as `legitimate reflector' or `artefacts' based on this observation and the expected probability of defection (PoD) and probability of false alarm (PFA) determined. To facilitate data collection and validate the proposed algorithm, a 5MHz linear array transducer is used to collect A-scans from both austenitic steel and Inconel samples. Each pulse-echo A-scan is pre-processed using SSP and the subsequent application of the proposed clustering algorithm has provided an additional reduction to PFA while maintaining PoD for both samples compared with SSP results alone.

  10. Optimization of Scat Detection Methods for a Social Ungulate, the Wild Pig, and Experimental Evaluation of Factors Affecting Detection of Scat.

    Directory of Open Access Journals (Sweden)

    David A Keiter

    Full Text Available Collection of scat samples is common in wildlife research, particularly for genetic capture-mark-recapture applications. Due to high degradation rates of genetic material in scat, large numbers of samples must be collected to generate robust estimates. Optimization of sampling approaches to account for taxa-specific patterns of scat deposition is, therefore, necessary to ensure sufficient sample collection. While scat collection methods have been widely studied in carnivores, research to maximize scat collection and noninvasive sampling efficiency for social ungulates is lacking. Further, environmental factors or scat morphology may influence detection of scat by observers. We contrasted performance of novel radial search protocols with existing adaptive cluster sampling protocols to quantify differences in observed amounts of wild pig (Sus scrofa scat. We also evaluated the effects of environmental (percentage of vegetative ground cover and occurrence of rain immediately prior to sampling and scat characteristics (fecal pellet size and number on the detectability of scat by observers. We found that 15- and 20-m radial search protocols resulted in greater numbers of scats encountered than the previously used adaptive cluster sampling approach across habitat types, and that fecal pellet size, number of fecal pellets, percent vegetative ground cover, and recent rain events were significant predictors of scat detection. Our results suggest that use of a fixed-width radial search protocol may increase the number of scats detected for wild pigs, or other social ungulates, allowing more robust estimation of population metrics using noninvasive genetic sampling methods. Further, as fecal pellet size affected scat detection, juvenile or smaller-sized animals may be less detectable than adult or large animals, which could introduce bias into abundance estimates. Knowledge of relationships between environmental variables and scat detection may allow

  11. An effective trust-based recommendation method using a novel graph clustering algorithm

    Science.gov (United States)

    Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin

    2015-10-01

    Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.

  12. Differences Between Ward's and UPGMA Methods of Cluster Analysis: Implications for School Psychology.

    Science.gov (United States)

    Hale, Robert L.; Dougherty, Donna

    1988-01-01

    Compared the efficacy of two methods of cluster analysis, the unweighted pair-groups method using arithmetic averages (UPGMA) and Ward's method, for students grouped on intelligence, achievement, and social adjustment by both clustering methods. Found UPGMA more efficacious based on output, on cophenetic correlation coefficients generated by each…

  13. A dynamic lattice searching method with rotation operation for optimization of large clusters

    International Nuclear Information System (INIS)

    Wu Xia; Cai Wensheng; Shao Xueguang

    2009-01-01

    Global optimization of large clusters has been a difficult task, though much effort has been paid and many efficient methods have been proposed. During our works, a rotation operation (RO) is designed to realize the structural transformation from decahedra to icosahedra for the optimization of large clusters, by rotating the atoms below the center atom with a definite degree around the fivefold axis. Based on the RO, a development of the previous dynamic lattice searching with constructed core (DLSc), named as DLSc-RO, is presented. With an investigation of the method for the optimization of Lennard-Jones (LJ) clusters, i.e., LJ 500 , LJ 561 , LJ 600 , LJ 665-667 , LJ 670 , LJ 685 , and LJ 923 , Morse clusters, silver clusters by Gupta potential, and aluminum clusters by NP-B potential, it was found that both the global minima with icosahedral and decahedral motifs can be obtained, and the method is proved to be efficient and universal.

  14. Clustering method to process signals from a CdZnTe detector

    International Nuclear Information System (INIS)

    Zhang, Lan; Takahashi, Hiroyuki; Fukuda, Daiji; Nakazawa, Masaharu

    2001-01-01

    The poor mobility of holes in a compound semiconductor detector results in the imperfect collection of the primary charge deposited in the detector. Furthermore the fluctuation of the charge loss efficiency due to the change in the hole collection path length seriously degrades the energy resolution of the detector. Since the charge collection efficiency varies with the signal waveform, we can expect the improvement of the energy resolution through a proper waveform signal processing method. We developed a new digital signal processing technique, a clustering method which derives typical patterns containing the information on the real situation inside a detector from measured signals. The obtained typical patterns for the detector are then used for the pattern matching method. Measured signals are classified through analyzing the practical waveform variation due to the charge trapping, the electric field and the crystal defect etc. Signals with similar shape are placed into the same cluster. For each cluster we calculate an average waveform as a reference pattern. Using these reference patterns obtained from all the clusters, we can classify other measured signal waveforms from the same detector. Then signals are independently processed according to the classified category and form corresponding spectra. Finally these spectra are merged into one spectrum by multiplying normalization coefficients. The effectiveness of this method was verified with a CdZnTe detector of 2 mm thick and a 137 Cs gamma-ray source. The obtained energy resolution as improved to about 8 keV (FWHM). Because the clustering method is only related to the measured waveforms, it can be applied to any type and size of detectors and compatible with any type of filtering methods. (author)

  15. Detection of sensor degradation using K-means clustering and support vector regression in nuclear power plant

    International Nuclear Information System (INIS)

    Seo, Inyong; Ha, Bokam; Lee, Sungwoo; Shin, Changhoon; Lee, Jaeyong; Kim, Seongjun

    2011-01-01

    In a nuclear power plant (NPP), periodic sensor calibrations are required to assure sensors are operating correctly. However, only a few faulty sensors are found to be rectified. For the safe operation of an NPP and the reduction of unnecessary calibration, on-line calibration monitoring is needed. In this study, an on-line calibration monitoring called KPCSVR using k-means clustering and principal component based Auto-Associative support vector regression (PCSVR) is proposed for nuclear power plant. To reduce the training time of the model, k-means clustering method was used. Response surface methodology is employed to efficiently determine the optimal values of support vector regression hyperparameters. The proposed KPCSVR model was confirmed with actual plant data of Kori Nuclear Power Plant Unit 3 which were measured from the primary and secondary systems of the plant, and compared with the PCSVR model. By using data clustering, the average accuracy of PCSVR improved from 1.228×10 -4 to 0.472×10 -4 and the average sensitivity of PCSVR from 0.0930 to 0.0909, which results in good detection of sensor drift. Moreover, the training time is greatly reduced from 123.5 to 31.5 sec. (author)

  16. Method for Determining Appropriate Clustering Criteria of Location-Sensing Data

    Directory of Open Access Journals (Sweden)

    Youngmin Lee

    2016-08-01

    Full Text Available Large quantities of location-sensing data are generated from location-based social network services. These data are provided as point properties with location coordinates acquired from a global positioning system or Wi-Fi signal. To show the point data on multi-scale map services, the data should be represented by clusters following a grid-based clustering method, in which an appropriate grid size should be determined. Currently, there are no criteria for determining the proper grid size, and the modifiable areal unit problem has been formulated for the purpose of addressing this issue. The method proposed in this paper is applies a hexagonal grid to geotagged Twitter point data, considering the grid size in terms of both quantity and quality to minimize the limitations associated with the modifiable areal unit problem. Quantitatively, we reduced the original Twitter point data by an appropriate amount using Töpfer’s radical law. Qualitatively, we maintained the original distribution characteristics using Moran’s I. Finally, we determined the appropriate sizes of clusters from zoom levels 9–13 by analyzing the distribution of data on the graphs. Based on the visualized clustering results, we confirm that the original distribution pattern is effectively maintained using the proposed method.

  17. Alerts Visualization and Clustering in Network-based Intrusion Detection

    Energy Technology Data Exchange (ETDEWEB)

    Yang, Dr. Li [University of Tennessee; Gasior, Wade C [ORNL; Dasireddy, Swetha [University of Tennessee

    2010-04-01

    Today's Intrusion detection systems when deployed on a busy network overload the network with huge number of alerts. This behavior of producing too much raw information makes it less effective. We propose a system which takes both raw data and Snort alerts to visualize and analyze possible intrusions in a network. Then we present with two models for the visualization of clustered alerts. Our first model gives the network administrator with the logical topology of the network and detailed information of each node that involves its associated alerts and connections. In the second model, flocking model, presents the network administrator with the visual representation of IDS data in which each alert is represented in different color and the alerts with maximum similarity move together. This gives network administrator with the idea of detecting various of intrusions through visualizing the alert patterns.

  18. Intra Cluster Light properties in the CLASH-VLT cluster MACS J1206.2-0847

    CERN Document Server

    Presotto, V; Nonino, M; Mercurio, A; Grillo, C; Rosati, P; Biviano, A; Annunziatella, M; Balestra, I; Cui, W; Sartoris, B; Lemze, D; Ascaso, B; Moustakas, J; Ford, H; Fritz, A; Czoske, O; Ettori, S; Kuchner, U; Lombardi, M; Maier, C; Medezinski, E; Molino, A; Scodeggio, M; Strazzullo, V; Tozzi, P; Ziegler, B; Bartelmann, M; Benitez, N; Bradley, L; Brescia, M; Broadhurst, T; Coe, D; Donahue, M; Gobat, R; Graves, G; Kelson, D; Koekemoer, A; Melchior, P; Meneghetti, M; Merten, J; Moustakas, L; Munari, E; Postman, M; Regős, E; Seitz, S; Umetsu, K; Zheng, W; Zitrin, A

    2014-01-01

    We aim at constraining the assembly history of clusters by studying the intra cluster light (ICL) properties, estimating its contribution to the fraction of baryons in stars, f*, and understanding possible systematics/bias using different ICL detection techniques. We developed an automated method, GALtoICL, based on the software GALAPAGOS to obtain a refined version of typical BCG+ICL maps. We applied this method to our test case MACS J1206.2-0847, a massive cluster located at z=0.44, that is part of the CLASH sample. Using deep multi-band SUBARU images, we extracted the surface brightness (SB) profile of the BCG+ICL and we studied the ICL morphology, color, and contribution to f* out to R500. We repeated the same analysis using a different definition of the ICL, SBlimit method, i.e. a SB cut-off level, to compare the results. The most peculiar feature of the ICL in MACS1206 is its asymmetric radial distribution, with an excess in the SE direction and extending towards the 2nd brightest cluster galaxy which i...

  19. A cluster approximation for the transfer-matrix method

    International Nuclear Information System (INIS)

    Surda, A.

    1990-08-01

    A cluster approximation for the transfer-method is formulated. The calculation of the partition function of lattice models is transformed to a nonlinear mapping problem. The method yields the free energy, correlation functions and the phase diagrams for a large class of lattice models. The high accuracy of the method is exemplified by the calculation of the critical temperature of the Ising model. (author). 14 refs, 2 figs, 1 tab

  20. Interactive K-Means Clustering Method Based on User Behavior for Different Analysis Target in Medicine.

    Science.gov (United States)

    Lei, Yang; Yu, Dai; Bin, Zhang; Yang, Yang

    2017-01-01

    Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor's knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K -means clustering method to improve the user's satisfactions towards the result. The core of this method is to get the user's feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user's business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user's requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.

  1. Health-related hot topic detection in online communities using text clustering.

    Directory of Open Access Journals (Sweden)

    Yingjie Lu

    Full Text Available Recently, health-related social media services, especially online health communities, have rapidly emerged. Patients with various health conditions participate in online health communities to share their experiences and exchange healthcare knowledge. Exploring hot topics in online health communities helps us better understand patients' needs and interest in health-related knowledge. However, the statistical topic analysis employed in previous studies is becoming impractical for processing the rapidly increasing amount of online data. Automatic topic detection based on document clustering is an alternative approach for extracting health-related hot topics in online communities. In addition to the keyword-based features used in traditional text clustering, we integrate medical domain-specific features to represent the messages posted in online health communities. Three disease discussion boards, including boards devoted to lung cancer, breast cancer and diabetes, from an online health community are used to test the effectiveness of topic detection. Experiment results demonstrate that health-related hot topics primarily include symptoms, examinations, drugs, procedures and complications. Further analysis reveals that there also exist some significant differences among the hot topics discussed on different types of disease discussion boards.

  2. Using hierarchical clustering methods to classify motor activities of COPD patients from wearable sensor data

    Directory of Open Access Journals (Sweden)

    Reilly John J

    2005-06-01

    Full Text Available Abstract Background Advances in miniature sensor technology have led to the development of wearable systems that allow one to monitor motor activities in the field. A variety of classifiers have been proposed in the past, but little has been done toward developing systematic approaches to assess the feasibility of discriminating the motor tasks of interest and to guide the choice of the classifier architecture. Methods A technique is introduced to address this problem according to a hierarchical framework and its use is demonstrated for the application of detecting motor activities in patients with chronic obstructive pulmonary disease (COPD undergoing pulmonary rehabilitation. Accelerometers were used to collect data for 10 different classes of activity. Features were extracted to capture essential properties of the data set and reduce the dimensionality of the problem at hand. Cluster measures were utilized to find natural groupings in the data set and then construct a hierarchy of the relationships between clusters to guide the process of merging clusters that are too similar to distinguish reliably. It provides a means to assess whether the benefits of merging for performance of a classifier outweigh the loss of resolution incurred through merging. Results Analysis of the COPD data set demonstrated that motor tasks related to ambulation can be reliably discriminated from tasks performed in a seated position with the legs in motion or stationary using two features derived from one accelerometer. Classifying motor tasks within the category of activities related to ambulation requires more advanced techniques. While in certain cases all the tasks could be accurately classified, in others merging clusters associated with different motor tasks was necessary. When merging clusters, it was found that the proposed method could lead to more than 12% improvement in classifier accuracy while retaining resolution of 4 tasks. Conclusion Hierarchical

  3. Detecting groups of coevolving positions in a molecule: a clustering approach

    Directory of Open Access Journals (Sweden)

    Galtier Nicolas

    2007-11-01

    Full Text Available Abstract Background Although the patterns of co-substitutions in RNA is now well characterized, detection of coevolving positions in proteins remains a difficult task. It has been recognized that the signal is typically weak, due to the fact that (i amino-acid are characterized by various biochemical properties, so that distinct amino acids changes are not functionally equivalent, and (ii a given mutation can be compensated by more than one mutation, at more than one position. Results We present a new method based on phylogenetic substitution mapping. The two above-mentioned problems are addressed by (i the introduction of a weighted mapping, which accounts for the biochemical effects (volume, polarity, charge of amino-acid changes, (ii the use of a clustering approach to detect groups of coevolving sites of virtually any size, and (iii the distinction between biochemical compensation and other coevolutionary mechanisms. We apply this methodology to a previously studied data set of bacterial ribosomal RNA, and to three protein data sets (myoglobin of vertebrates, S-locus Receptor Kinase and Methionine Amino-Peptidase. Conclusion We succeed in detecting groups of sites which significantly depart the null hypothesis of independence. Group sizes range from pairs to groups of size ≃ 10, depending on the substitution weights used. The structural and functional relevance of these groups of sites are assessed, and the various evolutionary processes potentially generating correlated substitution patterns are discussed.

  4. A 3D clustering approach for point clouds to detect and quantify changes at a rock glacier front

    Science.gov (United States)

    Micheletti, Natan; Tonini, Marj; Lane, Stuart N.

    2016-04-01

    Terrestrial Laser Scanners (TLS) are extensively used in geomorphology to remotely-sense landforms and surfaces of any type and to derive digital elevation models (DEMs). Modern devices are able to collect many millions of points, so that working on the resulting dataset is often troublesome in terms of computational efforts. Indeed, it is not unusual that raw point clouds are filtered prior to DEM creation, so that only a subset of points is retained and the interpolation process becomes less of a burden. Whilst this procedure is in many cases necessary, it implicates a considerable loss of valuable information. First, and even without eliminating points, the common interpolation of points to a regular grid causes a loss of potentially useful detail. Second, it inevitably causes the transition from 3D information to only 2.5D data where each (x,y) pair must have a unique z-value. Vector-based DEMs (e.g. triangulated irregular networks) partially mitigate these issues, but still require a set of parameters to be set and a considerable burden in terms of calculation and storage. Because of the reasons above, being able to perform geomorphological research directly on point clouds would be profitable. Here, we propose an approach to identify erosion and deposition patterns on a very active rock glacier front in the Swiss Alps to monitor sediment dynamics. The general aim is to set up a semiautomatic method to isolate mass movements using 3D-feature identification directly from LiDAR data. An ultra-long range LiDAR RIEGL VZ-6000 scanner was employed to acquire point clouds during three consecutive summers. In order to isolate single clusters of erosion and deposition we applied the Density-Based Scan Algorithm with Noise (DBSCAN), previously successfully employed by Tonini and Abellan (2014) in a similar case for rockfall detection. DBSCAN requires two input parameters, strongly influencing the number, shape and size of the detected clusters: the minimum number of

  5. An improved K-means clustering method for cDNA microarray image segmentation.

    Science.gov (United States)

    Wang, T N; Li, T J; Shao, G F; Wu, S X

    2015-07-14

    Microarray technology is a powerful tool for human genetic research and other biomedical applications. Numerous improvements to the standard K-means algorithm have been carried out to complete the image segmentation step. However, most of the previous studies classify the image into two clusters. In this paper, we propose a novel K-means algorithm, which first classifies the image into three clusters, and then one of the three clusters is divided as the background region and the other two clusters, as the foreground region. The proposed method was evaluated on six different data sets. The analyses of accuracy, efficiency, expression values, special gene spots, and noise images demonstrate the effectiveness of our method in improving the segmentation quality.

  6. An Efficient Method for Detection of Outliers in Tracer Curves Derived from Dynamic Contrast-Enhanced Imaging

    Directory of Open Access Journals (Sweden)

    Linning Ye

    2018-01-01

    Full Text Available Presence of outliers in tracer concentration-time curves derived from dynamic contrast-enhanced imaging can adversely affect the analysis of the tracer curves by model-fitting. A computationally efficient method for detecting outliers in tracer concentration-time curves is presented in this study. The proposed method is based on a piecewise linear model and implemented using a robust clustering algorithm. The method is noniterative and all the parameters are automatically estimated. To compare the proposed method with existing Gaussian model based and robust regression-based methods, simulation studies were performed by simulating tracer concentration-time curves using the generalized Tofts model and kinetic parameters derived from different tissue types. Results show that the proposed method and the robust regression-based method achieve better detection performance than the Gaussian model based method. Compared with the robust regression-based method, the proposed method can achieve similar detection performance with much faster computation speed.

  7. Expanding Comparative Literature into Comparative Sciences Clusters with Neutrosophy and Quad-stage Method

    Directory of Open Access Journals (Sweden)

    Fu Yuhua

    2016-08-01

    Full Text Available By using Neutrosophy and Quad-stage Method, the expansions of comparative literature include: comparative social sciences clusters, comparative natural sciences clusters, comparative interdisciplinary sciences clusters, and so on. Among them, comparative social sciences clusters include: comparative literature, comparative history, comparative philosophy, and so on; comparative natural sciences clusters include: comparative mathematics, comparative physics, comparative chemistry, comparative medicine, comparative biology, and so on.

  8. Analytical Energy Gradients for Excited-State Coupled-Cluster Methods

    Science.gov (United States)

    Wladyslawski, Mark; Nooijen, Marcel

    The equation-of-motion coupled-cluster (EOM-CC) and similarity transformed equation-of-motion coupled-cluster (STEOM-CC) methods have been firmly established as accurate and routinely applicable extensions of single-reference coupled-cluster theory to describe electronically excited states. An overview of these methods is provided, with emphasis on the many-body similarity transform concept that is the key to a rationalization of their accuracy. The main topic of the paper is the derivation of analytical energy gradients for such non-variational electronic structure approaches, with an ultimate focus on obtaining their detailed algebraic working equations. A general theoretical framework using Lagrange's method of undetermined multipliers is presented, and the method is applied to formulate the EOM-CC and STEOM-CC gradients in abstract operator terms, following the previous work in [P.G. Szalay, Int. J. Quantum Chem. 55 (1995) 151] and [S.R. Gwaltney, R.J. Bartlett, M. Nooijen, J. Chem. Phys. 111 (1999) 58]. Moreover, the systematics of the Lagrange multiplier approach is suitable for automation by computer, enabling the derivation of the detailed derivative equations through a standardized and direct procedure. To this end, we have developed the SMART (Symbolic Manipulation and Regrouping of Tensors) package of automated symbolic algebra routines, written in the Mathematica programming language. The SMART toolkit provides the means to expand, differentiate, and simplify equations by manipulation of the detailed algebraic tensor expressions directly. The Lagrangian multiplier formulation establishes a uniform strategy to perform the automated derivation in a standardized manner: A Lagrange multiplier functional is constructed from the explicit algebraic equations that define the energy in the electronic method; the energy functional is then made fully variational with respect to all of its parameters, and the symbolic differentiations directly yield the explicit

  9. A Saliency Guided Semi-Supervised Building Change Detection Method for High Resolution Remote Sensing Images

    Directory of Open Access Journals (Sweden)

    Bin Hou

    2016-08-01

    Full Text Available Characterizations of up to date information of the Earth’s surface are an important application providing insights to urban planning, resources monitoring and environmental studies. A large number of change detection (CD methods have been developed to solve them by utilizing remote sensing (RS images. The advent of high resolution (HR remote sensing images further provides challenges to traditional CD methods and opportunities to object-based CD methods. While several kinds of geospatial objects are recognized, this manuscript mainly focuses on buildings. Specifically, we propose a novel automatic approach combining pixel-based strategies with object-based ones for detecting building changes with HR remote sensing images. A multiresolution contextual morphological transformation called extended morphological attribute profiles (EMAPs allows the extraction of geometrical features related to the structures within the scene at different scales. Pixel-based post-classification is executed on EMAPs using hierarchical fuzzy clustering. Subsequently, the hierarchical fuzzy frequency vector histograms are formed based on the image-objects acquired by simple linear iterative clustering (SLIC segmentation. Then, saliency and morphological building index (MBI extracted on difference images are used to generate a pseudo training set. Ultimately, object-based semi-supervised classification is implemented on this training set by applying random forest (RF. Most of the important changes are detected by the proposed method in our experiments. This study was checked for effectiveness using visual evaluation and numerical evaluation.

  10. ICGE: an R package for detecting relevant clusters and atypical units in gene expression

    Directory of Open Access Journals (Sweden)

    Irigoien Itziar

    2012-02-01

    Full Text Available Abstract Background Gene expression technologies have opened up new ways to diagnose and treat cancer and other diseases. Clustering algorithms are a useful approach with which to analyze genome expression data. They attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. An important problem associated with gene classification is to discern whether the clustering process can find a relevant partition as well as the identification of new genes classes. There are two key aspects to classification: the estimation of the number of clusters, and the decision as to whether a new unit (gene, tumor sample... belongs to one of these previously identified clusters or to a new group. Results ICGE is a user-friendly R package which provides many functions related to this problem: identify the number of clusters using mixed variables, usually found by applied biomedical researchers; detect whether the data have a cluster structure; identify whether a new unit belongs to one of the pre-identified clusters or to a novel group, and classify new units into the corresponding cluster. The functions in the ICGE package are accompanied by help files and easy examples to facilitate its use. Conclusions We demonstrate the utility of ICGE by analyzing simulated and real data sets. The results show that ICGE could be very useful to a broad research community.

  11. Enhancing spatial detection accuracy for syndromic surveillance with street level incidence data

    Directory of Open Access Journals (Sweden)

    Alemi Farrokh

    2010-01-01

    Full Text Available Abstract Background The Department of Defense Military Health System operates a syndromic surveillance system that monitors medical records at more than 450 non-combat Military Treatment Facilities (MTF worldwide. The Electronic Surveillance System for Early Notification of Community-based Epidemics (ESSENCE uses both temporal and spatial algorithms to detect disease outbreaks. This study focuses on spatial detection and attempts to improve the effectiveness of the ESSENCE implementation of the spatial scan statistic by increasing the spatial resolution of incidence data from zip codes to street address level. Methods Influenza-Like Illness (ILI was used as a test syndrome to develop methods to improve the spatial accuracy of detected alerts. Simulated incident clusters of various sizes were superimposed on real ILI incidents from the 2008/2009 influenza season. Clusters were detected using the spatial scan statistic and their displacement from simulated loci was measured. Detected cluster size distributions were also evaluated for compliance with simulated cluster sizes. Results Relative to the ESSENCE zip code based method, clusters detected using street level incidents were displaced on average 65% less for 2 and 5 mile radius clusters and 31% less for 10 mile radius clusters. Detected cluster size distributions for the street address method were quasi normal and sizes tended to slightly exceed simulated radii. ESSENCE methods yielded fragmented distributions and had high rates of zero radius and oversized clusters. Conclusions Spatial detection accuracy improved notably with regard to both location and size when incidents were geocoded to street addresses rather than zip code centroids. Since street address geocoding success rates were only 73.5%, zip codes were still used for more than one quarter of ILI cases. Thus, further advances in spatial detection accuracy are dependant on systematic improvements in the collection of individual

  12. DETECTION OF SOLAR-LIKE OSCILLATIONS FROM KEPLER PHOTOMETRY OF THE OPEN CLUSTER NGC 6819

    International Nuclear Information System (INIS)

    Stello, Dennis; Bedding, Timothy R.; Huber, Daniel; Basu, Sarbani; Bruntt, Hans; Mosser, BenoIt; Barban, Caroline; Goupil, Marie-Jo; Stevens, Ian R.; Chaplin, William J.; Elsworth, Yvonne P.; Hekker, Saskia; Brown, Timothy M.; Christensen-Dalsgaard, Joergen; Kjeldsen, Hans; Arentoft, Torben; Gilliland, Ronald L.; Ballot, Jerome; GarcIa, Rafael A.; Mathur, Savita

    2010-01-01

    Asteroseismology of stars in clusters has been a long-sought goal because the assumption of a common age, distance, and initial chemical composition allows strong tests of the theory of stellar evolution. We report results from the first 34 days of science data from the Kepler Mission for the open cluster NGC 6819-one of the four clusters in the field of view. We obtain the first clear detections of solar-like oscillations in the cluster red giants and are able to measure the large frequency separation, Δν, and the frequency of maximum oscillation power, ν max . We find that the asteroseismic parameters allow us to test cluster membership of the stars, and even with the limited seismic data in hand, we can already identify four possible non-members despite their having a better than 80% membership probability from radial velocity measurements. We are also able to determine the oscillation amplitudes for stars that span about 2 orders of magnitude in luminosity and find good agreement with the prediction that oscillation amplitudes scale as the luminosity to the power of 0.7. These early results demonstrate the unique potential of asteroseismology of the stellar clusters observed by Kepler.

  13. Vertebra identification using template matching modelmp and K-means clustering.

    Science.gov (United States)

    Larhmam, Mohamed Amine; Benjelloun, Mohammed; Mahmoudi, Saïd

    2014-03-01

    Accurate vertebra detection and segmentation are essential steps for automating the diagnosis of spinal disorders. This study is dedicated to vertebra alignment measurement, the first step in a computer-aided diagnosis tool for cervical spine trauma. Automated vertebral segment alignment determination is a challenging task due to low contrast imaging and noise. A software tool for segmenting vertebrae and detecting subluxations has clinical significance. A robust method was developed and tested for cervical vertebra identification and segmentation that extracts parameters used for vertebra alignment measurement. Our contribution involves a novel combination of a template matching method and an unsupervised clustering algorithm. In this method, we build a geometric vertebra mean model. To achieve vertebra detection, manual selection of the region of interest is performed initially on the input image. Subsequent preprocessing is done to enhance image contrast and detect edges. Candidate vertebra localization is then carried out by using a modified generalized Hough transform (GHT). Next, an adapted cost function is used to compute local voted centers and filter boundary data. Thereafter, a K-means clustering algorithm is applied to obtain clusters distribution corresponding to the targeted vertebrae. These clusters are combined with the vote parameters to detect vertebra centers. Rigid segmentation is then carried out by using GHT parameters. Finally, cervical spine curves are extracted to measure vertebra alignment. The proposed approach was successfully applied to a set of 66 high-resolution X-ray images. Robust detection was achieved in 97.5 % of the 330 tested cervical vertebrae. An automated vertebral identification method was developed and demonstrated to be robust to noise and occlusion. This work presents a first step toward an automated computer-aided diagnosis system for cervical spine trauma detection.

  14. Locating irregularly shaped clusters of infection intensity

    DEFF Research Database (Denmark)

    Yiannakoulias, Niko; Wilson, Shona; Kariuki, H. Curtis

    2010-01-01

    of infection intensity identifies two small areas within the study region in which infection intensity is elevated, possibly due to local features of the physical or social environment. Collectively, our results show that the "greedy growth scan" is a suitable method for exploratory geographical analysis...... for cluster detection. Real data are based on samples of hookworm and S. mansoni from Kitengei, Makueni district, Kenya. Our analysis of simulated data shows how methods able to find irregular shapes are more likely to identify clusters along rivers than methods constrained to fixed geometries. Our analysis...

  15. Statistical Significance for Hierarchical Clustering

    Science.gov (United States)

    Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

    2017-01-01

    Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990

  16. Robustness of serial clustering of extratropical cyclones to the choice of tracking method

    Directory of Open Access Journals (Sweden)

    Joaquim G. Pinto

    2016-07-01

    Full Text Available Cyclone clusters are a frequent synoptic feature in the Euro-Atlantic area. Recent studies have shown that serial clustering of cyclones generally occurs on both flanks and downstream regions of the North Atlantic storm track, while cyclones tend to occur more regulary on the western side of the North Atlantic basin near Newfoundland. This study explores the sensitivity of serial clustering to the choice of cyclone tracking method using cyclone track data from 15 methods derived from ERA-Interim data (1979–2010. Clustering is estimated by the dispersion (ratio of variance to mean of winter [December – February (DJF] cyclone passages near each grid point over the Euro-Atlantic area. The mean number of cyclone counts and their variance are compared between methods, revealing considerable differences, particularly for the latter. Results show that all different tracking methods qualitatively capture similar large-scale spatial patterns of underdispersion and overdispersion over the study region. The quantitative differences can primarily be attributed to the differences in the variance of cyclone counts between the methods. Nevertheless, overdispersion is statistically significant for almost all methods over parts of the eastern North Atlantic and Western Europe, and is therefore considered as a robust feature. The influence of the North Atlantic Oscillation (NAO on cyclone clustering displays a similar pattern for all tracking methods, with one maximum near Iceland and another between the Azores and Iberia. The differences in variance between methods are not related with different sensitivities to the NAO, which can account to over 50% of the clustering in some regions. We conclude that the general features of underdispersion and overdispersion of extratropical cyclones over the North Atlantic and Western Europe are robust to the choice of tracking method. The same is true for the influence of the NAO on cyclone dispersion.

  17. Cosmological analysis of galaxy clusters surveys in X-rays

    International Nuclear Information System (INIS)

    Clerc, N.

    2012-01-01

    Clusters of galaxies are the most massive objects in equilibrium in our Universe. Their study allows to test cosmological scenarios of structure formation with precision, bringing constraints complementary to those stemming from the cosmological background radiation, supernovae or galaxies. They are identified through the X-ray emission of their heated gas, thus facilitating their mapping at different epochs of the Universe. This report presents two surveys of galaxy clusters detected in X-rays and puts forward a method for their cosmological interpretation. Thanks to its multi-wavelength coverage extending over 10 sq. deg. and after one decade of expertise, the XMM-LSS allows a systematic census of clusters in a large volume of the Universe. In the framework of this survey, the first part of this report describes the techniques developed to the purpose of characterizing the detected objects. A particular emphasis is placed on the most distant ones (z ≥ 1) through the complementarity of observations in X-ray, optical and infrared bands. Then the X-CLASS survey is fully described. Based on XMM archival data, it provides a new catalogue of 800 clusters detected in X-rays. A cosmological analysis of this survey is performed thanks to 'CR-HR' diagrams. This new method self-consistently includes selection effects and scaling relations and provides a means to bypass the computation of individual cluster masses. Propositions are made for applying this method to future surveys as XMM-XXL and eRosita. (author) [fr

  18. Comparison of tests for spatial heterogeneity on data with global clustering patterns and outliers

    Directory of Open Access Journals (Sweden)

    Hachey Mark

    2009-10-01

    Full Text Available Abstract Background The ability to evaluate geographic heterogeneity of cancer incidence and mortality is important in cancer surveillance. Many statistical methods for evaluating global clustering and local cluster patterns are developed and have been examined by many simulation studies. However, the performance of these methods on two extreme cases (global clustering evaluation and local anomaly (outlier detection has not been thoroughly investigated. Methods We compare methods for global clustering evaluation including Tango's Index, Moran's I, and Oden's I*pop; and cluster detection methods such as local Moran's I and SaTScan elliptic version on simulated count data that mimic global clustering patterns and outliers for cancer cases in the continental United States. We examine the power and precision of the selected methods in the purely spatial analysis. We illustrate Tango's MEET and SaTScan elliptic version on a 1987-2004 HIV and a 1950-1969 lung cancer mortality data in the United States. Results For simulated data with outlier patterns, Tango's MEET, Moran's I and I*pop had powers less than 0.2, and SaTScan had powers around 0.97. For simulated data with global clustering patterns, Tango's MEET and I*pop (with 50% of total population as the maximum search window had powers close to 1. SaTScan had powers around 0.7-0.8 and Moran's I has powers around 0.2-0.3. In the real data example, Tango's MEET indicated the existence of global clustering patterns in both the HIV and lung cancer mortality data. SaTScan found a large cluster for HIV mortality rates, which is consistent with the finding from Tango's MEET. SaTScan also found clusters and outliers in the lung cancer mortality data. Conclusion SaTScan elliptic version is more efficient for outlier detection compared with the other methods evaluated in this article. Tango's MEET and Oden's I*pop perform best in global clustering scenarios among the selected methods. The use of SaTScan for

  19. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    Directory of Open Access Journals (Sweden)

    I. Crawford

    2015-11-01

    Full Text Available In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4 where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio–hydro–atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen–Rocky Mountain Biogenic Aerosol Study ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the

  20. Cluster cosmological analysis with X ray instrumental observables: introduction and testing of AsPIX method

    International Nuclear Information System (INIS)

    Valotti, Andrea

    2016-01-01

    Cosmology is one of the fundamental pillars of astrophysics, as such it contains many unsolved puzzles. To investigate some of those puzzles, we analyze X-ray surveys of galaxy clusters. These surveys are possible thanks to the bremsstrahlung emission of the intra-cluster medium. The simultaneous fit of cluster counts as a function of mass and distance provides an independent measure of cosmological parameters such as Ω m , σ s , and the dark energy equation of state w0. A novel approach to cosmological analysis using galaxy cluster data, called top-down, was developed in N. Clerc et al. (2012). This top-down approach is based purely on instrumental observables that are considered in a two-dimensional X-ray color-magnitude diagram. The method self-consistently includes selection effects and scaling relationships. It also provides a means of bypassing the computation of individual cluster masses. My work presents an extension of the top-down method by introducing the apparent size of the cluster, creating a three-dimensional X-ray cluster diagram. The size of a cluster is sensitive to both the cluster mass and its angular diameter, so it must also be included in the assessment of selection effects. The performance of this new method is investigated using a Fisher analysis. In parallel, I have studied the effects of the intrinsic scatter in the cluster size scaling relation on the sample selection as well as on the obtained cosmological parameters. To validate the method, I estimate uncertainties of cosmological parameters with MCMC method Amoeba minimization routine and using two simulated XMM surveys that have an increasing level of complexity. The first simulated survey is a set of toy catalogues of 100 and 10000 deg 2 , whereas the second is a 1000 deg 2 catalogue that was generated using an Aardvark semi-analytical N-body simulation. This comparison corroborates the conclusions of the Fisher analysis. In conclusion, I find that a cluster diagram that accounts

  1. Unbiased methods for removing systematics from galaxy clustering measurements

    Science.gov (United States)

    Elsner, Franz; Leistedt, Boris; Peiris, Hiranya V.

    2016-02-01

    Measuring the angular clustering of galaxies as a function of redshift is a powerful method for extracting information from the three-dimensional galaxy distribution. The precision of such measurements will dramatically increase with ongoing and future wide-field galaxy surveys. However, these are also increasingly sensitive to observational and astrophysical contaminants. Here, we study the statistical properties of three methods proposed for controlling such systematics - template subtraction, basic mode projection, and extended mode projection - all of which make use of externally supplied template maps, designed to characterize and capture the spatial variations of potential systematic effects. Based on a detailed mathematical analysis, and in agreement with simulations, we find that the template subtraction method in its original formulation returns biased estimates of the galaxy angular clustering. We derive closed-form expressions that should be used to correct results for this shortcoming. Turning to the basic mode projection algorithm, we prove it to be free of any bias, whereas we conclude that results computed with extended mode projection are biased. Within a simplified setup, we derive analytical expressions for the bias and discuss the options for correcting it in more realistic configurations. Common to all three methods is an increased estimator variance induced by the cleaning process, albeit at different levels. These results enable unbiased high-precision clustering measurements in the presence of spatially varying systematics, an essential step towards realizing the full potential of current and planned galaxy surveys.

  2. A New Soft Computing Method for K-Harmonic Means Clustering.

    Science.gov (United States)

    Yeh, Wei-Chang; Jiang, Yunzhi; Chen, Yee-Fen; Chen, Zhe

    2016-01-01

    The K-harmonic means clustering algorithm (KHM) is a new clustering method used to group data such that the sum of the harmonic averages of the distances between each entity and all cluster centroids is minimized. Because it is less sensitive to initialization than K-means (KM), many researchers have recently been attracted to studying KHM. In this study, the proposed iSSO-KHM is based on an improved simplified swarm optimization (iSSO) and integrates a variable neighborhood search (VNS) for KHM clustering. As evidence of the utility of the proposed iSSO-KHM, we present extensive computational results on eight benchmark problems. From the computational results, the comparison appears to support the superiority of the proposed iSSO-KHM over previously developed algorithms for all experiments in the literature.

  3. Characteristics of Clusters of Salmonella and Escherichia coli O157 Detected by Pulsed-Field Gel Electrophoresis that Predict Identification of Outbreaks.

    Science.gov (United States)

    Jones, Timothy F; Sashti, Nupur; Ingram, Amanda; Phan, Quyen; Booth, Hillary; Rounds, Joshua; Nicholson, Cyndy S; Cosgrove, Shaun; Crocker, Kia; Gould, L Hannah

    2016-12-01

    Molecular subtyping of pathogens is critical for foodborne disease outbreak detection and investigation. Many clusters initially identified by pulsed-field gel electrophoresis (PFGE) are not confirmed as point-source outbreaks. We evaluated characteristics of clusters that can help prioritize investigations to maximize effective use of limited resources. A multiagency collaboration (FoodNet) collected data on Salmonella and Escherichia coli O157 clusters for 3 years. Cluster size, timing, extent, and nature of epidemiologic investigations were analyzed to determine associations with whether the cluster was identified as a confirmed outbreak. During the 3-year study period, 948 PFGE clusters were identified; 849 (90%) were Salmonella and 99 (10%) were E. coli O157. Of those, 192 (20%) were ultimately identified as outbreaks (154 [18%] of Salmonella and 38 [38%] of E. coli O157 clusters). Successful investigation was significantly associated with larger cluster size, more rapid submission of isolates (e.g., for Salmonella, 6 days for outbreaks vs. 8 days for nonoutbreaks) and PFGE result reporting to investigators (16 days vs. 29 days, respectively), and performance of analytic studies (completed in 33% of Salmonella outbreaks vs. 1% of nonoutbreaks) and environmental investigations (40% and 1%, respectively). Intervals between first and second cases in a cluster did not differ significantly between outbreaks and nonoutbreaks. Molecular subtyping of pathogens is a rapidly advancing technology, and successfully identifying outbreaks will vary by pathogen and methods used. Understanding criteria for successfully investigating outbreaks is critical for efficiently using limited resources.

  4. Substructures in DAFT/FADA survey clusters based on XMM and optical data

    Science.gov (United States)

    Durret, F.; DAFT/FADA Team

    2014-07-01

    The DAFT/FADA survey was initiated to perform weak lensing tomography on a sample of 90 massive clusters in the redshift range [0.4,0.9] with HST imaging available. The complementary deep multiband imaging constitutes a high quality imaging data base for these clusters. In X-rays, we have analysed the XMM-Newton and/or Chandra data available for 32 clusters, and for 23 clusters we fit the X-ray emissivity with a beta-model and subtract it to search for substructures in the X-ray gas. This study was coupled with a dynamical analysis for the 18 clusters with at least 15 spectroscopic galaxy redshifts in the cluster range, based on a Serna & Gerbal (SG) analysis. We detected ten substructures in eight clusters by both methods (X-rays and SG). The percentage of mass included in substructures is found to be roughly constant with redshift, with values of 5-15%. Most of the substructures detected both in X-rays and with the SG method are found to be relatively recent infalls, probably at their first cluster pericenter approach.

  5. A Hybrid Spectral Clustering and Deep Neural Network Ensemble Algorithm for Intrusion Detection in Sensor Networks.

    Science.gov (United States)

    Ma, Tao; Wang, Fen; Cheng, Jianjun; Yu, Yang; Chen, Xiaoyun

    2016-10-13

    The development of intrusion detection systems (IDS) that are adapted to allow routers and network defence systems to detect malicious network traffic disguised as network protocols or normal access is a critical challenge. This paper proposes a novel approach called SCDNN, which combines spectral clustering (SC) and deep neural network (DNN) algorithms. First, the dataset is divided into k subsets based on sample similarity using cluster centres, as in SC. Next, the distance between data points in a testing set and the training set is measured based on similarity features and is fed into the deep neural network algorithm for intrusion detection. Six KDD-Cup99 and NSL-KDD datasets and a sensor network dataset were employed to test the performance of the model. These experimental results indicate that the SCDNN classifier not only performs better than backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF) and Bayes tree models in detection accuracy and the types of abnormal attacks found. It also provides an effective tool of study and analysis of intrusion detection in large networks.

  6. Using Clustering Techniques To Detect Usage Patterns in a Web-based Information System.

    Science.gov (United States)

    Chen, Hui-Min; Cooper, Michael D.

    2001-01-01

    This study developed an analytical approach to detecting groups with homogenous usage patterns in a Web-based information system. Principal component analysis was used for data reduction, cluster analysis for categorizing usage into groups. The methodology was demonstrated and tested using two independent samples of user sessions from the…

  7. Analysis of genetic association using hierarchical clustering and cluster validation indices.

    Science.gov (United States)

    Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L

    2017-10-01

    It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Hierarchical modeling of cluster size in wildlife surveys

    Science.gov (United States)

    Royle, J. Andrew

    2008-01-01

    Clusters or groups of individuals are the fundamental unit of observation in many wildlife sampling problems, including aerial surveys of waterfowl, marine mammals, and ungulates. Explicit accounting of cluster size in models for estimating abundance is necessary because detection of individuals within clusters is not independent and detectability of clusters is likely to increase with cluster size. This induces a cluster size bias in which the average cluster size in the sample is larger than in the population at large. Thus, failure to account for the relationship between delectability and cluster size will tend to yield a positive bias in estimates of abundance or density. I describe a hierarchical modeling framework for accounting for cluster-size bias in animal sampling. The hierarchical model consists of models for the observation process conditional on the cluster size distribution and the cluster size distribution conditional on the total number of clusters. Optionally, a spatial model can be specified that describes variation in the total number of clusters per sample unit. Parameter estimation, model selection, and criticism may be carried out using conventional likelihood-based methods. An extension of the model is described for the situation where measurable covariates at the level of the sample unit are available. Several candidate models within the proposed class are evaluated for aerial survey data on mallard ducks (Anas platyrhynchos).

  9. Prediction of Solvent Physical Properties using the Hierarchical Clustering Method

    Science.gov (United States)

    Recently a QSAR (Quantitative Structure Activity Relationship) method, the hierarchical clustering method, was developed to estimate acute toxicity values for large, diverse datasets. This methodology has now been applied to the estimate solvent physical properties including sur...

  10. A quasiparticle-based multi-reference coupled-cluster method.

    Science.gov (United States)

    Rolik, Zoltán; Kállay, Mihály

    2014-10-07

    The purpose of this paper is to introduce a quasiparticle-based multi-reference coupled-cluster (MRCC) approach. The quasiparticles are introduced via a unitary transformation which allows us to represent a complete active space reference function and other elements of an orthonormal multi-reference (MR) basis in a determinant-like form. The quasiparticle creation and annihilation operators satisfy the fermion anti-commutation relations. On the basis of these quasiparticles, a generalization of the normal-ordered operator products for the MR case can be introduced as an alternative to the approach of Mukherjee and Kutzelnigg [Recent Prog. Many-Body Theor. 4, 127 (1995); Mukherjee and Kutzelnigg, J. Chem. Phys. 107, 432 (1997)]. Based on the new normal ordering any quasiparticle-based theory can be formulated using the well-known diagram techniques. Beyond the general quasiparticle framework we also present a possible realization of the unitary transformation. The suggested transformation has an exponential form where the parameters, holding exclusively active indices, are defined in a form similar to the wave operator of the unitary coupled-cluster approach. The definition of our quasiparticle-based MRCC approach strictly follows the form of the single-reference coupled-cluster method and retains several of its beneficial properties. Test results for small systems are presented using a pilot implementation of the new approach and compared to those obtained by other MR methods.

  11. Cluster analysis of European Y-chromosomal STR haplotypes using the discrete Laplace method

    DEFF Research Database (Denmark)

    Andersen, Mikkel Meyer; Eriksen, Poul Svante; Morling, Niels

    2014-01-01

    The European Y-chromosomal short tandem repeat (STR) haplotype distribution has previously been analysed in various ways. Here, we introduce a new way of analysing population substructure using a new method based on clustering within the discrete Laplace exponential family that models the probabi......The European Y-chromosomal short tandem repeat (STR) haplotype distribution has previously been analysed in various ways. Here, we introduce a new way of analysing population substructure using a new method based on clustering within the discrete Laplace exponential family that models...... the probability distribution of the Y-STR haplotypes. Creating a consistent statistical model of the haplotypes enables us to perform a wide range of analyses. Previously, haplotype frequency estimation using the discrete Laplace method has been validated. In this paper we investigate how the discrete Laplace...... method can be used for cluster analysis to further validate the discrete Laplace method. A very important practical fact is that the calculations can be performed on a normal computer. We identified two sub-clusters of the Eastern and Western European Y-STR haplotypes similar to results of previous...

  12. The Atacama Cosmology Telescope: Cosmology from Galaxy Clusters Detected via the Sunyaev-Zeldovich Effect

    International Nuclear Information System (INIS)

    Sehgal, N.

    2011-01-01

    We present constraints on cosmological parameters based on a sample of Sunyaev-Zeldovich-selected galaxy clusters detected in a millimeter-wave survey by the Atacama Cosmology Telescope. The cluster sample used in this analysis consists of 9 optically-confirmed high-mass clusters comprising the high-significance end of the total cluster sample identified in 455 square degrees of sky surveyed during 2008 at 148GHz. We focus on the most massive systems to reduce the degeneracy between unknown cluster astrophysics and cosmology derived from SZ surveys. We describe the scaling relation between cluster mass and SZ signal with a 4-parameter fit. Marginalizing over the values of the parameters in this fit with conservative priors gives σ 8 = 0.851 ± 0.115 and w = -1.14 ± 0.35 for a spatially-flat wCDM cosmological model with WMAP 7-year priors on cosmological parameters. This gives a modest improvement in statistical uncertainty over WMAP 7-year constraints alone. Fixing the scaling relation between cluster mass and SZ signal to a fiducial relation obtained from numerical simulations and calibrated by X-ray observations, we find σ 8 = 0.821 ± 0.044 and w = -1.05 ± 0.20. These results are consistent with constraints from WMAP 7 plus baryon acoustic oscillations plus type Ia supernoava which give σ 8 = 0.802 ± 0.038 and w = -0.98 ± 0.053. A stacking analysis of the clusters in this sample compared to clusters simulated assuming the fiducial model also shows good agreement. These results suggest that, given the sample of clusters used here, both the astrophysics of massive clusters and the cosmological parameters derived from them are broadly consistent with current models.

  13. Propensity score to detect baseline imbalance in cluster randomized trials: the role of the c-statistic.

    Science.gov (United States)

    Leyrat, Clémence; Caille, Agnès; Foucher, Yohann; Giraudeau, Bruno

    2016-01-22

    Despite randomization, baseline imbalance and confounding bias may occur in cluster randomized trials (CRTs). Covariate imbalance may jeopardize the validity of statistical inferences if they occur on prognostic factors. Thus, the diagnosis of a such imbalance is essential to adjust statistical analysis if required. We developed a tool based on the c-statistic of the propensity score (PS) model to detect global baseline covariate imbalance in CRTs and assess the risk of confounding bias. We performed a simulation study to assess the performance of the proposed tool and applied this method to analyze the data from 2 published CRTs. The proposed method had good performance for large sample sizes (n =500 per arm) and when the number of unbalanced covariates was not too small as compared with the total number of baseline covariates (≥40% of unbalanced covariates). We also provide a strategy for pre selection of the covariates needed to be included in the PS model to enhance imbalance detection. The proposed tool could be useful in deciding whether covariate adjustment is required before performing statistical analyses of CRTs.

  14. Segmentation and clustering as complementary sources of information

    Science.gov (United States)

    Dale, Michael B.; Allison, Lloyd; Dale, Patricia E. R.

    2007-03-01

    This paper examines the effects of using a segmentation method to identify change-points or edges in vegetation. It identifies coherence (spatial or temporal) in place of unconstrained clustering. The segmentation method involves change-point detection along a sequence of observations so that each cluster formed is composed of adjacent samples; this is a form of constrained clustering. The protocol identifies one or more models, one for each section identified, and the quality of each is assessed using a minimum message length criterion, which provides a rational basis for selecting an appropriate model. Although the segmentation is less efficient than clustering, it does provide other information because it incorporates textural similarity as well as homogeneity. In addition it can be useful in determining various scales of variation that may apply to the data, providing a general method of small-scale pattern analysis.

  15. Application of a Light-Front Coupled Cluster Method

    International Nuclear Information System (INIS)

    Chabysheva, S.S.; Hiller, J.R.

    2012-01-01

    As a test of the new light-front coupled-cluster method in a gauge theory, we apply it to the nonperturbative construction of the dressed-electron state in QED, for an arbitrary covariant gauge, and compute the electron's anomalous magnetic moment. The construction illustrates the spectator and Fock-sector independence of vertex and self-energy contributions and indicates resolution of the difficulties with uncanceled divergences that plague methods based on Fock-space truncation. (author)

  16. a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis

    Science.gov (United States)

    Huang, W.; Li, S.; Xu, S.

    2016-06-01

    How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the

  17. Near-Duplicate Web Page Detection: An Efficient Approach Using Clustering, Sentence Feature and Fingerprinting

    Directory of Open Access Journals (Sweden)

    J. Prasanna Kumar

    2013-02-01

    Full Text Available Duplicate and near-duplicate web pages are the chief concerns for web search engines. In reality, they incur enormous space to store the indexes, ultimately slowing down and increasing the cost of serving results. A variety of techniques have been developed to identify pairs of web pages that are aldquo;similarardquo; to each other. The problem of finding near-duplicate web pages has been a subject of research in the database and web-search communities for some years. In order to identify the near duplicate web pages, we make use of sentence level features along with fingerprinting method. When a large number of web documents are in consideration for the detection of web pages, then at first, we use K-mode clustering and subsequently sentence feature and fingerprint comparison is used. Using these steps, we exactly identify the near duplicate web pages in an efficient manner. The experimentation is carried out on the web page collections and the results ensured the efficiency of the proposed approach in detecting the near duplicate web pages.

  18. Automated detection of microcalcification clusters in digital mammograms based on wavelet domain hidden Markov tree modeling

    International Nuclear Information System (INIS)

    Regentova, E.; Zhang, L.; Veni, G.; Zheng, J.

    2007-01-01

    A system is designed for detecting microcalcification clusters (MCC) in digital mammograms. The system is intended for computer-aided diagnostic prompting. Further discrimination of MCC as benign or malignant is assumed to be performed by radiologists. Processing of mammograms is based on the statistical modeling by means of wavelet domain hidden markov trees (WHMT). Segmentation is performed by the weighted likelihood evaluation followed by the classification based on spatial filters for a single microcalcification (MC) and a cluster of MC detection. The analysis is carried out on FROC curves for 40 mammograms from the mini-MIAS database and for 100 mammograms with 50 cancerous and 50 benign cases from DDSM database. The designed system is capable to detect 100% of true positive cases in these sets. The rate of false positives is 2.9 per case for mini-MIAS dataset; and 0.01 for the DDSM images. (orig.)

  19. CutL: an alternative to Kulldorff's scan statistics for cluster detection with a specified cut-off level.

    Science.gov (United States)

    Więckowska, Barbara; Marcinkowska, Justyna

    2017-11-06

    When searching for epidemiological clusters, an important tool can be to carry out one's own research with the incidence rate from the literature as the reference level. Values exceeding this level may indicate the presence of a cluster in that location. This paper presents a method of searching for clusters that have significantly higher incidence rates than those specified by the investigator. The proposed method uses the classic binomial exact test for one proportion and an algorithm that joins areas with potential clusters while reducing the number of multiple comparisons needed. The sensitivity and specificity are preserved by this new method, while avoiding the Monte Carlo approach and still delivering results comparable to the commonly used Kulldorff's scan statistics and other similar methods of localising clusters. A strong contributing factor afforded by the statistical software that makes this possible is that it allows analysis and presentation of the results cartographically.

  20. A Review of Subsequence Time Series Clustering

    Directory of Open Access Journals (Sweden)

    Seyedjamal Zolhavarieh

    2014-01-01

    Full Text Available Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies.

  1. A review of subsequence time series clustering.

    Science.gov (United States)

    Zolhavarieh, Seyedjamal; Aghabozorgi, Saeed; Teh, Ying Wah

    2014-01-01

    Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies.

  2. A Review of Subsequence Time Series Clustering

    Science.gov (United States)

    Teh, Ying Wah

    2014-01-01

    Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies. PMID:25140332

  3. The Atacama Cosmology Telescope: Cosmology from Galaxy Clusters Detected Via the Sunyaev-Zel'dovich Effect

    Science.gov (United States)

    Sehgal, Neelima; Trac, Hy; Acquaviva, Viviana; Ade, Peter A. R.; Aguirre, Paula; Amiri, Mandana; Appel, John W.; Barrientos, L. Felipe; Battistelli, Elia S.; Bond, J. Richard; hide

    2010-01-01

    We present constraints on cosmological parameters based on a sample of Sunyaev-Zel'dovich-selected galaxy clusters detected in a millimeter-wave survey by the Atacama Cosmology Telescope. The cluster sample used in this analysis consists of 9 optically-confirmed high-mass clusters comprising the high-significance end of the total cluster sample identified in 455 square degrees of sky surveyed during 2008 at 148 GHz. We focus on the most massive systems to reduce the degeneracy between unknown cluster astrophysics and cosmology derived from SZ surveys. We describe the scaling relation between cluster mass and SZ signal with a 4-parameter fit. Marginalizing over the values of the parameters in this fit with conservative priors gives (sigma)8 = 0.851 +/- 0.115 and w = -1.14 +/- 0.35 for a spatially-flat wCDM cosmological model with WMAP 7-year priors on cosmological parameters. This gives a modest improvement in statistical uncertainty over WMAP 7-year constraints alone. Fixing the scaling relation between cluster mass and SZ signal to a fiducial relation obtained from numerical simulations and calibrated by X-ray observations, we find (sigma)8 + 0.821 +/- 0.044 and w = -1.05 +/- 0.20. These results are consistent with constraints from WMAP 7 plus baryon acoustic oscillations plus type Ia supernova which give (sigma)8 = 0.802 +/- 0.038 and w = -0.98 +/- 0.053. A stacking analysis of the clusters in this sample compared to clusters simulated assuming the fiducial model also shows good agreement. These results suggest that, given the sample of clusters used here, both the astrophysics of massive clusters and the cosmological parameters derived from them are broadly consistent with current models.

  4. Dynamic analysis of clustered building structures using substructures methods

    International Nuclear Information System (INIS)

    Leimbach, K.R.; Krutzik, N.J.

    1989-01-01

    The dynamic substructure approach to the building cluster on a common base mat starts with the generation of Ritz-vectors for each building on a rigid foundation. The base mat plus the foundation soil is subjected to kinematic constraint modes, for example constant, linear, quadratic or cubic constraints. These constraint modes are also imposed on the buildings. By enforcing kinematic compatibility of the complete structural system on the basis of the constraint modes a reduced Ritz model of the complete cluster is obtained. This reduced model can now be analyzed by modal time history or response spectrum methods

  5. Voting-based consensus clustering for combining multiple clusterings of chemical structures

    Directory of Open Access Journals (Sweden)

    Saeed Faisal

    2012-12-01

    Full Text Available Abstract Background Although many consensus clustering methods have been successfully used for combining multiple classifiers in many areas such as machine learning, applied statistics, pattern recognition and bioinformatics, few consensus clustering methods have been applied for combining multiple clusterings of chemical structures. It is known that any individual clustering method will not always give the best results for all types of applications. So, in this paper, three voting and graph-based consensus clusterings were used for combining multiple clusterings of chemical structures to enhance the ability of separating biologically active molecules from inactive ones in each cluster. Results The cumulative voting-based aggregation algorithm (CVAA, cluster-based similarity partitioning algorithm (CSPA and hyper-graph partitioning algorithm (HGPA were examined. The F-measure and Quality Partition Index method (QPI were used to evaluate the clusterings and the results were compared to the Ward’s clustering method. The MDL Drug Data Report (MDDR dataset was used for experiments and was represented by two 2D fingerprints, ALOGP and ECFP_4. The performance of voting-based consensus clustering method outperformed the Ward’s method using F-measure and QPI method for both ALOGP and ECFP_4 fingerprints, while the graph-based consensus clustering methods outperformed the Ward’s method only for ALOGP using QPI. The Jaccard and Euclidean distance measures were the methods of choice to generate the ensembles, which give the highest values for both criteria. Conclusions The results of the experiments show that consensus clustering methods can improve the effectiveness of chemical structures clusterings. The cumulative voting-based aggregation algorithm (CVAA was the method of choice among consensus clustering methods.

  6. Trend analysis using non-stationary time series clustering based on the finite element method

    OpenAIRE

    Gorji Sefidmazgi, M.; Sayemuzzaman, M.; Homaifar, A.; Jha, M. K.; Liess, S.

    2014-01-01

    In order to analyze low-frequency variability of climate, it is useful to model the climatic time series with multiple linear trends and locate the times of significant changes. In this paper, we have used non-stationary time series clustering to find change points in the trends. Clustering in a multi-dimensional non-stationary time series is challenging, since the problem is mathematically ill-posed. Clustering based on the finite element method (FEM) is one of the methods ...

  7. CHANDRA DETECTION OF A NEW DIFFUSE X-RAY COMPONENT FROM THE GLOBULAR CLUSTER 47 TUCANAE

    Energy Technology Data Exchange (ETDEWEB)

    Wu, E. M. H.; Cheng, K. S. [Department of Physics, University of Hong Kong, Pokfulam Road (Hong Kong); Hui, C. Y. [Department of Astronomy and Space Science, Chungnam National University, Daejeon (Korea, Republic of); Kong, A. K. H.; Tam, P. H. T. [Institute of Astronomy and Department of Physics, National Tsing Hua University, Hsinchu, Taiwan (China); Dogiel, V. A., E-mail: cyhui@cnu.ac.kr [I. E. Tamm Theoretical Physics Division of P. N. Lebedev Institute of Physics, Leninskii pr. 53, 119991 Moscow (Russian Federation)

    2014-06-20

    In re-analyzing the archival Chandra data of the globular cluster 47 Tucanae, we have detected a new diffuse X-ray emission feature within the half-mass radius of the cluster. The spectrum of the diffuse emission can be described by a power-law model plus a plasma component with photon index Γ ∼ 1.0 and plasma temperature kT ∼ 0.2 keV. While the thermal component is apparently uniform, the non-thermal contribution falls off exponentially from the core. The observed properties could possibly be explained in the context of multiple shocks resulting from the collisions among the stellar wind in the cluster and the inverse Compton scattering between the pulsar wind and the relic photons.

  8. Protein complex detection in PPI networks based on data integration and supervised learning method.

    Science.gov (United States)

    Yu, Feng; Yang, Zhi; Hu, Xiao; Sun, Yuan; Lin, Hong; Wang, Jian

    2015-01-01

    Revealing protein complexes are important for understanding principles of cellular organization and function. High-throughput experimental techniques have produced a large amount of protein interactions, which makes it possible to predict protein complexes from protein-protein interaction (PPI) networks. However, the small amount of known physical interactions may limit protein complex detection. The new PPI networks are constructed by integrating PPI datasets with the large and readily available PPI data from biomedical literature, and then the less reliable PPI between two proteins are filtered out based on semantic similarity and topological similarity of the two proteins. Finally, the supervised learning protein complex detection (SLPC), which can make full use of the information of available known complexes, is applied to detect protein complex on the new PPI networks. The experimental results of SLPC on two different categories yeast PPI networks demonstrate effectiveness of the approach: compared with the original PPI networks, the best average improvements of 4.76, 6.81 and 15.75 percentage units in the F-score, accuracy and maximum matching ratio (MMR) are achieved respectively; compared with the denoising PPI networks, the best average improvements of 3.91, 4.61 and 12.10 percentage units in the F-score, accuracy and MMR are achieved respectively; compared with ClusterONE, the start-of the-art complex detection method, on the denoising extended PPI networks, the average improvements of 26.02 and 22.40 percentage units in the F-score and MMR are achieved respectively. The experimental results show that the performances of SLPC have a large improvement through integration of new receivable PPI data from biomedical literature into original PPI networks and denoising PPI networks. In addition, our protein complexes detection method can achieve better performance than ClusterONE.

  9. A similarity based agglomerative clustering algorithm in networks

    Science.gov (United States)

    Liu, Zhiyuan; Wang, Xiujuan; Ma, Yinghong

    2018-04-01

    The detection of clusters is benefit for understanding the organizations and functions of networks. Clusters, or communities, are usually groups of nodes densely interconnected but sparsely linked with any other clusters. To identify communities, an efficient and effective community agglomerative algorithm based on node similarity is proposed. The proposed method initially calculates similarities between each pair of nodes, and form pre-partitions according to the principle that each node is in the same community as its most similar neighbor. After that, check each partition whether it satisfies community criterion. For the pre-partitions who do not satisfy, incorporate them with others that having the biggest attraction until there are no changes. To measure the attraction ability of a partition, we propose an attraction index that based on the linked node's importance in networks. Therefore, our proposed method can better exploit the nodes' properties and network's structure. To test the performance of our algorithm, both synthetic and empirical networks ranging in different scales are tested. Simulation results show that the proposed algorithm can obtain superior clustering results compared with six other widely used community detection algorithms.

  10. Orthology detection combining clustering and synteny for very large datasets.

    Science.gov (United States)

    Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K; Prohaska, Sonja J; Stadler, Peter F

    2014-01-01

    The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.

  11. Charge exchange in galaxy clusters

    Science.gov (United States)

    Gu, Liyi; Mao, Junjie; de Plaa, Jelle; Raassen, A. J. J.; Shah, Chintan; Kaastra, Jelle S.

    2018-03-01

    Context. Though theoretically expected, the charge exchange emission from galaxy clusters has never been confidently detected. Accumulating hints were reported recently, including a rather marginal detection with the Hitomi data of the Perseus cluster. As previously suggested, a detection of charge exchange line emission from galaxy clusters would not only impact the interpretation of the newly discovered 3.5 keV line, but also open up a new research topic on the interaction between hot and cold matter in clusters. Aim. We aim to perform the most systematic search for the O VIII charge exchange line in cluster spectra using the RGS on board XMM-Newton. Methods: We introduce a sample of 21 clusters observed with the RGS. In order to search for O VIII charge exchange, the sample selection criterion is a >35σ detection of the O VIII Lyα line in the archival RGS spectra. The dominating thermal plasma emission is modeled and subtracted with a two-temperature thermal component, and the residuals are stacked for the line search. The systematic uncertainties in the fits are quantified by refitting the spectra with a varying continuum and line broadening. Results: By the residual stacking, we do find a hint of a line-like feature at 14.82 Å, the characteristic wavelength expected for oxygen charge exchange. This feature has a marginal significance of 2.8σ, and the average equivalent width is 2.5 × 10-4 keV. We further demonstrate that the putative feature can be barely affected by the systematic errors from continuum modeling and instrumental effects, or the atomic uncertainties of the neighboring thermal lines. Conclusions: Assuming a realistic temperature and abundance pattern, the physical model implied by the possible oxygen line agrees well with the theoretical model proposed previously to explain the reported 3.5 keV line. If the charge exchange source indeed exists, we expect that the oxygen abundance could have been overestimated by 8-22% in previous X

  12. Study of methods to increase cluster/dislocation loop densities in electrodes

    Science.gov (United States)

    Yang, Xiaoling; Miley, George H.

    2009-03-01

    Recent research has developed a technique for imbedding ultra-high density deuterium ``clusters'' (50 to 100 atoms per cluster) in various metals such as Palladium (Pd), Beryllium (Be) and Lithium (Li). It was found the thermally dehydrogenated PdHx retained the clusters and exhibited up to 12 percent lower resistance compared to the virginal Pd samplesootnotetextA. G. Lipson, et al. Phys. Solid State. 39 (1997) 1891. SQUID measurements showed that in Pd these condensed matter clusters approach metallic conditions, exhibiting superconducting propertiesootnotetextA. Lipson, et al. Phys. Rev. B 72, 212507 (2005ootnotetextA. G. Lipson, et al. Phys. Lett. A 339, (2005) 414-423. If the fabrication methods under study are successful, a large packing fraction of nuclear reactive clusters can be developed in the electrodes by electrolyte or high pressure gas loading. This will provide a much higher low-energy-nuclear- reaction (LENR) rate than achieved with earlier electrodeootnotetextCastano, C.H., et al. Proc. ICCF-9, Beijing, China 19-24 May, 2002..

  13. Don't spin the pen: two alternative methods for second-stage sampling in urban cluster surveys

    Directory of Open Access Journals (Sweden)

    Rose Angela MC

    2007-06-01

    Full Text Available Abstract In two-stage cluster surveys, the traditional method used in second-stage sampling (in which the first household in a cluster is selected is time-consuming and may result in biased estimates of the indicator of interest. Firstly, a random direction from the center of the cluster is selected, usually by spinning a pen. The houses along that direction are then counted out to the boundary of the cluster, and one is then selected at random to be the first household surveyed. This process favors households towards the center of the cluster, but it could easily be improved. During a recent meningitis vaccination coverage survey in Maradi, Niger, we compared this method of first household selection to two alternatives in urban zones: 1 using a superimposed grid on the map of the cluster area and randomly selecting an intersection; and 2 drawing the perimeter of the cluster area using a Global Positioning System (GPS and randomly selecting one point within the perimeter. Although we only compared a limited number of clusters using each method, we found the sampling grid method to be the fastest and easiest for field survey teams, although it does require a map of the area. Selecting a random GPS point was also found to be a good method, once adequate training can be provided. Spinning the pen and counting households to the boundary was the most complicated and time-consuming. The two methods tested here represent simpler, quicker and potentially more robust alternatives to spinning the pen for cluster surveys in urban areas. However, in rural areas, these alternatives would favor initial household selection from lower density (or even potentially empty areas. Bearing in mind these limitations, as well as available resources and feasibility, investigators should choose the most appropriate method for their particular survey context.

  14. Anomaly-based Network Intrusion Detection Methods

    Directory of Open Access Journals (Sweden)

    Pavel Nevlud

    2013-01-01

    Full Text Available The article deals with detection of network anomalies. Network anomalies include everything that is quite different from the normal operation. For detection of anomalies were used machine learning systems. Machine learning can be considered as a support or a limited type of artificial intelligence. A machine learning system usually starts with some knowledge and a corresponding knowledge organization so that it can interpret, analyse, and test the knowledge acquired. There are several machine learning techniques available. We tested Decision tree learning and Bayesian networks. The open source data-mining framework WEKA was the tool we used for testing the classify, cluster, association algorithms and for visualization of our results. The WEKA is a collection of machine learning algorithms for data mining tasks.

  15. The Atacama Cosmology Telescope (ACT): Beam Profiles and First SZ Cluster Maps

    Science.gov (United States)

    Hincks, A. D.; Acquaviva, V.; Ade, P. A.; Aguirre, P.; Amiri, M.; Appel, J. W.; Barrientos, L. F.; Battistelli, E. S.; Bond, J. R.; Brown, B.; hide

    2010-01-01

    The Atacama Cosmology Telescope (ACT) is currently observing the cosmic microwave background with arcminute resolution at 148 GHz, 218 GHz, and 277 GHz, In this paper, we present ACT's first results. Data have been analyzed using a maximum-likelihood map-making method which uses B-splines to model and remove the atmospheric signal. It has been used to make high-precision beam maps from which we determine the experiment's window functions, This beam information directly impacts all subsequent analyses of the data. We also used the method to map a sample of galaxy clusters via the Sunyaev-Ze1'dovich (SZ) effect, and show five clusters previously detected with X-ray or SZ observations, We provide integrated Compton-y measurements for each cluster. Of particular interest is our detection of the z = 0.44 component of A3128 and our current non-detection of the low-redshift part, providing strong evidence that the further cluster is more massive as suggested by X-ray measurements. This is a compelling example of the redshift-independent mass selection of the SZ effect.

  16. Cost-effectiveness of intensive multifactorial treatment compared with routine care for individuals with screen-detected Type 2 diabetes : analysis of the ADDITION-UK cluster-randomized controlled trial

    NARCIS (Netherlands)

    Tao, L.; Wilson, E. C. F.; Wareham, N. J.; Sandbaek, A.; Rutten, G. E. H. M.; Lauritzen, T.; Khunti, K.; Davies, M. J.; Borch-Johnsen, K.; Griffin, S. J.; Simmons, R. K.

    Aims To examine the short- and long-term cost-effectiveness of intensive multifactorial treatment compared with routine care among people with screen-detected Type 2 diabetes. Methods Cost-utility analysis in ADDITION-UK, a cluster-randomized controlled trial of early intensive treatment in people

  17. Hybrid Clustering And Boundary Value Refinement for Tumor Segmentation using Brain MRI

    Science.gov (United States)

    Gupta, Anjali; Pahuja, Gunjan

    2017-08-01

    The method of brain tumor segmentation is the separation of tumor area from Brain Magnetic Resonance (MR) images. There are number of methods already exist for segmentation of brain tumor efficiently. However it’s tedious task to identify the brain tumor from MR images. The segmentation process is extraction of different tumor tissues such as active, tumor, necrosis, and edema from the normal brain tissues such as gray matter (GM), white matter (WM), as well as cerebrospinal fluid (CSF). As per the survey study, most of time the brain tumors are detected easily from brain MR image using region based approach but required level of accuracy, abnormalities classification is not predictable. The segmentation of brain tumor consists of many stages. Manually segmenting the tumor from brain MR images is very time consuming hence there exist many challenges in manual segmentation. In this research paper, our main goal is to present the hybrid clustering which consists of Fuzzy C-Means Clustering (for accurate tumor detection) and level set method(for handling complex shapes) for the detection of exact shape of tumor in minimal computational time. using this approach we observe that for a certain set of images 0.9412 sec of time is taken to detect tumor which is very less in comparison to recent existing algorithm i.e. Hybrid clustering (Fuzzy C-Means and K Means clustering).

  18. Structure and substructure analysis of DAFT/FADA galaxy clusters in the [0.4-0.9] redshift range

    Science.gov (United States)

    Guennou, L.; Adami, C.; Durret, F.; Lima Neto, G. B.; Ulmer, M. P.; Clowe, D.; LeBrun, V.; Martinet, N.; Allam, S.; Annis, J.; Basa, S.; Benoist, C.; Biviano, A.; Cappi, A.; Cypriano, E. S.; Gavazzi, R.; Halliday, C.; Ilbert, O.; Jullo, E.; Just, D.; Limousin, M.; Márquez, I.; Mazure, A.; Murphy, K. J.; Plana, H.; Rostagni, F.; Russeil, D.; Schirmer, M.; Slezak, E.; Tucker, D.; Zaritsky, D.; Ziegler, B.

    2014-01-01

    Context. The DAFT/FADA survey is based on the study of ~90 rich (masses found in the literature >2 × 1014 M⊙) and moderately distant clusters (redshifts 0.4 DAFT/FADA survey for which XMM-Newton and/or a sufficient number of galaxy redshifts in the cluster range are available, with the aim of detecting substructures and evidence for merging events. These properties are discussed in the framework of standard cold dark matter (ΛCDM) cosmology. Methods: In X-rays, we analysed the XMM-Newton data available, fit a β-model, and subtracted it to identify residuals. We used Chandra data, when available, to identify point sources. In the optical, we applied a Serna & Gerbal (SG) analysis to clusters with at least 15 spectroscopic galaxy redshifts available in the cluster range. We discuss the substructure detection efficiencies of both methods. Results: XMM-Newton data were available for 32 clusters, for which we derive the X-ray luminosity and a global X-ray temperature for 25 of them. For 23 clusters we were able to fit the X-ray emissivity with a β-model and subtract it to detect substructures in the X-ray gas. A dynamical analysis based on the SG method was applied to the clusters having at least 15 spectroscopic galaxy redshifts in the cluster range: 18 X-ray clusters and 11 clusters with no X-ray data. The choice of a minimum number of 15 redshifts implies that only major substructures will be detected. Ten substructures were detected both in X-rays and by the SG method. Most of the substructures detected both in X-rays and with the SG method are probably at their first cluster pericentre approach and are relatively recent infalls. We also find hints of a decreasing X-ray gas density profile core radius with redshift. Conclusions: The percentage of mass included in substructures was found to be roughly constant with redshift values of 5-15%, in agreement both with the general CDM framework and with the results of numerical simulations. Galaxies in substructures

  19. Clustering and training set selection methods for improving the accuracy of quantitative laser induced breakdown spectroscopy

    International Nuclear Information System (INIS)

    Anderson, Ryan B.; Bell, James F.; Wiens, Roger C.; Morris, Richard V.; Clegg, Samuel M.

    2012-01-01

    We investigated five clustering and training set selection methods to improve the accuracy of quantitative chemical analysis of geologic samples by laser induced breakdown spectroscopy (LIBS) using partial least squares (PLS) regression. The LIBS spectra were previously acquired for 195 rock slabs and 31 pressed powder geostandards under 7 Torr CO 2 at a stand-off distance of 7 m at 17 mJ per pulse to simulate the operational conditions of the ChemCam LIBS instrument on the Mars Science Laboratory Curiosity rover. The clustering and training set selection methods, which do not require prior knowledge of the chemical composition of the test-set samples, are based on grouping similar spectra and selecting appropriate training spectra for the partial least squares (PLS2) model. These methods were: (1) hierarchical clustering of the full set of training spectra and selection of a subset for use in training; (2) k-means clustering of all spectra and generation of PLS2 models based on the training samples within each cluster; (3) iterative use of PLS2 to predict sample composition and k-means clustering of the predicted compositions to subdivide the groups of spectra; (4) soft independent modeling of class analogy (SIMCA) classification of spectra, and generation of PLS2 models based on the training samples within each class; (5) use of Bayesian information criteria (BIC) to determine an optimal number of clusters and generation of PLS2 models based on the training samples within each cluster. The iterative method and the k-means method using 5 clusters showed the best performance, improving the absolute quadrature root mean squared error (RMSE) by ∼ 3 wt.%. The statistical significance of these improvements was ∼ 85%. Our results show that although clustering methods can modestly improve results, a large and diverse training set is the most reliable way to improve the accuracy of quantitative LIBS. In particular, additional sulfate standards and specifically

  20. Detection of an unidentified emission line in the stacked X-ray spectrum of galaxy clusters

    Energy Technology Data Exchange (ETDEWEB)

    Bulbul, Esra; Foster, Adam; Smith, Randall K.; Randall, Scott W. [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138 (United States); Markevitch, Maxim [NASA Goddard Space Flight Center, Greenbelt, MD 20771 (United States); Loewenstein, Michael, E-mail: ebulbul@cfa.harvard.edu [CRESST and X-ray Astrophysics Laboratory, NASA Goddard Space Flight Center, Greenbelt, MD 20771 (United States)

    2014-07-01

    We detect a weak unidentified emission line at E = (3.55-3.57) ± 0.03 keV in a stacked XMM-Newton spectrum of 73 galaxy clusters spanning a redshift range 0.01-0.35. When the full sample is divided into three subsamples (Perseus, Centaurus+Ophiuchus+Coma, and all others), the line is seen at >3σ statistical significance in all three independent MOS spectra and the PN 'all others' spectrum. It is also detected in the Chandra spectra of the Perseus Cluster. However, it is very weak and located within 50-110 eV of several known lines. The detection is at the limit of the current instrument capabilities. We argue that there should be no atomic transitions in thermal plasma at this energy. An intriguing possibility is the decay of sterile neutrino, a long-sought dark matter particle candidate. Assuming that all dark matter is in sterile neutrinos with m{sub s} = 2E = 7.1 keV, our detection corresponds to a neutrino decay rate consistent with previous upper limits. However, based on the cluster masses and distances, the line in Perseus is much brighter than expected in this model, significantly deviating from other subsamples. This appears to be because of an anomalously bright line at E = 3.62 keV in Perseus, which could be an Ar XVII dielectronic recombination line, although its emissivity would have to be 30 times the expected value and physically difficult to understand. Another alternative is the above anomaly in the Ar line combined with the nearby 3.51 keV K line also exceeding expectation by a factor of 10-20. Confirmation with Astro-H will be critical to determine the nature of this new line.

  1. A GMBCG GALAXY CLUSTER CATALOG OF 55,424 RICH CLUSTERS FROM SDSS DR7

    International Nuclear Information System (INIS)

    Hao Jiangang; Annis, James; Johnston, David E.; McKay, Timothy A.; Evrard, August; Siegel, Seth R.; Gerdes, David; Koester, Benjamin P.; Rykoff, Eli S.; Rozo, Eduardo; Wechsler, Risa H.; Busha, Michael; Becker, Matthew; Sheldon, Erin

    2010-01-01

    We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red-sequence plus brightest cluster galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red-sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 deg 2 of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.

  2. A GMBCG galaxy cluster catalog of 55,880 rich clusters from SDSS DR7

    Energy Technology Data Exchange (ETDEWEB)

    Hao, Jiangang; McKay, Timothy A.; Koester, Benjamin P.; Rykoff, Eli S.; Rozo, Eduardo; Annis, James; Wechsler, Risa H.; Evrard, August; Siegel, Seth R.; Becker, Matthew; Busha, Michael; /Fermilab /Michigan U. /Chicago U., Astron. Astrophys. Ctr. /UC, Santa Barbara /KICP, Chicago /KIPAC, Menlo Park /SLAC /Caltech /Brookhaven

    2010-08-01

    We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.

  3. Fuzzy Kernel k-Medoids algorithm for anomaly detection problems

    Science.gov (United States)

    Rustam, Z.; Talita, A. S.

    2017-07-01

    Intrusion Detection System (IDS) is an essential part of security systems to strengthen the security of information systems. IDS can be used to detect the abuse by intruders who try to get into the network system in order to access and utilize the available data sources in the system. There are two approaches of IDS, Misuse Detection and Anomaly Detection (behavior-based intrusion detection). Fuzzy clustering-based methods have been widely used to solve Anomaly Detection problems. Other than using fuzzy membership concept to determine the object to a cluster, other approaches as in combining fuzzy and possibilistic membership or feature-weighted based methods are also used. We propose Fuzzy Kernel k-Medoids that combining fuzzy and possibilistic membership as a powerful method to solve anomaly detection problem since on numerical experiment it is able to classify IDS benchmark data into five different classes simultaneously. We classify IDS benchmark data KDDCup'99 data set into five different classes simultaneously with the best performance was achieved by using 30 % of training data with clustering accuracy reached 90.28 percent.

  4. Multiresolution edge detection using enhanced fuzzy c-means clustering for ultrasound image speckle reduction

    Energy Technology Data Exchange (ETDEWEB)

    Tsantis, Stavros [Department of Medical Physics, School of Medicine, University of Patras, Rion, GR 26504 (Greece); Spiliopoulos, Stavros; Karnabatidis, Dimitrios [Department of Radiology, School of Medicine, University of Patras, Rion, GR 26504 (Greece); Skouroliakou, Aikaterini [Department of Energy Technology Engineering, Technological Education Institute of Athens, Athens 12210 (Greece); Hazle, John D. [Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030 (United States); Kagadis, George C., E-mail: gkagad@gmail.com, E-mail: George.Kagadis@med.upatras.gr, E-mail: GKagadis@mdanderson.org [Department of Medical Physics, School of Medicine, University of Patras, Rion, GR 26504, Greece and Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030 (United States)

    2014-07-15

    Purpose: Speckle suppression in ultrasound (US) images of various anatomic structures via a novel speckle noise reduction algorithm. Methods: The proposed algorithm employs an enhanced fuzzy c-means (EFCM) clustering and multiresolution wavelet analysis to distinguish edges from speckle noise in US images. The edge detection procedure involves a coarse-to-fine strategy with spatial and interscale constraints so as to classify wavelet local maxima distribution at different frequency bands. As an outcome, an edge map across scales is derived whereas the wavelet coefficients that correspond to speckle are suppressed in the inverse wavelet transform acquiring the denoised US image. Results: A total of 34 thyroid, liver, and breast US examinations were performed on a Logiq 9 US system. Each of these images was subjected to the proposed EFCM algorithm and, for comparison, to commercial speckle reduction imaging (SRI) software and another well-known denoising approach, Pizurica's method. The quantification of the speckle suppression performance in the selected set of US images was carried out via Speckle Suppression Index (SSI) with results of 0.61, 0.71, and 0.73 for EFCM, SRI, and Pizurica's methods, respectively. Peak signal-to-noise ratios of 35.12, 33.95, and 29.78 and edge preservation indices of 0.94, 0.93, and 0.86 were found for the EFCM, SIR, and Pizurica's method, respectively, demonstrating that the proposed method achieves superior speckle reduction performance and edge preservation properties. Based on two independent radiologists’ qualitative evaluation the proposed method significantly improved image characteristics over standard baseline B mode images, and those processed with the Pizurica's method. Furthermore, it yielded results similar to those for SRI for breast and thyroid images significantly better results than SRI for liver imaging, thus improving diagnostic accuracy in both superficial and in-depth structures. Conclusions: A

  5. Multiresolution edge detection using enhanced fuzzy c-means clustering for ultrasound image speckle reduction

    International Nuclear Information System (INIS)

    Tsantis, Stavros; Spiliopoulos, Stavros; Karnabatidis, Dimitrios; Skouroliakou, Aikaterini; Hazle, John D.; Kagadis, George C.

    2014-01-01

    Purpose: Speckle suppression in ultrasound (US) images of various anatomic structures via a novel speckle noise reduction algorithm. Methods: The proposed algorithm employs an enhanced fuzzy c-means (EFCM) clustering and multiresolution wavelet analysis to distinguish edges from speckle noise in US images. The edge detection procedure involves a coarse-to-fine strategy with spatial and interscale constraints so as to classify wavelet local maxima distribution at different frequency bands. As an outcome, an edge map across scales is derived whereas the wavelet coefficients that correspond to speckle are suppressed in the inverse wavelet transform acquiring the denoised US image. Results: A total of 34 thyroid, liver, and breast US examinations were performed on a Logiq 9 US system. Each of these images was subjected to the proposed EFCM algorithm and, for comparison, to commercial speckle reduction imaging (SRI) software and another well-known denoising approach, Pizurica's method. The quantification of the speckle suppression performance in the selected set of US images was carried out via Speckle Suppression Index (SSI) with results of 0.61, 0.71, and 0.73 for EFCM, SRI, and Pizurica's methods, respectively. Peak signal-to-noise ratios of 35.12, 33.95, and 29.78 and edge preservation indices of 0.94, 0.93, and 0.86 were found for the EFCM, SIR, and Pizurica's method, respectively, demonstrating that the proposed method achieves superior speckle reduction performance and edge preservation properties. Based on two independent radiologists’ qualitative evaluation the proposed method significantly improved image characteristics over standard baseline B mode images, and those processed with the Pizurica's method. Furthermore, it yielded results similar to those for SRI for breast and thyroid images significantly better results than SRI for liver imaging, thus improving diagnostic accuracy in both superficial and in-depth structures. Conclusions: A

  6. Nonlinear Multiantenna Detection Methods

    Directory of Open Access Journals (Sweden)

    Chen Sheng

    2004-01-01

    Full Text Available A nonlinear detection technique designed for multiple-antenna assisted receivers employed in space-division multiple-access systems is investigated. We derive the optimal solution of the nonlinear spatial-processing assisted receiver for binary phase shift keying signalling, which we refer to as the Bayesian detector. It is shown that this optimal Bayesian receiver significantly outperforms the standard linear beamforming assisted receiver in terms of a reduced bit error rate, at the expense of an increased complexity, while the achievable system capacity is substantially enhanced with the advent of employing nonlinear detection. Specifically, when the spatial separation expressed in terms of the angle of arrival between the desired and interfering signals is below a certain threshold, a linear beamformer would fail to separate them, while a nonlinear detection assisted receiver is still capable of performing adequately. The adaptive implementation of the optimal Bayesian detector can be realized using a radial basis function network. Two techniques are presented for constructing block-data-based adaptive nonlinear multiple-antenna assisted receivers. One of them is based on the relevance vector machine invoked for classification, while the other on the orthogonal forward selection procedure combined with the Fisher ratio class-separability measure. A recursive sample-by-sample adaptation procedure is also proposed for training nonlinear detectors based on an amalgam of enhanced -means clustering techniques and the recursive least squares algorithm.

  7. HOTSPOTS DETECTION FROM TRAJECTORY DATA BASED ON SPATIOTEMPORAL DATA FIELD CLUSTERING

    Directory of Open Access Journals (Sweden)

    K. Qin

    2017-09-01

    Full Text Available City hotspots refer to the areas where residents visit frequently, and large traffic flow exist, which reflect the people travel patterns and distribution of urban function area. Taxi trajectory data contain abundant information about urban functions and citizen activities, and extracting interesting city hotspots from them can be of importance in urban planning, traffic command, public travel services etc. To detect city hotspots and discover a variety of changing patterns among them, we introduce a data field-based cluster analysis technique to the pick-up and drop-off points of taxi trajectory data and improve the method by introducing the time weight, which has been normalized to estimate the potential value in data field. Thus, in the light of the new potential function in data field, short distance and short time difference play a powerful role. So the region full of trajectory points, which is regarded as hotspots area, has a higher potential value, while the region with thin trajectory points has a lower potential value. The taxi trajectory data of Wuhan city in China on May 1, 6 and 9, 2015, are taken as the experimental data. From the result, we find the sustaining hotspots area and inconstant hotspots area in Wuhan city based on the spatiotemporal data field method. Further study will focus on optimizing parameter and the interaction among hotspots area.

  8. Orthology detection combining clustering and synteny for very large datasets.

    Directory of Open Access Journals (Sweden)

    Marcus Lechner

    Full Text Available The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.

  9. Mainshock-Aftershocks Clustering Detection in Volcanic Regions

    Science.gov (United States)

    Garza Giron, R.; Brodsky, E. E.; Prejean, S. G.

    2017-12-01

    Crustal earthquakes tend to break their general Poissonean process behavior by gathering into two main kinds of seismic bursts: swarms and mainshock-aftershocks sequences. The former is commonly related to volcanic or geothermal processes whereas the latter is a characteristic feature of tectonically driven seismicity. We explore the mainshock-aftershock clustering behavior of different active volcanic regions in Japan and its comparison to non-volcanic regions. We find that aftershock production in volcanoes shows mainshock-aftershocks clustering similar to what is observed in non-volcanic areas. The ratio of volanic areas that cluster in mainshock-aftershocks sequences vs the areas that do not is comparable to the ratio of non-volcanic regions that show clustering vs the ones that do not. Furthermore, the level of production of aftershocks for most volcanic areas where clustering is present seems to be of the same order of magnitude, or slightly higher, as the median of the non-volcanic regions. An interesting example of highly aftershock-productive volcanoes emerges from the 2000 Miyakejima dike intrusion. A big seismic cluster started to build up rapidly in the south-west flank of Miyakejima to later propagate to the north-west towards the Kozushima and Niijima volcanoes. In Miyakejima the seismicity showed a swarm-like signature with a constant earthquake rate, whereas Kozushima and Niijima both had expressions of highly productive mainshock-aftershocks sequences. These findings are surprising given the alternative mechanisms available in volcanic systems for releasing deviatoric strain. We speculate that aftershock behavior might hold a relationship with the rheological properties of the rocks of each system and with the capacity of a system to accumulate or release the internal pressures caused by magmatic or hydrothermal systems.

  10. A Centralized Detection of Sinkhole Attacks Based on Energy Level of the Nodes on Cluster-Based Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Merve Nilay Aydın

    2017-10-01

    Full Text Available Wireless Sensor Networks is consist of thousands of small and low-cost devices, which communicate over wireless medium. Due to locating in harsh environment and having limited resources, WSN is prone to various attacks. One of the most dangerous attacks threatening WSN is the sinkhole attack. In this paper, sinkhole attack is modelled on a cluster-based WSN, and a centralized detection algorithm based on the remaining energies of the nodes is proposed. The simulations were run for different values of energy thresholds and various numbers of nodes. The performance of the system was investigated over total energy consumption in the system, the number of packets arrived at base station and true detection rate of the sinkhole node(s. The results showed that the proposed method is energy-efficient and detects the malicious nodes with a 100% accuracy for all number of nodes.

  11. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    Science.gov (United States)

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  12. ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data.

    Science.gov (United States)

    Oluwadare, Oluwatosin; Cheng, Jianlin

    2017-11-14

    With the development of chromosomal conformation capturing techniques, particularly, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The Hi-C technique can generate genome-wide chromosomal interaction (contact) data, which can be used to investigate the higher-level organization of chromosomes, such as Topologically Associated Domains (TAD), i.e., locally packed chromosome regions bounded together by intra chromosomal contacts. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function. Here, we formulate the TAD identification problem as an unsupervised machine learning (clustering) problem, and develop a new TAD identification method called ClusterTAD. We introduce a novel method to represent chromosomal contacts as features to be used by the clustering algorithm. Our results show that ClusterTAD can accurately predict the TADs on a simulated Hi-C data. Our method is also largely complementary and consistent with existing methods on the real Hi-C datasets of two mouse cells. The validation with the chromatin immunoprecipitation (ChIP) sequencing (ChIP-Seq) data shows that the domain boundaries identified by ClusterTAD have a high enrichment of CTCF binding sites, promoter-related marks, and enhancer-related histone modifications. As ClusterTAD is based on a proven clustering approach, it opens a new avenue to apply a large array of clustering methods developed in the machine learning field to the TAD identification problem. The source code, the results, and the TADs generated for the simulated and real Hi-C datasets are available here: https://github.com/BDM-Lab/ClusterTAD .

  13. Convex Clustering: An Attractive Alternative to Hierarchical Clustering

    Science.gov (United States)

    Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth

    2015-01-01

    The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340

  14. A mathematical programming approach for sequential clustering of dynamic networks

    Science.gov (United States)

    Silva, Jonathan C.; Bennett, Laura; Papageorgiou, Lazaros G.; Tsoka, Sophia

    2016-02-01

    A common analysis performed on dynamic networks is community structure detection, a challenging problem that aims to track the temporal evolution of network modules. An emerging area in this field is evolutionary clustering, where the community structure of a network snapshot is identified by taking into account both its current state as well as previous time points. Based on this concept, we have developed a mixed integer non-linear programming (MINLP) model, SeqMod, that sequentially clusters each snapshot of a dynamic network. The modularity metric is used to determine the quality of community structure of the current snapshot and the historical cost is accounted for by optimising the number of node pairs co-clustered at the previous time point that remain so in the current snapshot partition. Our method is tested on social networks of interactions among high school students, college students and members of the Brazilian Congress. We show that, for an adequate parameter setting, our algorithm detects the classes that these students belong more accurately than partitioning each time step individually or by partitioning the aggregated snapshots. Our method also detects drastic discontinuities in interaction patterns across network snapshots. Finally, we present comparative results with similar community detection methods for time-dependent networks from the literature. Overall, we illustrate the applicability of mathematical programming as a flexible, adaptable and systematic approach for these community detection problems. Contribution to the Topical Issue "Temporal Network Theory and Applications", edited by Petter Holme.

  15. Heuristic methods using grasp, path relinking and variable neighborhood search for the clustered traveling salesman problem

    Directory of Open Access Journals (Sweden)

    Mário Mestria

    2013-08-01

    Full Text Available The Clustered Traveling Salesman Problem (CTSP is a generalization of the Traveling Salesman Problem (TSP in which the set of vertices is partitioned into disjoint clusters and objective is to find a minimum cost Hamiltonian cycle such that the vertices of each cluster are visited contiguously. The CTSP is NP-hard and, in this context, we are proposed heuristic methods for the CTSP using GRASP, Path Relinking and Variable Neighborhood Descent (VND. The heuristic methods were tested using Euclidean instances with up to 2000 vertices and clusters varying between 4 to 150 vertices. The computational tests were performed to compare the performance of the heuristic methods with an exact algorithm using the Parallel CPLEX software. The computational results showed that the hybrid heuristic method using VND outperforms other heuristic methods.

  16. Compact Gaussian quantum computation by multi-pixel homodyne detection

    International Nuclear Information System (INIS)

    Ferrini, G; Fabre, C; Treps, N; Gazeau, J P; Coudreau, T

    2013-01-01

    We study the possibility of producing and detecting continuous variable cluster states in an extremely compact optical setup. This method is based on a multi-pixel homodyne detection system recently demonstrated experimentally, which includes classical data post-processing. It allows the incorporation of the linear optics network, usually employed in standard experiments for the production of cluster states, in the stage of the measurement. After giving an example of cluster state generation by this method, we further study how this procedure can be generalized to perform Gaussian quantum computation. (paper)

  17. A THREE-STEP SPATIAL-TEMPORAL-SEMANTIC CLUSTERING METHOD FOR HUMAN ACTIVITY PATTERN ANALYSIS

    Directory of Open Access Journals (Sweden)

    W. Huang

    2016-06-01

    Full Text Available How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time to four dimensions (space, time and semantics. More specifically, not only a location and time that people stay and spend are collected, but also what people “say” for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The

  18. Fast clustering algorithm for large ECG data sets based on CS theory in combination with PCA and K-NN methods.

    Science.gov (United States)

    Balouchestani, Mohammadreza; Krishnan, Sridhar

    2014-01-01

    Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.

  19. PARTIAL TRAINING METHOD FOR HEURISTIC ALGORITHM OF POSSIBLE CLUSTERIZATION UNDER UNKNOWN NUMBER OF CLASSES

    Directory of Open Access Journals (Sweden)

    D. A. Viattchenin

    2009-01-01

    Full Text Available A method for constructing a subset of labeled objects which is used in a heuristic algorithm of possible  clusterization with partial  training is proposed in the  paper.  The  method  is  based  on  data preprocessing by the heuristic algorithm of possible clusterization using a transitive closure of a fuzzy tolerance. Method efficiency is demonstrated by way of an illustrative example.

  20. Segmentation of clustered cells in negative phase contrast images with integrated light intensity and cell shape information.

    Science.gov (United States)

    Wang, Y; Wang, C; Zhang, Z

    2018-05-01

    Automated cell segmentation plays a key role in characterisations of cell behaviours for both biology research and clinical practices. Currently, the segmentation of clustered cells still remains as a challenge and is the main reason for false segmentation. In this study, the emphasis was put on the segmentation of clustered cells in negative phase contrast images. A new method was proposed to combine both light intensity and cell shape information through the construction of grey-weighted distance transform (GWDT) within preliminarily segmented areas. With the constructed GWDT, the clustered cells can be detected and then separated with a modified region skeleton-based method. Moreover, a contour expansion operation was applied to get optimised detection of cell boundaries. In this paper, the working principle and detailed procedure of the proposed method are described, followed by the evaluation of the method on clustered cell segmentation. Results show that the proposed method achieves an improved performance in clustered cell segmentation compared with other methods, with 85.8% and 97.16% accuracy rate for clustered cells and all cells, respectively. © 2017 The Authors Journal of Microscopy © 2017 Royal Microscopical Society.

  1. The potential of clustering methods to define intersection test scenarios: Assessing real-life performance of AEB.

    Science.gov (United States)

    Sander, Ulrich; Lubbe, Nils

    2018-04-01

    Intersection accidents are frequent and harmful. The accident types 'straight crossing path' (SCP), 'left turn across path - oncoming direction' (LTAP/OD), and 'left-turn across path - lateral direction' (LTAP/LD) represent around 95% of all intersection accidents and one-third of all police-reported car-to-car accidents in Germany. The European New Car Assessment Program (Euro NCAP) have announced that intersection scenarios will be included in their rating from 2020; however, how these scenarios are to be tested has not been defined. This study investigates whether clustering methods can be used to identify a small number of test scenarios sufficiently representative of the accident dataset to evaluate Intersection Automated Emergency Braking (AEB). Data from the German In-Depth Accident Study (GIDAS) and the GIDAS-based Pre-Crash Matrix (PCM) from 1999 to 2016, containing 784 SCP and 453 LTAP/OD accidents, were analyzed with principal component methods to identify variables that account for the relevant total variances of the sample. Three different methods for data clustering were applied to each of the accident types, two similarity-based approaches, namely Hierarchical Clustering (HC) and Partitioning Around Medoids (PAM), and the probability-based Latent Class Clustering (LCC). The optimum number of clusters was derived for HC and PAM with the silhouette method. The PAM algorithm was both initiated with random start medoid selection and medoids from HC. For LCC, the Bayesian Information Criterion (BIC) was used to determine the optimal number of clusters. Test scenarios were defined from optimal cluster medoids weighted by their real-life representation in GIDAS. The set of variables for clustering was further varied to investigate the influence of variable type and character. We quantified how accurately each cluster variation represents real-life AEB performance using pre-crash simulations with PCM data and a generic algorithm for AEB intervention. The

  2. Investigation of the cluster formation in lithium niobate crystals by computer modeling method

    Energy Technology Data Exchange (ETDEWEB)

    Voskresenskii, V. M.; Starodub, O. R., E-mail: ol-star@mail.ru; Sidorov, N. V.; Palatnikov, M. N. [Russian Academy of Sciences, Tananaev Institute of Chemistry and Technology of Rare Earth Elements and Mineral Raw Materials, Kola Science Centre (Russian Federation)

    2017-03-15

    The processes occurring upon the formation of energetically equilibrium oxygen-octahedral clusters in the ferroelectric phase of a stoichiometric lithium niobate (LiNbO{sub 3}) crystal have been investigated by the computer modeling method within the semiclassical atomistic model. An energetically favorable cluster size (at which a structure similar to that of a congruent crystal is organized) is shown to exist. A stoichiometric cluster cannot exist because of the electroneutrality loss. The most energetically favorable cluster is that with a Li/Nb ratio of about 0.945, a value close to the lithium-to-niobium ratio for a congruent crystal.

  3. Clustering and training set selection methods for improving the accuracy of quantitative laser induced breakdown spectroscopy

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Ryan B., E-mail: randerson@astro.cornell.edu [Cornell University Department of Astronomy, 406 Space Sciences Building, Ithaca, NY 14853 (United States); Bell, James F., E-mail: Jim.Bell@asu.edu [Arizona State University School of Earth and Space Exploration, Bldg.: INTDS-A, Room: 115B, Box 871404, Tempe, AZ 85287 (United States); Wiens, Roger C., E-mail: rwiens@lanl.gov [Los Alamos National Laboratory, P.O. Box 1663 MS J565, Los Alamos, NM 87545 (United States); Morris, Richard V., E-mail: richard.v.morris@nasa.gov [NASA Johnson Space Center, 2101 NASA Parkway, Houston, TX 77058 (United States); Clegg, Samuel M., E-mail: sclegg@lanl.gov [Los Alamos National Laboratory, P.O. Box 1663 MS J565, Los Alamos, NM 87545 (United States)

    2012-04-15

    We investigated five clustering and training set selection methods to improve the accuracy of quantitative chemical analysis of geologic samples by laser induced breakdown spectroscopy (LIBS) using partial least squares (PLS) regression. The LIBS spectra were previously acquired for 195 rock slabs and 31 pressed powder geostandards under 7 Torr CO{sub 2} at a stand-off distance of 7 m at 17 mJ per pulse to simulate the operational conditions of the ChemCam LIBS instrument on the Mars Science Laboratory Curiosity rover. The clustering and training set selection methods, which do not require prior knowledge of the chemical composition of the test-set samples, are based on grouping similar spectra and selecting appropriate training spectra for the partial least squares (PLS2) model. These methods were: (1) hierarchical clustering of the full set of training spectra and selection of a subset for use in training; (2) k-means clustering of all spectra and generation of PLS2 models based on the training samples within each cluster; (3) iterative use of PLS2 to predict sample composition and k-means clustering of the predicted compositions to subdivide the groups of spectra; (4) soft independent modeling of class analogy (SIMCA) classification of spectra, and generation of PLS2 models based on the training samples within each class; (5) use of Bayesian information criteria (BIC) to determine an optimal number of clusters and generation of PLS2 models based on the training samples within each cluster. The iterative method and the k-means method using 5 clusters showed the best performance, improving the absolute quadrature root mean squared error (RMSE) by {approx} 3 wt.%. The statistical significance of these improvements was {approx} 85%. Our results show that although clustering methods can modestly improve results, a large and diverse training set is the most reliable way to improve the accuracy of quantitative LIBS. In particular, additional sulfate standards and

  4. Remote detection device and detection method therefor

    International Nuclear Information System (INIS)

    Kogure, Sumio; Yoshida, Yoji; Matsuo, Takashiro; Takehara, Hidetoshi; Kojima, Shinsaku.

    1997-01-01

    The present invention provides a non-destructive detection device for collectively, efficiently and effectively conducting maintenance and detection for confirming the integrity of a nuclear reactor by way of a shielding member for shielding radiation rays generated from an objective portion to be detected. Namely, devices for direct visual detection using an under water TV camera as a sensor, an eddy current detection using a coil as a sensor and each magnetic powder flow detection are integrated and applied collectively. Specifically, the visual detection by using the TV camera and the eddy current flaw detection are adopted together. The flaw detection with magnetic powder is applied as a means for confirming the results of the two kinds of detections by other method. With such procedures, detection techniques using respective specific theories are combined thereby enabling to enhance the accuracy for the evaluation of the detection. (I.S.)

  5. Point Cluster Analysis Using a 3D Voronoi Diagram with Applications in Point Cloud Segmentation

    Directory of Open Access Journals (Sweden)

    Shen Ying

    2015-08-01

    Full Text Available Three-dimensional (3D point analysis and visualization is one of the most effective methods of point cluster detection and segmentation in geospatial datasets. However, serious scattering and clotting characteristics interfere with the visual detection of 3D point clusters. To overcome this problem, this study proposes the use of 3D Voronoi diagrams to analyze and visualize 3D points instead of the original data item. The proposed algorithm computes the cluster of 3D points by applying a set of 3D Voronoi cells to describe and quantify 3D points. The decompositions of point cloud of 3D models are guided by the 3D Voronoi cell parameters. The parameter values are mapped from the Voronoi cells to 3D points to show the spatial pattern and relationships; thus, a 3D point cluster pattern can be highlighted and easily recognized. To capture different cluster patterns, continuous progressive clusters and segmentations are tested. The 3D spatial relationship is shown to facilitate cluster detection. Furthermore, the generated segmentations of real 3D data cases are exploited to demonstrate the feasibility of our approach in detecting different spatial clusters for continuous point cloud segmentation.

  6. Measurement of Galaxy Cluster Integrated Comptonization and Mass Scaling Relations with the South Pole Telescope

    Energy Technology Data Exchange (ETDEWEB)

    Saliwanchik, B. R.; et al.

    2015-01-22

    We describe a method for measuring the integrated Comptonization (Y (SZ)) of clusters of galaxies from measurements of the Sunyaev-Zel'dovich (SZ) effect in multiple frequency bands and use this method to characterize a sample of galaxy clusters detected in the South Pole Telescope (SPT) data. We use a Markov Chain Monte Carlo method to fit a β-model source profile and integrate Y (SZ) within an angular aperture on the sky. In simulated observations of an SPT-like survey that include cosmic microwave background anisotropy, point sources, and atmospheric and instrumental noise at typical SPT-SZ survey levels, we show that we can accurately recover β-model parameters for inputted clusters. We measure Y (SZ) for simulated semi-analytic clusters and find that Y (SZ) is most accurately determined in an angular aperture comparable to the SPT beam size. We demonstrate the utility of this method to measure Y (SZ) and to constrain mass scaling relations using X-ray mass estimates for a sample of 18 galaxy clusters from the SPT-SZ survey. Measuring Y (SZ) within a 0.'75 radius aperture, we find an intrinsic log-normal scatter of 21% ± 11% in Y (SZ) at a fixed mass. Measuring Y (SZ) within a 0.3 Mpc projected radius (equivalent to 0.'75 at the survey median redshift z = 0.6), we find a scatter of 26% ± 9%. Prior to this study, the SPT observable found to have the lowest scatter with mass was cluster detection significance. We demonstrate, from both simulations and SPT observed clusters that Y (SZ) measured within an aperture comparable to the SPT beam size is equivalent, in terms of scatter with cluster mass, to SPT cluster detection significance.

  7. Grey Wolf Optimizer Based on Powell Local Optimization Method for Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Sen Zhang

    2015-01-01

    Full Text Available One heuristic evolutionary algorithm recently proposed is the grey wolf optimizer (GWO, inspired by the leadership hierarchy and hunting mechanism of grey wolves in nature. This paper presents an extended GWO algorithm based on Powell local optimization method, and we call it PGWO. PGWO algorithm significantly improves the original GWO in solving complex optimization problems. Clustering is a popular data analysis and data mining technique. Hence, the PGWO could be applied in solving clustering problems. In this study, first the PGWO algorithm is tested on seven benchmark functions. Second, the PGWO algorithm is used for data clustering on nine data sets. Compared to other state-of-the-art evolutionary algorithms, the results of benchmark and data clustering demonstrate the superior performance of PGWO algorithm.

  8. Application Of WIMS Code To Calculation Kartini Reactor Parameters By Pin-Cell And Cluster Method

    International Nuclear Information System (INIS)

    Sumarsono, Bambang; Tjiptono, T.W.

    1996-01-01

    Analysis UZrH fuel element parameters calculation in Kartini Reactor by WIMS Code has been done. The analysis is done by pin cell and cluster method. The pin cell method is done as a function percent burn-up and by 8 group 3 region analysis and cluster method by 8 group 12 region analysis. From analysis and calculation resulted K ∼ = 1.3687 by pin cell method and K ∼ = 1.3162 by cluster method and so deviation is 3.83%. By pin cell analysis as a function percent burn-up at the percent burn-up greater than 59.50%, the multiplication factor is less than one (k ∼ < 1) it is mean that the fuel element reactivity is negative

  9. Supersonic copper clusters

    International Nuclear Information System (INIS)

    Powers, D.E.; Hansen, S.G.; Geusic, M.E.; Michalopoulos, D.L.; Smalley, R.E.

    1983-01-01

    Copper clusters ranging in size from 1 to 29 atoms have been prepared in a supersonic beam by laser vaporization of a rotating copper target rod within the throat of a pulsed supersonic nozzle using helium for the carrier gas. The clusters were cooled extensively in the supersonic expansion [T(translational) 1 to 4 K, T(rotational) = 4 K, T(vibrational) = 20 to 70 K]. These clusters were detected in the supersonic beam by laser photoionization with time-of-flight mass analysis. Using a number of fixed frequency outputs of an exciplex laser, the threshold behavior of the photoionization cross section was monitored as a function of cluster size.nce two-photon ionization (R2PI) with mass selective detection allowed the detection of five new electronic band systems in the region between 2690 and 3200 A, for each of the three naturally occurring isotopic forms of Cu 2 . In the process of scanning the R2PI spectrum of these new electronic states, the ionization potential of the copper dimer was determined to be 7.894 +- 0.015 eV

  10. Puzzle of magnetic moments of Ni clusters revisited using quantum Monte Carlo method.

    Science.gov (United States)

    Lee, Hung-Wen; Chang, Chun-Ming; Hsing, Cheng-Rong

    2017-02-28

    The puzzle of the magnetic moments of small nickel clusters arises from the discrepancy between values predicted using density functional theory (DFT) and experimental measurements. Traditional DFT approaches underestimate the magnetic moments of nickel clusters. Two fundamental problems are associated with this puzzle, namely, calculating the exchange-correlation interaction accurately and determining the global minimum structures of the clusters. Theoretically, the two problems can be solved using quantum Monte Carlo (QMC) calculations and the ab initio random structure searching (AIRSS) method correspondingly. Therefore, we combined the fixed-moment AIRSS and QMC methods to investigate the magnetic properties of Ni n (n = 5-9) clusters. The spin moments of the diffusion Monte Carlo (DMC) ground states are higher than those of the Perdew-Burke-Ernzerhof ground states and, in the case of Ni 8-9 , two new ground-state structures have been discovered using the DMC calculations. The predicted results are closer to the experimental findings, unlike the results predicted in previous standard DFT studies.

  11. Anomalous properties of technetium clusters

    International Nuclear Information System (INIS)

    Kryuchkov, S.V.

    1985-01-01

    On the basis of critical evaluation of literature data in the field of chemistry of technetium cluster compounds with ligands of a weak field a conclusion is made on specific, ''anomalous'' properties of technetium cluster complexes which consist in an increased ability of the given element to the formation of a series of binuclear and multinuclear clusters, similar in composition and structure and easily transforming in each other. The majority of technetium clusters unlike similar compounds of other elements are paramagnetic with one unpaired electron on ''metallic'' MO of loosening type. All theoretical conceptions known today on the electronic structure of technetium clusters are considered. It is pointed out, that the best results in the explanation of ''anomalous'' properties of technetium clusters can be obtained in the framework of nonempirical methods of self-consistent field taking into account configuration interactions. It is also shown, that certain properties of technetium clusters can be explained on the basis of qualitative model of Coulomb repulsion of metal atoms in clusters. The conclusion is made, that technetium position in the Periodic table, as well as recently detected technetium property to the decrease of effective charge on its atoms during M-M bond formation promote a high ability of the element to cluster formation both with weak field ligands and with strong field one

  12. Structure and physical properties of silicon clusters and of vacancy clusters in bulk silicon

    International Nuclear Information System (INIS)

    Sieck, A.

    2000-01-01

    In this thesis the growth-pattern of free silicon clusters and vacancy clusters in bulk silicon is investigated. The aim is to describe and to better understand the cluster to bulk transition. Silicon structures in between clusters and solids feature new interesting physical properties. The structure and physical properties of silicon clusters can be revealed by a combination of theory and experiment, only. Low-energy clusters are determined with different optimization techniques and a density-functional based tight-binding method. Additionally, infrared and Raman spectra, and polarizabilities calculated within self-consistent field density-functional theory are provided for the smaller clusters. For clusters with 25 to 35 atoms an analysis of the shape of the clusters and the related mobilities in a buffer gas is given. Finally, the clusters observed in low-temperature experiments are identified via the best match between calculated properties and experimental data. Silicon clusters with 10 to 15 atoms have a tricapped trigonal prism as a common subunit. Clusters with up to about 25 atoms follow a prolate growth-path. In the range from 24 to 30 atoms the geometry of the clusters undergoes a transition towards compact spherical structures. Low-energy clusters with up to 240 atoms feature a bonding pattern strikingly different from the tetrahedral bonding in the solid. It follows that structures with dimensions of several Angstroem have electrical and optical properties different from the solid. The calculated stabilities and positron-lifetimes of vacancy clusters in bulk silicon indicate the positron-lifetimes of about 435 ps detected in irradiated silicon to be related to clusters of 9 or 10 vacancies. The vacancies in these clusters form neighboring hexa-rings and, therefore, minimize the number of dangling bonds. (orig.)

  13. The Atacama Cosmology Telescope: Cosmology from Galaxy Clusters Detected via the Sunyaev-Zel'dovich Effect

    Energy Technology Data Exchange (ETDEWEB)

    Sehgal, Neelima; Trac, Hy; Acquaviva, Viviana; Ade, Peter A.R.; Aguirre, Paula; Amiri, Mandana; Appel, John W.; Barrientos, L.Felipe; Battistelli, Elia S.; Bond, J.Richard; Brown, Ben; Burger, Bryce; Chervenak, Jay; Das, Sudeep; Devlin, Mark J.; Dicker, Simon R.; Doriese, W.Bertrand; Dunkley, Joanna; Dunner, Rolando; Essinger-Hileman, Thomas; Fisher, Ryan P.

    2011-08-18

    We present constraints on cosmological parameters based on a sample of Sunyaev-Zeldovich-selected galaxy clusters detected in a millimeter-wave survey by the Atacama Cosmology Telescope. The cluster sample used in this analysis consists of 9 optically-confirmed high-mass clusters comprising the high-significance end of the total cluster sample identified in 455 square degrees of sky surveyed during 2008 at 148GHz. We focus on the most massive systems to reduce the degeneracy between unknown cluster astrophysics and cosmology derived from SZ surveys. We describe the scaling relation between cluster mass and SZ signal with a 4-parameter fit. Marginalizing over the values of the parameters in this fit with conservative priors gives {sigma}{sub 8} = 0.851 {+-} 0.115 and w = -1.14 {+-} 0.35 for a spatially-flat wCDM cosmological model with WMAP 7-year priors on cosmological parameters. This gives a modest improvement in statistical uncertainty over WMAP 7-year constraints alone. Fixing the scaling relation between cluster mass and SZ signal to a fiducial relation obtained from numerical simulations and calibrated by X-ray observations, we find {sigma}{sub 8} = 0.821 {+-} 0.044 and w = -1.05 {+-} 0.20. These results are consistent with constraints from WMAP 7 plus baryon acoustic oscillations plus type Ia supernoava which give {sigma}{sub 8} = 0.802 {+-} 0.038 and w = -0.98 {+-} 0.053. A stacking analysis of the clusters in this sample compared to clusters simulated assuming the fiducial model also shows good agreement. These results suggest that, given the sample of clusters used here, both the astrophysics of massive clusters and the cosmological parameters derived from them are broadly consistent with current models.

  14. Analysis of cost data in a cluster-randomized, controlled trial: comparison of methods

    DEFF Research Database (Denmark)

    Sokolowski, Ineta; Ørnbøl, Eva; Rosendal, Marianne

    studies have used non-valid analysis of skewed data. We propose two different methods to compare mean cost in two groups. Firstly, we use a non-parametric bootstrap method where the re-sampling takes place on two levels in order to take into account the cluster effect. Secondly, we proceed with a log......-transformation of the cost data and apply the normal theory on these data. Again we try to account for the cluster effect. The performance of these two methods is investigated in a simulation study. The advantages and disadvantages of the different approaches are discussed.......  We consider health care data from a cluster-randomized intervention study in primary care to test whether the average health care costs among study patients differ between the two groups. The problems of analysing cost data are that most data are severely skewed. Median instead of mean...

  15. Dynamic Fuzzy Clustering Method for Decision Support in Electricity Markets Negotiation

    Directory of Open Access Journals (Sweden)

    Ricardo FAIA

    2016-10-01

    Full Text Available Artificial Intelligence (AI methods contribute to the construction of systems where there is a need to automate the tasks. They are typically used for problems that have a large response time, or when a mathematical method cannot be used to solve the problem. However, the application of AI brings an added complexity to the development of such applications. AI has been frequently applied in the power systems field, namely in Electricity Markets (EM. In this area, AI applications are essentially used to forecast / estimate the prices of electricity or to search for the best opportunity to sell the product. This paper proposes a clustering methodology that is combined with fuzzy logic in order to perform the estimation of EM prices. The proposed method is based on the application of a clustering methodology that groups historic energy contracts according to their prices’ similarity. The optimal number of groups is automatically calculated taking into account the preference for the balance between the estimation error and the number of groups. The centroids of each cluster are used to define a dynamic fuzzy variable that approximates the tendency of contracts’ history. The resulting fuzzy variable allows estimating expected prices for contracts instantaneously and approximating missing values in the historic contracts.

  16. CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

    Science.gov (United States)

    Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin

    2017-08-31

    Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.

  17. Applying Hotspot Detection Methods in Forestry: A Case Study of Chestnut Oak Regeneration

    International Nuclear Information System (INIS)

    Fei, S.

    2010-01-01

    Hotspot detection has been widely adopted in health sciences for disease surveillance, but rarely in natural resource disciplines. In this paper, two spatial scan statistics (SaT Scan and Cluster Seer) and a non spatial classification and regression trees method were evaluated as techniques for identifying chestnut oak (Quercus Montana) regeneration hotspots among 50 mixed-oak stands in the central Appalachian region of the eastern United States. Hotspots defined by the three methods had a moderate level of conformity and revealed similar chestnut oak regeneration site affinity. Chestnut oak regeneration hotspots were positively associated with the abundance of chestnut oak trees in the over story and a moderate cover of heather species (Vaccinium and Gaylussacia spp.) but were negatively associated with the abundance of hay scented fern (Dennstaedtia punctilobula) and mountain laurel (Kalmia latiforia). In general, hotspot detection is a viable tool for assisting natural resource managers with identifying areas possessing significantly high or low tree regeneration.

  18. Applying Hotspot Detection Methods in Forestry: A Case Study of Chestnut Oak Regeneration

    Directory of Open Access Journals (Sweden)

    Songlin Fei

    2010-01-01

    Full Text Available Hotspot detection has been widely adopted in health sciences for disease surveillance, but rarely in natural resource disciplines. In this paper, two spatial scan statistics (SaTScan and ClusterSeer and a nonspatial classification and regression trees method were evaluated as techniques for identifying chestnut oak (Quercus Montana regeneration hotspots among 50 mixed-oak stands in the central Appalachian region of the eastern United States. Hotspots defined by the three methods had a moderate level of conformity and revealed similar chestnut oak regeneration site affinity. Chestnut oak regeneration hotspots were positively associated with the abundance of chestnut oak trees in the overstory and a moderate cover of heather species (Vaccinium and Gaylussacia spp. but were negatively associated with the abundance of hayscented fern (Dennstaedtia punctilobula and mountain laurel (Kalmia latiforia. In general, hotspot detection is a viable tool for assisting natural resource managers with identifying areas possessing significantly high or low tree regeneration.

  19. Pseudo-potential method for taking into account the Pauli principle in cluster systems

    International Nuclear Information System (INIS)

    Krasnopol'skii, V.M.; Kukulin, V.I.

    1975-01-01

    In order to take account of the Pauli principle in cluster systems (such as 3α, α + α + n) a convenient method of renormalization of the cluster-cluster deep attractive potentials with forbidden states is suggested. The renormalization consists of adding projectors upon the occupied states with an infinite coupling constant to the initial deep potential which means that we pass to pseudo-potentials. The pseudo-potential approach in projecting upon the noneigenstates is shown to be equivalent to the orthogonality condition model of Saito et al. The orthogonality of the many-particle wave function to the forbidden states of each two-cluster sub-system is clearly demonstrated

  20. A diabetic retinopathy detection method using an improved pillar K-means algorithm.

    Science.gov (United States)

    Gogula, Susmitha Valli; Divakar, Ch; Satyanarayana, Ch; Rao, Allam Appa

    2014-01-01

    The paper presents a new approach for medical image segmentation. Exudates are a visible sign of diabetic retinopathy that is the major reason of vision loss in patients with diabetes. If the exudates extend into the macular area, blindness may occur. Automated detection of exudates will assist ophthalmologists in early diagnosis. This segmentation process includes a new mechanism for clustering the elements of high-resolution images in order to improve precision and reduce computation time. The system applies K-means clustering to the image segmentation after getting optimized by Pillar algorithm; pillars are constructed in such a way that they can withstand the pressure. Improved pillar algorithm can optimize the K-means clustering for image segmentation in aspects of precision and computation time. This evaluates the proposed approach for image segmentation by comparing with Kmeans and Fuzzy C-means in a medical image. Using this method, identification of dark spot in the retina becomes easier and the proposed algorithm is applied on diabetic retinal images of all stages to identify hard and soft exudates, where the existing pillar K-means is more appropriate for brain MRI images. This proposed system help the doctors to identify the problem in the early stage and can suggest a better drug for preventing further retinal damage.

  1. Unsupervised machine-learning method for improving the performance of ambulatory fall-detection systems

    Directory of Open Access Journals (Sweden)

    Yuwono Mitchell

    2012-02-01

    Full Text Available Abstract Background Falls can cause trauma, disability and death among older people. Ambulatory accelerometer devices are currently capable of detecting falls in a controlled environment. However, research suggests that most current approaches can tend to have insufficient sensitivity and specificity in non-laboratory environments, in part because impacts can be experienced as part of ordinary daily living activities. Method We used a waist-worn wireless tri-axial accelerometer combined with digital signal processing, clustering and neural network classifiers. The method includes the application of Discrete Wavelet Transform, Regrouping Particle Swarm Optimization, Gaussian Distribution of Clustered Knowledge and an ensemble of classifiers including a multilayer perceptron and Augmented Radial Basis Function (ARBF neural networks. Results Preliminary testing with 8 healthy individuals in a home environment yields 98.6% sensitivity to falls and 99.6% specificity for routine Activities of Daily Living (ADL data. Single ARB and MLP classifiers were compared with a combined classifier. The combined classifier offers the greatest sensitivity, with a slight reduction in specificity for routine ADL and an increased specificity for exercise activities. In preliminary tests, the approach achieves 100% sensitivity on in-group falls, 97.65% on out-group falls, 99.33% specificity on routine ADL, and 96.59% specificity on exercise ADL. Conclusion The pre-processing and feature-extraction steps appear to simplify the signal while successfully extracting the essential features that are required to characterize a fall. The results suggest this combination of classifiers can perform better than MLP alone. Preliminary testing suggests these methods may be useful for researchers who are attempting to improve the performance of ambulatory fall-detection systems.

  2. Test computations on the dynamical evolution of star clusters. [Fluid dynamic method

    Energy Technology Data Exchange (ETDEWEB)

    Angeletti, L; Giannone, P. (Rome Univ. (Italy))

    1977-01-01

    Test calculations have been carried out on the evolution of star clusters using the fluid-dynamical method devised by Larson (1970). Large systems of stars have been considered with specific concern with globular clusters. With reference to the analogous 'standard' model by Larson, the influence of varying in turn the various free parameters (cluster mass, star mass, tidal radius, mass concentration of the initial model) has been studied for the results. Furthermore, the partial release of some simplifying assumptions with regard to the relaxation time and distribution of the 'target' stars has been considered. The change of the structural properties is discussed, and the variation of the evolutionary time scale is outlined. An indicative agreement of the results obtained here with structural properties of globular clusters as deduced from previous theoretical models is pointed out.

  3. The use of the temporal scan statistic to detect methicillin-resistant Staphylococcus aureus clusters in a community hospital.

    Science.gov (United States)

    Faires, Meredith C; Pearl, David L; Ciccotelli, William A; Berke, Olaf; Reid-Smith, Richard J; Weese, J Scott

    2014-07-08

    In healthcare facilities, conventional surveillance techniques using rule-based guidelines may result in under- or over-reporting of methicillin-resistant Staphylococcus aureus (MRSA) outbreaks, as these guidelines are generally unvalidated. The objectives of this study were to investigate the utility of the temporal scan statistic for detecting MRSA clusters, validate clusters using molecular techniques and hospital records, and determine significant differences in the rate of MRSA cases using regression models. Patients admitted to a community hospital between August 2006 and February 2011, and identified with MRSA>48 hours following hospital admission, were included in this study. Between March 2010 and February 2011, MRSA specimens were obtained for spa typing. MRSA clusters were investigated using a retrospective temporal scan statistic. Tests were conducted on a monthly scale and significant clusters were compared to MRSA outbreaks identified by hospital personnel. Associations between the rate of MRSA cases and the variables year, month, and season were investigated using a negative binomial regression model. During the study period, 735 MRSA cases were identified and 167 MRSA isolates were spa typed. Nine different spa types were identified with spa type 2/t002 (88.6%) the most prevalent. The temporal scan statistic identified significant MRSA clusters at the hospital (n=2), service (n=16), and ward (n=10) levels (P ≤ 0.05). Seven clusters were concordant with nine MRSA outbreaks identified by hospital staff. For the remaining clusters, seven events may have been equivalent to true outbreaks and six clusters demonstrated possible transmission events. The regression analysis indicated years 2009-2011, compared to 2006, and months March and April, compared to January, were associated with an increase in the rate of MRSA cases (P ≤ 0.05). The application of the temporal scan statistic identified several MRSA clusters that were not detected by hospital

  4. Environmental data processing by clustering methods for energy forecast and planning

    Energy Technology Data Exchange (ETDEWEB)

    Di Piazza, Annalisa [Dipartimento di Ingegneria Idraulica e Applicazioni Ambientali (DIIAA), viale delle Scienze, Universita degli Studi di Palermo, 90128 Palermo (Italy); Di Piazza, Maria Carmela; Ragusa, Antonella; Vitale, Gianpaolo [Consiglio Nazionale delle Ricerche Istituto di Studi sui Sistemi Intelligenti per l' Automazione (ISSIA - CNR), sezione di Palermo, Via Dante, 12, 90141 Palermo (Italy)

    2011-03-15

    This paper presents a statistical approach based on the k-means clustering technique to manage environmental sampled data to evaluate and to forecast of the energy deliverable by different renewable sources in a given site. In particular, wind speed and solar irradiance sampled data are studied in association to the energy capability of a wind generator and a photovoltaic (PV) plant, respectively. The proposed method allows the sub-sets of useful data, describing the energy capability of a site, to be extracted from a set of experimental observations belonging the considered site. The data collection is performed in Sicily, in the south of Italy, as case study. As far as the wind generation is concerned, a suitable generator, matching the wind profile of the studied sites, has been selected for the evaluation of the producible energy. With respect to the photovoltaic generation, the irradiance data have been taken from the acquisition system of an actual installation. It is demonstrated, in both cases, that the use of the k-means clustering method allows data that do not contribute to the produced energy to be grouped into a cluster, moreover it simplifies the problem of the energy assessment since it permits to obtain the desired information on energy capability by managing a reduced amount of experimental samples. In the studied cases, the proposed method permitted a reduction of the 50% of the data with a maximum discrepancy of 10% in energy estimation compared to the classical statistical approach. Therefore, the adopted k-means clustering technique represents an useful tool for an appropriate and less demanding energy forecast and planning in distributed generation systems. (author)

  5. The use of different clustering methods in the evaluation of genetic diversity in upland cotton

    Directory of Open Access Journals (Sweden)

    Laíse Ferreira de Araújo

    Full Text Available The continuous development and evaluation of new genotypes through crop breeding is essential in order to obtain new cultivars. The objective of this work was to evaluate the genetic divergences between cultivars of upland cotton (Gossypium hirsutum L. using the agronomic and technological characteristics of the fibre, in order to select superior parent plants. The experiment was set up during 2010 at the Federal University of Ceará in Fortaleza, Ceará, Brazil. Eleven cultivars of upland cotton were used in an experimental design of randomised blocks with three replications. In order to evaluate the genetic diversity among cultivars, the generalised Mahalanobis distance matrix was calculated, with cluster analysis then being applied, employing various methods: single linkage, Ward, complete linkage, median, average linkage within a cluster and average linkage between clusters. Genetic variability exists among the evaluated genotypes. The most consistant clustering method was that employing average linkage between clusters. Among the characteristics assessed, mean boll weight presented the highest contribution to genetic diversity, followed by elongation at rupture. Employing the method of mean linkage between clusters, the cultivars with greater genetic divergence were BRS Acacia and LD Frego; those of greater similarity were BRS Itaúba and BRS Araripe.

  6. Comparison Of Keyword Based Clustering Of Web Documents By Using Openstack 4j And By Traditional Method

    Directory of Open Access Journals (Sweden)

    Shiza Anand

    2015-08-01

    Full Text Available As the number of hypertext documents are increasing continuously day by day on world wide web. Therefore clustering methods will be required to bind documents into the clusters repositories according to the similarity lying between the documents. Various clustering methods exist such as Hierarchical Based K-means Fuzzy Logic Based Centroid Based etc. These keyword based clustering methods takes much more amount of time for creating containers and putting documents in their respective containers. These traditional methods use File Handling techniques of different programming languages for creating repositories and transferring web documents into these containers. In contrast openstack4j SDK is a new technique for creating containers and shifting web documents into these containers according to the similarity in much more less amount of time as compared to the traditional methods. Another benefit of this technique is that this SDK understands and reads all types of files such as jpg html pdf doc etc. This paper compares the time required for clustering of documents by using openstack4j and by traditional methods and suggests various search engines to adopt this technique for clustering so that they give result to the user querries in less amount of time.

  7. Application of clustering methods: Regularized Markov clustering (R-MCL) for analyzing dengue virus similarity

    Science.gov (United States)

    Lestari, D.; Raharjo, D.; Bustamam, A.; Abdillah, B.; Widhianto, W.

    2017-07-01

    Dengue virus consists of 10 different constituent proteins and are classified into 4 major serotypes (DEN 1 - DEN 4). This study was designed to perform clustering against 30 protein sequences of dengue virus taken from Virus Pathogen Database and Analysis Resource (VIPR) using Regularized Markov Clustering (R-MCL) algorithm and then we analyze the result. By using Python program 3.4, R-MCL algorithm produces 8 clusters with more than one centroid in several clusters. The number of centroid shows the density level of interaction. Protein interactions that are connected in a tissue, form a complex protein that serves as a specific biological process unit. The analysis of result shows the R-MCL clustering produces clusters of dengue virus family based on the similarity role of their constituent protein, regardless of serotypes.

  8. Doppler method leak detection for LMFBR steam generators. Pt. 3. Investigation of detection sensitivity and method

    International Nuclear Information System (INIS)

    Kumagai, Hiromichi; Kinoshita, Izumi

    2001-01-01

    To prevent the expansion of tube damage and to maintain structural integrity in the steam generators (SGs) of a fast breeder reactor (FBR), it is necessary to detect precisely and immediately any leakage of water from heat transfer tubes. Therefore, the Doppler method was developed. Previous studies have revealed that, in the SG full-sector model that simulates actual SGs, the Doppler method can detect bubbles of 0.4 l/s within a few seconds. However in consideration of the dissolution rate of hydrogen generated by a sodium-water reaction even from a small water leak, it is necessary to detect smaller leakages of water from the heat transfer tubes. The detection sensitivity of the Doppler method and the influence of background noise were experimentally investigated. In-water experiments were performed using the SG model. The results show that the Doppler method can detect bubbles of 0.01 l/s (equivalent to a water leak rate of about 0.01 g/s) within a few seconds and that the background noise has little effect on water leak detection performance. The Doppler method thus has great potential for the detection of water leakage in SGs. (author)

  9. Characterization and detection of a widely distributed gene cluster that predicts anaerobic choline utilization by human gut bacteria.

    Science.gov (United States)

    Martínez-del Campo, Ana; Bodea, Smaranda; Hamer, Hilary A; Marks, Jonathan A; Haiser, Henry J; Turnbaugh, Peter J; Balskus, Emily P

    2015-04-14

    Elucidation of the molecular mechanisms underlying the human gut microbiota's effects on health and disease has been complicated by difficulties in linking metabolic functions associated with the gut community as a whole to individual microorganisms and activities. Anaerobic microbial choline metabolism, a disease-associated metabolic pathway, exemplifies this challenge, as the specific human gut microorganisms responsible for this transformation have not yet been clearly identified. In this study, we established the link between a bacterial gene cluster, the choline utilization (cut) cluster, and anaerobic choline metabolism in human gut isolates by combining transcriptional, biochemical, bioinformatic, and cultivation-based approaches. Quantitative reverse transcription-PCR analysis and in vitro biochemical characterization of two cut gene products linked the entire cluster to growth on choline and supported a model for this pathway. Analyses of sequenced bacterial genomes revealed that the cut cluster is present in many human gut bacteria, is predictive of choline utilization in sequenced isolates, and is widely but discontinuously distributed across multiple bacterial phyla. Given that bacterial phylogeny is a poor marker for choline utilization, we were prompted to develop a degenerate PCR-based method for detecting the key functional gene choline TMA-lyase (cutC) in genomic and metagenomic DNA. Using this tool, we found that new choline-metabolizing gut isolates universally possessed cutC. We also demonstrated that this gene is widespread in stool metagenomic data sets. Overall, this work represents a crucial step toward understanding anaerobic choline metabolism in the human gut microbiota and underscores the importance of examining this microbial community from a function-oriented perspective. Anaerobic choline utilization is a bacterial metabolic activity that occurs in the human gut and is linked to multiple diseases. While bacterial genes responsible for

  10. Cluster Based Text Classification Model

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases...... the accuracy at the same time. The test example is classified using simpler and smaller model. The training examples in a particular cluster share the common vocabulary. At the time of clustering, we do not take into account the labels of the training examples. After the clusters have been created......, the classifier is trained on each cluster having reduced dimensionality and less number of examples. The experimental results show that the proposed model outperforms the existing classification models for the task of suspicious email detection and topic categorization on the Reuters-21578 and 20 Newsgroups...

  11. The cosmological analysis of X-ray cluster surveys - I. A new method for interpreting number counts

    Science.gov (United States)

    Clerc, N.; Pierre, M.; Pacaud, F.; Sadibekova, T.

    2012-07-01

    We present a new method aimed at simplifying the cosmological analysis of X-ray cluster surveys. It is based on purely instrumental observable quantities considered in a two-dimensional X-ray colour-magnitude diagram (hardness ratio versus count rate). The basic principle is that even in rather shallow surveys, substantial information on cluster redshift and temperature is present in the raw X-ray data and can be statistically extracted; in parallel, such diagrams can be readily predicted from an ab initio cosmological modelling. We illustrate the methodology for the case of a 100-deg2XMM survey having a sensitivity of ˜10-14 erg s-1 cm-2 and fit at the same time, the survey selection function, the cluster evolutionary scaling relations and the cosmology; our sole assumption - driven by the limited size of the sample considered in the case study - is that the local cluster scaling relations are known. We devote special attention to the realistic modelling of the count-rate measurement uncertainties and evaluate the potential of the method via a Fisher analysis. In the absence of individual cluster redshifts, the count rate and hardness ratio (CR-HR) method appears to be much more efficient than the traditional approach based on cluster counts (i.e. dn/dz, requiring redshifts). In the case where redshifts are available, our method performs similar to the traditional mass function (dn/dM/dz) for the purely cosmological parameters, but constrains better parameters defining the cluster scaling relations and their evolution. A further practical advantage of the CR-HR method is its simplicity: this fully top-down approach totally bypasses the tedious steps consisting in deriving cluster masses from X-ray temperature measurements.

  12. STAR FORMATION AND RELAXATION IN 379 NEARBY GALAXY CLUSTERS

    International Nuclear Information System (INIS)

    Cohen, Seth A.; Hickox, Ryan C.; Wegner, Gary A.

    2015-01-01

    We investigate the relationship between star formation (SF) and level of relaxation in a sample of 379 galaxy clusters at z < 0.2. We use data from the Sloan Digital Sky Survey to measure cluster membership and level of relaxation, and to select star-forming galaxies based on mid-infrared emission detected with the Wide-Field Infrared Survey Explorer. For galaxies with absolute magnitudes M r < −19.5, we find an inverse correlation between SF fraction and cluster relaxation: as a cluster becomes less relaxed, its SF fraction increases. Furthermore, in general, the subtracted SF fraction in all unrelaxed clusters (0.117 ± 0.003) is higher than that in all relaxed clusters (0.097 ± 0.005). We verify the validity of our SF calculation methods and membership criteria through analysis of previous work. Our results agree with previous findings that a weak correlation exists between cluster SF and dynamical state, possibly because unrelaxed clusters are less evolved relative to relaxed clusters

  13. Alignment and integration of complex networks by hypergraph-based spectral clustering

    Science.gov (United States)

    Michoel, Tom; Nachtergaele, Bruno

    2012-11-01

    Complex networks possess a rich, multiscale structure reflecting the dynamical and functional organization of the systems they model. Often there is a need to analyze multiple networks simultaneously, to model a system by more than one type of interaction, or to go beyond simple pairwise interactions, but currently there is a lack of theoretical and computational methods to address these problems. Here we introduce a framework for clustering and community detection in such systems using hypergraph representations. Our main result is a generalization of the Perron-Frobenius theorem from which we derive spectral clustering algorithms for directed and undirected hypergraphs. We illustrate our approach with applications for local and global alignment of protein-protein interaction networks between multiple species, for tripartite community detection in folksonomies, and for detecting clusters of overlapping regulatory pathways in directed networks.

  14. Smoothed Particle Inference: A Kilo-Parametric Method for X-ray Galaxy Cluster Modeling

    Energy Technology Data Exchange (ETDEWEB)

    Peterson, John R.; Marshall, P.J.; /KIPAC, Menlo Park; Andersson, K.; /Stockholm U. /SLAC

    2005-08-05

    We propose an ambitious new method that models the intracluster medium in clusters of galaxies as a set of X-ray emitting smoothed particles of plasma. Each smoothed particle is described by a handful of parameters including temperature, location, size, and elemental abundances. Hundreds to thousands of these particles are used to construct a model cluster of galaxies, with the appropriate complexity estimated from the data quality. This model is then compared iteratively with X-ray data in the form of adaptively binned photon lists via a two-sample likelihood statistic and iterated via Markov Chain Monte Carlo. The complex cluster model is propagated through the X-ray instrument response using direct sampling Monte Carlo methods. Using this approach the method can reproduce many of the features observed in the X-ray emission in a less assumption-dependent way that traditional analyses, and it allows for a more detailed characterization of the density, temperature, and metal abundance structure of clusters. Multi-instrument X-ray analyses and simultaneous X-ray, Sunyaev-Zeldovich (SZ), and lensing analyses are a straight-forward extension of this methodology. Significant challenges still exist in understanding the degeneracy in these models and the statistical noise induced by the complexity of the models.

  15. Family-based clusters of cognitive test performance in familial schizophrenia

    Directory of Open Access Journals (Sweden)

    Partonen Timo

    2004-07-01

    Full Text Available Abstract Background Cognitive traits derived from neuropsychological test data are considered to be potential endophenotypes of schizophrenia. Previously, these traits have been found to form a valid basis for clustering samples of schizophrenia patients into homogeneous subgroups. We set out to identify such clusters, but apart from previous studies, we included both schizophrenia patients and family members into the cluster analysis. The aim of the study was to detect family clusters with similar cognitive test performance. Methods Test scores from 54 randomly selected families comprising at least two siblings with schizophrenia spectrum disorders, and at least two unaffected family members were included in a complete-linkage cluster analysis with interactive data visualization. Results A well-performing, an impaired, and an intermediate family cluster emerged from the analysis. While the neuropsychological test scores differed significantly between the clusters, only minor differences were observed in the clinical variables. Conclusions The visually aided clustering algorithm was successful in identifying family clusters comprising both schizophrenia patients and their relatives. The present classification method may serve as a basis for selecting phenotypically more homogeneous groups of families in subsequent genetic analyses.

  16. Shocks and cold fronts in merging and massive galaxy clusters: new detections with Chandra

    Science.gov (United States)

    Botteon, A.; Gastaldello, F.; Brunetti, G.

    2018-06-01

    A number of merging galaxy clusters show the presence of shocks and cold fronts, i.e. sharp discontinuities in surface brightness and temperature. The observation of these features requires an X-ray telescope with high spatial resolution like Chandra, and allows to study important aspects concerning the physics of the intracluster medium (ICM), such as its thermal conduction and viscosity, as well as to provide information on the physical conditions leading to the acceleration of cosmic rays and magnetic field amplification in the cluster environment. In this work we search for new discontinuities in 15 merging and massive clusters observed with Chandra by using different imaging and spectral techniques of X-ray observations. Our analysis led to the discovery of 22 edges: six shocks, eight cold fronts, and eight with uncertain origin. All the six shocks detected have Mdiverse approaches aimed to identify edges in the ICM. A radio follow-up of the shocks discovered in this paper will be useful to study the connection between weak shocks and radio relics.

  17. Implementation of K-Means Clustering Method for Electronic Learning Model

    Science.gov (United States)

    Latipa Sari, Herlina; Suranti Mrs., Dewi; Natalia Zulita, Leni

    2017-12-01

    Teaching and Learning process at SMK Negeri 2 Bengkulu Tengah has applied e-learning system for teachers and students. The e-learning was based on the classification of normative, productive, and adaptive subjects. SMK Negeri 2 Bengkulu Tengah consisted of 394 students and 60 teachers with 16 subjects. The record of e-learning database was used in this research to observe students’ activity pattern in attending class. K-Means algorithm in this research was used to classify students’ learning activities using e-learning, so that it was obtained cluster of students’ activity and improvement of student’s ability. Implementation of K-Means Clustering method for electronic learning model at SMK Negeri 2 Bengkulu Tengah was conducted by observing 10 students’ activities, namely participation of students in the classroom, submit assignment, view assignment, add discussion, view discussion, add comment, download course materials, view article, view test, and submit test. In the e-learning model, the testing was conducted toward 10 students that yielded 2 clusters of membership data (C1 and C2). Cluster 1: with membership percentage of 70% and it consisted of 6 members, namely 1112438 Anggi Julian, 1112439 Anis Maulita, 1112441 Ardi Febriansyah, 1112452 Berlian Sinurat, 1112460 Dewi Anugrah Anwar and 1112467 Eka Tri Oktavia Sari. Cluster 2:with membership percentage of 30% and it consisted of 4 members, namely 1112463 Dosita Afriyani, 1112471 Erda Novita, 1112474 Eskardi and 1112477 Fachrur Rozi.

  18. A comparison of latent class, K-means, and K-median methods for clustering dichotomous data.

    Science.gov (United States)

    Brusco, Michael J; Shireman, Emilie; Steinley, Douglas

    2017-09-01

    The problem of partitioning a collection of objects based on their measurements on a set of dichotomous variables is a well-established problem in psychological research, with applications including clinical diagnosis, educational testing, cognitive categorization, and choice analysis. Latent class analysis and K-means clustering are popular methods for partitioning objects based on dichotomous measures in the psychological literature. The K-median clustering method has recently been touted as a potentially useful tool for psychological data and might be preferable to its close neighbor, K-means, when the variable measures are dichotomous. We conducted simulation-based comparisons of the latent class, K-means, and K-median approaches for partitioning dichotomous data. Although all 3 methods proved capable of recovering cluster structure, K-median clustering yielded the best average performance, followed closely by latent class analysis. We also report results for the 3 methods within the context of an application to transitive reasoning data, in which it was found that the 3 approaches can exhibit profound differences when applied to real data. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  19. THE SWIFT AGN AND CLUSTER SURVEY. II. CLUSTER CONFIRMATION WITH SDSS DATA

    International Nuclear Information System (INIS)

    Griffin, Rhiannon D.; Dai, Xinyu; Kochanek, Christopher S.; Bregman, Joel N.

    2016-01-01

    We study 203 (of 442) Swift AGN and Cluster Survey extended X-ray sources located in the SDSS DR8 footprint to search for galaxy over-densities in three-dimensional space using SDSS galaxy photometric redshifts and positions near the Swift cluster candidates. We find 104 Swift clusters with a >3σ galaxy over-density. The remaining targets are potentially located at higher redshifts and require deeper optical follow-up observations for confirmation as galaxy clusters. We present a series of cluster properties including the redshift, brightest cluster galaxy (BCG) magnitude, BCG-to-X-ray center offset, optical richness, and X-ray luminosity. We also detect red sequences in ∼85% of the 104 confirmed clusters. The X-ray luminosity and optical richness for the SDSS confirmed Swift clusters are correlated and follow previously established relations. The distribution of the separations between the X-ray centroids and the most likely BCG is also consistent with expectation. We compare the observed redshift distribution of the sample with a theoretical model, and find that our sample is complete for z ≲ 0.3 and is still 80% complete up to z ≃ 0.4, consistent with the SDSS survey depth. These analysis results suggest that our Swift cluster selection algorithm has yielded a statistically well-defined cluster sample for further study of cluster evolution and cosmology. We also match our SDSS confirmed Swift clusters to existing cluster catalogs, and find 42, 23, and 1 matches in optical, X-ray, and Sunyaev–Zel’dovich catalogs, respectively, and so the majority of these clusters are new detections

  20. Cluster-level statistical inference in fMRI datasets: The unexpected behavior of random fields in high dimensions.

    Science.gov (United States)

    Bansal, Ravi; Peterson, Bradley S

    2018-06-01

    FWERs. Those rejected clusters were outlying values in the distribution of cluster size but cannot be distinguished from true positive findings without further analyses, including assessing whether fMRI signal in those regions correlates with other clinical, behavioral, or cognitive measures. Rejecting the large clusters, however, significantly reduced the statistical power of nonparametric methods in detecting true findings compared with parametric methods, which would have detected most true findings that are essential for making valid biological inferences in MRI data. Parametric analyses, in contrast, detected most true findings while generating relatively few false positives: on average, less than one of those very large clusters would be deemed a true finding in each brain-wide analysis. We therefore recommend the continued use of parametric methods that model nonstationary smoothness for cluster-level, familywise control of false positives, particularly when using a Cluster Defining Threshold of 2.5 or higher, and subsequently assessing rigorously the biological plausibility of the findings, even for large clusters. Finally, because nonparametric methods yielded a large reduction in statistical power to detect true positive findings, we conclude that the modest reduction in false positive findings that nonparametric analyses afford does not warrant a re-analysis of previously published fMRI studies using nonparametric techniques. Copyright © 2018 Elsevier Inc. All rights reserved.

  1. Open-Source Sequence Clustering Methods Improve the State Of the Art.

    Science.gov (United States)

    Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob

    2016-01-01

    Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http

  2. A method for determining the radius of an open cluster from stellar proper motions

    Science.gov (United States)

    Sánchez, Néstor; Alfaro, Emilio J.; López-Martínez, Fátima

    2018-04-01

    We propose a method for calculating the radius of an open cluster in an objective way from an astrometric catalogue containing, at least, positions and proper motions. It uses the minimum spanning tree in the proper motion space to discriminate cluster stars from field stars and it quantifies the strength of the cluster-field separation by means of a statistical parameter defined for the first time in this paper. This is done for a range of different sampling radii from where the cluster radius is obtained as the size at which the best cluster-field separation is achieved. The novelty of this strategy is that the cluster radius is obtained independently of how its stars are spatially distributed. We test the reliability and robustness of the method with both simulated and real data from a well-studied open cluster (NGC 188), and apply it to UCAC4 data for five other open clusters with different catalogued radius values. NGC 188, NGC 1647, NGC 6603, and Ruprecht 155 yielded unambiguous radius values of 15.2 ± 1.8, 29.4 ± 3.4, 4.2 ± 1.7, and 7.0 ± 0.3 arcmin, respectively. ASCC 19 and Collinder 471 showed more than one possible solution, but it is not possible to know whether this is due to the involved uncertainties or due to the presence of complex patterns in their proper motion distributions, something that could be inherent to the physical object or due to the way in which the catalogue was sampled.

  3. TreeCluster: Massively scalable transmission clustering using phylogenetic trees

    OpenAIRE

    Moshiri, Alexander

    2018-01-01

    Background: The ability to infer transmission clusters from molecular data is critical to designing and evaluating viral control strategies. Viral sequencing datasets are growing rapidly, but standard methods of transmission cluster inference do not scale well beyond thousands of sequences. Results: I present TreeCluster, a cross-platform tool that performs transmission cluster inference on a given phylogenetic tree orders of magnitude faster than existing inference methods and supports multi...

  4. Measurement Error Correction Formula for Cluster-Level Group Differences in Cluster Randomized and Observational Studies

    Science.gov (United States)

    Cho, Sun-Joo; Preacher, Kristopher J.

    2016-01-01

    Multilevel modeling (MLM) is frequently used to detect cluster-level group differences in cluster randomized trial and observational studies. Group differences on the outcomes (posttest scores) are detected by controlling for the covariate (pretest scores) as a proxy variable for unobserved factors that predict future attributes. The pretest and…

  5. A hybrid method based on a new clustering technique and multilayer perceptron neural networks for hourly solar radiation forecasting

    International Nuclear Information System (INIS)

    Azimi, R.; Ghayekhloo, M.; Ghofrani, M.

    2016-01-01

    Highlights: • A novel clustering approach is proposed based on the data transformation approach. • A novel cluster selection method based on correlation analysis is presented. • The proposed hybrid clustering approach leads to deep learning for MLPNN. • A hybrid forecasting method is developed to predict solar radiations. • The evaluation results show superior performance of the proposed forecasting model. - Abstract: Accurate forecasting of renewable energy sources plays a key role in their integration into the grid. This paper proposes a hybrid solar irradiance forecasting framework using a Transformation based K-means algorithm, named TB K-means, to increase the forecast accuracy. The proposed clustering method is a combination of a new initialization technique, K-means algorithm and a new gradual data transformation approach. Unlike the other K-means based clustering methods which are not capable of providing a fixed and definitive answer due to the selection of different cluster centroids for each run, the proposed clustering provides constant results for different runs of the algorithm. The proposed clustering is combined with a time-series analysis, a novel cluster selection algorithm and a multilayer perceptron neural network (MLPNN) to develop the hybrid solar radiation forecasting method for different time horizons (1 h ahead, 2 h ahead, …, 48 h ahead). The performance of the proposed TB K-means clustering is evaluated using several different datasets and compared with different variants of K-means algorithm. Solar datasets with different solar radiation characteristics are also used to determine the accuracy and processing speed of the developed forecasting method with the proposed TB K-means and other clustering techniques. The results of direct comparison with other well-established forecasting models demonstrate the superior performance of the proposed hybrid forecasting method. Furthermore, a comparative analysis with the benchmark solar

  6. Cluster-based spectrum sensing for cognitive radios with imperfect channel to cluster-head

    KAUST Repository

    Ben Ghorbel, Mahdi

    2012-04-01

    Spectrum sensing is considered as the first and main step for cognitive radio systems to achieve an efficient use of spectrum. Cooperation and clustering among cognitive radio users are two techniques that can be employed with spectrum sensing in order to improve the sensing performance by reducing miss-detection and false alarm. In this paper, within the framework of a clustering-based cooperative spectrum sensing scheme, we study the effect of errors in transmitting the local decisions from the secondary users to the cluster heads (or the fusion center), while considering non-identical channel conditions between the secondary users. Closed-form expressions for the global probabilities of detection and false alarm at the cluster head are derived. © 2012 IEEE.

  7. Cluster-based spectrum sensing for cognitive radios with imperfect channel to cluster-head

    KAUST Repository

    Ben Ghorbel, Mahdi; Nam, Haewoon; Alouini, Mohamed-Slim

    2012-01-01

    Spectrum sensing is considered as the first and main step for cognitive radio systems to achieve an efficient use of spectrum. Cooperation and clustering among cognitive radio users are two techniques that can be employed with spectrum sensing in order to improve the sensing performance by reducing miss-detection and false alarm. In this paper, within the framework of a clustering-based cooperative spectrum sensing scheme, we study the effect of errors in transmitting the local decisions from the secondary users to the cluster heads (or the fusion center), while considering non-identical channel conditions between the secondary users. Closed-form expressions for the global probabilities of detection and false alarm at the cluster head are derived. © 2012 IEEE.

  8. Symptom Clusters in Advanced Cancer Patients: An Empirical Comparison of Statistical Methods and the Impact on Quality of Life.

    Science.gov (United States)

    Dong, Skye T; Costa, Daniel S J; Butow, Phyllis N; Lovell, Melanie R; Agar, Meera; Velikova, Galina; Teckle, Paulos; Tong, Allison; Tebbutt, Niall C; Clarke, Stephen J; van der Hoek, Kim; King, Madeleine T; Fayers, Peter M

    2016-01-01

    Symptom clusters in advanced cancer can influence patient outcomes. There is large heterogeneity in the methods used to identify symptom clusters. To investigate the consistency of symptom cluster composition in advanced cancer patients using different statistical methodologies for all patients across five primary cancer sites, and to examine which clusters predict functional status, a global assessment of health and global quality of life. Principal component analysis and exploratory factor analysis (with different rotation and factor selection methods) and hierarchical cluster analysis (with different linkage and similarity measures) were used on a data set of 1562 advanced cancer patients who completed the European Organization for the Research and Treatment of Cancer Quality of Life Questionnaire-Core 30. Four clusters consistently formed for many of the methods and cancer sites: tense-worry-irritable-depressed (emotional cluster), fatigue-pain, nausea-vomiting, and concentration-memory (cognitive cluster). The emotional cluster was a stronger predictor of overall quality of life than the other clusters. Fatigue-pain was a stronger predictor of overall health than the other clusters. The cognitive cluster and fatigue-pain predicted physical functioning, role functioning, and social functioning. The four identified symptom clusters were consistent across statistical methods and cancer types, although there were some noteworthy differences. Statistical derivation of symptom clusters is in need of greater methodological guidance. A psychosocial pathway in the management of symptom clusters may improve quality of life. Biological mechanisms underpinning symptom clusters need to be delineated by future research. A framework for evidence-based screening, assessment, treatment, and follow-up of symptom clusters in advanced cancer is essential. Copyright © 2016 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.

  9. Comparison of Methods for Oscillation Detection

    DEFF Research Database (Denmark)

    Odgaard, Peter Fogh; Trangbæk, Klaus

    2006-01-01

    This paper compares a selection of methods for detecting oscillations in control loops. The methods are tested on measurement data from a coal-fired power plant, where some oscillations are occurring. Emphasis is put on being able to detect oscillations without having a system model and without...... using process knowledge. The tested methods show potential for detecting the oscillations, however, transient components in the signals cause false detections as well, motivating usage of models in order to remove the expected signals behavior....

  10. Fuzzy Clustering Methods and their Application to Fuzzy Modeling

    DEFF Research Database (Denmark)

    Kroszynski, Uri; Zhou, Jianjun

    1999-01-01

    Fuzzy modeling techniques based upon the analysis of measured input/output data sets result in a set of rules that allow to predict system outputs from given inputs. Fuzzy clustering methods for system modeling and identification result in relatively small rule-bases, allowing fast, yet accurate....... An illustrative synthetic example is analyzed, and prediction accuracy measures are compared between the different variants...

  11. Joint two-view information for computerized detection of microcalcifications on mammograms

    International Nuclear Information System (INIS)

    Sahiner, Berkman; Chan, H.-P.; Hadjiiski, Lubomir M.; Helvie, Mark A.; Paramagul, Chinatana; Ge Jun; Wei Jun; Zhou Chuan

    2006-01-01

    We are developing new techniques to improve the accuracy of computerized microcalcification detection by using the joint two-view information on craniocaudal (CC) and mediolateral-oblique (MLO) views. After cluster candidates were detected using a single-view detection technique, candidates on CC and MLO views were paired using their radial distances from the nipple. Candidate pairs were classified with a similarity classifier that used the joint information from both views. Each cluster candidate was also characterized by its single-view features. The outputs of the similarity classifier and the single-view classifier were fused and the cluster candidate was classified as a true microcalcification cluster or a false-positive (FP) using the fused two-view information. A data set of 116 pairs of mammograms containing microcalcification clusters and 203 pairs of normal images from the University of South Florida (USF) public database was used for training the two-view detection algorithm. The trained method was tested on an independent test set of 167 pairs of mammograms, which contained 71 normal pairs and 96 pairs with microcalcification clusters collected at the University of Michigan (UM). The similarity classifier had a very low FP rate for the test set at low and medium levels of sensitivity. However, the highest mammogram-based sensitivity that could be reached by the similarity classifier was 69%. The single-view classifier had a higher FP rate compared to the similarity classifier, but it could reach a maximum mammogram-based sensitivity of 93%. The fusion method combined the scores of these two classifiers so that the number of FPs was substantially reduced at relatively low and medium sensitivities, and a relatively high maximum sensitivity was maintained. For the malignant microcalcification clusters, at a mammogram-based sensitivity of 80%, the FP rates were 0.18 and 0.35 with the two-view fusion and single-view detection methods, respectively. When the

  12. The Atacama Cosmology Telescope: Sunyaev-Zel'dovich-Selected Galaxy Clusters AT 148 GHz in the 2008 Survey

    Science.gov (United States)

    Marriage, Tobias A.; Acquaviva, Viviana; Ade, Peter A. R.; Aguirre, Paula; Amiri, Mandana; Appel, John William; Barrientos, L. Felipe; Battistelli, Elia S.; Bond, J. Richard; Brown, Ben; hide

    2011-01-01

    We report on 23 clusters detected blindly as Sunyaev-Zel'dovich (SZ) decrements in a 148 GHz, 455 deg (exp 2) map of the southern sky made with data from the Atacama Cosmology Telescope 2008 observing season. All SZ detections announced in this work have confirmed optical counterparts. Ten of the clusters are new discoveries. One newly discovered cluster, ACT-CL 10102-4915, with a redshift of 0.75 (photometric), has an SZ decrement comparable to the most massive systems at lower redshifts. Simulations of the cluster recovery method reproduce the sample purity measured by optical follow-up. In particular, for clusters detected with a signal-to-noise ratio greater than six, simulations are consistent with optical follow-up that demonstrated this subsample is 100% pure, The simulations further imply that the total sample is 80% complete for clusters with mass in excess of 6 x 10(exp 14) solar masses referenced to the cluster volume characterized by 500 times the critical density. The Compton gamma-X-ray luminosity mass comparison for the 11 best-detected clusters visually agrees with both self-similar and non-adiabatic, simulation-derived scaling laws,

  13. A Voltage Quality Detection Method

    DEFF Research Database (Denmark)

    Chen, Zhe; Wei, Mu

    2008-01-01

    This paper presents a voltage quality detection method based on a phase-locked loop (PLL) technique. The technique can detect the voltage magnitude and phase angle of each individual phase under both normal and fault power system conditions. The proposed method has the potential to evaluate various...

  14. Applying Clustering Methods in Drawing Maps of Science: Case Study of the Map For Urban Management Science

    Directory of Open Access Journals (Sweden)

    Mohammad Abuei Ardakan

    2010-04-01

    Full Text Available The present paper offers a basic introduction to data clustering and demonstrates the application of clustering methods in drawing maps of science. All approaches towards classification and clustering of information are briefly discussed. Their application to the process of visualization of conceptual information and drawing of science maps are illustrated by reviewing similar researches in this field. By implementing aggregated hierarchical clustering algorithm, which is an algorithm based on complete-link method, the map for urban management science as an emerging, interdisciplinary scientific field is analyzed and reviewed.

  15. A Cyber-Attack Detection Model Based on Multivariate Analyses

    Science.gov (United States)

    Sakai, Yuto; Rinsaka, Koichiro; Dohi, Tadashi

    In the present paper, we propose a novel cyber-attack detection model based on two multivariate-analysis methods to the audit data observed on a host machine. The statistical techniques used here are the well-known Hayashi's quantification method IV and cluster analysis method. We quantify the observed qualitative audit event sequence via the quantification method IV, and collect similar audit event sequence in the same groups based on the cluster analysis. It is shown in simulation experiments that our model can improve the cyber-attack detection accuracy in some realistic cases where both normal and attack activities are intermingled.

  16. A study of several CAD methods for classification of clustered microcalcifications

    Science.gov (United States)

    Wei, Liyang; Yang, Yongyi; Nishikawa, Robert M.; Jiang, Yulei

    2005-04-01

    In this paper we investigate several state-of-the-art machine-learning methods for automated classification of clustered microcalcifications (MCs), aimed to assisting radiologists for more accurate diagnosis of breast cancer in a computer-aided diagnosis (CADx) scheme. The methods we consider include: support vector machine (SVM), kernel Fisher discriminant (KFD), and committee machines (ensemble averaging and AdaBoost), most of which have been developed recently in statistical learning theory. We formulate differentiation of malignant from benign MCs as a supervised learning problem, and apply these learning methods to develop the classification algorithms. As input, these methods use image features automatically extracted from clustered MCs. We test these methods using a database of 697 clinical mammograms from 386 cases, which include a wide spectrum of difficult-to-classify cases. We use receiver operating characteristic (ROC) analysis to evaluate and compare the classification performance by the different methods. In addition, we also investigate how to combine information from multiple-view mammograms of the same case so that the best decision can be made by a classifier. In our experiments, the kernel-based methods (i.e., SVM, KFD) yield the best performance, significantly outperforming a well-established CADx approach based on neural network learning.

  17. DETECTION OF MICROCALCIFICATION IN DIGITAL MAMMOGRAMS USING ONE DIMENSIONAL WAVELET TRANSFORM

    Directory of Open Access Journals (Sweden)

    T. Balakumaran

    2010-11-01

    Full Text Available Mammography is the most efficient method for breast cancer early detection. Clusters of microcalcifications are the early sign of breast cancer and their detection is the key to improve prognosis of breast cancer. Microcalcifications appear in mammogram image as tiny localized granular points, which is often difficult to detect by naked eye because of their small size. Automatic and accurately detection of microcalcifications has received much more attention from radiologists and physician. An efficient method for automatic detection of clustered microcalcifications in digitized mammograms is the use of Computer Aided Diagnosis (CAD systems. This paper presents a one dimensional wavelet-based multiscale products scheme for microcalcification detection in mammogram images. The detection of microcalcifications were achieved by decomposing the each line of mammograms by 1D wavelet transform into different frequency sub-bands, suppressing the low-frequency subband, and finally reconstructing the mammogram from the subbands containing only significant high frequencies features. The significant features are obtained by multiscale products. Preliminary results indicate that the proposed scheme is better in suppressing the background and detecting the microcalcification clusters than any other wavelet decomposition methods.

  18. Leak detection by vibrational diagnostic methods

    International Nuclear Information System (INIS)

    Siklossy, P.

    1983-01-01

    The possibilities and methods of leak detection due to mechanical failures in nuclear power plants are reviewed on the basis of the literature. Great importance is attributed to vibrational diagnostic methods for their adventageous characteristics which enable them to become final leak detecting methods. The problems of noise analysis, e.g. leak detection by impact sound measurements, probe characteristics, gain problems, probe selection, off-line analysis and correlation functions, types of leak noises etc. are summarized. Leak detection based on noise analysis can be installed additionally to power plants. Its maintenance and testing is simple. On the other hand, it requires special training and measuring methods. (Sz.J.)

  19. CHANDRA CLUSTER COSMOLOGY PROJECT III: COSMOLOGICAL PARAMETER CONSTRAINTS

    International Nuclear Information System (INIS)

    Vikhlinin, A.; Forman, W. R.; Jones, C.; Murray, S. S.; Kravtsov, A. V.; Burenin, R. A.; Voevodkin, A.; Ebeling, H.; Hornstrup, A.; Nagai, D.; Quintana, H.

    2009-01-01

    Chandra observations of large samples of galaxy clusters detected in X-rays by ROSAT provide a new, robust determination of the cluster mass functions at low and high redshifts. Statistical and systematic errors are now sufficiently small, and the redshift leverage sufficiently large for the mass function evolution to be used as a useful growth of a structure-based dark energy probe. In this paper, we present cosmological parameter constraints obtained from Chandra observations of 37 clusters with (z) = 0.55 derived from 400 deg 2 ROSAT serendipitous survey and 49 brightest z ∼ 0.05 clusters detected in the All-Sky Survey. Evolution of the mass function between these redshifts requires Ω Λ > 0 with a ∼5σ significance, and constrains the dark energy equation-of-state parameter to w 0 = -1.14 ± 0.21, assuming a constant w and a flat universe. Cluster information also significantly improves constraints when combined with other methods. Fitting our cluster data jointly with the latest supernovae, Wilkinson Microwave Anisotropy Probe, and baryonic acoustic oscillation measurements, we obtain w 0 = -0.991 ± 0.045 (stat) ±0.039 (sys), a factor of 1.5 reduction in statistical uncertainties, and nearly a factor of 2 improvement in systematics compared with constraints that can be obtained without clusters. The joint analysis of these four data sets puts a conservative upper limit on the masses of light neutrinos Σm ν M h and σ 8 from the low-redshift cluster mass function.

  20. Clustering Dycom

    KAUST Repository

    Minku, Leandro L.

    2017-10-06

    Background: Software Effort Estimation (SEE) can be formulated as an online learning problem, where new projects are completed over time and may become available for training. In this scenario, a Cross-Company (CC) SEE approach called Dycom can drastically reduce the number of Within-Company (WC) projects needed for training, saving the high cost of collecting such training projects. However, Dycom relies on splitting CC projects into different subsets in order to create its CC models. Such splitting can have a significant impact on Dycom\\'s predictive performance. Aims: This paper investigates whether clustering methods can be used to help finding good CC splits for Dycom. Method: Dycom is extended to use clustering methods for creating the CC subsets. Three different clustering methods are investigated, namely Hierarchical Clustering, K-Means, and Expectation-Maximisation. Clustering Dycom is compared against the original Dycom with CC subsets of different sizes, based on four SEE databases. A baseline WC model is also included in the analysis. Results: Clustering Dycom with K-Means can potentially help to split the CC projects, managing to achieve similar or better predictive performance than Dycom. However, K-Means still requires the number of CC subsets to be pre-defined, and a poor choice can negatively affect predictive performance. EM enables Dycom to automatically set the number of CC subsets while still maintaining or improving predictive performance with respect to the baseline WC model. Clustering Dycom with Hierarchical Clustering did not offer significant advantage in terms of predictive performance. Conclusion: Clustering methods can be an effective way to automatically generate Dycom\\'s CC subsets.

  1. Water Quality Evaluation of the Yellow River Basin Based on Gray Clustering Method

    Science.gov (United States)

    Fu, X. Q.; Zou, Z. H.

    2018-03-01

    Evaluating the water quality of 12 monitoring sections in the Yellow River Basin comprehensively by grey clustering method based on the water quality monitoring data from the Ministry of environmental protection of China in May 2016 and the environmental quality standard of surface water. The results can reflect the water quality of the Yellow River Basin objectively. Furthermore, the evaluation results are basically the same when compared with the fuzzy comprehensive evaluation method. The results also show that the overall water quality of the Yellow River Basin is good and coincident with the actual situation of the Yellow River basin. Overall, gray clustering method for water quality evaluation is reasonable and feasible and it is also convenient to calculate.

  2. Applications of Cluster Analysis to the Creation of Perfectionism Profiles: A Comparison of two Clustering Approaches

    Directory of Open Access Journals (Sweden)

    Jocelyn H Bolin

    2014-04-01

    Full Text Available Although traditional clustering methods (e.g., K-means have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  3. Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches.

    Science.gov (United States)

    Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C

    2014-01-01

    Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  4. Clustering self-organizing maps (SOM) method for human papillomavirus (HPV) DNA as the main cause of cervical cancer disease

    Science.gov (United States)

    Bustamam, A.; Aldila, D.; Fatimah, Arimbi, M. D.

    2017-07-01

    One of the most widely used clustering method, since it has advantage on its robustness, is Self-Organizing Maps (SOM) method. This paper discusses the application of SOM method on Human Papillomavirus (HPV) DNA which is the main cause of cervical cancer disease, the most dangerous cancer in developing countries. We use 18 types of HPV DNA-based on the newest complete genome. By using open-source-based program R, clustering process can separate 18 types of HPV into two different clusters. There are two types of HPV in the first cluster while 16 others in the second cluster. The analyzing result of 18 types HPV based on the malignancy of the virus (the difficultness to cure). Two of HPV types the first cluster can be classified as tame HPV, while 16 others in the second cluster are classified as vicious HPV.

  5. Comparison of Molecular and Phenotypic Methods for the Detection and Characterization of Carbapenem Resistant Enterobacteriaceae.

    Science.gov (United States)

    Somily, Ali M; Garaween, Ghada A; Abukhalid, Norah; Absar, Muhammad M; Senok, Abiola C

    2016-03-01

    In recent years, there has been a rapid dissemination of carbapenem resistant Enterobacteriaceae (CRE). This study aimed to compare phenotypic and molecular methods for detection and characterization of CRE isolates at a large tertiary care hospital in Saudi Arabia. This study was carried out between January 2011 and November 2013 at the King Khalid University Hospital (KKUH) in Saudi Arabia. Determination of presence of extended-spectrum beta-lactamases (ESBL) and carbapenem resistance was in accordance with Clinical and Laboratory Standards Institute (CLSI) guidelines. Phenotypic classification was done by the MASTDISCS(TM) ID inhibitor combination disk method. Genotypic characterization of ESBL and carbapenemase genes was performed by the Check-MDR CT102. Diversilab rep-PCR was used for the determination of clonal relationship. Of the 883 ESBL-positive Enterobacteriaceae detected during the study period, 14 (1.6%) isolates were carbapenem resistant. Both the molecular genotypic characterization and phenotypic testing were in agreement in the detection of all 8 metalo-beta-lactamases (MBL) producing isolates. Of these 8 MBL-producers, 5 were positive for blaNDM gene and 3 were positive for blaVIM gene. Molecular method identified additional blaOXA gene isolates while MASTDISCS(TM) ID detected one AmpC producer isolate. Both methods agreed in identifying 2 carbapenem resistant isolates which were negative for carbapenemase genes. Diversilab rep-PCR analysis of the 9 Klebsiella pneumoniae isolates revealed polyclonal distribution into eight clusters. MASTDISCS(TM) ID is a reliable simple cheap phenotypic method for detection of majority of carbapenemase genes with the exception of the blaOXA gene. We recommend to use such method in the clinical laboratory.

  6. A New Waveform Signal Processing Method Based on Adaptive Clustering-Genetic Algorithms

    International Nuclear Information System (INIS)

    Noha Shaaban; Fukuzo Masuda; Hidetsugu Morota

    2006-01-01

    We present a fast digital signal processing method for numerical analysis of individual pulses from CdZnTe compound semiconductor detectors. Using Maxi-Mini Distance Algorithm and Genetic Algorithms based discrimination technique. A parametric approach has been used for classifying the discriminated waveforms into a set of clusters each has a similar signal shape with a corresponding pulse height spectrum. A corrected total pulse height spectrum was obtained by applying a normalization factor for the full energy peak for each cluster with a highly improvements in the energy spectrum characteristics. This method applied successfully for both simulated and real measured data, it can be applied to any detector suffers from signal shape variation. (authors)

  7. Data Stream Clustering With Affinity Propagation

    KAUST Repository

    Zhang, Xiangliang

    2014-07-09

    Data stream clustering provides insights into the underlying patterns of data flows. This paper focuses on selecting the best representatives from clusters of streaming data. There are two main challenges: how to cluster with the best representatives and how to handle the evolving patterns that are important characteristics of streaming data with dynamic distributions. We employ the Affinity Propagation (AP) algorithm presented in 2007 by Frey and Dueck for the first challenge, as it offers good guarantees of clustering optimality for selecting exemplars. The second challenging problem is solved by change detection. The presented StrAP algorithm combines AP with a statistical change point detection test; the clustering model is rebuilt whenever the test detects a change in the underlying data distribution. Besides the validation on two benchmark data sets, the presented algorithm is validated on a real-world application, monitoring the data flow of jobs submitted to the EGEE grid.

  8. Data Stream Clustering With Affinity Propagation

    KAUST Repository

    Zhang, Xiangliang; Furtlehner, Cyril; Germain-Renaud, Cecile; Sebag, Michele

    2014-01-01

    Data stream clustering provides insights into the underlying patterns of data flows. This paper focuses on selecting the best representatives from clusters of streaming data. There are two main challenges: how to cluster with the best representatives and how to handle the evolving patterns that are important characteristics of streaming data with dynamic distributions. We employ the Affinity Propagation (AP) algorithm presented in 2007 by Frey and Dueck for the first challenge, as it offers good guarantees of clustering optimality for selecting exemplars. The second challenging problem is solved by change detection. The presented StrAP algorithm combines AP with a statistical change point detection test; the clustering model is rebuilt whenever the test detects a change in the underlying data distribution. Besides the validation on two benchmark data sets, the presented algorithm is validated on a real-world application, monitoring the data flow of jobs submitted to the EGEE grid.

  9. Cosmology with cluster surveys

    Indian Academy of Sciences (India)

    Abstract. Surveys of clusters of galaxies provide us with a powerful probe of the den- sity and nature of the dark energy. The red-shift distribution of detected clusters is highly sensitive to the dark energy equation of state parameter w. Upcoming Sunyaev–. Zel'dovich (SZ) surveys would provide us large yields of clusters to ...

  10. Clustering method for counting passengers getting in a bus with single camera

    Science.gov (United States)

    Yang, Tao; Zhang, Yanning; Shao, Dapei; Li, Ying

    2010-03-01

    Automatic counting of passengers is very important for both business and security applications. We present a single-camera-based vision system that is able to count passengers in a highly crowded situation at the entrance of a traffic bus. The unique characteristics of the proposed system include, First, a novel feature-point-tracking- and online clustering-based passenger counting framework, which performs much better than those of background-modeling-and foreground-blob-tracking-based methods. Second, a simple and highly accurate clustering algorithm is developed that projects the high-dimensional feature point trajectories into a 2-D feature space by their appearance and disappearance times and counts the number of people through online clustering. Finally, all test video sequences in the experiment are captured from a real traffic bus in Shanghai, China. The results show that the system can process two 320×240 video sequences at a frame rate of 25 fps simultaneously, and can count passengers reliably in various difficult scenarios with complex interaction and occlusion among people. The method achieves high accuracy rates up to 96.5%.

  11. DLTAP: A Network-efficient Scheduling Method for Distributed Deep Learning Workload in Containerized Cluster Environment

    Directory of Open Access Journals (Sweden)

    Qiao Wei

    2017-01-01

    Full Text Available Deep neural networks (DNNs have recently yielded strong results on a range of applications. Training these DNNs using a cluster of commodity machines is a promising approach since training is time consuming and compute-intensive. Furthermore, putting DNN tasks into containers of clusters would enable broader and easier deployment of DNN-based algorithms. Toward this end, this paper addresses the problem of scheduling DNN tasks in the containerized cluster environment. Efficiently scheduling data-parallel computation jobs like DNN over containerized clusters is critical for job performance, system throughput, and resource utilization. It becomes even more challenging with the complex workloads. We propose a scheduling method called Deep Learning Task Allocation Priority (DLTAP which performs scheduling decisions in a distributed manner, and each of scheduling decisions takes aggregation degree of parameter sever task and worker task into account, in particularly, to reduce cross-node network transmission traffic and, correspondingly, decrease the DNN training time. We evaluate the DLTAP scheduling method using a state-of-the-art distributed DNN training framework on 3 benchmarks. The results show that the proposed method can averagely reduce 12% cross-node network traffic, and decrease the DNN training time even with the cluster of low-end servers.

  12. Clustering analysis

    International Nuclear Information System (INIS)

    Romli

    1997-01-01

    Cluster analysis is the name of group of multivariate techniques whose principal purpose is to distinguish similar entities from the characteristics they process.To study this analysis, there are several algorithms that can be used. Therefore, this topic focuses to discuss the algorithms, such as, similarity measures, and hierarchical clustering which includes single linkage, complete linkage and average linkage method. also, non-hierarchical clustering method, which is popular name K -mean method ' will be discussed. Finally, this paper will be described the advantages and disadvantages of every methods

  13. Improved Density Based Spatial Clustering of Applications of Noise Clustering Algorithm for Knowledge Discovery in Spatial Data

    Directory of Open Access Journals (Sweden)

    Arvind Sharma

    2016-01-01

    Full Text Available There are many techniques available in the field of data mining and its subfield spatial data mining is to understand relationships between data objects. Data objects related with spatial features are called spatial databases. These relationships can be used for prediction and trend detection between spatial and nonspatial objects for social and scientific reasons. A huge data set may be collected from different sources as satellite images, X-rays, medical images, traffic cameras, and GIS system. To handle this large amount of data and set relationship between them in a certain manner with certain results is our primary purpose of this paper. This paper gives a complete process to understand how spatial data is different from other kinds of data sets and how it is refined to apply to get useful results and set trends to predict geographic information system and spatial data mining process. In this paper a new improved algorithm for clustering is designed because role of clustering is very indispensable in spatial data mining process. Clustering methods are useful in various fields of human life such as GIS (Geographic Information System, GPS (Global Positioning System, weather forecasting, air traffic controller, water treatment, area selection, cost estimation, planning of rural and urban areas, remote sensing, and VLSI designing. This paper presents study of various clustering methods and algorithms and an improved algorithm of DBSCAN as IDBSCAN (Improved Density Based Spatial Clustering of Application of Noise. The algorithm is designed by addition of some important attributes which are responsible for generation of better clusters from existing data sets in comparison of other methods.

  14. A fast density-based clustering algorithm for real-time Internet of Things stream.

    Science.gov (United States)

    Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut

    2014-01-01

    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.

  15. Clustering Methods; Part IV of Scientific Report No. ISR-18, Information Storage and Retrieval...

    Science.gov (United States)

    Cornell Univ., Ithaca, NY. Dept. of Computer Science.

    Two papers are included as Part Four of this report on Salton's Magical Automatic Retriever of Texts (SMART) project report. The first paper: "A Controlled Single Pass Classification Algorithm with Application to Multilevel Clustering" by D. B. Johnson and J. M. Laferente presents a single pass clustering method which compares favorably…

  16. Detection of the YORP Effect for Small Asteroids in the Karin Cluster

    Science.gov (United States)

    Carruba, V.; Nesvorný, D.; Vokrouhlický, D.

    2016-06-01

    The Karin cluster is a young asteroid family thought to have formed only ≃ 5.75 Myr ago. The young age can be demonstrated by numerically integrating the orbits of Karin cluster members backward in time and showing the convergence of the perihelion and nodal longitudes (as well as other orbital elements). Previous work has pointed out that the convergence is not ideal if the backward integration only accounts for the gravitational perturbations from the solar system planets. It improves when the thermal radiation force known as the Yarkovsky effect is accounted for. This argument can be used to estimate the spin obliquities of the Karin cluster members. Here we take advantage of the fast growing membership of the Karin cluster and show that the obliquity distribution of diameter D≃ 1{--}2 km Karin asteroids is bimodal, as expected if the YORP effect acted to move obliquities toward extreme values (0° or 180°). The measured magnitude of the effect is consistent with the standard YORP model. The surface thermal conductivity is inferred to be 0.07-0.2 W m-1 K-1 (thermal inertia ≃ 300{--}500 J m-2 K-1 s{}-1/2). We find that the strength of the YORP effect is roughly ≃ 0.7 of the nominal strength obtained for a collection of random Gaussian spheroids. These results are consistent with a surface composed of rough, rocky regolith. The obliquity values predicted here for 480 members of the Karin cluster can be validated by the light-curve inversion method.

  17. A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data

    Directory of Open Access Journals (Sweden)

    Scherer Stephen W

    2011-05-01

    Full Text Available Abstract Background Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. Results We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. Conclusions The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

  18. A comparison of moving object detection methods for real-time moving object detection

    Science.gov (United States)

    Roshan, Aditya; Zhang, Yun

    2014-06-01

    Moving object detection has a wide variety of applications from traffic monitoring, site monitoring, automatic theft identification, face detection to military surveillance. Many methods have been developed across the globe for moving object detection, but it is very difficult to find one which can work globally in all situations and with different types of videos. The purpose of this paper is to evaluate existing moving object detection methods which can be implemented in software on a desktop or laptop, for real time object detection. There are several moving object detection methods noted in the literature, but few of them are suitable for real time moving object detection. Most of the methods which provide for real time movement are further limited by the number of objects and the scene complexity. This paper evaluates the four most commonly used moving object detection methods as background subtraction technique, Gaussian mixture model, wavelet based and optical flow based methods. The work is based on evaluation of these four moving object detection methods using two (2) different sets of cameras and two (2) different scenes. The moving object detection methods have been implemented using MatLab and results are compared based on completeness of detected objects, noise, light change sensitivity, processing time etc. After comparison, it is observed that optical flow based method took least processing time and successfully detected boundary of moving objects which also implies that it can be implemented for real-time moving object detection.

  19. Detection of major climatic and environmental predictors of liver fluke exposure risk in Ireland using spatial cluster analysis.

    Science.gov (United States)

    Selemetas, Nikolaos; de Waal, Theo

    2015-04-30

    Fasciolosis caused by Fasciola hepatica (liver fluke) can cause significant economic and production losses in dairy cow farms. The aim of the current study was to identify important weather and environmental predictors of the exposure risk to liver fluke by detecting clusters of fasciolosis in Ireland. During autumn 2012, bulk-tank milk samples from 4365 dairy farms were collected throughout Ireland. Using an in-house antibody-detection ELISA, the analysis of BTM samples showed that 83% (n=3602) of dairy farms had been exposed to liver fluke. The Getis-Ord Gi* statistic identified 74 high-risk and 130 low-risk significant (Pclimatic variables (monthly and seasonal mean rainfall and temperatures, total wet days and rain days) and environmental datasets (soil types, enhanced vegetation index and normalised difference vegetation index) were used to investigate dissimilarities in the exposure to liver fluke between clusters. Rainfall, total wet days and rain days, and soil type were the significant classes of climatic and environmental variables explaining the differences between significant clusters. A discriminant function analysis was used to predict the exposure risk to liver fluke using 80% of data for modelling and the remaining subset of 20% for post hoc model validation. The most significant predictors of the model risk function were total rainfall in August and September and total wet days. The risk model presented 100% sensitivity and 91% specificity and an accuracy of 95% correctly classified cases. A risk map of exposure to liver fluke was constructed with higher probability of exposure in western and north-western regions. The results of this study identified differences between clusters of fasciolosis in Ireland regarding climatic and environmental variables and detected significant predictors of the exposure risk to liver fluke. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. A rapid ATR-FTIR spectroscopic method for detection of sibutramine adulteration in tea and coffee based on hierarchical cluster and principal component analyses.

    Science.gov (United States)

    Cebi, Nur; Yilmaz, Mustafa Tahsin; Sagdic, Osman

    2017-08-15

    Sibutramine may be illicitly included in herbal slimming foods and supplements marketed as "100% natural" to enhance weight loss. Considering public health and legal regulations, there is an urgent need for effective, rapid and reliable techniques to detect sibutramine in dietetic herbal foods, teas and dietary supplements. This research comprehensively explored, for the first time, detection of sibutramine in green tea, green coffee and mixed herbal tea using ATR-FTIR spectroscopic technique combined with chemometrics. Hierarchical cluster analysis and PCA principle component analysis techniques were employed in spectral range (2746-2656cm -1 ) for classification and discrimination through Euclidian distance and Ward's algorithm. Unadulterated and adulterated samples were classified and discriminated with respect to their sibutramine contents with perfect accuracy without any false prediction. The results suggest that existence of the active substance could be successfully determined at the levels in the range of 0.375-12mg in totally 1.75g of green tea, green coffee and mixed herbal tea by using FTIR-ATR technique combined with chemometrics. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Brightest Cluster Galaxies in REXCESS Clusters

    Science.gov (United States)

    Haarsma, Deborah B.; Leisman, L.; Bruch, S.; Donahue, M.

    2009-01-01

    Most galaxy clusters contain a Brightest Cluster Galaxy (BCG) which is larger than the other cluster ellipticals and has a more extended profile. In the hierarchical model, the BCG forms through many galaxy mergers in the crowded center of the cluster, and thus its properties give insight into the assembly of the cluster as a whole. In this project, we are working with the Representative XMM-Newton Cluster Structure Survey (REXCESS) team (Boehringer et al 2007) to study BCGs in 33 X-ray luminous galaxy clusters, 0.055 < z < 0.183. We are imaging the BCGs in R band at the Southern Observatory for Astrophysical Research (SOAR) in Chile. In this poster, we discuss our methods and give preliminary measurements of the BCG magnitudes, morphology, and stellar mass. We compare these BCG properties with the properties of their host clusters, particularly of the X-ray emitting gas.

  2. Rapid methods for detection of bacteria

    DEFF Research Database (Denmark)

    Corfitzen, Charlotte B.; Andersen, B.Ø.; Miller, M.

    2006-01-01

    Traditional methods for detection of bacteria in drinking water e.g. Heterotrophic Plate Counts (HPC) or Most Probable Number (MNP) take 48-72 hours to give the result. New rapid methods for detection of bacteria are needed to protect the consumers against contaminations. Two rapid methods...

  3. A robust automatic leukocyte recognition method based on island-clustering texture

    Directory of Open Access Journals (Sweden)

    Xiaoshun Li

    2016-01-01

    Full Text Available A leukocyte recognition method for human peripheral blood smear based on island-clustering texture (ICT is proposed. By analyzing the features of the five typical classes of leukocyte images, a new ICT model is established. Firstly, some feature points are extracted in a gray leukocyte image by mean-shift clustering to be the centers of islands. Secondly, the growing region is employed to create regions of the islands in which the seeds are just these feature points. These islands distribution can describe a new texture. Finally, a distinguished parameter vector of these islands is created as the ICT features by combining the ICT features with the geometric features of the leukocyte. Then the five typical classes of leukocytes can be recognized successfully at the correct recognition rate of more than 92.3% with a total sample of 1310 leukocytes. Experimental results show the feasibility of the proposed method. Further analysis reveals that the method is robust and results can provide important information for disease diagnosis.

  4. Three-dimensional reconstruction of clustered microcalcifications from two digitized mammograms

    Science.gov (United States)

    Stotzka, Rainer; Mueller, Tim O.; Epper, Wolfgang; Gemmeke, Hartmut

    1998-06-01

    X-ray mammography is one of the most significant diagnosis methods in early detection of breast cancer. Usually two X- ray images from different angles are taken from each mamma to make even overlapping structures visible. X-ray mammography has a very high spatial resolution and can show microcalcifications of 50 - 200 micron in size. Clusters of microcalcifications are one of the most important and often the only indicator for malignant tumors. These calcifications are in some cases extremely difficult to detect. Computer assisted diagnosis of digitized mammograms may improve detection and interpretation of microcalcifications and cause more reliable diagnostic findings. We build a low-cost mammography workstation to detect and classify clusters of microcalcifications and tissue densities automatically. New in this approach is the estimation of the 3D formation of segmented microcalcifications and its visualization which will put additional diagnostic information at the radiologists disposal. The real problem using only two or three projections for reconstruction is the big loss of volume information. Therefore the arrangement of a cluster is estimated using only the positions of segmented microcalcifications. The arrangement of microcalcifications is visualized to the physician by rotating.

  5. Relativistic rise measurement by cluster counting method in time expansion chamber

    International Nuclear Information System (INIS)

    Rehak, P.; Walenta, A.H.

    1979-10-01

    A new approach to the measurement of the ionization energy loss for the charged particle identification in the region of the relativistic rise was tested experimentally. The method consists of determining in a special drift chamber (TEC) the number of clusters of the primary ionization. The method gives almost the full relativistic rise and narrower landau distribution. The consequences for a practical detector are discussed

  6. Detection of oral HPV infection - Comparison of two different specimen collection methods and two HPV detection methods.

    Science.gov (United States)

    de Souza, Marjorie M A; Hartel, Gunter; Whiteman, David C; Antonsson, Annika

    2018-04-01

    Very little is known about the natural history of oral HPV infection. Several different methods exist to collect oral specimens and detect HPV, but their respective performance characteristics are unknown. We compared two different methods for oral specimen collection (oral saline rinse and commercial saliva kit) from 96 individuals and then analyzed the samples for HPV by two different PCR detection methods (single GP5+/6+ PCR and nested MY09/11 and GP5+/6+ PCR). For the oral rinse samples, the oral HPV prevalence was 10.4% (GP+ PCR; 10% repeatability) vs 11.5% (nested PCR method; 100% repeatability). For the commercial saliva kit samples, the prevalences were 3.1% vs 16.7% with the GP+ PCR vs the nested PCR method (repeatability 100% for both detection methods). Overall the agreement was fair or poor between samples and methods (kappa 0.06-0.36). Standardizing methods of oral sample collection and HPV detection would ensure comparability between future oral HPV studies. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Spatial cluster modelling

    CERN Document Server

    Lawson, Andrew B

    2002-01-01

    Research has generated a number of advances in methods for spatial cluster modelling in recent years, particularly in the area of Bayesian cluster modelling. Along with these advances has come an explosion of interest in the potential applications of this work, especially in epidemiology and genome research. In one integrated volume, this book reviews the state-of-the-art in spatial clustering and spatial cluster modelling, bringing together research and applications previously scattered throughout the literature. It begins with an overview of the field, then presents a series of chapters that illuminate the nature and purpose of cluster modelling within different application areas, including astrophysics, epidemiology, ecology, and imaging. The focus then shifts to methods, with discussions on point and object process modelling, perfect sampling of cluster processes, partitioning in space and space-time, spatial and spatio-temporal process modelling, nonparametric methods for clustering, and spatio-temporal ...

  8. Spanning Tree Based Attribute Clustering

    DEFF Research Database (Denmark)

    Zeng, Yifeng; Jorge, Cordero Hernandez

    2009-01-01

    Attribute clustering has been previously employed to detect statistical dependence between subsets of variables. We propose a novel attribute clustering algorithm motivated by research of complex networks, called the Star Discovery algorithm. The algorithm partitions and indirectly discards...... inconsistent edges from a maximum spanning tree by starting appropriate initial modes, therefore generating stable clusters. It discovers sound clusters through simple graph operations and achieves significant computational savings. We compare the Star Discovery algorithm against earlier attribute clustering...

  9. Pre-crash scenarios at road junctions: A clustering method for car crash data.

    Science.gov (United States)

    Nitsche, Philippe; Thomas, Pete; Stuetz, Rainer; Welsh, Ruth

    2017-10-01

    Given the recent advancements in autonomous driving functions, one of the main challenges is safe and efficient operation in complex traffic situations such as road junctions. There is a need for comprehensive testing, either in virtual simulation environments or on real-world test tracks. This paper presents a novel data analysis method including the preparation, analysis and visualization of car crash data, to identify the critical pre-crash scenarios at T- and four-legged junctions as a basis for testing the safety of automated driving systems. The presented method employs k-medoids to cluster historical junction crash data into distinct partitions and then applies the association rules algorithm to each cluster to specify the driving scenarios in more detail. The dataset used consists of 1056 junction crashes in the UK, which were exported from the in-depth "On-the-Spot" database. The study resulted in thirteen crash clusters for T-junctions, and six crash clusters for crossroads. Association rules revealed common crash characteristics, which were the basis for the scenario descriptions. The results support existing findings on road junction accidents and provide benchmark situations for safety performance tests in order to reduce the possible number parameter combinations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Cluster Physics with Merging Galaxy Clusters

    Directory of Open Access Journals (Sweden)

    Sandor M. Molnar

    2016-02-01

    Full Text Available Collisions between galaxy clusters provide a unique opportunity to study matter in a parameter space which cannot be explored in our laboratories on Earth. In the standard LCDM model, where the total density is dominated by the cosmological constant ($Lambda$ and the matter density by cold dark matter (CDM, structure formation is hierarchical, and clusters grow mostly by merging.Mergers of two massive clusters are the most energetic events in the universe after the Big Bang,hence they provide a unique laboratory to study cluster physics.The two main mass components in clusters behave differently during collisions:the dark matter is nearly collisionless, responding only to gravity, while the gas is subject to pressure forces and dissipation, and shocks and turbulenceare developed during collisions. In the present contribution we review the different methods used to derive the physical properties of merging clusters. Different physical processes leave their signatures on different wavelengths, thusour review is based on a multifrequency analysis. In principle, the best way to analyze multifrequency observations of merging clustersis to model them using N-body/HYDRO numerical simulations. We discuss the results of such detailed analyses.New high spatial and spectral resolution ground and space based telescopeswill come online in the near future. Motivated by these new opportunities,we briefly discuss methods which will be feasible in the near future in studying merging clusters.

  11. Abnormal Activity Detection Using Pyroelectric Infrared Sensors

    Directory of Open Access Journals (Sweden)

    Xiaomu Luo

    2016-06-01

    Full Text Available Healthy aging is one of the most important social issues. In this paper, we propose a method for abnormal activity detection without any manual labeling of the training samples. By leveraging the Field of View (FOV modulation, the spatio-temporal characteristic of human activity is encoded into low-dimension data stream generated by the ceiling-mounted Pyroelectric Infrared (PIR sensors. The similarity between normal training samples are measured based on Kullback-Leibler (KL divergence of each pair of them. The natural clustering of normal activities is discovered through a self-tuning spectral clustering algorithm with unsupervised model selection on the eigenvectors of a modified similarity matrix. Hidden Markov Models (HMMs are employed to model each cluster of normal activities and form feature vectors. One-Class Support Vector Machines (OSVMs are used to profile the normal activities and detect abnormal activities. To validate the efficacy of our method, we conducted experiments in real indoor environments. The encouraging results show that our method is able to detect abnormal activities given only the normal training samples, which aims to avoid the laborious and inconsistent data labeling process.

  12. Some methods for the detection of fissionable matter; Quelques methodes de detection des corps fissiles

    Energy Technology Data Exchange (ETDEWEB)

    Guery, M [Commissariat a l' Energie Atomique, Saclay (France). Centre d' Etudes Nucleaires

    1967-03-01

    A number of equipments or processes allowing to detect uranium or plutonium in industrial plants, and in particular to measure solution concentrations, are studied here. Each method has its own field of applications and has its own performances, which we have tried to define by calculations and by experiments. The following topics have been treated: {gamma} absorptiometer with an Am source, detection test by neutron multiplication, apparatus for the measurement of the {alpha} activity of a solution, fissionable matter detection by {gamma} emission, fissionable matter detection by neutron emission. (author) [French] On examine ici plusieurs appareils ou procedes qui permettent de detecter l'uranium ou le plutonium dans les installations industrielles, et en particulier de mesurer les concentrations de solutions. Chacune des methodes a son domaine d'application et ses performances, qu'on a tente de definir par le calcul et par des experiences. Les sujets traites sont les suivants: absorptiometre {gamma} a source d'americium, essais de detection par multiplication neutronique, appareil de mesure de l'activite {alpha} d'une solution, detection des matieres fissiles par leur emission {gamma}, detection des matieres fissiles par leur emission neutronique. (auteur)

  13. IP2P K-means: an efficient method for data clustering on sensor networks

    Directory of Open Access Journals (Sweden)

    Peyman Mirhadi

    2013-03-01

    Full Text Available Many wireless sensor network applications require data gathering as the most important parts of their operations. There are increasing demands for innovative methods to improve energy efficiency and to prolong the network lifetime. Clustering is considered as an efficient topology control methods in wireless sensor networks, which can increase network scalability and lifetime. This paper presents a method, IP2P K-means – Improved P2P K-means, which uses efficient leveling in clustering approach, reduces false labeling and restricts the necessary communication among various sensors, which obviously saves more energy. The proposed method is examined in Network Simulator Ver.2 (NS2 and the preliminary results show that the algorithm works effectively and relatively more precisely.

  14. An Entropy-Based Network Anomaly Detection Method

    Directory of Open Access Journals (Sweden)

    Przemysław Bereziński

    2015-04-01

    Full Text Available Data mining is an interdisciplinary subfield of computer science involving methods at the intersection of artificial intelligence, machine learning and statistics. One of the data mining tasks is anomaly detection which is the analysis of large quantities of data to identify items, events or observations which do not conform to an expected pattern. Anomaly detection is applicable in a variety of domains, e.g., fraud detection, fault detection, system health monitoring but this article focuses on application of anomaly detection in the field of network intrusion detection.The main goal of the article is to prove that an entropy-based approach is suitable to detect modern botnet-like malware based on anomalous patterns in network. This aim is achieved by realization of the following points: (i preparation of a concept of original entropy-based network anomaly detection method, (ii implementation of the method, (iii preparation of original dataset, (iv evaluation of the method.

  15. Searching for filaments and large-scale structure around DAFT/FADA clusters

    Science.gov (United States)

    Durret, F.; Márquez, I.; Acebrón, A.; Adami, C.; Cabrera-Lavers, A.; Capelato, H.; Martinet, N.; Sarron, F.; Ulmer, M. P.

    2016-04-01

    Context. Clusters of galaxies are located at the intersection of cosmic filaments and are still accreting galaxies and groups along these preferential directions. However, because of their relatively low contrast on the sky, filaments are difficult to detect (unless a large amount of spectroscopic data are available), and unambiguous detections have been limited until now to relatively low redshifts (zDAFT/FADA survey for which we had deep wide field photometric data. For each cluster, based on a colour-magnitude diagram, we selected galaxies that were likely to belong to the red sequence, and hence to be at the cluster redshift, and built density maps. By computing the background for each of these maps and drawing 3σ contours, we estimated the elongations of the structures detected in this way. Whenever possible, we identified the other structures detected on the density maps with clusters listed in NED. Results: We find clear elongations in twelve clusters out of thirty, with sizes that can reach up to 7.6 Mpc. Eleven other clusters have neighbouring structures, but the zones linking them are not detected in the density maps at a 3σ level. Three clusters show no extended structure and no neighbours, and four clusters are of too low contrast to be clearly visible on our density maps. Conclusions: The simple method we have applied appears to work well to show the existence of filaments and/or extensions around a number of clusters in the redshift range 0.4 cluster samples such as the clusters detected in the CFHTLS and SDSS-Stripe 82 surveys in the near future. Based on our own data (see Guennou et al. 2014) and archive data obtained with MegaPrime/MegaCam, a joint project of CFHT and CEA/DAPNIA, at the Canada-France-Hawaii Telescope (CFHT) which is operated by the National Research Council (NRC) of Canada, the Institut National des Sciences de l'Univers of the Centre National de la Recherche Scientifique of France, and

  16. Detecting method for crude oil price fluctuation mechanism under different periodic time series

    International Nuclear Information System (INIS)

    Gao, Xiangyun; Fang, Wei; An, Feng; Wang, Yue

    2017-01-01

    Highlights: • We proposed the concept of autoregressive modes to indicate the fluctuation patterns. • We constructed transmission networks for studying the fluctuation mechanism. • There are different fluctuation mechanism under different periodic time series. • Only a few types of autoregressive modes control the fluctuations in crude oil price. • There are cluster effects during the fluctuation mechanism of autoregressive modes. - Abstract: Current existing literatures can characterize the long-term fluctuation of crude oil price time series, however, it is difficult to detect the fluctuation mechanism specifically under short term. Because each fluctuation pattern for one short period contained in a long-term crude oil price time series have dynamic characteristics of diversity; in other words, there exhibit various fluctuation patterns in different short periods and transmit to each other, which reflects the reputedly complicate and chaotic oil market. Thus, we proposed an incorporated method to detect the fluctuation mechanism, which is the evolution of the different fluctuation patterns over time from the complex network perspective. We divided crude oil price time series into segments using sliding time windows, and defined autoregressive modes based on regression models to indicate the fluctuation patterns of each segment. Hence, the transmissions between different types of autoregressive modes over time form a transmission network that contains rich dynamic information. We then capture transmission characteristics of autoregressive modes under different periodic time series through the structure features of the transmission networks. The results indicate that there are various autoregressive modes with significantly different statistical characteristics under different periodic time series. However, only a few types of autoregressive modes and transmission patterns play a major role in the fluctuation mechanism of the crude oil price, and these

  17. A Comparison of Methods for Player Clustering via Behavioral Telemetry

    DEFF Research Database (Denmark)

    Drachen, Anders; Thurau, C.; Sifa, R.

    2013-01-01

    patterns in the behavioral data, and developing profiles that are actionable to game developers. There are numerous methods for unsupervised clustering of user behavior, e.g. k-means/c-means, Nonnegative Matrix Factorization, or Principal Component Analysis. Although all yield behavior categorizations......, interpretation of the resulting categories in terms of actual play behavior can be difficult if not impossible. In this paper, a range of unsupervised techniques are applied together with Archetypal Analysis to develop behavioral clusters from playtime data of 70,014 World of Warcraft players, covering a five......The analysis of user behavior in digital games has been aided by the introduction of user telemetry in game development, which provides unprecedented access to quantitative data on user behavior from the installed game clients of the entire population of players. Player behavior telemetry datasets...

  18. Fourth-order perturbative extension of the single-double excitation coupled-cluster method

    International Nuclear Information System (INIS)

    Derevianko, Andrei; Emmons, Erik D.

    2002-01-01

    Fourth-order many-body corrections to matrix elements for atoms with one valence electron are derived. The obtained diagrams are classified using coupled-cluster-inspired separation into contributions from n-particle excitations from the lowest-order wave function. The complete set of fourth-order diagrams involves only connected single, double, and triple excitations and disconnected quadruple excitations. Approximately half of the fourth-order diagrams are not accounted for by the popular coupled-cluster method truncated at single and double excitations (CCSD). Explicit formulas are tabulated for the entire set of fourth-order diagrams missed by the CCSD method and its linearized version, i.e., contributions from connected triple and disconnected quadruple excitations. A partial summation scheme of the derived fourth-order contributions to all orders of perturbation theory is proposed

  19. Functional connectivity analysis of the neural bases of emotion regulation: A comparison of independent component method with density-based k-means clustering method.

    Science.gov (United States)

    Zou, Ling; Guo, Qian; Xu, Yi; Yang, Biao; Jiao, Zhuqing; Xiang, Jianbo

    2016-04-29

    Functional magnetic resonance imaging (fMRI) is an important tool in neuroscience for assessing connectivity and interactions between distant areas of the brain. To find and characterize the coherent patterns of brain activity as a means of identifying brain systems for the cognitive reappraisal of the emotion task, both density-based k-means clustering and independent component analysis (ICA) methods can be applied to characterize the interactions between brain regions involved in cognitive reappraisal of emotion. Our results reveal that compared with the ICA method, the density-based k-means clustering method provides a higher sensitivity of polymerization. In addition, it is more sensitive to those relatively weak functional connection regions. Thus, the study concludes that in the process of receiving emotional stimuli, the relatively obvious activation areas are mainly distributed in the frontal lobe, cingulum and near the hypothalamus. Furthermore, density-based k-means clustering method creates a more reliable method for follow-up studies of brain functional connectivity.

  20. Covariance descriptor fusion for target detection

    Science.gov (United States)

    Cukur, Huseyin; Binol, Hamidullah; Bal, Abdullah; Yavuz, Fatih

    2016-05-01

    Target detection is one of the most important topics for military or civilian applications. In order to address such detection tasks, hyperspectral imaging sensors provide useful images data containing both spatial and spectral information. Target detection has various challenging scenarios for hyperspectral images. To overcome these challenges, covariance descriptor presents many advantages. Detection capability of the conventional covariance descriptor technique can be improved by fusion methods. In this paper, hyperspectral bands are clustered according to inter-bands correlation. Target detection is then realized by fusion of covariance descriptor results based on the band clusters. The proposed combination technique is denoted Covariance Descriptor Fusion (CDF). The efficiency of the CDF is evaluated by applying to hyperspectral imagery to detect man-made objects. The obtained results show that the CDF presents better performance than the conventional covariance descriptor.

  1. Potential use of SERS-assisted theranostic strategy based on Fe{sub 3}O{sub 4}/Au cluster/shell nanocomposites for bio-detection, MRI, and magnetic hyperthermia

    Energy Technology Data Exchange (ETDEWEB)

    Han, Yu; Lei, Sheng-lan [Department of Biomaterials, College of Materials, Xiamen University, Xiamen 361005 (China); Lu, Jian-hua [Department of Electronic Science, College of Physical Science and Technology, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance Research, Xiamen University, Xiamen 361005 (China); He, Yuan [Department of Biomaterials, College of Materials, Xiamen University, Xiamen 361005 (China); Chen, Zhi-wei, E-mail: chenzhiwei@xmu.edu.cn [Department of Electronic Science, College of Physical Science and Technology, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance Research, Xiamen University, Xiamen 361005 (China); Ren, Lei, E-mail: renlei@xmu.edu.cn [Department of Biomaterials, College of Materials, Xiamen University, Xiamen 361005 (China); State Key Laboratory for Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005 (China); Fujian Collaborative Innovation Center for Exploitation and Utilization of Marine Biological Resources, Xiamen 361005 (China); Zhou, Xi [Department of Biomaterials, College of Materials, Xiamen University, Xiamen 361005 (China); Fujian Provincial Key Laboratory of Fire Retardant Materials, Xiamen University, Xiamen 361005 (China); Fujian Collaborative Innovation Center for Exploitation and Utilization of Marine Biological Resources, Xiamen 361005 (China)

    2016-07-01

    A surface-enhanced Raman scattering (SERS)-assisted theranostic strategy was designed based on a synthesized multifunctional Fe{sub 3}O{sub 4}/Au cluster/shell nanocomposite. This theranostic strategy was used for free prostate specific antigen (free-PSA) detection, magnetic resonance imaging (MRI), and magnetic hyperthermia. The lowest protein concentration detected was 1 ng mL{sup −1}, and the limit of detection (LOD) of the calculated PSA was 0.75 ng mL{sup −1}. Then, MRI was carried out to visualize the tumor cell. Lastly, magnetic hyperthermia was employed and revealed a favorable killing effect for the tumor cells. Thus, this SERS-assisted strategy based on a Fe{sub 3}O{sub 4}/Au cluster/shell nanocomposite showed great advantages in theranostic treatment. - Graphical abstract: Fe{sub 3}O{sub 4}/Au cluster/shell composite can be used for specific protein detection, magnetic resonance imaging and magnetic hyperthermia therapy. - Highlights: • We designed a SERS-assisted theranostic strategy based on the mutifunctional nanocomposites using gold shelled Fe{sub 3}O{sub 4} clusters. • Fe{sub 3}O{sub 4}/Au nanoparticles with theranostics and SERS for early diagnosis of PSA were reported for the first time. • The LOD of detection for PSA was lowed as 0.75 ng mL{sup −1}, and the total detection time was shorten to less than 1 h. • Fe{sub 3}O{sub 4} clusters had spin-spin (T{sub 2}) contrast enhancement and increased magnetic response. • Gold nanoshells supplied excellent chemical stability, biocompatibility, better heating property for magnetic hyperthermia.

  2. Cluster monte carlo method for nuclear criticality safety calculation

    International Nuclear Information System (INIS)

    Pei Lucheng

    1984-01-01

    One of the most important applications of the Monte Carlo method is the calculation of the nuclear criticality safety. The fair source game problem was presented at almost the same time as the Monte Carlo method was applied to calculating the nuclear criticality safety. The source iteration cost may be reduced as much as possible or no need for any source iteration. This kind of problems all belongs to the fair source game prolems, among which, the optimal source game is without any source iteration. Although the single neutron Monte Carlo method solved the problem without the source iteration, there is still quite an apparent shortcoming in it, that is, it solves the problem without the source iteration only in the asymptotic sense. In this work, a new Monte Carlo method called the cluster Monte Carlo method is given to solve the problem further

  3. A support vector machine approach for detection of microcalcifications.

    Science.gov (United States)

    El-Naqa, Issam; Yang, Yongyi; Wernick, Miles N; Galatsanos, Nikolas P; Nishikawa, Robert M

    2002-12-01

    In this paper, we investigate an approach based on support vector machines (SVMs) for detection of microcalcification (MC) clusters in digital mammograms, and propose a successive enhancement learning scheme for improved performance. SVM is a machine-learning method, based on the principle of structural risk minimization, which performs well when applied to data outside the training set. We formulate MC detection as a supervised-learning problem and apply SVM to develop the detection algorithm. We use the SVM to detect at each location in the image whether an MC is present or not. We tested the proposed method using a database of 76 clinical mammograms containing 1120 MCs. We use free-response receiver operating characteristic curves to evaluate detection performance, and compare the proposed algorithm with several existing methods. In our experiments, the proposed SVM framework outperformed all the other methods tested. In particular, a sensitivity as high as 94% was achieved by the SVM method at an error rate of one false-positive cluster per image. The ability of SVM to out perform several well-known methods developed for the widely studied problem of MC detection suggests that SVM is a promising technique for object detection in a medical imaging application.

  4. A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream

    Science.gov (United States)

    Ying Wah, Teh

    2014-01-01

    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753

  5. Infrared Multiple Photon Dissociation Spectroscopy Of Metal Cluster-Adducts

    Science.gov (United States)

    Cox, D. M.; Kaldor, A.; Zakin, M. R.

    1987-01-01

    Recent development of the laser vaporization technique combined with mass-selective detection has made possible new studies of the fundamental chemical and physical properties of unsupported transition metal clusters as a function of the number of constituent atoms. A variety of experimental techniques have been developed in our laboratory to measure ionization threshold energies, magnetic moments, and gas phase reactivity of clusters. However, studies have so far been unable to determine the cluster structure or the chemical state of chemisorbed species on gas phase clusters. The application of infrared multiple photon dissociation IRMPD to obtain the IR absorption properties of metal cluster-adsorbate species in a molecular beam is described here. Specifically using a high power, pulsed CO2 laser as the infrared source, the IRMPD spectrum for methanol chemisorbed on small iron clusters is measured as a function of the number of both iron atoms and methanols in the complex for different methanol isotopes. Both the feasibility and potential utility of IRMPD for characterizing metal cluster-adsorbate interactions are demonstrated. The method is generally applicable to any cluster or cluster-adsorbate system dependent only upon the availability of appropriate high power infrared sources.

  6. A Preliminary Study Application Clustering System in Acoustic Emission Monitoring

    Directory of Open Access Journals (Sweden)

    Saiful Bahari Nur Amira Afiza

    2017-01-01

    Full Text Available Acoustic Emission (AE is a non-destructive testing known as assessment on damage detection in structural engineering. It also can be used to discriminate the different types of damage occurring in a composite materials. The main problem associated with the data analysis is the discrimination between the different AE sources and analysis of the AE signal in order to identify the most critical damage mechanism. Clustering analysis is a technique in which the set of object are assigned to a group called cluster. The objective of the cluster analysis is to separate a set of data into several classes that reflect the internal structure of data. In this paper was used k-means algorithm for partitioned clustering method, numerous effort have been made to improve the performance of application k-means clustering algorithm. This paper presents a current review on application clustering system in Acoustic Emission.

  7. Single-cluster dynamics for the random-cluster model

    NARCIS (Netherlands)

    Deng, Y.; Qian, X.; Blöte, H.W.J.

    2009-01-01

    We formulate a single-cluster Monte Carlo algorithm for the simulation of the random-cluster model. This algorithm is a generalization of the Wolff single-cluster method for the q-state Potts model to noninteger values q>1. Its results for static quantities are in a satisfactory agreement with those

  8. GMDD: a database of GMO detection methods.

    Science.gov (United States)

    Dong, Wei; Yang, Litao; Shen, Kailin; Kim, Banghyun; Kleter, Gijs A; Marvin, Hans J P; Guo, Rong; Liang, Wanqi; Zhang, Dabing

    2008-06-04

    Since more than one hundred events of genetically modified organisms (GMOs) have been developed and approved for commercialization in global area, the GMO analysis methods are essential for the enforcement of GMO labelling regulations. Protein and nucleic acid-based detection techniques have been developed and utilized for GMOs identification and quantification. However, the information for harmonization and standardization of GMO analysis methods at global level is needed. GMO Detection method Database (GMDD) has collected almost all the previous developed and reported GMOs detection methods, which have been grouped by different strategies (screen-, gene-, construct-, and event-specific), and also provide a user-friendly search service of the detection methods by GMO event name, exogenous gene, or protein information, etc. In this database, users can obtain the sequences of exogenous integration, which will facilitate PCR primers and probes design. Also the information on endogenous genes, certified reference materials, reference molecules, and the validation status of developed methods is included in this database. Furthermore, registered users can also submit new detection methods and sequences to this database, and the newly submitted information will be released soon after being checked. GMDD contains comprehensive information of GMO detection methods. The database will make the GMOs analysis much easier.

  9. Data Clustering

    Science.gov (United States)

    Wagstaff, Kiri L.

    2012-03-01

    On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained

  10. Consumers' Kansei Needs Clustering Method for Product Emotional Design Based on Numerical Design Structure Matrix and Genetic Algorithms.

    Science.gov (United States)

    Yang, Yan-Pu; Chen, Deng-Kai; Gu, Rong; Gu, Yu-Feng; Yu, Sui-Huai

    2016-01-01

    Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design.

  11. A Spectrum Sensing Method Based on Signal Feature and Clustering Algorithm in Cognitive Wireless Multimedia Sensor Networks

    Directory of Open Access Journals (Sweden)

    Yongwei Zhang

    2017-01-01

    Full Text Available In order to solve the problem of difficulty in determining the threshold in spectrum sensing technologies based on the random matrix theory, a spectrum sensing method based on clustering algorithm and signal feature is proposed for Cognitive Wireless Multimedia Sensor Networks. Firstly, the wireless communication signal features are obtained according to the sampling signal covariance matrix. Then, the clustering algorithm is used to classify and test the signal features. Different signal features and clustering algorithms are compared in this paper. The experimental results show that the proposed method has better sensing performance.

  12. Digital breast tomosynthesis: computer-aided detection of clustered microcalcifications on planar projection images

    International Nuclear Information System (INIS)

    Samala, Ravi K; Chan, Heang-Ping; Lu, Yao; Hadjiiski, Lubomir M; Wei, Jun; Helvie, Mark A

    2014-01-01

    This paper describes a new approach to detect microcalcification clusters (MCs) in digital breast tomosynthesis (DBT) via its planar projection (PPJ) image. With IRB approval, two-view (cranio-caudal and mediolateral oblique views) DBTs of human subject breasts were obtained with a GE GEN2 prototype DBT system that acquires 21 projection angles spanning 60° in 3° increments. A data set of 307 volumes (154 human subjects) was divided by case into independent training (127 with MCs) and test sets (104 with MCs and 76 free of MCs). A simultaneous algebraic reconstruction technique with multiscale bilateral filtering (MSBF) regularization was used to enhance microcalcifications and suppress noise. During the MSBF regularized reconstruction, the DBT volume was separated into high frequency (HF) and low frequency components representing microcalcifications and larger structures. At the final iteration, maximum intensity projection was applied to the regularized HF volume to generate a PPJ image that contained MCs with increased contrast-to-noise ratio (CNR) and reduced search space. High CNR objects in the PPJ image were extracted and labeled as microcalcification candidates. Convolution neural network trained to recognize the image pattern of microcalcifications was used to classify the candidates into true calcifications and tissue structures and artifacts. The remaining microcalcification candidates were grouped into MCs by dynamic conditional clustering based on adaptive CNR threshold and radial distance criteria. False positive (FP) clusters were further reduced using the number of candidates in a cluster, CNR and size of microcalcification candidates. At 85% sensitivity an FP rate of 0.71 and 0.54 was achieved for view- and case-based sensitivity, respectively, compared to 2.16 and 0.85 achieved in DBT. The improvement was significant (p-value = 0.003) by JAFROC analysis. (paper)

  13. Multi-Optimisation Consensus Clustering

    Science.gov (United States)

    Li, Jian; Swift, Stephen; Liu, Xiaohui

    Ensemble Clustering has been developed to provide an alternative way of obtaining more stable and accurate clustering results. It aims to avoid the biases of individual clustering algorithms. However, it is still a challenge to develop an efficient and robust method for Ensemble Clustering. Based on an existing ensemble clustering method, Consensus Clustering (CC), this paper introduces an advanced Consensus Clustering algorithm called Multi-Optimisation Consensus Clustering (MOCC), which utilises an optimised Agreement Separation criterion and a Multi-Optimisation framework to improve the performance of CC. Fifteen different data sets are used for evaluating the performance of MOCC. The results reveal that MOCC can generate more accurate clustering results than the original CC algorithm.

  14. Label-free colorimetric detection of mercury via Hg2+ ions-accelerated structural transformation of nanoscale metal-oxo clusters

    Science.gov (United States)

    Chen, Kun; She, Shan; Zhang, Jiangwei; Bayaguud, Aruuhan; Wei, Yongge

    2015-11-01

    Mercury and its compounds are known to be extremely toxic but widely distributed in environment. Although many works have been reported to efficiently detect mercury, development of simple and convenient sensors is still longed for quick analyzing mercury in water. In this work, a nanoscale metal-oxo cluster, (n-Bu4N)2[Mo5NaO13(OCH3)4(NO)], (MLPOM), organically-derivatized from monolacunary Lindqvist-type polyoxomolybdate, is found to specifically react with Hg2+ in methanol/water via structural transformation. The MLPOM methanol solution displays a color change from purple to brown within seconds after being mixed with an aqueous solution containing Hg2+. By comparing the structure of polyoxomolybdate before and after reaction, the color change is revealed to be the essentially structural transformation of MLPOM accelerated by Hg2+. Based on this discovery, MLPOM could be utilized as a colorimetric sensor to sense the existence of Hg2+, and a simple and label-free method is developed to selectively detect aqueous Hg2+. Furthermore, the colorimetric sensor has been applied to indicating mercury contamination in industrial sewage.

  15. Statistical issues in galaxy cluster cosmology

    DEFF Research Database (Denmark)

    Mantz, Adam; Allen, Steven W.; Rapetti Serra, David Angelo

    2013-01-01

    The number and growth of massive galaxy clusters is a sensitive probe of cosmological structure formation and dark energy. Surveys at various wavelengths can detect clusters to high redshift, but the fact that cluster mass is not directly observable complicates matters, requiring us to simultaneo...

  16. A Novel Double Cluster and Principal Component Analysis-Based Optimization Method for the Orbit Design of Earth Observation Satellites

    Directory of Open Access Journals (Sweden)

    Yunfeng Dong

    2017-01-01

    Full Text Available The weighted sum and genetic algorithm-based hybrid method (WSGA-based HM, which has been applied to multiobjective orbit optimizations, is negatively influenced by human factors through the artificial choice of the weight coefficients in weighted sum method and the slow convergence of GA. To address these two problems, a cluster and principal component analysis-based optimization method (CPC-based OM is proposed, in which many candidate orbits are gradually randomly generated until the optimal orbit is obtained using a data mining method, that is, cluster analysis based on principal components. Then, the second cluster analysis of the orbital elements is introduced into CPC-based OM to improve the convergence, developing a novel double cluster and principal component analysis-based optimization method (DCPC-based OM. In DCPC-based OM, the cluster analysis based on principal components has the advantage of reducing the human influences, and the cluster analysis based on six orbital elements can reduce the search space to effectively accelerate convergence. The test results from a multiobjective numerical benchmark function and the orbit design results of an Earth observation satellite show that DCPC-based OM converges more efficiently than WSGA-based HM. And DCPC-based OM, to some degree, reduces the influence of human factors presented in WSGA-based HM.

  17. A SOM clustering pattern sequence-based next symbol prediction method for day-ahead direct electricity load and price forecasting

    International Nuclear Information System (INIS)

    Jin, Cheng Hao; Pok, Gouchol; Lee, Yongmi; Park, Hyun-Woo; Kim, Kwang Deuk; Yun, Unil; Ryu, Keun Ho

    2015-01-01

    Highlights: • A novel pattern sequence-based direct time series forecasting method was proposed. • Due to the use of SOM’s topology preserving property, only SOM can be applied. • SCPSNSP only deals with the cluster patterns not each specific time series value. • SCPSNSP performs better than recently developed forecasting algorithms. - Abstract: In this paper, we propose a new day-ahead direct time series forecasting method for competitive electricity markets based on clustering and next symbol prediction. In the clustering step, pattern sequence and their topology relations are obtained from self organizing map time series clustering. In the next symbol prediction step, with each cluster label in the pattern sequence represented as a pair of its topologically identical coordinates, artificial neural network is used to predict the topological coordinates of next day by training the relationship between previous daily pattern sequence and its next day pattern. According to the obtained topology relations, the nearest nonzero hits pattern is assigned to next day so that the whole time series values can be directly forecasted from the assigned cluster pattern. The proposed method was evaluated on Spanish, Australian and New York electricity markets and compared with PSF and some of the most recently published forecasting methods. Experimental results show that the proposed method outperforms the best forecasting methods at least 3.64%

  18. Cluster fusion algorithm: application to Lennard-Jones clusters

    DEFF Research Database (Denmark)

    Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter

    2006-01-01

    paths up to the cluster size of 150 atoms. We demonstrate that in this way all known global minima structures of the Lennard-Jones clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence......We present a new general theoretical framework for modelling the cluster structure and apply it to description of the Lennard-Jones clusters. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system and absorbing its energy at each step, we find cluster growing...... for the clusters of noble gas atoms and compare it with experimental observations. We report the striking correspondence of the peaks in the dependence of the second derivative of the binding energy per atom on cluster size calculated for the chain of the Lennard-Jones clusters based on the icosahedral symmetry...

  19. Cluster fusion algorithm: application to Lennard-Jones clusters

    DEFF Research Database (Denmark)

    Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter

    2008-01-01

    paths up to the cluster size of 150 atoms. We demonstrate that in this way all known global minima structures of the Lennard-Jones clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence......We present a new general theoretical framework for modelling the cluster structure and apply it to description of the Lennard-Jones clusters. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system and absorbing its energy at each step, we find cluster growing...... for the clusters of noble gas atoms and compare it with experimental observations. We report the striking correspondence of the peaks in the dependence of the second derivative of the binding energy per atom on cluster size calculated for the chain of the Lennard-Jones clusters based on the icosahedral symmetry...

  20. The IMACS Cluster Building Survey. I. Description of the Survey and Analysis Methods

    Science.gov (United States)

    Oemler Jr., Augustus; Dressler, Alan; Gladders, Michael G.; Rigby, Jane R.; Bai, Lei; Kelson, Daniel; Villanueva, Edward; Fritz, Jacopo; Rieke, George; Poggianti, Bianca M.; hide

    2013-01-01

    The IMACS Cluster Building Survey uses the wide field spectroscopic capabilities of the IMACS spectrograph on the 6.5 m Baade Telescope to survey the large-scale environment surrounding rich intermediate-redshift clusters of galaxies. The goal is to understand the processes which may be transforming star-forming field galaxies into quiescent cluster members as groups and individual galaxies fall into the cluster from the surrounding supercluster. This first paper describes the survey: the data taking and reduction methods. We provide new calibrations of star formation rates (SFRs) derived from optical and infrared spectroscopy and photometry. We demonstrate that there is a tight relation between the observed SFR per unit B luminosity, and the ratio of the extinctions of the stellar continuum and the optical emission lines.With this, we can obtain accurate extinction-corrected colors of galaxies. Using these colors as well as other spectral measures, we determine new criteria for the existence of ongoing and recent starbursts in galaxies.

  1. THE IMACS CLUSTER BUILDING SURVEY. I. DESCRIPTION OF THE SURVEY AND ANALYSIS METHODS

    Energy Technology Data Exchange (ETDEWEB)

    Oemler, Augustus Jr.; Dressler, Alan; Kelson, Daniel; Villanueva, Edward [Observatories of the Carnegie Institution for Science, 813 Santa Barbara St., Pasadena, CA 91101-1292 (United States); Gladders, Michael G. [Department of Astronomy and Astrophysics, University of Chicago, Chicago, IL 60637 (United States); Rigby, Jane R. [Observational Cosmology Lab, NASA Goddard Space Flight Center, Greenbelt, MD 20771 (United States); Bai Lei [Department of Astronomy and Astrophysics, University of Toronto, 50 St. George Street, Toronto, ON M5S 3H4 (Canada); Fritz, Jacopo [Sterrenkundig Observatorium, Universiteit Gent, Krijgslaan 281 S9, B-9000 Gent (Belgium); Rieke, George [Steward Observatory, University of Arizona, Tucson, AZ 8572 (United States); Poggianti, Bianca M.; Vulcani, Benedetta, E-mail: oemler@obs.carnegiescience.edu [INAF-Osservatorio Astronomico di Padova, Vicolo dell' Osservatorio 5, I-35122 Padova (Italy)

    2013-06-10

    The IMACS Cluster Building Survey uses the wide field spectroscopic capabilities of the IMACS spectrograph on the 6.5 m Baade Telescope to survey the large-scale environment surrounding rich intermediate-redshift clusters of galaxies. The goal is to understand the processes which may be transforming star-forming field galaxies into quiescent cluster members as groups and individual galaxies fall into the cluster from the surrounding supercluster. This first paper describes the survey: the data taking and reduction methods. We provide new calibrations of star formation rates (SFRs) derived from optical and infrared spectroscopy and photometry. We demonstrate that there is a tight relation between the observed SFR per unit B luminosity, and the ratio of the extinctions of the stellar continuum and the optical emission lines. With this, we can obtain accurate extinction-corrected colors of galaxies. Using these colors as well as other spectral measures, we determine new criteria for the existence of ongoing and recent starbursts in galaxies.

  2. A new fault detection method for computer networks

    International Nuclear Information System (INIS)

    Lu, Lu; Xu, Zhengguo; Wang, Wenhai; Sun, Youxian

    2013-01-01

    Over the past few years, fault detection for computer networks has attracted extensive attentions for its importance in network management. Most existing fault detection methods are based on active probing techniques which can detect the occurrence of faults fast and precisely. But these methods suffer from the limitation of traffic overhead, especially in large scale networks. To relieve traffic overhead induced by active probing based methods, a new fault detection method, whose key is to divide the detection process into multiple stages, is proposed in this paper. During each stage, only a small region of the network is detected by using a small set of probes. Meanwhile, it also ensures that the entire network can be covered after multiple detection stages. This method can guarantee that the traffic used by probes during each detection stage is small sufficiently so that the network can operate without severe disturbance from probes. Several simulation results verify the effectiveness of the proposed method

  3. CLUSTER LENSING PROFILES DERIVED FROM A REDSHIFT ENHANCEMENT OF MAGNIFIED BOSS-SURVEY GALAXIES

    International Nuclear Information System (INIS)

    Coupon, Jean; Umetsu, Keiichi; Broadhurst, Tom

    2013-01-01

    We report the first detection of a redshift-depth enhancement of background galaxies magnified by foreground clusters. Using 300,000 BOSS survey galaxies with accurate spectroscopic redshifts, we measure their mean redshift depth behind four large samples of optically selected clusters from the Sloan Digital Sky Survey (SDSS) surveys, totaling 5000-15,000 clusters. A clear trend of increasing mean redshift toward the cluster centers is found, averaged over each of the four cluster samples. In addition, we find similar but noisier behavior for an independent X-ray sample of 158 clusters lying in the foreground of the current BOSS sky area. By adopting the mass-richness relationships appropriate for each survey, we compare our results with theoretical predictions for each of the four SDSS cluster catalogs. The radial form of this redshift enhancement is well fitted by a richness-to-mass weighted composite Navarro-Frenk-White profile with an effective mass ranging between M 200 ∼ 1.4-1.8 × 10 14 M ☉ for the optically detected cluster samples, and M 200 ∼ 5.0 × 10 14 M ☉ for the X-ray sample. This lensing detection helps to establish the credibility of these SDSS cluster surveys, and provides a normalization for their respective mass-richness relations. In the context of the upcoming bigBOSS, Subaru Prime Focus Spectrograph, and EUCLID-NISP spectroscopic surveys, this method represents an independent means of deriving the masses of cluster samples for examining the cosmological evolution, and provides a relatively clean consistency check of weak-lensing measurements, free from the systematic limitations of shear calibration

  4. Unsupervised Learning (Clustering) of Odontocete Echolocation Clicks

    Science.gov (United States)

    2015-09-30

    develop methods for clustering of marine mammal echolocation clicks to learn about species assemblages where little or no prior knowledge exists about... Mexico or the Atlanic. 2 APPROACH Acoustic encounters with odontocetes are detected automatically and noise-corrected cepstral features...Estmation of Marine Mammals Using Passive Acoustic Monitoring (DCLDE). KL divergence maps were created for all known species, but the sperm whale

  5. Scalable Static and Dynamic Community Detection Using Grappolo

    Energy Technology Data Exchange (ETDEWEB)

    Halappanavar, Mahantesh; Lu, Hao; Kalyanaraman, Anantharaman; Tumeo, Antonino

    2017-09-12

    Graph clustering, popularly known as community detection, is a fundamental kernel for several applications of relevance to the Defense Advanced Research Projects Agency’s (DARPA) Hierarchical Identify Verify Exploit (HIVE) Pro- gram. Clusters or communities represent natural divisions within a network that are densely connected within a cluster and sparsely connected to the rest of the network. The need to compute clustering on large scale data necessitates the development of efficient algorithms that can exploit modern architectures that are fundamentally parallel in nature. How- ever, due to their irregular and inherently sequential nature, many of the current algorithms for community detection are challenging to parallelize. In response to the HIVE Graph Challenge, we present several parallelization heuristics for fast community detection using the Louvain method as the serial template. We implement all the heuristics in a software library called Grappolo. Using the inputs from the HIVE Challenge, we demonstrate superior performance and high quality solutions based on four parallelization heuristics. We use Grappolo on static graphs as the first step towards community detection on streaming graphs.

  6. Applying spatial analysis tools in public health: an example using SaTScan to detect geographic targets for colorectal cancer screening interventions.

    Science.gov (United States)

    Sherman, Recinda L; Henry, Kevin A; Tannenbaum, Stacey L; Feaster, Daniel J; Kobetz, Erin; Lee, David J

    2014-03-20

    Epidemiologists are gradually incorporating spatial analysis into health-related research as geocoded cases of disease become widely available and health-focused geospatial computer applications are developed. One health-focused application of spatial analysis is cluster detection. Using cluster detection to identify geographic areas with high-risk populations and then screening those populations for disease can improve cancer control. SaTScan is a free cluster-detection software application used by epidemiologists around the world to describe spatial clusters of infectious and chronic disease, as well as disease vectors and risk factors. The objectives of this article are to describe how spatial analysis can be used in cancer control to detect geographic areas in need of colorectal cancer screening intervention, identify issues commonly encountered by SaTScan users, detail how to select the appropriate methods for using SaTScan, and explain how method selection can affect results. As an example, we used various methods to detect areas in Florida where the population is at high risk for late-stage diagnosis of colorectal cancer. We found that much of our analysis was underpowered and that no single method detected all clusters of statistical or public health significance. However, all methods detected 1 area as high risk; this area is potentially a priority area for a screening intervention. Cluster detection can be incorporated into routine public health operations, but the challenge is to identify areas in which the burden of disease can be alleviated through public health intervention. Reliance on SaTScan's default settings does not always produce pertinent results.

  7. Epidemiological analysis of Salmonella clusters identified by whole genome sequencing, England and Wales 2014.

    Science.gov (United States)

    Waldram, Alison; Dolan, Gayle; Ashton, Philip M; Jenkins, Claire; Dallman, Timothy J

    2018-05-01

    The unprecedented level of bacterial strain discrimination provided by whole genome sequencing (WGS) presents new challenges with respect to the utility and interpretation of the data. Whole genome sequences from 1445 isolates of Salmonella belonging to the most commonly identified serotypes in England and Wales isolated between April and August 2014 were analysed. Single linkage single nucleotide polymorphism thresholds at the 10, 5 and 0 level were explored for evidence of epidemiological links between clustered cases. Analysis of the WGS data organised 566 of the 1445 isolates into 32 clusters of five or more. A statistically significant epidemiological link was identified for 17 clusters. The clusters were associated with foreign travel (n = 8), consumption of Chinese takeaways (n = 4), chicken eaten at home (n = 2), and one each of the following; eating out, contact with another case in the home and contact with reptiles. In the same time frame, one cluster was detected using traditional outbreak detection methods. WGS can be used for the highly specific and highly sensitive detection of biologically related isolates when epidemiological links are obscured. Improvements in the collection of detailed, standardised exposure information would enhance cluster investigations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. An Unexpected Detection of Bifurcated Blue Straggler Sequences in the Young Globular Cluster NGC 2173

    Science.gov (United States)

    Li, Chengyuan; Deng, Licai; de Grijs, Richard; Jiang, Dengkai; Xin, Yu

    2018-03-01

    The bifurcated patterns in the color–magnitude diagrams of blue straggler stars (BSSs) have attracted significant attention. This type of special (but rare) pattern of two distinct blue straggler sequences is commonly interpreted as evidence that cluster core-collapse-driven stellar collisions are an efficient formation mechanism. Here, we report the detection of a bifurcated blue straggler distribution in a young Large Magellanic Cloud cluster, NGC 2173. Because of the cluster’s low central stellar number density and its young age, dynamical analysis shows that stellar collisions alone cannot explain the observed BSSs. Therefore, binary evolution is instead the most viable explanation of the origin of these BSSs. However, the reason why binary evolution would render the color–magnitude distribution of BSSs bifurcated remains unclear. C. Li, L. Deng, and R. de Grijs jointly designed this project.

  9. A comparison of confidence interval methods for the intraclass correlation coefficient in community-based cluster randomization trials with a binary outcome.

    Science.gov (United States)

    Braschel, Melissa C; Svec, Ivana; Darlington, Gerarda A; Donner, Allan

    2016-04-01

    Many investigators rely on previously published point estimates of the intraclass correlation coefficient rather than on their associated confidence intervals to determine the required size of a newly planned cluster randomized trial. Although confidence interval methods for the intraclass correlation coefficient that can be applied to community-based trials have been developed for a continuous outcome variable, fewer methods exist for a binary outcome variable. The aim of this study is to evaluate confidence interval methods for the intraclass correlation coefficient applied to binary outcomes in community intervention trials enrolling a small number of large clusters. Existing methods for confidence interval construction are examined and compared to a new ad hoc approach based on dividing clusters into a large number of smaller sub-clusters and subsequently applying existing methods to the resulting data. Monte Carlo simulation is used to assess the width and coverage of confidence intervals for the intraclass correlation coefficient based on Smith's large sample approximation of the standard error of the one-way analysis of variance estimator, an inverted modified Wald test for the Fleiss-Cuzick estimator, and intervals constructed using a bootstrap-t applied to a variance-stabilizing transformation of the intraclass correlation coefficient estimate. In addition, a new approach is applied in which clusters are randomly divided into a large number of smaller sub-clusters with the same methods applied to these data (with the exception of the bootstrap-t interval, which assumes large cluster sizes). These methods are also applied to a cluster randomized trial on adolescent tobacco use for illustration. When applied to a binary outcome variable in a small number of large clusters, existing confidence interval methods for the intraclass correlation coefficient provide poor coverage. However, confidence intervals constructed using the new approach combined with Smith

  10. Temporal Methods to Detect Content-Based Anomalies in Social Media

    Energy Technology Data Exchange (ETDEWEB)

    Skryzalin, Jacek [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Field, Jr., Richard [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Fisher, Andrew N. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Bauer, Travis L. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-11-01

    Here, we develop a method for time-dependent topic tracking and meme trending in social media. Our objective is to identify time periods whose content differs signifcantly from normal, and we utilize two techniques to do so. The first is an information-theoretic analysis of the distributions of terms emitted during different periods of time. In the second, we cluster documents from each time period and analyze the tightness of each clustering. We also discuss a method of combining the scores created by each technique, and we provide ample empirical analysis of our methodology on various Twitter datasets.

  11. Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review.

    Science.gov (United States)

    Kristunas, Caroline; Morris, Tom; Gray, Laura

    2017-11-15

    To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Any, not limited to healthcare settings. Any taking part in an SW-CRT published up to March 2016. The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22-0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  12. Thermodynamics of non-ideal QGP using Mayers cluster expansion method

    International Nuclear Information System (INIS)

    Prasanth, J.P; Simji, P.; Bannur, Vishnu M.

    2013-01-01

    The Quark gluon plasma (QGP) is the state in which the individual hadrons dissolve into a system of free (or almost free) quarks and gluons in strongly compressed system at high temperature. The present paper aims to calculate the critical temperature at which a non-ideal three quark plasma condenses into droplet of three quarks (i.e., into a liquid of baryons) using Mayers cluster expansion method

  13. Statistical Issues in Galaxy Cluster Cosmology

    Science.gov (United States)

    Mantz, Adam

    2013-01-01

    The number and growth of massive galaxy clusters are sensitive probes of cosmological structure formation. Surveys at various wavelengths can detect clusters to high redshift, but the fact that cluster mass is not directly observable complicates matters, requiring us to simultaneously constrain scaling relations of observable signals with mass. The problem can be cast as one of regression, in which the data set is truncated, the (cosmology-dependent) underlying population must be modeled, and strong, complex correlations between measurements often exist. Simulations of cosmological structure formation provide a robust prediction for the number of clusters in the Universe as a function of mass and redshift (the mass function), but they cannot reliably predict the observables used to detect clusters in sky surveys (e.g. X-ray luminosity). Consequently, observers must constrain observable-mass scaling relations using additional data, and use the scaling relation model in conjunction with the mass function to predict the number of clusters as a function of redshift and luminosity.

  14. Cosmology with clusters in the CMB

    International Nuclear Information System (INIS)

    Majumdar, Subhabrata

    2008-01-01

    Ever since the seminal work by Sunyaev and Zel'dovich describing the distortion of the CMB spectrum, due to photons passing through the hot inter cluster gas on its way to us from the surface of last scattering (the so called Sunyaev-Zel'dovich effect (SZE)), small scale distortions of the CMB by clusters has been used to detect clusters as well as to do cosmology with clusters. Cosmology with clusters in the CMB can be divided into three distinct regimes: a) when the clusters are completely unresolved and contribute to the secondary CMB distortions power spectrum at small angular scales; b) when we can just about resolve the clusters so as to detect the clusters through its total SZE flux such that the clusters can be tagged and counted for doing cosmology and c) when we can completely resolve the clusters so as to measure their sizes and other cluster structural properties and their evolution with redshift. In this article, we take a look at these three aspects of SZE cluster studies and their implication for using clusters as cosmological probes. We show that clusters can be used as effective probes of cosmology, when in all of these three cases, one explores the synergy between cluster physics and cosmology as well take clues about cluster physics from the latest high precision cluster observations (for example, from Chandra and XMM - Newton). As a specific case, we show how an observationally motivated cluster SZ template can explain the CBI-excess without the need for a high σ 8 . We also briefly discuss 'self-calibration' in cluster surveys and the prospect of using clusters as an ensemble of cosmic rulers to break degeneracies arising in cluster cosmology.

  15. Clusters of Multidrug-Resistant Mycobacterium tuberculosis Cases, Europe

    Science.gov (United States)

    Kremer, Kristin; Heersma, Herre; Van Soolingen, Dick

    2009-01-01

    Molecular surveillance of multidrug-resistant tuberculosis (MDR TB) was implemented in Europe as case reporting in 2005. For all new MDR TB cases detected from January 2003 through June 2007, countries reported case-based epidemiologic data and DNA fingerprint patterns of MDR TB strains when available. International clusters were detected and analyzed. From 2003 through mid-2007 in Europe, 2,494 cases of MDR TB were reported from 24 European countries. Epidemiologic and molecular data were linked for 593 (39%) cases, and 672 insertion sequence 6110 DNA fingerprint patterns were reported from 19 countries. Of these patterns, 288 (43%) belonged to 18 European clusters; 7 clusters (242/288 cases, 84%) were characterized by strains of the Beijing genotype family, including the largest cluster (175/288 cases, 61%). Both clustering and the Beijing genotype were associated with strains originating in eastern European countries. Molecular cluster detection contributes to identification of transmission profile, risk factors, and control measures. PMID:19624920

  16. Mass spectrometric production of heterogeneous metal clusters using Knudsen cell

    Directory of Open Access Journals (Sweden)

    Veljković Filip M.

    2016-01-01

    Full Text Available Knudsen effusion mass spectrometry or high-temperature method of mass spectrometry for decades gives new information about saturated vapor of hardly volatile compounds and it is an important method in the discovery of many new molecules, radicals, ions and clusters present in the gas phase. Since pioneering works until now, this method has been successfully applied to a large number of systems (ores, oxides, ceramics, glass materials, borides, carbides, sulfides, nitrates, metals, fullerenes, etc which led to the establishment of various research branches such as chemistry of clusters. This paper describes the basic principles of Knudsen cell use for both identification of chemical species created in the process of evaporation and determination of their ionization energies. Depending on detected ions intensities and the partial pressure of each gaseous component, as well as on changes in partial pressure with temperature, Knudsen cell mass spectrometry enables the determination of thermodynamic parameters of the tested system. A special attention is paid to its application in the field of small heterogeneous and homogeneous clusters of alkali metals. Furthermore, experimental results for thermodynamic parameters of some clusters, as well as capabilities of non-standard ways of using Knudsen cells in the process of synthesis of new clusters are presented herein. [Projekat Ministarstva nauke Republike Srbije, br. 172019

  17. Research of the Space Clustering Method for the Airport Noise Data Minings

    Directory of Open Access Journals (Sweden)

    Jiwen Xie

    2014-03-01

    Full Text Available Mining the distribution pattern and evolution of the airport noise from the airport noise data and the geographic information of the monitoring points is of great significance for the scientific and rational governance of airport noise pollution problem. However, most of the traditional clustering methods are based on the closeness of space location or the similarity of non-spatial features, which split the duality of space elements, resulting in that the clustering result has difficult in satisfying both the closeness of space location and the similarity of non-spatial features. This paper, therefore, proposes a spatial clustering algorithm based on dual-distance. This algorithm uses a distance function as the similarity measure function in which spatial features and non-spatial features are combined. The experimental results show that the proposed algorithm can discover the noise distribution pattern around the airport effectively.

  18. New Target for an Old Method: Hubble Measures Globular Cluster Parallax

    Science.gov (United States)

    Hensley, Kerry

    2018-05-01

    Measuring precise distances to faraway objects has long been a challenge in astrophysics. Now, one of the earliest techniques used to measure the distance to astrophysical objects has been applied to a metal-poor globular cluster for the first time.A Classic TechniqueAn artists impression of the European Space Agencys Gaia spacecraft. Gaia is on track to map the positions and motions of a billion stars. [ESA]Distances to nearby stars are often measured using the parallax technique tracing the tiny apparent motion of a target star against the background of more distant stars as Earth orbits the Sun. This technique has come a long way since it was first used in the 1800s to measure the distance to stars a few tens of light-years away; with the advent of space observatories like Hipparcos and Gaia, parallax can now be used to map the positions of stars out to thousands of light-years.Precise distance measurements arent only important for setting the scale of the universe, however; they can also help us better understand stellar evolution over the course of cosmic history. Stellar evolution models are often anchored to a reference star cluster, the properties of which must be known precisely. These precise properties can be readily determined for young, nearby open clusters using parallax measurements. But stellar evolution models that anchor on themore-distant, ancient, metal-poor globular clusters have been hampered by theless-precise indirect methods used tomeasure distance to these faraway clusters until now.Top: An image of NGC 6397 overlaid with the area scanned by Hubble (dashed green) and the footprint of the camera (solid green). The blue ellipse represents the parallax motion of a star in the cluster, exaggerated by a factor of ten thousand. Bottom: An example scan from this field. [Adapted from Brown et al. 2018]New Measurement to an Old ClusterThomas Brown (Space Telescope Science Institute) and collaborators used the Hubble Space Telescope todetermine the

  19. Simple method to calculate percolation, Ising and Potts clusters

    International Nuclear Information System (INIS)

    Tsallis, C.

    1981-01-01

    A procedure ('break-collapse method') is introduced which considerably simplifies the calculation of two - or multirooted clusters like those commonly appearing in real space renormalization group (RG) treatments of bond-percolation, and pure and random Ising and Potts problems. The method is illustrated through two applications for the q-state Potts ferromagnet. The first of them concerns a RG calculation of the critical exponent ν for the isotropic square lattice: numerical consistence is obtained (particularly for q→0) with den Nijs conjecture. The second application is a compact reformulation of the standard star-triangle and duality transformations which provide the exact critical temperature for the anisotropic triangular and honeycomb lattices. (Author) [pt

  20. Android Malware Classification Using K-Means Clustering Algorithm

    Science.gov (United States)

    Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah

    2017-08-01

    Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.