WorldWideScience

Sample records for temporal clustering analysis

  1. Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

    Science.gov (United States)

    Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

    2013-03-01

    Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.

  2. Spatial-Temporal Clustering of Tornadoes

    Science.gov (United States)

    Malamud, Bruce D.; Turcotte, Donald L.; Brooks, Harold E.

    2017-04-01

    The standard measure of the intensity of a tornado is the Enhanced Fujita scale, which is based qualitatively on the damage caused by a tornado. An alternative measure of tornado intensity is the tornado path length, L. Here we examine the spatial-temporal clustering of severe tornadoes, which we define as having path lengths L ≥ 10 km. Of particular concern are tornado outbreaks, when a large number of severe tornadoes occur in a day in a restricted region. We apply a spatial-temporal clustering analysis developed for earthquakes. We take all pairs of severe tornadoes in observed and modelled outbreaks, and for each pair plot the spatial lag (distance between touchdown points) against the temporal lag (time between touchdown points). We apply our spatial-temporal lag methodology to the intense tornado outbreaks in the central United States on 26 and 27 April 2011, which resulted in over 300 fatalities and produced 109 severe (L ≥ 10 km) tornadoes. The patterns of spatial-temporal lag correlations that we obtain for the 2 days are strikingly different. On 26 April 2011, there were 45 severe tornadoes and our clustering analysis is dominated by a complex sequence of linear features. We associate the linear patterns with the tornadoes generated in either a single cell thunderstorm or a closely spaced cluster of single cell thunderstorms moving at a near-constant velocity. Our study of a derecho tornado outbreak of six severe tornadoes on 4 April 2011 along with modelled outbreak scenarios confirms this association. On 27 April 2011, there were 64 severe tornadoes and our clustering analysis is predominantly random with virtually no embedded linear patterns. We associate this pattern with a large number of interacting supercell thunderstorms generating tornadoes randomly in space and time. In order to better understand these associations, we also applied our approach to the Great Plains tornado outbreak of 3 May 1999. Careful studies by others have associated

  3. Cluster Oriented Spatio Temporal Multidimensional Data Visualization of Earthquakes in Indonesia

    Directory of Open Access Journals (Sweden)

    Mohammad Nur Shodiq

    2016-03-01

    Full Text Available Spatio temporal data clustering is challenge task. The result of clustering data are utilized to investigate the seismic parameters. Seismic parameters are used to describe the characteristics of earthquake behavior. One of the effective technique to study multidimensional spatio temporal data is visualization. But, visualization of multidimensional data is complicated problem. Because, this analysis consists of observed data cluster and seismic parameters. In this paper, we propose a visualization system, called as IES (Indonesia Earthquake System, for cluster analysis, spatio temporal analysis, and visualize the multidimensional data of seismic parameters. We analyze the cluster analysis by using automatic clustering, that consists of get optimal number of cluster and Hierarchical K-means clustering. We explore the visual cluster and multidimensional data in low dimensional space visualization. We made experiment with observed data, that consists of seismic data around Indonesian archipelago during 2004 to 2014. Keywords: Clustering, visualization, multidimensional data, seismic parameters.

  4. Ananke: temporal clustering reveals ecological dynamics of microbial communities

    Directory of Open Access Journals (Sweden)

    Michael W. Hall

    2017-09-01

    Full Text Available Taxonomic markers such as the 16S ribosomal RNA gene are widely used in microbial community analysis. A common first step in marker-gene analysis is grouping genes into clusters to reduce data sets to a more manageable size and potentially mitigate the effects of sequencing error. Instead of clustering based on sequence identity, marker-gene data sets collected over time can be clustered based on temporal correlation to reveal ecologically meaningful associations. We present Ananke, a free and open-source algorithm and software package that complements existing sequence-identity-based clustering approaches by clustering marker-gene data based on time-series profiles and provides interactive visualization of clusters, including highlighting of internal OTU inconsistencies. Ananke is able to cluster distinct temporal patterns from simulations of multiple ecological patterns, such as periodic seasonal dynamics and organism appearances/disappearances. We apply our algorithm to two longitudinal marker gene data sets: faecal communities from the human gut of an individual sampled over one year, and communities from a freshwater lake sampled over eleven years. Within the gut, the segregation of the bacterial community around a food-poisoning event was immediately clear. In the freshwater lake, we found that high sequence identity between marker genes does not guarantee similar temporal dynamics, and Ananke time-series clusters revealed patterns obscured by clustering based on sequence identity or taxonomy. Ananke is free and open-source software available at https://github.com/beiko-lab/ananke.

  5. Mortality in Danish Swine herds: Spatio-temporal clusters and risk factors

    DEFF Research Database (Denmark)

    Lopes Antunes, Ana Carolina; Ersbøll, Annette Kjær; Bihrmann, Kristine

    2017-01-01

    -temporal analysis included data description for spatial, temporal, and spatio-temporal cluster analysis for three age groups: weaners (up to 30 kg), sows and finishers. Logistic regression models were used to assess the potential factors associated with finisher and weaner herds being included within multiple...

  6. a Web-Based Interactive Platform for Co-Clustering Spatio-Temporal Data

    Science.gov (United States)

    Wu, X.; Poorthuis, A.; Zurita-Milla, R.; Kraak, M.-J.

    2017-09-01

    Since current studies on clustering analysis mainly focus on exploring spatial or temporal patterns separately, a co-clustering algorithm is utilized in this study to enable the concurrent analysis of spatio-temporal patterns. To allow users to adopt and adapt the algorithm for their own analysis, it is integrated within the server side of an interactive web-based platform. The client side of the platform, running within any modern browser, is a graphical user interface (GUI) with multiple linked visualizations that facilitates the understanding, exploration and interpretation of the raw dataset and co-clustering results. Users can also upload their own datasets and adjust clustering parameters within the platform. To illustrate the use of this platform, an annual temperature dataset from 28 weather stations over 20 years in the Netherlands is used. After the dataset is loaded, it is visualized in a set of linked visualizations: a geographical map, a timeline and a heatmap. This aids the user in understanding the nature of their dataset and the appropriate selection of co-clustering parameters. Once the dataset is processed by the co-clustering algorithm, the results are visualized in the small multiples, a heatmap and a timeline to provide various views for better understanding and also further interpretation. Since the visualization and analysis are integrated in a seamless platform, the user can explore different sets of co-clustering parameters and instantly view the results in order to do iterative, exploratory data analysis. As such, this interactive web-based platform allows users to analyze spatio-temporal data using the co-clustering method and also helps the understanding of the results using multiple linked visualizations.

  7. Spatio-temporal clustering of wildfires in Portugal

    Science.gov (United States)

    Costa, R.; Pereira, M. G.; Caramelo, L.; Vega Orozco, C.; Kanevski, M.

    2012-04-01

    Several studies have shown that wildfires in Portugal presenthigh temporal as well as high spatial variability (Pereira et al., 2005, 2011). The identification and characterization of spatio-temporal clusters contributes to a comprehensivecharacterization of the fire regime and to improve the efficiency of fire prevention and combat activities. The main goalsin this studyare: (i) to detect the spatio-temporal clusters of burned area; and, (ii) to characterize these clusters along with the role of human and environmental factors. The data were supplied by the National Forest Authority(AFN, 2011) and comprises: (a)the Portuguese Rural Fire Database, PRFD, (Pereira et al., 2011) for the 1980-2007period; and, (b) the national mapping burned areas between 1990 and 2009. In this work, in order to complement the more common cluster analysis algorithms, an alternative approach based onscan statistics and on the permutation modelwas used. This statistical methodallows the detection of local excess events and to test if such an excess can reasonably have occurred by chance.Results obtained for different simulations performed for different spatial and temporal windows are presented, compared and interpreted.The influence of several fire factors such as (climate, vegetation type, etc.) is also assessed. Pereira, M.G., Trigo, R.M., DaCamara, C.C., Pereira, J.M.C., Leite, S.M., 2005:"Synoptic patterns associated with large summer forest fires in Portugal".Agricultural and Forest Meteorology. 129, 11-25. Pereira, M. G., Malamud, B. D., Trigo, R. M., and Alves, P. I.: The history and characteristics of the 1980-2005 Portuguese rural fire database, Nat. Hazards Earth Syst. Sci., 11, 3343-3358, doi:10.5194/nhess-11-3343-2011, 2011 AFN, 2011: AutoridadeFlorestalNacional (National Forest Authority). Available at http://www.afn.min-agricultura.pt/portal.

  8. a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis

    Science.gov (United States)

    Huang, W.; Li, S.; Xu, S.

    2016-06-01

    How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the

  9. The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

    Science.gov (United States)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-07-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  10. A THREE-STEP SPATIAL-TEMPORAL-SEMANTIC CLUSTERING METHOD FOR HUMAN ACTIVITY PATTERN ANALYSIS

    Directory of Open Access Journals (Sweden)

    W. Huang

    2016-06-01

    Full Text Available How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time to four dimensions (space, time and semantics. More specifically, not only a location and time that people stay and spend are collected, but also what people “say” for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The

  11. Clustering Vehicle Temporal and Spatial Travel Behavior Using License Plate Recognition Data

    Directory of Open Access Journals (Sweden)

    Huiyu Chen

    2017-01-01

    Full Text Available Understanding travel patterns of vehicle can support the planning and design of better services. In addition, vehicle clustering can improve management efficiency through more targeted access to groups of interest and facilitate planning by more specific survey design. This paper clustered 854,712 vehicles in a week using K-means clustering algorithm based on license plate recognition (LPR data obtained in Shenzhen, China. Firstly, several travel characteristics related to temporal and spatial variability and activity patterns are used to identify homogeneous clusters. Then, Davies-Bouldin index (DBI and Silhouette Coefficient (SC are applied to capture the optimal number of groups and, consequently, six groups are classified in weekdays and three groups are sorted in weekends, including commuting vehicles and some other occasional leisure travel vehicles. Moreover, a detailed analysis of the characteristics of each group in terms of spatial travel patterns and temporal changes are presented. This study highlights the possibility of applying LPR data for discovering the underlying factor in vehicle travel patterns and examining the characteristic of some groups specifically.

  12. Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms

    Science.gov (United States)

    Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel

    2016-04-01

    Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and

  13. Multi-temporal clustering of continental floods and associated atmospheric circulations

    Science.gov (United States)

    Liu, Jianyu; Zhang, Yongqiang

    2017-12-01

    Investigating clustering of floods has important social, economic and ecological implications. This study examines the clustering of Australian floods at different temporal scales and its possible physical mechanisms. Flood series with different severities are obtained by peaks-over-threshold (POT) sampling in four flood thresholds. At intra-annual scale, Cox regression and monthly frequency methods are used to examine whether and when the flood clustering exists, respectively. At inter-annual scale, dispersion indices with four-time variation windows are applied to investigate the inter-annual flood clustering and its variation. Furthermore, the Kernel occurrence rate estimate and bootstrap resampling methods are used to identify flood-rich/flood-poor periods. Finally, seasonal variation of horizontal wind at 850 hPa and vertical wind velocity at 500 hPa are used to investigate the possible mechanisms causing the temporal flood clustering. Our results show that: (1) flood occurrences exhibit clustering at intra-annual scale, which are regulated by climate indices representing the impacts of the Pacific and Indian Oceans; (2) the flood-rich months occur from January to March over northern Australia, and from July to September over southwestern and southeastern Australia; (3) stronger inter-annual clustering takes place across southern Australia than northern Australia; and (4) Australian floods are characterised by regional flood-rich and flood-poor periods, with 1987-1992 identified as the flood-rich period across southern Australia, but the flood-poor period across northern Australia, and 2001-2006 being the flood-poor period across most regions of Australia. The intra-annual and inter-annual clustering and temporal variation of flood occurrences are in accordance with the variation of atmospheric circulation. These results provide relevant information for flood management under the influence of climate variability, and, therefore, are helpful for developing

  14. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...

  15. Spatial and temporal clustering of dengue virus transmission in Thai villages.

    Science.gov (United States)

    Mammen, Mammen P; Pimgate, Chusak; Koenraadt, Constantianus J M; Rothman, Alan L; Aldstadt, Jared; Nisalak, Ananda; Jarman, Richard G; Jones, James W; Srikiatkhachorn, Anon; Ypil-Butac, Charity Ann; Getis, Arthur; Thammapalo, Suwich; Morrison, Amy C; Libraty, Daniel H; Green, Sharone; Scott, Thomas W

    2008-11-04

    Transmission of dengue viruses (DENV), the leading cause of arboviral disease worldwide, is known to vary through time and space, likely owing to a combination of factors related to the human host, virus, mosquito vector, and environment. An improved understanding of variation in transmission patterns is fundamental to conducting surveillance and implementing disease prevention strategies. To test the hypothesis that DENV transmission is spatially and temporally focal, we compared geographic and temporal characteristics within Thai villages where DENV are and are not being actively transmitted. Cluster investigations were conducted within 100 m of homes where febrile index children with (positive clusters) and without (negative clusters) acute dengue lived during two seasons of peak DENV transmission. Data on human infection and mosquito infection/density were examined to precisely (1) define the spatial and temporal dimensions of DENV transmission, (2) correlate these factors with variation in DENV transmission, and (3) determine the burden of inapparent and symptomatic infections. Among 556 village children enrolled as neighbors of 12 dengue-positive and 22 dengue-negative index cases, all 27 DENV infections (4.9% of enrollees) occurred in positive clusters (p availability of piped water in negative clusters (p < 0.01) and greater number of Ae. aegypti pupae per person in positive clusters (p = 0.04). During primarily DENV-4 transmission seasons, the ratio of inapparent to symptomatic infections was nearly 1:1 among child enrollees. Study limitations included inability to sample all children and mosquitoes within each cluster and our reliance on serologic rather than virologic evidence of interval infections in enrollees given restrictions on the frequency of blood collections in children. Our data reveal the remarkably focal nature of DENV transmission within a hyperendemic rural area of Thailand. These data suggest that active school-based dengue case detection

  16. Spatial and Temporal Assessment on Drug Addiction Using Multivariate Analysis and GIS

    International Nuclear Information System (INIS)

    Mohd Ekhwan Toriman; Mohd Ekhwan Toriman; Siti Nor Fazillah Abdullah; Izwan Arif Azizan; Mohd Khairul Amri Kamarudin; Roslan Umar; Nasir Mohamad

    2015-01-01

    There is a need for managing and displaying drug addiction phenomena and trend at both spatial and temporal scales. Spatial and temporal assessment on drug addiction in Terengganu was undertaken to understand the geographical area of district in the same cluster, in addition, identify the hot spot area of this problem and analysis the trend of drug addiction. Data used were topography map of Terengganu and number of drug addicted person in Terengganu by district within 10 years (2004-2013). Number of drug addicted person by district were mapped using Geographic Information system and analysed using a combination of multivariate analysis which is cluster analysis were applied to the database in order to validate the correlation between data in the same cluster. Result showed a cluster analysis for number of drug addiction by district generated three clusters which are Besut and Kuala Terengganu in cluster 1 named moderate drug addicted person (MDA), Dungun, Marang, Setiu and Hulu Terengganu in cluster 2 named lower drug addicted person (LDA) and Kemaman in cluster 3 named high drug addicted person(HDA). This analysis indicates that cluster 3 which is Kemaman is a hot spot area. These results were beneficial for stakeholder to monitor and manage this problem especially in the hot spot area which needs to be emphasized. (author)

  17. Cluster analysis of rural, urban, and curbside atmospheric particle size data.

    Science.gov (United States)

    Beddows, David C S; Dall'Osto, Manuel; Harrison, Roy M

    2009-07-01

    Particle size is a key determinant of the hazard posed by airborne particles. Continuous multivariate particle size data have been collected using aerosol particle size spectrometers sited at four locations within the UK: Harwell (Oxfordshire); Regents Park (London); British Telecom Tower (London); and Marylebone Road (London). These data have been analyzed using k-means cluster analysis, deduced to be the preferred cluster analysis technique, selected from an option of four partitional cluster packages, namelythe following: Fuzzy; k-means; k-median; and Model-Based clustering. Using cluster validation indices k-means clustering was shown to produce clusters with the smallest size, furthest separation, and importantly the highest degree of similarity between the elements within each partition. Using k-means clustering, the complexity of the data set is reduced allowing characterization of the data according to the temporal and spatial trends of the clusters. At Harwell, the rural background measurement site, the cluster analysis showed that the spectra may be differentiated by their modal-diameters and average temporal trends showing either high counts during the day-time or night-time hours. Likewise for the urban sites, the cluster analysis differentiated the spectra into a small number of size distributions according their modal-diameter, the location of the measurement site, and time of day. The responsible aerosol emission, formation, and dynamic processes can be inferred according to the cluster characteristics and correlation to concurrently measured meteorological, gas phase, and particle phase measurements.

  18. The absoluteness of semantic processing: lessons from the analysis of temporal clusters in phonemic verbal fluency.

    Directory of Open Access Journals (Sweden)

    Isabelle Vonberg

    Full Text Available For word production, we may consciously pursue semantic or phonological search strategies, but it is uncertain whether we can retrieve the different aspects of lexical information independently from each other. We therefore studied the spread of semantic information into words produced under exclusively phonemic task demands.42 subjects participated in a letter verbal fluency task, demanding the production of as many s-words as possible in two minutes. Based on curve fittings for the time courses of word production, output spurts (temporal clusters considered to reflect rapid lexical retrieval based on automatic activation spread, were identified. Semantic and phonemic word relatedness within versus between these clusters was assessed by respective scores (0 meaning no relation, 4 maximum relation.Subjects produced 27.5 (±9.4 words belonging to 6.7 (±2.4 clusters. Both phonemically and semantically words were more related within clusters than between clusters (phon: 0.33±0.22 vs. 0.19±0.17, p<.01; sem: 0.65±0.29 vs. 0.37±0.29, p<.01. Whereas the extent of phonemic relatedness correlated with high task performance, the contrary was the case for the extent of semantic relatedness.The results indicate that semantic information spread occurs, even if the consciously pursued word search strategy is purely phonological. This, together with the negative correlation between semantic relatedness and verbal output suits the idea of a semantic default mode of lexical search, acting against rapid task performance in the given scenario of phonemic verbal fluency. The simultaneity of enhanced semantic and phonemic word relatedness within the same temporal cluster boundaries suggests an interaction between content and sound-related information whenever a new semantic field has been opened.

  19. Spatial and temporal clustering of dengue virus transmission in Thai villages.

    Directory of Open Access Journals (Sweden)

    Mammen P Mammen

    2008-11-01

    Full Text Available Transmission of dengue viruses (DENV, the leading cause of arboviral disease worldwide, is known to vary through time and space, likely owing to a combination of factors related to the human host, virus, mosquito vector, and environment. An improved understanding of variation in transmission patterns is fundamental to conducting surveillance and implementing disease prevention strategies. To test the hypothesis that DENV transmission is spatially and temporally focal, we compared geographic and temporal characteristics within Thai villages where DENV are and are not being actively transmitted.Cluster investigations were conducted within 100 m of homes where febrile index children with (positive clusters and without (negative clusters acute dengue lived during two seasons of peak DENV transmission. Data on human infection and mosquito infection/density were examined to precisely (1 define the spatial and temporal dimensions of DENV transmission, (2 correlate these factors with variation in DENV transmission, and (3 determine the burden of inapparent and symptomatic infections. Among 556 village children enrolled as neighbors of 12 dengue-positive and 22 dengue-negative index cases, all 27 DENV infections (4.9% of enrollees occurred in positive clusters (p < 0.01; attributable risk [AR] = 10.4 per 100; 95% confidence interval 1-19.8 per 100]. In positive clusters, 12.4% of enrollees became infected in a 15-d period and DENV infections were aggregated centrally near homes of index cases. As only 1 of 217 pairs of serologic specimens tested in positive clusters revealed a recent DENV infection that occurred prior to cluster initiation, we attribute the observed DENV transmission subsequent to cluster investigation to recent DENV transmission activity. Of the 1,022 female adult Ae. aegypti collected, all eight (0.8% dengue-infected mosquitoes came from houses in positive clusters; none from control clusters or schools. Distinguishing features between

  20. Temporal fringe pattern analysis with parallel computing

    International Nuclear Information System (INIS)

    Tuck Wah Ng; Kar Tien Ang; Argentini, Gianluca

    2005-01-01

    Temporal fringe pattern analysis is invaluable in transient phenomena studies but necessitates long processing times. Here we describe a parallel computing strategy based on the single-program multiple-data model and hyperthreading processor technology to reduce the execution time. In a two-node cluster workstation configuration we found that execution periods were reduced by 1.6 times when four virtual processors were used. To allow even lower execution times with an increasing number of processors, the time allocated for data transfer, data read, and waiting should be minimized. Parallel computing is found here to present a feasible approach to reduce execution times in temporal fringe pattern analysis

  1. Sirenomelia in Argentina: Prevalence, geographic clusters and temporal trends analysis.

    Science.gov (United States)

    Groisman, Boris; Liascovich, Rosa; Gili, Juan Antonio; Barbero, Pablo; Bidondo, María Paz

    2016-07-01

    Sirenomelia is a severe malformation of the lower body characterized by a single medial lower limb and a variable combination of visceral abnormalities. Given that Sirenomelia is a very rare birth defect, epidemiological studies are scarce. The aim of this study is to evaluate prevalence, geographic clusters and time trends of sirenomelia in Argentina, using data from the National Network of Congenital Anomalies of Argentina (RENAC) from November 2009 until December 2014. This is a descriptive study using data from the RENAC, a hospital-based surveillance system for newborns affected with major morphological congenital anomalies. We calculated sirenomelia prevalence throughout the period, searched for geographical clusters, and evaluated time trends. The prevalence of confirmed cases of sirenomelia throughout the period was 2.35 per 100,000 births. Cluster analysis showed no statistically significant geographical aggregates. Time-trends analysis showed that the prevalence was higher in years 2009 to 2010. The observed prevalence was higher than the observed in previous epidemiological studies in other geographic regions. We observed a likely real increase in the initial period of our study. We used strict diagnostic criteria, excluding cases that only had clinical diagnosis of sirenomelia. Therefore, real prevalence could be even higher. This study did not show any geographic clusters. Because etiology of sirenomelia has not yet been established, studies of epidemiological features of this defect may contribute to define its causes. Birth Defects Research (Part A) 106:604-611, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  2. The use of the temporal scan statistic to detect methicillin-resistant Staphylococcus aureus clusters in a community hospital.

    Science.gov (United States)

    Faires, Meredith C; Pearl, David L; Ciccotelli, William A; Berke, Olaf; Reid-Smith, Richard J; Weese, J Scott

    2014-07-08

    In healthcare facilities, conventional surveillance techniques using rule-based guidelines may result in under- or over-reporting of methicillin-resistant Staphylococcus aureus (MRSA) outbreaks, as these guidelines are generally unvalidated. The objectives of this study were to investigate the utility of the temporal scan statistic for detecting MRSA clusters, validate clusters using molecular techniques and hospital records, and determine significant differences in the rate of MRSA cases using regression models. Patients admitted to a community hospital between August 2006 and February 2011, and identified with MRSA>48 hours following hospital admission, were included in this study. Between March 2010 and February 2011, MRSA specimens were obtained for spa typing. MRSA clusters were investigated using a retrospective temporal scan statistic. Tests were conducted on a monthly scale and significant clusters were compared to MRSA outbreaks identified by hospital personnel. Associations between the rate of MRSA cases and the variables year, month, and season were investigated using a negative binomial regression model. During the study period, 735 MRSA cases were identified and 167 MRSA isolates were spa typed. Nine different spa types were identified with spa type 2/t002 (88.6%) the most prevalent. The temporal scan statistic identified significant MRSA clusters at the hospital (n=2), service (n=16), and ward (n=10) levels (P ≤ 0.05). Seven clusters were concordant with nine MRSA outbreaks identified by hospital staff. For the remaining clusters, seven events may have been equivalent to true outbreaks and six clusters demonstrated possible transmission events. The regression analysis indicated years 2009-2011, compared to 2006, and months March and April, compared to January, were associated with an increase in the rate of MRSA cases (P ≤ 0.05). The application of the temporal scan statistic identified several MRSA clusters that were not detected by hospital

  3. Temporal Clustering and Sequencing in Short-Term Memory and Episodic Memory

    Science.gov (United States)

    Farrell, Simon

    2012-01-01

    A model of short-term memory and episodic memory is presented, with the core assumptions that (a) people parse their continuous experience into episodic clusters and (b) items are clustered together in memory as episodes by binding information within an episode to a common temporal context. Along with the additional assumption that information…

  4. Groundwater Quality: Analysis of Its Temporal and Spatial Variability in a Karst Aquifer.

    Science.gov (United States)

    Pacheco Castro, Roger; Pacheco Ávila, Julia; Ye, Ming; Cabrera Sansores, Armando

    2018-01-01

    This study develops an approach based on hierarchical cluster analysis for investigating the spatial and temporal variation of water quality governing processes. The water quality data used in this study were collected in the karst aquifer of Yucatan, Mexico, the only source of drinking water for a population of nearly two million people. Hierarchical cluster analysis was applied to the quality data of all the sampling periods lumped together. This was motivated by the observation that, if water quality does not vary significantly in time, two samples from the same sampling site will belong to the same cluster. The resulting distribution maps of clusters and box-plots of the major chemical components reveal the spatial and temporal variability of groundwater quality. Principal component analysis was used to verify the results of cluster analysis and to derive the variables that explained most of the variation of the groundwater quality data. Results of this work increase the knowledge about how precipitation and human contamination impact groundwater quality in Yucatan. Spatial variability of groundwater quality in the study area is caused by: a) seawater intrusion and groundwater rich in sulfates at the west and in the coast, b) water rock interactions and the average annual precipitation at the middle and east zones respectively, and c) human contamination present in two localized zones. Changes in the amount and distribution of precipitation cause temporal variation by diluting groundwater in the aquifer. This approach allows to analyze the variation of groundwater quality controlling processes efficiently and simultaneously. © 2017, National Ground Water Association.

  5. Spontaneous brain network activity: Analysis of its temporal complexity

    Directory of Open Access Journals (Sweden)

    Mangor Pedersen

    2017-06-01

    Full Text Available The brain operates in a complex way. The temporal complexity underlying macroscopic and spontaneous brain network activity is still to be understood. In this study, we explored the brain’s complexity by combining functional connectivity, graph theory, and entropy analyses in 25 healthy people using task-free functional magnetic resonance imaging. We calculated the pairwise instantaneous phase synchrony between 8,192 brain nodes for a total of 200 time points. This resulted in graphs for which time series of clustering coefficients (the “cliquiness” of a node and participation coefficients (the between-module connectivity of a node were estimated. For these two network metrics, sample entropy was calculated. The procedure produced a number of results: (1 Entropy is higher for the participation coefficient than for the clustering coefficient. (2 The average clustering coefficient is negatively related to its associated entropy, whereas the average participation coefficient is positively related to its associated entropy. (3 The level of entropy is network-specific to the participation coefficient, but not to the clustering coefficient. High entropy for the participation coefficient was observed in the default-mode, visual, and motor networks. These results were further validated using an independent replication dataset. Our work confirms that brain networks are temporally complex. Entropy is a good candidate metric to explore temporal network alterations in diseases with paroxysmal brain disruptions, including schizophrenia and epilepsy. In recent years, connectomics has provided significant insights into the topological complexity of brain networks. However, the temporal complexity of brain networks still remains somewhat poorly understood. In this study we used entropy analysis to demonstrate that the properties of network segregation (the clustering coefficient and integration (the participation coefficient are temporally complex

  6. Spatial-temporal cluster analysis of mortality from road traffic injuries using geographic information systems in West of Iran during 2009-2014.

    Science.gov (United States)

    Zangeneh, Alireza; Najafi, Farid; Karimi, Saeed; Saeidi, Shahram; Izadi, Neda

    2018-04-01

    Road traffic injuries (RTIs) are considered as one of the most important health problems endangering people's life. The examination of the geographical distribution of RTIs could help policymakers in better planning to reduce RTIs. This study, therefore, aimed to determine the spatial-temporal clustering of mortality from RTIs in West of Iran. Deaths from RTIs, registered in Forensic Medicine Organization of Kermanshah province over a period of six years (2009-2014), were used. Using negative binomial regression, the mortality trend was investigated. In order to investigate the spatial distribution of RTIs, we used ArcGIS. (Version 10.3). The median age of the 3231 people died in RTIs was 37 (IQR = 31) year, 78.4% were male. The 6-year average mortality rate from RTIs was 27.8/100,000 deaths, and the average rate had a declining trend. The dispersion of RTIs showed that most deaths occurred in Kermanshah, Islamabad, Bisotun, and Harsin road axes, respectively. The mean center of all deaths from RTIs occurred in Kermanshah province, the central area of Kermanshah district. The spatial trend of such deaths has moved to the northeast-southwest, and such deaths were geographically centralized. Results of Moran's I with respect to cluster analysis also indicated positive spatial autocorrelations. The results showed that the mortality rate from RTIs, despite the decline in recent years, is still high when compared with other countries. The clustering of accidents raises the concern that road infrastructure in certain locations may also be a factor. Regarding the results related to the temporal analysis, it is suggested that the enforcement of traffic rules be stricter at rush hours. Copyright © 2018 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  7. Clustering Vehicle Temporal and Spatial Travel Behavior Using License Plate Recognition Data

    OpenAIRE

    Huiyu Chen; Chao Yang; Xiangdong Xu

    2017-01-01

    Understanding travel patterns of vehicle can support the planning and design of better services. In addition, vehicle clustering can improve management efficiency through more targeted access to groups of interest and facilitate planning by more specific survey design. This paper clustered 854,712 vehicles in a week using K-means clustering algorithm based on license plate recognition (LPR) data obtained in Shenzhen, China. Firstly, several travel characteristics related to temporal and spati...

  8. Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Sreepathi, Sarat [ORNL; Kumar, Jitendra [ORNL; Mills, Richard T. [Argonne National Laboratory; Hoffman, Forrest M. [ORNL; Sripathi, Vamsi [Intel Corporation; Hargrove, William Walter [United States Department of Agriculture (USDA), United States Forest Service (USFS)

    2017-09-01

    A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like the Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.

  9. CLUSTERING ANALYSIS OF OFFICER'S BEHAVIOURS IN LONDON POLICE FOOT PATROL ACTIVITIES

    Directory of Open Access Journals (Sweden)

    J. Shen

    2015-07-01

    Full Text Available In this small paper we aim at presenting a framework of conceptual representation and clustering analysis of police officers’ patrol pattern obtained from mining their raw movement trajectory data. This have been achieved by a model developed to accounts for the spatio-temporal dynamics human movements by incorporating both the behaviour features of the travellers and the semantic meaning of the environment they are moving in. Hence, the similarity metric of traveller behaviours is jointly defined according to the stay time allocation in each Spatio-temporal region of interests (ST-ROI to support clustering analysis of patrol behaviours. The proposed framework enables the analysis of behaviour and preferences on higher level based on raw moment trajectories. The model is firstly applied to police patrol data provided by the Metropolitan Police and will be tested by other type of dataset afterwards.

  10. Multipoint analysis of the spatio-temporal coherence of dayside O+ outflows with Cluster

    Directory of Open Access Journals (Sweden)

    P. Puhl-Quinn

    2004-07-01

    Full Text Available The spatial distribution of ionospheric ion outflow from the dayside cusp/cleft has previously been studied in great detail with numerous satellite missions, but only statistically. Between July and November 2001, the orbit configuration of the Cluster multi-satellite system close to its perigee (4 Earth radii allows for delay times between spacecraft of about 4 and 35min in crossing the cusp/cleft. This enables for the first time to assess the spatial and temporal coherence of O+ ion outflow on time scales of the order of the satellite time lag. After presenting two contrasting events in detail, O+ velocities and outflow intensities from three spacecraft, available on 18 events, all with a similar orbit, have been cross-correlated to quantify the degree of coherence in the outflow. The main result from the analysis is that, although dayside outflows are a permanent feature, steady-state conditions are surprisingly never achieved. In particular, a significant variability is found for convection drift and local outflow intensities on small time scales. This variability of local intensities is not found to depend on the total strenghth of the outflow, which is much more stable and increases with the dynamic solar wind pressure.

  11. Clustering of Multi-Temporal Fully Polarimetric L-Band SAR Data for Agricultural Land Cover Mapping

    Science.gov (United States)

    Tamiminia, H.; Homayouni, S.; Safari, A.

    2015-12-01

    Recently, the unique capabilities of Polarimetric Synthetic Aperture Radar (PolSAR) sensors make them an important and efficient tool for natural resources and environmental applications, such as land cover and crop classification. The aim of this paper is to classify multi-temporal full polarimetric SAR data using kernel-based fuzzy C-means clustering method, over an agricultural region. This method starts with transforming input data into the higher dimensional space using kernel functions and then clustering them in the feature space. Feature space, due to its inherent properties, has the ability to take in account the nonlinear and complex nature of polarimetric data. Several SAR polarimetric features extracted using target decomposition algorithms. Features from Cloude-Pottier, Freeman-Durden and Yamaguchi algorithms used as inputs for the clustering. This method was applied to multi-temporal UAVSAR L-band images acquired over an agricultural area near Winnipeg, Canada, during June and July in 2012. The results demonstrate the efficiency of this approach with respect to the classical methods. In addition, using multi-temporal data in the clustering process helped to investigate the phenological cycle of plants and significantly improved the performance of agricultural land cover mapping.

  12. Temporal clustering of floods in Germany: Do flood-rich and flood-poor periods exist?

    Science.gov (United States)

    Merz, Bruno; Nguyen, Viet Dung; Vorogushyn, Sergiy

    2016-10-01

    The repeated occurrence of exceptional floods within a few years, such as the Rhine floods in 1993 and 1995 and the Elbe and Danube floods in 2002 and 2013, suggests that floods in Central Europe may be organized in flood-rich and flood-poor periods. This hypothesis is studied by testing the significance of temporal clustering in flood occurrence (peak-over-threshold) time series for 68 catchments across Germany for the period 1932-2005. To assess the robustness of the results, different methods are used: Firstly, the index of dispersion, which quantifies the departure from a homogeneous Poisson process, is investigated. Further, the time-variation of the flood occurrence rate is derived by non-parametric kernel implementation and the significance of clustering is evaluated via parametric and non-parametric tests. Although the methods give consistent overall results, the specific results differ considerably. Hence, we recommend applying different methods when investigating flood clustering. For flood estimation and risk management, it is of relevance to understand whether clustering changes with flood severity and time scale. To this end, clustering is assessed for different thresholds and time scales. It is found that the majority of catchments show temporal clustering at the 5% significance level for low thresholds and time scales of one to a few years. However, clustering decreases substantially with increasing threshold and time scale. We hypothesize that flood clustering in Germany is mainly caused by catchment memory effects along with intra- to inter-annual climate variability, and that decadal climate variability plays a minor role.

  13. Cluster analysis for applications

    CERN Document Server

    Anderberg, Michael R

    1973-01-01

    Cluster Analysis for Applications deals with methods and various applications of cluster analysis. Topics covered range from variables and scales to measures of association among variables and among data units. Conceptual problems in cluster analysis are discussed, along with hierarchical and non-hierarchical clustering methods. The necessary elements of data analysis, statistics, cluster analysis, and computer implementation are integrated vertically to cover the complete path from raw data to a finished analysis.Comprised of 10 chapters, this book begins with an introduction to the subject o

  14. Self-organization of spatio-temporal earthquake clusters

    Directory of Open Access Journals (Sweden)

    S. Hainzl

    2000-01-01

    Full Text Available Cellular automaton versions of the Burridge-Knopoff model have been shown to reproduce the power law distribution of event sizes; that is, the Gutenberg-Richter law. However, they have failed to reproduce the occurrence of foreshock and aftershock sequences correlated with large earthquakes. We show that in the case of partial stress recovery due to transient creep occurring subsequently to earthquakes in the crust, such spring-block systems self-organize into a statistically stationary state characterized by a power law distribution of fracture sizes as well as by foreshocks and aftershocks accompanying large events. In particular, the increase of foreshock and the decrease of aftershock activity can be described by, aside from a prefactor, the same Omori law. The exponent of the Omori law depends on the relaxation time and on the spatial scale of transient creep. Further investigations concerning the number of aftershocks, the temporal variation of aftershock magnitudes, and the waiting time distribution support the conclusion that this model, even "more realistic" physics in missed, captures in some ways the origin of the size distribution as well as spatio-temporal clustering of earthquakes.

  15. Spatio-temporal analysis of smear-positive tuberculosis in the Sidama Zone, southern Ethiopia.

    Directory of Open Access Journals (Sweden)

    Mesay Hailu Dangisso

    Full Text Available Tuberculosis (TB is a disease of public health concern, with a varying distribution across settings depending on socio-economic status, HIV burden, availability and performance of the health system. Ethiopia is a country with a high burden of TB, with regional variations in TB case notification rates (CNRs. However, TB program reports are often compiled and reported at higher administrative units that do not show the burden at lower units, so there is limited information about the spatial distribution of the disease. We therefore aim to assess the spatial distribution and presence of the spatio-temporal clustering of the disease in different geographic settings over 10 years in the Sidama Zone in southern Ethiopia.A retrospective space-time and spatial analysis were carried out at the kebele level (the lowest administrative unit within a district to identify spatial and space-time clusters of smear-positive pulmonary TB (PTB. Scan statistics, Global Moran's I, and Getis and Ordi (Gi* statistics were all used to help analyze the spatial distribution and clusters of the disease across settings.A total of 22,545 smear-positive PTB cases notified over 10 years were used for spatial analysis. In a purely spatial analysis, we identified the most likely cluster of smear-positive PTB in 192 kebeles in eight districts (RR= 2, p<0.001, with 12,155 observed and 8,668 expected cases. The Gi* statistic also identified the clusters in the same areas, and the spatial clusters showed stability in most areas in each year during the study period. The space-time analysis also detected the most likely cluster in 193 kebeles in the same eight districts (RR= 1.92, p<0.001, with 7,584 observed and 4,738 expected cases in 2003-2012.The study found variations in CNRs and significant spatio-temporal clusters of smear-positive PTB in the Sidama Zone. The findings can be used to guide TB control programs to devise effective TB control strategies for the geographic areas

  16. Clustering analysis

    International Nuclear Information System (INIS)

    Romli

    1997-01-01

    Cluster analysis is the name of group of multivariate techniques whose principal purpose is to distinguish similar entities from the characteristics they process.To study this analysis, there are several algorithms that can be used. Therefore, this topic focuses to discuss the algorithms, such as, similarity measures, and hierarchical clustering which includes single linkage, complete linkage and average linkage method. also, non-hierarchical clustering method, which is popular name K -mean method ' will be discussed. Finally, this paper will be described the advantages and disadvantages of every methods

  17. Cluster analysis

    CERN Document Server

    Everitt, Brian S; Leese, Morven; Stahl, Daniel

    2011-01-01

    Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.Real life examples are used throughout to demons

  18. Characterizing the Spatio-Temporal Pattern of Land Surface Temperature through Time Series Clustering: Based on the Latent Pattern and Morphology

    Directory of Open Access Journals (Sweden)

    Huimin Liu

    2018-04-01

    Full Text Available Land Surface Temperature (LST is a critical component to understand the impact of urbanization on the urban thermal environment. Previous studies were inclined to apply only one snapshot to analyze the pattern and dynamics of LST without considering the non-stationarity in the temporal domain, or focus on the diurnal, seasonal, and annual pattern analysis of LST which has limited support for the understanding of how LST varies with the advancing of urbanization. This paper presents a workflow to extract the spatio-temporal pattern of LST through time series clustering by focusing on the LST of Wuhan, China, from 2002 to 2017 with a 3-year time interval with 8-day MODerate-resolution Imaging Spectroradiometer (MODIS satellite image products. The Latent pattern of LST (LLST generated by non-parametric Multi-Task Gaussian Process Modeling (MTGP and the Multi-Scale Shape Index (MSSI which characterizes the morphology of LLST are coupled for pattern recognition. Specifically, spatio-temporal patterns are discovered after the extraction of spatial patterns conducted by the incorporation of k -means and the Back-Propagation neural networks (BP-Net. The spatial patterns of the 6 years form a basic understanding about the corresponding temporal variances. For spatio-temporal pattern recognition, LLSTs and MSSIs of the 6 years are regarded as geo-referenced time series. Multiple algorithms including traditional k -means with Euclidean Distance (ED, shape-based k -means with the constrained Dynamic Time Warping ( c DTW distance measure, and the Dynamic Time Warping Barycenter Averaging (DBA centroid computation method ( k - c DBA and k -shape are applied. Ten external indexes are employed to evaluate the performance of the three algorithms and reveal k - c DBA as the optimal time series clustering algorithm for our study. The study area is divided into 17 geographical time series clusters which respectively illustrate heterogeneous temporal dynamics of LST

  19. Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

    Energy Technology Data Exchange (ETDEWEB)

    Data Analysis and Visualization (IDAV) and the Department of Computer Science, University of California, Davis, One Shields Avenue, Davis CA 95616, USA,; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,' ' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA; Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA; Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA,; Computer Science Division,University of California, Berkeley, CA, USA,; Computer Science Department, University of California, Irvine, CA, USA,; All authors are with the Berkeley Drosophila Transcription Network Project, Lawrence Berkeley National Laboratory,; Rubel, Oliver; Weber, Gunther H.; Huang, Min-Yu; Bethel, E. Wes; Biggin, Mark D.; Fowlkes, Charless C.; Hendriks, Cris L. Luengo; Keranen, Soile V. E.; Eisen, Michael B.; Knowles, David W.; Malik, Jitendra; Hagen, Hans; Hamann, Bernd

    2008-05-12

    The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii) evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.

  20. Differential Spatio-temporal Multiband Satellite Image Clustering using K-means Optimization With Reinforcement Programming

    Directory of Open Access Journals (Sweden)

    Irene Erlyn Wina Rachmawan

    2015-06-01

    Full Text Available Deforestration is one of the crucial issues in Indonesia because now Indonesia has world's highest deforestation rate. In other hand, multispectral image delivers a great source of data for studying spatial and temporal changeability of the environmental such as deforestration area. This research present differential image processing methods for detecting nature change of deforestration. Our differential image processing algorithms extract and indicating area automatically. The feature of our proposed idea produce extracted information from multiband satellite image and calculate the area of deforestration by years with calculating data using temporal dataset. Yet, multiband satellite image consists of big data size that were difficult to be handled for segmentation. Commonly, K- Means clustering is considered to be a powerfull clustering algorithm because of its ability to clustering big data. However K-Means has sensitivity of its first generated centroids, which could lead into a bad performance. In this paper we propose a new approach to optimize K-Means clustering using Reinforcement Programming in order to clustering multispectral image. We build a new mechanism for generating initial centroids by implementing exploration and exploitation knowledge from Reinforcement Programming. This optimization will lead a better result for K-means data cluster. We select multispectral image from Landsat 7 in past ten years in Medawai, Borneo, Indonesia, and apply two segmentation areas consist of deforestration land and forest field. We made series of experiments and compared the experimental results of K-means using Reinforcement Programming as optimizing initiate centroid and normal K-means without optimization process. Keywords: Deforestration, Multispectral images, landsat, automatic clustering, K-means.

  1. Hanseniaspora uvarum from winemaking environments show spatial and temporal genetic clustering

    Directory of Open Access Journals (Sweden)

    Warren eAlbertin

    2016-01-01

    Full Text Available Hanseniaspora uvarum is one of the most abundant yeast species found on grapes and in grape must, at least before the onset of alcoholic fermentation which is usually performed by Saccharomyces species. The aim of this study was to characterise the genetic and phenotypic variability within the H. uvarum species. One hundred and fifteen strains isolated from winemaking environments in different geographical origins were analysed using 11 microsatellite markers and a subset of 47 strains were analysed by AFLP. H. uvarum isolates clustered mainly on the basis of their geographical localisation as revealed by microsatellites. In addition, a strong clustering based on year of isolation was evidenced, indicating that the genetic diversity of Hanseniaspora uvarum isolates was related to both spatial and temporal variations. Conversely, clustering analysis based on AFLP data provided a different picture with groups showing no particular characteristics, but provided higher strain discrimination. This result indicated that AFLP approaches are inadequate to establish the genetic relationship between individuals, but allowed good strain discrimination. At the phenotypic level, several extracellular enzymatic activities of enological relevance (pectinase, chitinase, protease, β-glucosidase were measured but showed low diversity. The impact of environmental factors of enological interest (temperature, anaerobia and copper addition on growth was also assessed and showed poor variation. Altogether, this work provided both new analytical tool (microsatellites and new insights into the genetic and phenotypic diversity of H. uvarum, a yeast species that has previously been identified as a potential candidate for co-inoculation in grape must, but whose intraspecific variability had never been fully assessed.

  2. Somatosensory nociceptive characteristics differentiate subgroups in people with chronic low back pain: a cluster analysis.

    Science.gov (United States)

    Rabey, Martin; Slater, Helen; OʼSullivan, Peter; Beales, Darren; Smith, Anne

    2015-10-01

    The objectives of this study were to explore the existence of subgroups in a cohort with chronic low back pain (n = 294) based on the results of multimodal sensory testing and profile subgroups on demographic, psychological, lifestyle, and general health factors. Bedside (2-point discrimination, brush, vibration and pinprick perception, temporal summation on repeated monofilament stimulation) and laboratory (mechanical detection threshold, pressure, heat and cold pain thresholds, conditioned pain modulation) sensory testing were examined at wrist and lumbar sites. Data were entered into principal component analysis, and 5 component scores were entered into latent class analysis. Three clusters, with different sensory characteristics, were derived. Cluster 1 (31.9%) was characterised by average to high temperature and pressure pain sensitivity. Cluster 2 (52.0%) was characterised by average to high pressure pain sensitivity. Cluster 3 (16.0%) was characterised by low temperature and pressure pain sensitivity. Temporal summation occurred significantly more frequently in cluster 1. Subgroups were profiled on pain intensity, disability, depression, anxiety, stress, life events, fear avoidance, catastrophizing, perception of the low back region, comorbidities, body mass index, multiple pain sites, sleep, and activity levels. Clusters 1 and 2 had a significantly greater proportion of female participants and higher depression and sleep disturbance scores than cluster 3. The proportion of participants undertaking Low back pain, therefore, does not appear to be homogeneous. Pain mechanisms relating to presentations of each subgroup were postulated. Future research may investigate prognoses and interventions tailored towards these subgroups.

  3. The AIDS epidemic and economic input impact factors in Chongqing, China, from 2006 to 2012: a spatial-temporal analysis.

    Science.gov (United States)

    Zhang, Yanqi; Xiao, Qin; Zhou, Liang; Ma, Dihui; Liu, Ling; Lu, Rongrong; Yi, Dali; Yi, Dong

    2015-03-27

    To analyse the spatial-temporal clustering of the HIV/AIDS epidemic in Chongqing and to explore its association with the economic indices of AIDS prevention and treatment. Data on the HIV/AIDS epidemic and economic indices of AIDS prevention and treatment were obtained from the annual reports of the Chongqing Municipal Center for Disease Control for 2006-2012. Spatial clustering analysis, temporal-spatial clustering analysis, and spatial regression were used to conduct statistical analysis. The annual average new HIV infection rate, incidence rate for new AIDS cases, and rate of people living with HIV in Chongqing were 5.97, 2.42 and 28.12 per 100,000, respectively, for 2006-2012. The HIV/AIDS epidemic showed a non-random spatial distribution (Moran's I≥0.310; p<0.05). The epidemic hotspots were distributed in the 15 mid-western counties. The most likely clusters were primarily located in the central region and southwest of Chongqing and occurred in 2010-2012. The regression coefficients of the total amount of special funds allocated to AIDS and to the public awareness unit for the numbers of new HIV cases, new AIDS cases, and people living with HIV were 0.775, 0.976 and 0.816, and -0.188, -0.259 and -0.215 (p<0.002), respectively. The Chongqing HIV/AIDS epidemic showed temporal-spatial clustering and was mainly clustered in the mid-western and south-western counties, showing an upward trend over time. The amount of special funds dedicated to AIDS and to the public awareness unit showed positive and negative relationships with HIV/AIDS spatial clustering, respectively. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  4. Marketing research cluster analysis

    OpenAIRE

    Marić Nebojša

    2002-01-01

    One area of applications of cluster analysis in marketing is identification of groups of cities and towns with similar demographic profiles. This paper considers main aspects of cluster analysis by an example of clustering 12 cities with the use of Minitab software.

  5. Down's syndrome clusters in Germany in close temporal relationship to the Chernobyl accident

    International Nuclear Information System (INIS)

    Grosche, B.; Schoetzau, A.; Burkart, W.

    1997-01-01

    In two independent studies using different approaches and covering West Berlin and Bavaria, respectively, highly significant temporal clusters of Down's syndrome were found. Both sharp increases occurred in areas receiving relatively low Chernobyl fallout and concomitant radiation exposures. Only for the Berlin cluster was fallout present at the time of the affected meioses, whereas the Nuremberg cluster preceded the radioactive contamination for one month. Hypotheses on possible causal relationships are compared. Radiation from the Chernobyl accident is an unlikely factor, also, because the associated cumulative dose was so low in comparison with natural background. Given the lack of understanding of what causes Down's syndrome, other than factors associated with increased maternal age, additional research into environmental and infectious risk factors is warranted. (author)

  6. Spatial and temporal characteristics of poloidal waves in the terrestrial plasmasphere: a CLUSTER case study

    Directory of Open Access Journals (Sweden)

    S. Schäfer

    2007-05-01

    Full Text Available Oscillating magnetic field lines are frequently observed by spacecraft in the terrestrial and other planetary magnetospheres. The CLUSTER mission is a very suitable tool to further study these Alfvén waves as the four CLUSTER spacecraft provide for an opportunity to separate spatial and temporal structures in the terrestrial magnetosphere. Using a large scaled configuration formed by the four spacecraft we are able to detect a poloidal Ultra-Low-Frequency (ULF pulsation of the magnetic and electric field in order to analyze its temporal and spatial structures. For this purpose the measurements are transformed into a specific field line related coordinate system to investigate their specific amplitude pattern depending on the path of the CLUSTER spacecraft across oscillating field lines. These measurements are then compared with modeled spacecraft observations across a localized poloidal wave resonator in the dayside plasmasphere. A detailed investigation of theoretically expected poloidal eigenfrequencies allows us to specify the observed 16 mHz pulsation as a third harmonic oscillation. Based on this we perform a case study providing a clear identification of wave properties such as an spatial scale structure of about 0.67 RE, the azimuthal wave number m≈30, temporal evolution, and energy transport in the detected ULF pulsations.

  7. Marketing research cluster analysis

    Directory of Open Access Journals (Sweden)

    Marić Nebojša

    2002-01-01

    Full Text Available One area of applications of cluster analysis in marketing is identification of groups of cities and towns with similar demographic profiles. This paper considers main aspects of cluster analysis by an example of clustering 12 cities with the use of Minitab software.

  8. Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans

    Directory of Open Access Journals (Sweden)

    Michelle Carey

    2016-10-01

    Full Text Available Many pragmatic clustering methods have been developed to group data vectors or objects into clusters so that the objects in one cluster are very similar and objects in different clusters are distinct based on some similarity measure. The availability of time course data has motivated researchers to develop methods, such as mixture and mixed-effects modelling approaches, that incorporate the temporal information contained in the shape of the trajectory of the data. However, there is still a need for the development of time-course clustering methods that can adequately deal with inhomogeneous clusters (some clusters are quite large and others are quite small. Here we propose two such methods, hierarchical clustering (IHC and iterative pairwise-correlation clustering (IPC. We evaluate and compare the proposed methods to the Markov Cluster Algorithm (MCL and the generalised mixed-effects model (GMM using simulation studies and an application to a time course gene expression data set from a study containing human subjects who were challenged by a live influenza virus. We identify four types of temporal gene response modules to influenza infection in humans, i.e., single-gene modules (SGM, small-size modules (SSM, medium-size modules (MSM and large-size modules (LSM. The LSM contain genes that perform various fundamental biological functions that are consistent across subjects. The SSM and SGM contain genes that perform either different or similar biological functions that have complex temporal responses to the virus and are unique to each subject. We show that the temporal response of the genes in the LSM have either simple patterns with a single peak or trough a consequence of the transient stimuli sustained or state-transitioning patterns pertaining to developmental cues and that these modules can differentiate the severity of disease outcomes. Additionally, the size of gene response modules follows a power-law distribution with a consistent

  9. Comprehensive cluster analysis with Transitivity Clustering.

    Science.gov (United States)

    Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan

    2011-03-01

    Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

  10. Assymetry of temporal artery diameters during spontaneous attacks of cluster headache

    DEFF Research Database (Denmark)

    Nielsen, Thue H; Tfelt-Hansen, Peer; Iversen, Helle K

    2009-01-01

    BACKGROUND: Cluster headache is characterized by strictly unilateral head pain associated with symptoms of cranial autonomic features. Transcranial Doppler studies showed in most studies a bilateral decreased blood flow velocity in the middle cerebral artery. OBJECTIVE: To investigate whether the...... = .67). CONCLUSIONS: What was observed is most likely a general pain-induced arterial vasoconstriction (confer the decrease in diameter on the pain-free side) with an unchanged superficial temporal artery on the pain side because of some vasodilator influence....

  11. [Cluster analysis in biomedical researches].

    Science.gov (United States)

    Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

    2013-01-01

    Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.

  12. Time series analysis of temporal networks

    Science.gov (United States)

    Sikdar, Sandipan; Ganguly, Niloy; Mukherjee, Animesh

    2016-01-01

    A common but an important feature of all real-world networks is that they are temporal in nature, i.e., the network structure changes over time. Due to this dynamic nature, it becomes difficult to propose suitable growth models that can explain the various important characteristic properties of these networks. In fact, in many application oriented studies only knowing these properties is sufficient. For instance, if one wishes to launch a targeted attack on a network, this can be done even without the knowledge of the full network structure; rather an estimate of some of the properties is sufficient enough to launch the attack. We, in this paper show that even if the network structure at a future time point is not available one can still manage to estimate its properties. We propose a novel method to map a temporal network to a set of time series instances, analyze them and using a standard forecast model of time series, try to predict the properties of a temporal network at a later time instance. To our aim, we consider eight properties such as number of active nodes, average degree, clustering coefficient etc. and apply our prediction framework on them. We mainly focus on the temporal network of human face-to-face contacts and observe that it represents a stochastic process with memory that can be modeled as Auto-Regressive-Integrated-Moving-Average (ARIMA). We use cross validation techniques to find the percentage accuracy of our predictions. An important observation is that the frequency domain properties of the time series obtained from spectrogram analysis could be used to refine the prediction framework by identifying beforehand the cases where the error in prediction is likely to be high. This leads to an improvement of 7.96% (for error level ≤20%) in prediction accuracy on an average across all datasets. As an application we show how such prediction scheme can be used to launch targeted attacks on temporal networks. Contribution to the Topical Issue

  13. Detection of Clostridium difficile infection clusters, using the temporal scan statistic, in a community hospital in southern Ontario, Canada, 2006-2011.

    Science.gov (United States)

    Faires, Meredith C; Pearl, David L; Ciccotelli, William A; Berke, Olaf; Reid-Smith, Richard J; Weese, J Scott

    2014-05-12

    In hospitals, Clostridium difficile infection (CDI) surveillance relies on unvalidated guidelines or threshold criteria to identify outbreaks. This can result in false-positive and -negative cluster alarms. The application of statistical methods to identify and understand CDI clusters may be a useful alternative or complement to standard surveillance techniques. The objectives of this study were to investigate the utility of the temporal scan statistic for detecting CDI clusters and determine if there are significant differences in the rate of CDI cases by month, season, and year in a community hospital. Bacteriology reports of patients identified with a CDI from August 2006 to February 2011 were collected. For patients detected with CDI from March 2010 to February 2011, stool specimens were obtained. Clostridium difficile isolates were characterized by ribotyping and investigated for the presence of toxin genes by PCR. CDI clusters were investigated using a retrospective temporal scan test statistic. Statistically significant clusters were compared to known CDI outbreaks within the hospital. A negative binomial regression model was used to identify associations between year, season, month and the rate of CDI cases. Overall, 86 CDI cases were identified. Eighteen specimens were analyzed and nine ribotypes were classified with ribotype 027 (n = 6) the most prevalent. The temporal scan statistic identified significant CDI clusters at the hospital (n = 5), service (n = 6), and ward (n = 4) levels (P ≤ 0.05). Three clusters were concordant with the one C. difficile outbreak identified by hospital personnel. Two clusters were identified as potential outbreaks. The negative binomial model indicated years 2007-2010 (P ≤ 0.05) had decreased CDI rates compared to 2006 and spring had an increased CDI rate compared to the fall (P = 0.023). Application of the temporal scan statistic identified several clusters, including potential outbreaks not detected by hospital

  14. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Årup; Frutiger, Sally A.

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing. Hum. Brain Mapping 15...

  15. Temporal features of word-initial /s/+stop clusters in bilingual Mandarin-English children and monolingual English children and adults.

    Science.gov (United States)

    Yang, Jing

    2018-03-01

    This study investigated the durational features of English word-initial /s/+stop clusters produced by bilingual Mandarin (L1)-English (L2) children and monolingual English children and adults. The participants included two groups of five- to six-year-old bilingual children: low proficiency in the L2 (Bi-low) and high proficiency in the L2 (Bi-high), one group of age-matched English children, and one group of English adults. Each participant produced a list of English words containing /sp, st, sk/ at the word-initial position followed by /a, i, u/, respectively. The absolute durations of the clusters and cluster elements and the durational proportions of elements to the overall cluster were measured. The results revealed that Bi-high children behaved similarly to the English monolinguals whereas Bi-low children used a different strategy of temporal organization to coordinate the cluster components in comparison to the English monolinguals and Bi-high children. The influence of language experience and continuing development of temporal features in children were discussed.

  16. [Spatial and temporal clustering characteristics of typhoid and paratyphoid fever and its change pattern in 3 provinces in southwestern China, 2001-2012].

    Science.gov (United States)

    Wang, L X; Yang, B; Yan, M Y; Tang, Y Q; Liu, Z C; Wang, R Q; Li, S; Ma, L; Kan, B

    2017-11-10

    Objective: To analyze the spatial and temporal clustering characteristics of typhoid and paratyphoid fever and its change pattern in Yunnan, Guizhou and Guangxi provinces in southwestern China in recent years. Methods: The incidence data of typhoid and paratyphoid fever cases at county level in 3 provinces during 2001-2012 were collected from China Information System for Diseases Control and Prevention and analyzed by the methods of descriptive epidemiology and geographic informatics. And the map showing the spatial and temporal clustering characters of typhoid and paratyphoid fever cases in three provinces was drawn. SaTScan statistics was used to identify the typhoid and paratyphoid fever clustering areas of three provinces in each year from 2001 to 2012. Results: During the study period, the reported cases of typhoid and paratyphoid fever declined with year. The reported incidence decreased from 30.15 per 100 000 in 2001 to 10.83 per 100 000 in 2006(annual incidence 21.12 per 100 000); while during 2007-2012, the incidence became stable, ranging from 4.75 per 100 000 to 6.83 per 100 000 (annual incidence 5.73 per 100 000). The seasonal variation of the incidence was consistent in three provinces, with majority of cases occurred in summer and autumn. The spatial and temporal clustering of typhoid and paratyphoid fever was demonstrated by the incidence map. Most high-incidence counties were located in a zonal area extending from Yuxi of Yunnan to Guiyang of Guizhou, but were concentrated in Guilin in Guangxi. Temporal and spatial scan statistics identified the positional shifting of class Ⅰ clustering area from Guizhou to Yunnan. Class Ⅰ clustering area was located around the central and western areas (Zunyi and Anshun) of Guizhou during 2001-2003, and moved to the central area of Yunnan during 2004-2012. Conclusion: Spatial and temporal clustering of typhoid and paratyphoid fever existed in the endemic areas of southwestern China, and the clustering area

  17. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations

    Science.gov (United States)

    2014-01-01

    Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations

  18. Integrative cluster analysis in bioinformatics

    CERN Document Server

    Abu-Jamous, Basel; Nandi, Asoke K

    2015-01-01

    Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review o

  19. Temporal and spatial analysis of psittacosis in association with poultry farming in the Netherlands, 2000-2015.

    Science.gov (United States)

    Hogerwerf, Lenny; Holstege, Manon M C; Benincà, Elisa; Dijkstra, Frederika; van der Hoek, Wim

    2017-07-26

    Human psittacosis is a highly under diagnosed zoonotic disease, commonly linked to psittacine birds. Psittacosis in birds, also known as avian chlamydiosis, is endemic in poultry, but the risk for people living close to poultry farms is unknown. Therefore, our study aimed to explore the temporal and spatial patterns of human psittacosis infections and identify possible associations with poultry farming in the Netherlands. We analysed data on 700 human cases of psittacosis notified between 01-01-2000 and 01-09-2015. First, we studied the temporal behaviour of psittacosis notifications by applying wavelet analysis. Then, to identify possible spatial patterns, we applied spatial cluster analysis. Finally, we investigated the possible spatial association between psittacosis notifications and data on the Dutch poultry sector at municipality level using a multivariable model. We found a large spatial cluster that covered a highly poultry-dense area but additional clusters were found in areas that had a low poultry density. There were marked geographical differences in the awareness of psittacosis and the amount and the type of laboratory diagnostics used for psittacosis, making it difficult to draw conclusions about the correlation between the large cluster and poultry density. The multivariable model showed that the presence of chicken processing plants and slaughter duck farms in a municipality was associated with a higher rate of human psittacosis notifications. The significance of the associations was influenced by the inclusion or exclusion of farm density in the model. Our temporal and spatial analyses showed weak associations between poultry-related variables and psittacosis notifications. Because of the low number of psittacosis notifications available for analysis, the power of our analysis was relative low. Because of the exploratory nature of this research, the associations found cannot be interpreted as evidence for airborne transmission of psittacosis from

  20. Earthquake Clusters and Spatio-temporal Migration of earthquakes in Northeastern Tibetan Plateau: a Finite Element Modeling

    Science.gov (United States)

    Sun, Y.; Luo, G.

    2017-12-01

    Seismicity in a region is usually characterized by earthquake clusters and earthquake migration along its major fault zones. However, we do not fully understand why and how earthquake clusters and spatio-temporal migration of earthquakes occur. The northeastern Tibetan Plateau is a good example for us to investigate these problems. In this study, we construct and use a three-dimensional viscoelastoplastic finite-element model to simulate earthquake cycles and spatio-temporal migration of earthquakes along major fault zones in northeastern Tibetan Plateau. We calculate stress evolution and fault interactions, and explore effects of topographic loading and viscosity of middle-lower crust and upper mantle on model results. Model results show that earthquakes and fault interactions increase Coulomb stress on the neighboring faults or segments, accelerating the future earthquakes in this region. Thus, earthquakes occur sequentially in a short time, leading to regional earthquake clusters. Through long-term evolution, stresses on some seismogenic faults, which are far apart, may almost simultaneously reach the critical state of fault failure, probably also leading to regional earthquake clusters and earthquake migration. Based on our model synthetic seismic catalog and paleoseismic data, we analyze probability of earthquake migration between major faults in northeastern Tibetan Plateau. We find that following the 1920 M 8.5 Haiyuan earthquake and the 1927 M 8.0 Gulang earthquake, the next big event (M≥7) in northeastern Tibetan Plateau would be most likely to occur on the Haiyuan fault.

  1. A two-stage approach to estimate spatial and spatio-temporal disease risks in the presence of local discontinuities and clusters.

    Science.gov (United States)

    Adin, A; Lee, D; Goicoa, T; Ugarte, María Dolores

    2018-01-01

    Disease risk maps for areal unit data are often estimated from Poisson mixed models with local spatial smoothing, for example by incorporating random effects with a conditional autoregressive prior distribution. However, one of the limitations is that local discontinuities in the spatial pattern are not usually modelled, leading to over-smoothing of the risk maps and a masking of clusters of hot/coldspot areas. In this paper, we propose a novel two-stage approach to estimate and map disease risk in the presence of such local discontinuities and clusters. We propose approaches in both spatial and spatio-temporal domains, where for the latter the clusters can either be fixed or allowed to vary over time. In the first stage, we apply an agglomerative hierarchical clustering algorithm to training data to provide sets of potential clusters, and in the second stage, a two-level spatial or spatio-temporal model is applied to each potential cluster configuration. The superiority of the proposed approach with regard to a previous proposal is shown by simulation, and the methodology is applied to two important public health problems in Spain, namely stomach cancer mortality across Spain and brain cancer incidence in the Navarre and Basque Country regions of Spain.

  2. Temporal Data-Driven Sleep Scheduling and Spatial Data-Driven Anomaly Detection for Clustered Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Gang Li

    2016-09-01

    Full Text Available The spatial–temporal correlation is an important feature of sensor data in wireless sensor networks (WSNs. Most of the existing works based on the spatial–temporal correlation can be divided into two parts: redundancy reduction and anomaly detection. These two parts are pursued separately in existing works. In this work, the combination of temporal data-driven sleep scheduling (TDSS and spatial data-driven anomaly detection is proposed, where TDSS can reduce data redundancy. The TDSS model is inspired by transmission control protocol (TCP congestion control. Based on long and linear cluster structure in the tunnel monitoring system, cooperative TDSS and spatial data-driven anomaly detection are then proposed. To realize synchronous acquisition in the same ring for analyzing the situation of every ring, TDSS is implemented in a cooperative way in the cluster. To keep the precision of sensor data, spatial data-driven anomaly detection based on the spatial correlation and Kriging method is realized to generate an anomaly indicator. The experiment results show that cooperative TDSS can realize non-uniform sensing effectively to reduce the energy consumption. In addition, spatial data-driven anomaly detection is quite significant for maintaining and improving the precision of sensor data.

  3. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis

    Directory of Open Access Journals (Sweden)

    Huanhuan Li

    2017-08-01

    Full Text Available The Shipboard Automatic Identification System (AIS is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW, a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our

  4. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis.

    Science.gov (United States)

    Li, Huanhuan; Liu, Jingxian; Liu, Ryan Wen; Xiong, Naixue; Wu, Kefeng; Kim, Tai-Hoon

    2017-08-04

    The Shipboard Automatic Identification System (AIS) is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW), a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA) is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our proposed method with

  5. CC_TRS: Continuous Clustering of Trajectory Stream Data Based on Micro Cluster Life

    Directory of Open Access Journals (Sweden)

    Musaab Riyadh

    2017-01-01

    Full Text Available The rapid spreading of positioning devices leads to the generation of massive spatiotemporal trajectories data. In some scenarios, spatiotemporal data are received in stream manner. Clustering of stream data is beneficial for different applications such as traffic management and weather forecasting. In this article, an algorithm for Continuous Clustering of Trajectory Stream Data Based on Micro Cluster Life is proposed. The algorithm consists of two phases. There is the online phase where temporal micro clusters are used to store summarized spatiotemporal information for each group of similar segments. The clustering task in online phase is based on temporal micro cluster lifetime instead of time window technique which divides stream data into time bins and clusters each bin separately. For offline phase, a density based clustering approach is used to generate macro clusters depending on temporal micro clusters. The evaluation of the proposed algorithm on real data sets shows the efficiency and the effectiveness of the proposed algorithm and proved it is efficient alternative to time window technique.

  6. Cluster analysis of residential heat load profiles and the role of technical and household characteristics

    DEFF Research Database (Denmark)

    Carmo, Carolina; Christensen, Toke Haunstrup

    2016-01-01

    of the temporality of the energy demand is needed. This paper contributes to this by focusing on the daily load profiles of energy demand for heating of Danish dwellings with heat pumps. Based on hourly recordings from 139 dwellings and employing cluster and regression analysis, the paper explores patterns...... (typologies) in daily heating load profiles and how these relate to socio-economic and technical characteristics of the included households. The study shows that the load profiles vary according to the external load conditions. Two main clusters were identified for both weekdays and weekends and across load...

  7. A spatio-temporal analysis of suicide in El Salvador.

    Science.gov (United States)

    Carcach, Carlos

    2017-04-20

    In 2012, international statistics showed El Salvador's suicide rate as 40th in the world and the highest in Latin America. Over the last 15 years, national statistics show the suicide death rate declining as opposed to an increasing rate of homicide. Though completed suicide is an important social and health issue, little is known about its prevalence, incidence, etiology and spatio-temporal behavior. The primary objective of this study was to examine completed suicide and homicide using the stream analogy to lethal violence within a spatio-temporal framework. A Bayesian model was applied to examine the spatio-temporal evolution of the tendency of completed suicide over homicide in El Salvador. Data on numbers of suicides and homicides at the municipal level were obtained from the Instituto de Medicina Legal (IML) and population counts, from the Dirección General de Estadística y Censos (DIGESTYC), for the period of 2002 to 2012. Data on migration were derived from the 2007 Population Census, and inequality data were obtained from a study by Damianović, Valenzuela and Vera. The data reveal a stable standardized rate of total lethal violence (completed suicide plus homicide) across municipalities over time; a decline in suicide; and a standardized suicide rate decreasing with income inequality but increasing with social isolation. Municipalities clustered in terms of both total lethal violence and suicide standardized rates. Spatial effects for suicide were stronger among municipalities located in the north-east and center-south sides of the country. New clusters of municipalities with large suicide standardized rates were detected in the north-west, south-west and center-south regions, all of which are part of time-stable clusters of homicide. Prevention efforts to reduce income inequality and mitigate the negative effects of weak relational systems should focus upon municipalities forming time-persistent clusters with a large rate of death by suicide. In

  8. A spatio-temporal analysis of suicide in El Salvador

    Directory of Open Access Journals (Sweden)

    Carlos Carcach

    2017-04-01

    Full Text Available Abstract Background In 2012, international statistics showed El Salvador’s suicide rate as 40th in the world and the highest in Latin America. Over the last 15 years, national statistics show the suicide death rate declining as opposed to an increasing rate of homicide. Though completed suicide is an important social and health issue, little is known about its prevalence, incidence, etiology and spatio-temporal behavior. The primary objective of this study was to examine completed suicide and homicide using the stream analogy to lethal violence within a spatio-temporal framework. Methods A Bayesian model was applied to examine the spatio-temporal evolution of the tendency of completed suicide over homicide in El Salvador. Data on numbers of suicides and homicides at the municipal level were obtained from the Instituto de Medicina Legal (IML and population counts, from the Dirección General de Estadística y Censos (DIGESTYC, for the period of 2002 to 2012. Data on migration were derived from the 2007 Population Census, and inequality data were obtained from a study by Damianović, Valenzuela and Vera. Results The data reveal a stable standardized rate of total lethal violence (completed suicide plus homicide across municipalities over time; a decline in suicide; and a standardized suicide rate decreasing with income inequality but increasing with social isolation. Municipalities clustered in terms of both total lethal violence and suicide standardized rates. Conclusions Spatial effects for suicide were stronger among municipalities located in the north-east and center-south sides of the country. New clusters of municipalities with large suicide standardized rates were detected in the north-west, south-west and center-south regions, all of which are part of time-stable clusters of homicide. Prevention efforts to reduce income inequality and mitigate the negative effects of weak relational systems should focus upon municipalities forming time

  9. Spatial and spatio-temporal analysis of malaria in the state of Acre, western Amazon, Brazil

    Directory of Open Access Journals (Sweden)

    Leonardo Augusto Kohara Melchior

    2016-11-01

    Full Text Available Since 2005, the State of Acre, western Amazon, Brazil, has reported the highest annual parasite incidence (API of malaria among the Brazilian states. This study examines malaria incidence in Acre using spatial and spatio-temporal analysis based on an ecological time series study analyzing malaria cases and deaths for the time period 1992- 2014 and using secondary data. API indexes were calculated by age, sex, parasite species, ratio of Plasmodium vivax to P. falciparum malaria, malaria mortality rate and case fatality rate. SaTScan was used to detect spatial and spatio-temporal clusters of malaria cases and data were represented in the form of choropleth maps. A high-risk cluster of malaria was detected in Vale do Juruá and three low-risk clusters in Vale do Acre for both parasite species. Those younger than 19 years of age and females showed a high incidence of malaria in Vale do Juruá, but working-age males were the most affected in Vale do Acre. The malaria mortality rate showed a decreasing trend across the state, while the case fatality rate increased only in the micro-region of Rio Branco during the study period. We conclude that malaria is a focal disease in Acre showing different spatial and spatio-temporal patterns of cases and deaths that vary by age, sex, and parasite species. Malaria incidence is thought to be influenced by factors related to regional characteristics; therefore, appropriate disease and vector control strategies must be implemented at each locality.

  10. Cluster analysis in phenotyping a Portuguese population.

    Science.gov (United States)

    Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J

    2015-09-03

    Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.

  11. An Efficient Data Compression Model Based on Spatial Clustering and Principal Component Analysis in Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Yihang Yin

    2015-08-01

    Full Text Available Wireless sensor networks (WSNs have been widely used to monitor the environment, and sensors in WSNs are usually power constrained. Because inner-node communication consumes most of the power, efficient data compression schemes are needed to reduce the data transmission to prolong the lifetime of WSNs. In this paper, we propose an efficient data compression model to aggregate data, which is based on spatial clustering and principal component analysis (PCA. First, sensors with a strong temporal-spatial correlation are grouped into one cluster for further processing with a novel similarity measure metric. Next, sensor data in one cluster are aggregated in the cluster head sensor node, and an efficient adaptive strategy is proposed for the selection of the cluster head to conserve energy. Finally, the proposed model applies principal component analysis with an error bound guarantee to compress the data and retain the definite variance at the same time. Computer simulations show that the proposed model can greatly reduce communication and obtain a lower mean square error than other PCA-based algorithms.

  12. An Efficient Data Compression Model Based on Spatial Clustering and Principal Component Analysis in Wireless Sensor Networks.

    Science.gov (United States)

    Yin, Yihang; Liu, Fengzheng; Zhou, Xiang; Li, Quanzhong

    2015-08-07

    Wireless sensor networks (WSNs) have been widely used to monitor the environment, and sensors in WSNs are usually power constrained. Because inner-node communication consumes most of the power, efficient data compression schemes are needed to reduce the data transmission to prolong the lifetime of WSNs. In this paper, we propose an efficient data compression model to aggregate data, which is based on spatial clustering and principal component analysis (PCA). First, sensors with a strong temporal-spatial correlation are grouped into one cluster for further processing with a novel similarity measure metric. Next, sensor data in one cluster are aggregated in the cluster head sensor node, and an efficient adaptive strategy is proposed for the selection of the cluster head to conserve energy. Finally, the proposed model applies principal component analysis with an error bound guarantee to compress the data and retain the definite variance at the same time. Computer simulations show that the proposed model can greatly reduce communication and obtain a lower mean square error than other PCA-based algorithms.

  13. Modeling Temporal-Spatial Earthquake and Volcano Clustering at Yucca Mountain, Nevada

    International Nuclear Information System (INIS)

    T. Parsons; G.A. Thompson; A.H. Cogbill

    2006-01-01

    The proposed national high-level nuclear repository at Yucca Mountain is close to Quaternary faults and cinder cones. The frequency of these events is low, with indications of spatial and temporal clustering, making probabilistic assessments difficult. In an effort to identify the most likely intrusion sites, we based a 3D finite element model on the expectation that faulting and basalt intrusions are primarily sensitive to the magnitude and orientation of the least principal stress in extensional terranes. We found that in the absence of fault slip, variation in overburden pressure caused a stress state that preferentially favored intrusions at Crater Flat. However, when we allowed central Yucca Mountain faults to slip in the model, we found that magmatic clustering was not favored at Crater Flat or in the central Yucca Mountain block. Instead, we calculated that the stress field was most encouraging to intrusions near fault terminations, consistent with the location of the most recent volcanism at Yucca Mountain, the Lathrop Wells cone. We found this linked fault and magmatic system to be mutually reinforcing in the model in that dike inflation favored renewed fault slip

  14. fMR-adaptation indicates selectivity to audiovisual content congruency in distributed clusters in human superior temporal cortex.

    Science.gov (United States)

    van Atteveldt, Nienke M; Blau, Vera C; Blomert, Leo; Goebel, Rainer

    2010-02-02

    Efficient multisensory integration is of vital importance for adequate interaction with the environment. In addition to basic binding cues like temporal and spatial coherence, meaningful multisensory information is also bound together by content-based associations. Many functional Magnetic Resonance Imaging (fMRI) studies propose the (posterior) superior temporal cortex (STC) as the key structure for integrating meaningful multisensory information. However, a still unanswered question is how superior temporal cortex encodes content-based associations, especially in light of inconsistent results from studies comparing brain activation to semantically matching (congruent) versus nonmatching (incongruent) multisensory inputs. Here, we used fMR-adaptation (fMR-A) in order to circumvent potential problems with standard fMRI approaches, including spatial averaging and amplitude saturation confounds. We presented repetitions of audiovisual stimuli (letter-speech sound pairs) and manipulated the associative relation between the auditory and visual inputs (congruent/incongruent pairs). We predicted that if multisensory neuronal populations exist in STC and encode audiovisual content relatedness, adaptation should be affected by the manipulated audiovisual relation. The results revealed an occipital-temporal network that adapted independently of the audiovisual relation. Interestingly, several smaller clusters distributed over superior temporal cortex within that network, adapted stronger to congruent than to incongruent audiovisual repetitions, indicating sensitivity to content congruency. These results suggest that the revealed clusters contain multisensory neuronal populations that encode content relatedness by selectively responding to congruent audiovisual inputs, since unisensory neuronal populations are assumed to be insensitive to the audiovisual relation. These findings extend our previously revealed mechanism for the integration of letters and speech sounds and

  15. fMR-adaptation indicates selectivity to audiovisual content congruency in distributed clusters in human superior temporal cortex

    Directory of Open Access Journals (Sweden)

    Blomert Leo

    2010-02-01

    Full Text Available Abstract Background Efficient multisensory integration is of vital importance for adequate interaction with the environment. In addition to basic binding cues like temporal and spatial coherence, meaningful multisensory information is also bound together by content-based associations. Many functional Magnetic Resonance Imaging (fMRI studies propose the (posterior superior temporal cortex (STC as the key structure for integrating meaningful multisensory information. However, a still unanswered question is how superior temporal cortex encodes content-based associations, especially in light of inconsistent results from studies comparing brain activation to semantically matching (congruent versus nonmatching (incongruent multisensory inputs. Here, we used fMR-adaptation (fMR-A in order to circumvent potential problems with standard fMRI approaches, including spatial averaging and amplitude saturation confounds. We presented repetitions of audiovisual stimuli (letter-speech sound pairs and manipulated the associative relation between the auditory and visual inputs (congruent/incongruent pairs. We predicted that if multisensory neuronal populations exist in STC and encode audiovisual content relatedness, adaptation should be affected by the manipulated audiovisual relation. Results The results revealed an occipital-temporal network that adapted independently of the audiovisual relation. Interestingly, several smaller clusters distributed over superior temporal cortex within that network, adapted stronger to congruent than to incongruent audiovisual repetitions, indicating sensitivity to content congruency. Conclusions These results suggest that the revealed clusters contain multisensory neuronal populations that encode content relatedness by selectively responding to congruent audiovisual inputs, since unisensory neuronal populations are assumed to be insensitive to the audiovisual relation. These findings extend our previously revealed mechanism for

  16. Incidence rate and spatio-temporal clustering of type 1 diabetes in Santiago, Chile, from 1997 to 1998 Taxa de incidência e agrupamento espaço-temporal de diabetes tipo 1 em Santiago, Chile, de 1997 a 1998

    Directory of Open Access Journals (Sweden)

    JL Santos

    2001-02-01

    Full Text Available OBJECTIVE: To estimate the incidence rate of type 1 diabetes in the urban area of Santiago, Chile, from March 21, 1997 to March 20, 1998, and to assess the spatio-temporal clustering of cases during that period. METHODS: All sixty-one incident cases were located temporally (day of diagnosis and spatially (place of residence in the area of study. Knox's method was used to assess spatio-temporal clustering of incident cases. RESULTS: The overall incidence rate of type 1 diabetes was 4.11 cases per 100,000 children aged less than 15 years per year (95% confidence interval: 3.06--5.14. The incidence rate seems to have increased since the last estimate of the incidence calculated for the years 1986--1992 in the metropolitan region of Santiago. Different combinations of space-time intervals have been evaluated to assess spatio-temporal clustering. The smallest p-value was found for the combination of critical distances of 750 meters and 60 days (uncorrected p-value = 0.048. CONCLUSIONS: Although these are preliminary results regarding space-time clustering in Santiago, exploratory analysis of the data method would suggest a possible aggregation of incident cases in space-time coordinates.OBJETIVO: Estimar a taxa de incidência de diabetes tipo 1 na área urbana de Santiago, Chile, entre os dias 21 de março de 1997 e 20 de março 1998, assim como a avaliação do agrupamento espaço-temporal dos casos incidentes no período. MÉTODOS: Foram localizados 61 casos incidentes no tempo (dia do diagnóstico e no espaço (lugar de residência na área do estudo. O método de Knox foi usado para avaliar o agrupamento dos casos no espaço e no tempo. RESULTADOS: A taxa de diabetes tipo 1 foi estimada em 4,11 casos por 100.000 menores de 15 anos por ano (Intervalo de confiança 95%: 3,06 -- 5,14. Essa taxa de incidência parece ter aumentado desde a última estimativa realizada na região metropolitana de Santiago, nos anos 1986-1992. Foram constru

  17. Spatial cluster modelling

    CERN Document Server

    Lawson, Andrew B

    2002-01-01

    Research has generated a number of advances in methods for spatial cluster modelling in recent years, particularly in the area of Bayesian cluster modelling. Along with these advances has come an explosion of interest in the potential applications of this work, especially in epidemiology and genome research. In one integrated volume, this book reviews the state-of-the-art in spatial clustering and spatial cluster modelling, bringing together research and applications previously scattered throughout the literature. It begins with an overview of the field, then presents a series of chapters that illuminate the nature and purpose of cluster modelling within different application areas, including astrophysics, epidemiology, ecology, and imaging. The focus then shifts to methods, with discussions on point and object process modelling, perfect sampling of cluster processes, partitioning in space and space-time, spatial and spatio-temporal process modelling, nonparametric methods for clustering, and spatio-temporal ...

  18. Spatial and temporal structure of typhoid outbreaks in Washington, D.C., 1906–1909: evaluating local clustering with the Gi* statistic

    Directory of Open Access Journals (Sweden)

    Curtis Andrew

    2006-03-01

    Full Text Available Abstract Background To better understand the distribution of typhoid outbreaks in Washington, D.C., the U.S. Public Health Service (PHS conducted four investigations of typhoid fever. These studies included maps of cases reported between 1 May – 31 October 1906 – 1909. These data were entered into a GIS database and analyzed using Ripley's K-function followed by the Gi* statistic in yearly intervals to evaluate spatial clustering, the scale of clustering, and the temporal stability of these clusters. Results The Ripley's K-function indicated no global spatial autocorrelation. The Gi* statistic indicated clustering of typhoid at multiple scales across the four year time period, refuting the conclusions drawn in all four PHS reports concerning the distribution of cases. While the PHS reports suggested an even distribution of the disease, this study quantified both areas of localized disease clustering, as well as mobile larger regions of clustering. Thus, indicating both highly localized and periodic generalized sources of infection within the city. Conclusion The methodology applied in this study was useful for evaluating the spatial distribution and annual-level temporal patterns of typhoid outbreaks in Washington, D.C. from 1906 to 1909. While advanced spatial analyses of historical data sets must be interpreted with caution, this study does suggest that there is utility in these types of analyses and that they provide new insights into the urban patterns of typhoid outbreaks during the early part of the twentieth century.

  19. Automated classification of mouse pup isolation syllables: from cluster analysis to an Excel based ‘mouse pup syllable classification calculator’

    Directory of Open Access Journals (Sweden)

    Jasmine eGrimsley

    2013-01-01

    Full Text Available Mouse pups vocalize at high rates when they are cold or isolated from the nest. The proportions of each syllable type produced carry information about disease state and are being used as behavioral markers for the internal state of animals. Manual classifications of these vocalizations identified ten syllable types based on their spectro-temporal features. However, manual classification of mouse syllables is time consuming and vulnerable to experimenter bias. This study uses an automated cluster analysis to identify acoustically distinct syllable types produced by CBA/CaJ mouse pups, and then compares the results to prior manual classification methods. The cluster analysis identified two syllable types, based on their frequency bands, that have continuous frequency-time structure, and two syllable types featuring abrupt frequency transitions. Although cluster analysis computed fewer syllable types than manual classification, the clusters represented well the probability distributions of the acoustic features within syllables. These probability distributions indicate that some of the manually classified syllable types are not statistically distinct. The characteristics of the four classified clusters were used to generate a Microsoft Excel-based mouse syllable classifier that rapidly categorizes syllables, with over a 90% match, into the syllable types determined by cluster analysis.

  20. User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.

    Science.gov (United States)

    Bourobou, Serge Thomas Mickala; Yoo, Younghwan

    2015-05-21

    This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.

  1. User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm

    Directory of Open Access Journals (Sweden)

    Serge Thomas Mickala Bourobou

    2015-05-01

    Full Text Available This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen’s temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.

  2. Cluster analysis of track structure

    International Nuclear Information System (INIS)

    Michalik, V.

    1991-01-01

    One of the possibilities of classifying track structures is application of conventional partition techniques of analysis of multidimensional data to the track structure. Using these cluster algorithms this paper attempts to find characteristics of radiation reflecting the spatial distribution of ionizations in the primary particle track. An absolute frequency distribution of clusters of ionizations giving the mean number of clusters produced by radiation per unit of deposited energy can serve as this characteristic. General computation techniques used as well as methods of calculations of distributions of clusters for different radiations are discussed. 8 refs.; 5 figs

  3. Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study.

    Science.gov (United States)

    Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein

    2014-11-01

    Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

    Science.gov (United States)

    Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin

    2017-08-31

    Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.

  5. MANNER OF STOCKS SORTING USING CLUSTER ANALYSIS METHODS

    Directory of Open Access Journals (Sweden)

    Jana Halčinová

    2014-06-01

    Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.

  6. A particle swarm optimized kernel-based clustering method for crop mapping from multi-temporal polarimetric L-band SAR observations

    Science.gov (United States)

    Tamiminia, Haifa; Homayouni, Saeid; McNairn, Heather; Safari, Abdoreza

    2017-06-01

    Polarimetric Synthetic Aperture Radar (PolSAR) data, thanks to their specific characteristics such as high resolution, weather and daylight independence, have become a valuable source of information for environment monitoring and management. The discrimination capability of observations acquired by these sensors can be used for land cover classification and mapping. The aim of this paper is to propose an optimized kernel-based C-means clustering algorithm for agriculture crop mapping from multi-temporal PolSAR data. Firstly, several polarimetric features are extracted from preprocessed data. These features are linear polarization intensities, and several statistical and physical based decompositions such as Cloude-Pottier, Freeman-Durden and Yamaguchi techniques. Then, the kernelized version of hard and fuzzy C-means clustering algorithms are applied to these polarimetric features in order to identify crop types. The kernel function, unlike the conventional partitioning clustering algorithms, simplifies the non-spherical and non-linearly patterns of data structure, to be clustered easily. In addition, in order to enhance the results, Particle Swarm Optimization (PSO) algorithm is used to tune the kernel parameters, cluster centers and to optimize features selection. The efficiency of this method was evaluated by using multi-temporal UAVSAR L-band images acquired over an agricultural area near Winnipeg, Manitoba, Canada, during June and July in 2012. The results demonstrate more accurate crop maps using the proposed method when compared to the classical approaches, (e.g. 12% improvement in general). In addition, when the optimization technique is used, greater improvement is observed in crop classification, e.g. 5% in overall. Furthermore, a strong relationship between Freeman-Durden volume scattering component, which is related to canopy structure, and phenological growth stages is observed.

  7. Exact WKB analysis and cluster algebras

    International Nuclear Information System (INIS)

    Iwaki, Kohei; Nakanishi, Tomoki

    2014-01-01

    We develop the mutation theory in the exact WKB analysis using the framework of cluster algebras. Under a continuous deformation of the potential of the Schrödinger equation on a compact Riemann surface, the Stokes graph may change the topology. We call this phenomenon the mutation of Stokes graphs. Along the mutation of Stokes graphs, the Voros symbols, which are monodromy data of the equation, also mutate due to the Stokes phenomenon. We show that the Voros symbols mutate as variables of a cluster algebra with surface realization. As an application, we obtain the identities of Stokes automorphisms associated with periods of cluster algebras. The paper also includes an extensive introduction of the exact WKB analysis and the surface realization of cluster algebras for nonexperts. This article is part of a special issue of Journal of Physics A: Mathematical and Theoretical devoted to ‘Cluster algebras in mathematical physics’. (paper)

  8. Spatio-temporal pattern analysis for evaluation of the spread of human infections with avian influenza A(H7N9) virus in China, 2013-2014.

    Science.gov (United States)

    Dong, Wen; Yang, Kun; Xu, Quanli; Liu, Lin; Chen, Juan

    2017-10-24

    A large number (n = 460) of A(H7N9) human infections have been reported in China from March 2013 through December 2014, and H7N9 outbreaks in humans became an emerging issue for China health, which have caused numerous disease outbreaks in domestic poultry and wild bird populations, and threatened human health severely. The aims of this study were to investigate the directional trend of the epidemic and to identify the significant presence of spatial-temporal clustering of influenza A(H7N9) human cases between March 2013 and December 2014. Three distinct epidemic phases of A(H7N9) human infections were identified in this study. In each phase, standard deviational ellipse analysis was conducted to examine the directional trend of disease spreading, and retrospective space-time permutation scan statistic was then used to identify the spatio-temporal cluster patterns of H7N9 outbreaks in humans. The ever-changing location and the increasing size of the three identified standard deviational ellipses showed that the epidemic moved from east to southeast coast, and hence to some central regions, with a future epidemiological trend of continue dispersing to more central regions of China, and a few new human cases might also appear in parts of the western China. Furthermore, A(H7N9) human infections were clustering in space and time in the first two phases with five significant spatio-temporal clusters (p < 0.05), but there was no significant cluster identified in phase III. There was a new epidemiologic pattern that the decrease in significant spatio-temporal cluster of A(H7N9) human infections was accompanied with an obvious spatial expansion of the outbreaks during the study period, and identification of the spatio-temporal patterns of the epidemic can provide valuable insights for better understanding the spreading dynamics of the disease in China.

  9. From virtual clustering analysis to self-consistent clustering analysis: a mathematical study

    Science.gov (United States)

    Tang, Shaoqiang; Zhang, Lei; Liu, Wing Kam

    2018-03-01

    In this paper, we propose a new homogenization algorithm, virtual clustering analysis (VCA), as well as provide a mathematical framework for the recently proposed self-consistent clustering analysis (SCA) (Liu et al. in Comput Methods Appl Mech Eng 306:319-341, 2016). In the mathematical theory, we clarify the key assumptions and ideas of VCA and SCA, and derive the continuous and discrete Lippmann-Schwinger equations. Based on a key postulation of "once response similarly, always response similarly", clustering is performed in an offline stage by machine learning techniques (k-means and SOM), and facilitates substantial reduction of computational complexity in an online predictive stage. The clear mathematical setup allows for the first time a convergence study of clustering refinement in one space dimension. Convergence is proved rigorously, and found to be of second order from numerical investigations. Furthermore, we propose to suitably enlarge the domain in VCA, such that the boundary terms may be neglected in the Lippmann-Schwinger equation, by virtue of the Saint-Venant's principle. In contrast, they were not obtained in the original SCA paper, and we discover these terms may well be responsible for the numerical dependency on the choice of reference material property. Since VCA enhances the accuracy by overcoming the modeling error, and reduce the numerical cost by avoiding an outer loop iteration for attaining the material property consistency in SCA, its efficiency is expected even higher than the recently proposed SCA algorithm.

  10. Quantitative and temporal proteome analysis of butyrate-treated colorectal cancer cells.

    Science.gov (United States)

    Tan, Hwee Tong; Tan, Sandra; Lin, Qingsong; Lim, Teck Kwang; Hew, Choy Leong; Chung, Maxey C M

    2008-06-01

    Colorectal cancer is one of the most common cancers in developed countries, and its incidence is negatively associated with high dietary fiber intake. Butyrate, a short-chain fatty acid fermentation by-product of fiber induces cell maturation with the promotion of growth arrest, differentiation, and/or apoptosis of cancer cells. The stimulation of cell maturation by butyrate in colonic cancer cells follows a temporal progression from the early phase of growth arrest to the activation of apoptotic cascades. Previously we performed two-dimensional DIGE to identify differentially expressed proteins induced by 24-h butyrate treatment of HCT-116 colorectal cancer cells. Herein we used quantitative proteomics approaches using iTRAQ (isobaric tags for relative and absolute quantitation), a stable isotope labeling methodology that enables multiplexing of four samples, for a temporal study of HCT-116 cells treated with butyrate. In addition, cleavable ICAT, which selectively tags cysteine-containing proteins, was also used, and the results complemented those obtained from the iTRAQ strategy. Selected protein targets were validated by real time PCR and Western blotting. A model is proposed to illustrate our findings from this temporal analysis of the butyrate-responsive proteome that uncovered several integrated cellular processes and pathways involved in growth arrest, apoptosis, and metastasis. These signature clusters of butyrate-regulated pathways are potential targets for novel chemopreventive and therapeutic drugs for treatment of colorectal cancer.

  11. Cluster Analysis of Time-Dependent Crystallographic Data: Direct Identification of Time-Independent Structural Intermediates

    Science.gov (United States)

    Kostov, Konstantin S.; Moffat, Keith

    2011-01-01

    The initial output of a time-resolved macromolecular crystallography experiment is a time-dependent series of difference electron density maps that displays the time-dependent changes in underlying structure as a reaction progresses. The goal is to interpret such data in terms of a small number of crystallographically refinable, time-independent structures, each associated with a reaction intermediate; to establish the pathways and rate coefficients by which these intermediates interconvert; and thereby to elucidate a chemical kinetic mechanism. One strategy toward achieving this goal is to use cluster analysis, a statistical method that groups objects based on their similarity. If the difference electron density at a particular voxel in the time-dependent difference electron density (TDED) maps is sensitive to the presence of one and only one intermediate, then its temporal evolution will exactly parallel the concentration profile of that intermediate with time. The rationale is therefore to cluster voxels with respect to the shapes of their TDEDs, so that each group or cluster of voxels corresponds to one structural intermediate. Clusters of voxels whose TDEDs reflect the presence of two or more specific intermediates can also be identified. From such groupings one can then infer the number of intermediates, obtain their time-independent difference density characteristics, and refine the structure of each intermediate. We review the principles of cluster analysis and clustering algorithms in a crystallographic context, and describe the application of the method to simulated and experimental time-resolved crystallographic data for the photocycle of photoactive yellow protein. PMID:21244840

  12. ENTREPRENEURIAL ACTIVITY IN ROMANIA – A TIME SERIES CLUSTERING ANALYSIS AT THE NUTS3 LEVEL

    Directory of Open Access Journals (Sweden)

    Sipos-Gug Sebastian

    2013-07-01

    Full Text Available Entrepreneurship is an active field of research, having known a major increase in interest and publication levels in the last years (Landström et al., 2012. Within this field recently there has been an increasing interest in understanding why some regions seem to have a significantly higher entrepreneurship activity compared to others. In line with this research field, we would like to investigate the differences in entrepreneurial activity among the Romanian counties (NUTS 3 regions. While the classical research paradigm in this field is to conduct a temporally stationary analysis, we choose to use a time series clustering analysis to better understanding the dynamics of entrepreneurial activity between counties. Our analysis showed that if we use the total number of new privately owned companies that are founded each year in the last decade (2002-2012 we can distinguish between 5 clusters, one with high total entrepreneurial activity (18 counties, one with above average activity (8 counties, two clusters with average and slightly below average activity (total of 18 counties and one cluster with low and declining activity (2 counties. If we are interested in the entrepreneurial activity rate, that is the number of new privately owned companies founded each year adjusted by the population of the respective county, we obtain 4 clusters, one with a very high entrepreneurial rate (1 county, one with average rate (10 counties, and two clusters with below average entrepreneurial rate (total of 31 counties. In conclusion, our research shows that Romania is far from being a homogeneous geographical area in respect to entrepreneurial activity. Depending on what we are interested in, it can be divided in 5 or 4 clusters of counties, which behave differently as a function of time. Further research should be focused on explaining these regional differences, on studying the high performance clusters and trying to improve the low performing ones.

  13. Robust cluster analysis and variable selection

    CERN Document Server

    Ritter, Gunter

    2014-01-01

    Clustering remains a vibrant area of research in statistics. Although there are many books on this topic, there are relatively few that are well founded in the theoretical aspects. In Robust Cluster Analysis and Variable Selection, Gunter Ritter presents an overview of the theory and applications of probabilistic clustering and variable selection, synthesizing the key research results of the last 50 years. The author focuses on the robust clustering methods he found to be the most useful on simulated data and real-time applications. The book provides clear guidance for the varying needs of bot

  14. Spatial and Temporal Analysis of Eruption Locations, Compositions, and Styles in Northern Harrat Rahat, Kingdom of Saudi Arabia

    Science.gov (United States)

    Dietterich, H. R.; Stelten, M. E.; Downs, D. T.; Champion, D. E.

    2017-12-01

    Harrat Rahat is a predominantly mafic, 20,000 km2 volcanic field in western Saudi Arabia with an elongate volcanic axis extending 310 km north-south. Prior mapping suggests that the youngest eruptions were concentrated in northernmost Harrat Rahat, where our new geologic mapping and geochronology reveal >300 eruptive vents with ages ranging from 1.2 Ma to a historic eruption in 1256 CE. Eruption compositions and styles vary spatially and temporally within the volcanic field, where extensive alkali basaltic lavas dominate, but more evolved compositions erupted episodically as clusters of trachytic domes and small-volume pyroclastic flows. Analysis of vent locations, compositions, and eruption styles shows the evolution of the volcanic field and allows assessment of the spatio-temporal probabilities of vent opening and eruption styles. We link individual vents and fissures to eruptions and their deposits using field relations, petrography, geochemistry, paleomagnetism, and 40Ar/39Ar and 36Cl geochronology. Eruption volumes and deposit extents are derived from geologic mapping and topographic analysis. Spatial density analysis with kernel density estimation captures vent densities of up to 0.2 %/km2 along the north-south running volcanic axis, decaying quickly away to the east but reaching a second, lower high along a secondary axis to the west. Temporal trends show slight younging of mafic eruption ages to the north in the past 300 ka, as well as clustered eruptions of trachytes over the past 150 ka. Vent locations, timing, and composition are integrated through spatial probability weighted by eruption age for each compositional range to produce spatio-temporal models of vent opening probability. These show that the next mafic eruption is most probable within the north end of the main (eastern) volcanic axis, whereas more evolved compositions are most likely to erupt within the trachytic centers further to the south. These vent opening probabilities, combined with

  15. Spatio-Temporal Analysis to Predict Environmental Influence on Malaria

    Science.gov (United States)

    Baig, S.; Sarfraz, M. S.

    2018-05-01

    Malaria is a vector borne disease which is a major cause of morbidity and mortality. It is one of the major diseases in the category of infectious diseases. The survival and bionomics of malaria is affected by environmental factors such as climatic, demographic and land-use/land-cover etc. Currently, a very few under developing countries are using Geo-informatics approaches to control this disease. Gujrat a district of Pakistan, is still under threat of malaria disease. Current research is carried on malaria incidents obtained from District Executive Officer of Health Gujrat. The objective of this study was to explore the spatio-temporal patterns of malaria in district Gujrat and to identify the areas being affected by Malaria. Furthermore, it has been also analyzed the relationship between malaria incident and environmental factors in highly favorable zones. Data is analyzed based on spatial and temporal patterns using (Moran's I). Moreover cluster and hot spots analysis were performed on the incident data. This study shows positive correlation with rainfall, vegetation index, population density and water bodies; while it shows positive and negative correlation with temperature in different seasons. However, variation between amount of vegetation and water bodies were observed. Finding of this research can help the decision makers to take preventive measures and reduce the morbidity and mortality related with malaria in Gujrat, Pakistan.

  16. Cluster analysis

    OpenAIRE

    Mucha, Hans-Joachim; Sofyan, Hizir

    2000-01-01

    As an explorative technique, duster analysis provides a description or a reduction in the dimension of the data. It classifies a set of observations into two or more mutually exclusive unknown groups based on combinations of many variables. Its aim is to construct groups in such a way that the profiles of objects in the same groups are relatively homogenous whereas the profiles of objects in different groups are relatively heterogeneous. Clustering is distinct from classification techniques, ...

  17. Heart morphogenesis gene regulatory networks revealed by temporal expression analysis.

    Science.gov (United States)

    Hill, Jonathon T; Demarest, Bradley; Gorsi, Bushra; Smith, Megan; Yost, H Joseph

    2017-10-01

    During embryogenesis the heart forms as a linear tube that then undergoes multiple simultaneous morphogenetic events to obtain its mature shape. To understand the gene regulatory networks (GRNs) driving this phase of heart development, during which many congenital heart disease malformations likely arise, we conducted an RNA-seq timecourse in zebrafish from 30 hpf to 72 hpf and identified 5861 genes with altered expression. We clustered the genes by temporal expression pattern, identified transcription factor binding motifs enriched in each cluster, and generated a model GRN for the major gene batteries in heart morphogenesis. This approach predicted hundreds of regulatory interactions and found batteries enriched in specific cell and tissue types, indicating that the approach can be used to narrow the search for novel genetic markers and regulatory interactions. Subsequent analyses confirmed the GRN using two mutants, Tbx5 and nkx2-5 , and identified sets of duplicated zebrafish genes that do not show temporal subfunctionalization. This dataset provides an essential resource for future studies on the genetic/epigenetic pathways implicated in congenital heart defects and the mechanisms of cardiac transcriptional regulation. © 2017. Published by The Company of Biologists Ltd.

  18. Electricity Consumption Clustering Using Smart Meter Data

    Directory of Open Access Journals (Sweden)

    Alexander Tureczek

    2018-04-01

    Full Text Available Electricity smart meter consumption data is enabling utilities to analyze consumption information at unprecedented granularity. Much focus has been directed towards consumption clustering for diversifying tariffs; through modern clustering methods, cluster analyses have been performed. However, the clusters developed exhibit a large variation with resulting shadow clusters, making it impossible to truly identify the individual clusters. Using clearly defined dwelling types, this paper will present methods to improve clustering by harvesting inherent structure from the smart meter data. This paper clusters domestic electricity consumption using smart meter data from the Danish city of Esbjerg. Methods from time series analysis and wavelets are applied to enable the K-Means clustering method to account for autocorrelation in data and thereby improve the clustering performance. The results show the importance of data knowledge and we identify sub-clusters of consumption within the dwelling types and enable K-Means to produce satisfactory clustering by accounting for a temporal component. Furthermore our study shows that careful preprocessing of the data to account for intrinsic structure enables better clustering performance by the K-Means method.

  19. Analysis of earthquake clustering and source spectra in the Salton Sea Geothermal Field

    Science.gov (United States)

    Cheng, Y.; Chen, X.

    2015-12-01

    The Salton Sea Geothermal field is located within the tectonic step-over between San Andreas Fault and Imperial Fault. Since the 1980s, geothermal energy exploration has resulted with step-like increase of microearthquake activities, which mirror the expansion of geothermal field. Distinguishing naturally occurred and induced seismicity, and their corresponding characteristics (e.g., energy release) is important for hazard assessment. Between 2008 and 2014, seismic data recorded by a local borehole array were provided public access from CalEnergy through SCEC data center; and the high quality local recording of over 7000 microearthquakes provides unique opportunity to sort out characteristics of induced versus natural activities. We obtain high-resolution earthquake location using improved S-wave picks, waveform cross-correlation and a new 3D velocity model. We then develop method to identify spatial-temporally isolated earthquake clusters. These clusters are classified into aftershock-type, swarm-type, and mixed-type (aftershock-like, with low skew, low magnitude and shorter duration), based on the relative timing of largest earthquakes and moment-release. The mixed-type clusters are mostly located at 3 - 4 km depth near injection well; while aftershock-type clusters and swarm-type clusters also occur further from injection well. By counting number of aftershocks within 1day following mainshock in each cluster, we find that the mixed-type clusters have much higher aftershock productivity compared with other types and historic M4 earthquakes. We analyze detailed spatial variation of 'b-value'. We find that the mixed-type clusters are mostly located within high b-value patches, while large (M>3) earthquakes and other types of clusters are located within low b-value patches. We are currently processing P and S-wave spectra to analyze the spatial-temporal correlation of earthquake stress parameter and seismicity characteristics. Preliminary results suggest that the

  20. A Temporal Extension to Traditional Empirical Orthogonal Function Analysis

    DEFF Research Database (Denmark)

    Nielsen, Allan Aasbjerg; Hilger, Klaus Baggesen; Andersen, Ole Baltazar

    2002-01-01

    (EOF) analysis, which provides a non-temporal analysis of one variable over time. The temporal extension proves its strength in separating the signals at different periods in an analysis of relevant oceanographic properties related to one of the largest El Niño events ever recorded....

  1. A Model-Based Analysis of Chemical and Temporal Patterns of Cuticular Hydrocarbons in Male Drosophila melanogaster

    Science.gov (United States)

    Kent, Clement; Azanchi, Reza; Smith, Ben; Chu, Adrienne; Levine, Joel

    2007-01-01

    Drosophila Cuticular Hydrocarbons (CH) influence courtship behaviour, mating, aggregation, oviposition, and resistance to desiccation. We measured levels of 24 different CH compounds of individual male D. melanogaster hourly under a variety of environmental (LD/DD) conditions. Using a model-based analysis of CH variation, we developed an improved normalization method for CH data, and show that CH compounds have reproducible cyclic within-day temporal patterns of expression which differ between LD and DD conditions. Multivariate clustering of expression patterns identified 5 clusters of co-expressed compounds with common chemical characteristics. Turnover rate estimates suggest CH production may be a significant metabolic cost. Male cuticular hydrocarbon expression is a dynamic trait influenced by light and time of day; since abundant hydrocarbons affect male sexual behavior, males may present different pheromonal profiles at different times and under different conditions. PMID:17896002

  2. A model-based analysis of chemical and temporal patterns of cuticular hydrocarbons in male Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Clement Kent

    Full Text Available Drosophila Cuticular Hydrocarbons (CH influence courtship behaviour, mating, aggregation, oviposition, and resistance to desiccation. We measured levels of 24 different CH compounds of individual male D. melanogaster hourly under a variety of environmental (LD/DD conditions. Using a model-based analysis of CH variation, we developed an improved normalization method for CH data, and show that CH compounds have reproducible cyclic within-day temporal patterns of expression which differ between LD and DD conditions. Multivariate clustering of expression patterns identified 5 clusters of co-expressed compounds with common chemical characteristics. Turnover rate estimates suggest CH production may be a significant metabolic cost. Male cuticular hydrocarbon expression is a dynamic trait influenced by light and time of day; since abundant hydrocarbons affect male sexual behavior, males may present different pheromonal profiles at different times and under different conditions.

  3. Cluster analysis and quality assessment of logged water at an irrigation project, eastern Saudi Arabia.

    Science.gov (United States)

    Hussain, Mahbub; Ahmed, Syed Munaf; Abderrahman, Walid

    2008-01-01

    A multivariate statistical technique, cluster analysis, was used to assess the logged surface water quality at an irrigation project at Al-Fadhley, Eastern Province, Saudi Arabia. The principal idea behind using the technique was to utilize all available hydrochemical variables in the quality assessment including trace elements and other ions which are not considered in conventional techniques for water quality assessments like Stiff and Piper diagrams. Furthermore, the area belongs to an irrigation project where water contamination associated with the use of fertilizers, insecticides and pesticides is expected. This quality assessment study was carried out on a total of 34 surface/logged water samples. To gain a greater insight in terms of the seasonal variation of water quality, 17 samples were collected from both summer and winter seasons. The collected samples were analyzed for a total of 23 water quality parameters including pH, TDS, conductivity, alkalinity, sulfate, chloride, bicarbonate, nitrate, phosphate, bromide, fluoride, calcium, magnesium, sodium, potassium, arsenic, boron, copper, cobalt, iron, lithium, manganese, molybdenum, nickel, selenium, mercury and zinc. Cluster analysis in both Q and R modes was used. Q-mode analysis resulted in three distinct water types for both the summer and winter seasons. Q-mode analysis also showed the spatial as well as temporal variation in water quality. R-mode cluster analysis led to the conclusion that there are two major sources of contamination for the surface/shallow groundwater in the area: fertilizers, micronutrients, pesticides, and insecticides used in agricultural activities, and non-point natural sources.

  4. Data Clustering

    Science.gov (United States)

    Wagstaff, Kiri L.

    2012-03-01

    On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained

  5. Quantitative and qualitative analysis of semantic verbal fluency in patients with temporal lobe epilepsy.

    Science.gov (United States)

    Jaimes-Bautista, A G; Rodríguez-Camacho, M; Martínez-Juárez, I E; Rodríguez-Agudelo, Y

    2017-08-29

    Patients with temporal lobe epilepsy (TLE) perform poorly on semantic verbal fluency (SVF) tasks. Completing these tasks successfully involves multiple cognitive processes simultaneously. Therefore, quantitative analysis of SVF (number of correct words in one minute), conducted in most studies, has been found to be insufficient to identify cognitive dysfunction underlying SVF difficulties in TLE. To determine whether a sample of patients with TLE had SVF difficulties compared with a control group (CG), and to identify the cognitive components associated with SVF difficulties using quantitative and qualitative analysis. SVF was evaluated in 25 patients with TLE and 24 healthy controls; the semantic verbal fluency test included 5 semantic categories: animals, fruits, occupations, countries, and verbs. All 5 categories were analysed quantitatively (number of correct words per minute and interval of execution: 0-15, 16-30, 31-45, and 46-60seconds); the categories animals and fruits were also analysed qualitatively (clusters, cluster size, switches, perseverations, and intrusions). Patients generated fewer words for all categories and intervals and fewer clusters and switches for animals and fruits than the CG (Psize and number of intrusions and perseverations (P>.05). Our results suggest an association between SVF difficulties in TLE and difficulty activating semantic networks, impaired strategic search, and poor cognitive flexibility. Attention, inhibition, and working memory are preserved in these patients. Copyright © 2017 Sociedad Española de Neurología. Publicado por Elsevier España, S.L.U. All rights reserved.

  6. Clustering Trajectories by Relevant Parts for Air Traffic Analysis.

    Science.gov (United States)

    Andrienko, Gennady; Andrienko, Natalia; Fuchs, Georg; Garcia, Jose Manuel Cordero

    2018-01-01

    Clustering of trajectories of moving objects by similarity is an important technique in movement analysis. Existing distance functions assess the similarity between trajectories based on properties of the trajectory points or segments. The properties may include the spatial positions, times, and thematic attributes. There may be a need to focus the analysis on certain parts of trajectories, i.e., points and segments that have particular properties. According to the analysis focus, the analyst may need to cluster trajectories by similarity of their relevant parts only. Throughout the analysis process, the focus may change, and different parts of trajectories may become relevant. We propose an analytical workflow in which interactive filtering tools are used to attach relevance flags to elements of trajectories, clustering is done using a distance function that ignores irrelevant elements, and the resulting clusters are summarized for further analysis. We demonstrate how this workflow can be useful for different analysis tasks in three case studies with real data from the domain of air traffic. We propose a suite of generic techniques and visualization guidelines to support movement data analysis by means of relevance-aware trajectory clustering.

  7. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis

    Science.gov (United States)

    Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

    2015-01-01

    Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383

  8. Simultaneous Two-Way Clustering of Multiple Correspondence Analysis

    Science.gov (United States)

    Hwang, Heungsun; Dillon, William R.

    2010-01-01

    A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is…

  9. Pharmacokinetic analysis and k-means clustering of DCEMR images for radiotherapy outcome prediction of advanced cervical cancers.

    Science.gov (United States)

    Andersen, Erlend K F; Kristensen, Gunnar B; Lyng, Heidi; Malinen, Eirik

    2011-08-01

    Pharmacokinetic analysis of dynamic contrast enhanced magnetic resonance images (DCEMRI) allows for quantitative characterization of vascular properties of tumors. The aim of this study is twofold, first to determine if tumor regions with similar vascularization could be labeled by clustering methods, second to determine if the identified regions can be associated with local cancer relapse. Eighty-one patients with locally advanced cervical cancer treated with chemoradiotherapy underwent DCEMRI with Gd-DTPA prior to external beam radiotherapy. The median follow-up time after treatment was four years, in which nine patients had primary tumor relapse. By fitting a pharmacokinetic two-compartment model function to the temporal contrast enhancement in the tumor, two pharmacokinetic parameters, K(trans) and ύ(e), were estimated voxel by voxel from the DCEMR-images. Intratumoral regions with similar vascularization were identified by k-means clustering of the two pharmacokinetic parameter estimates over all patients. The volume fraction of each cluster was used to evaluate the prognostic value of the clusters. Three clusters provided a sufficient reduction of the cluster variance to label different vascular properties within the tumors. The corresponding median volume fraction of each cluster was 38%, 46% and 10%. The second cluster was significantly associated with primary tumor control in a log-rank survival test (p-value: 0.042), showing a decreased risk of treatment failure for patients with high volume fraction of voxels. Intratumoral regions showing similar vascular properties could successfully be labeled in three distinct clusters and the volume fraction of one cluster region was associated with primary tumor control.

  10. Pharmacokinetic analysis and k-means clustering of DCEMR images for radiotherapy outcome prediction of advanced cervical cancers

    International Nuclear Information System (INIS)

    Andersen, Erlend K. F.; Kristensen, Gunnar B.; Lyng, Heidi; Malinen, Eirik

    2011-01-01

    Introduction. Pharmacokinetic analysis of dynamic contrast enhanced magnetic resonance images (DCEMRI) allows for quantitative characterization of vascular properties of tumors. The aim of this study is twofold, first to determine if tumor regions with similar vascularization could be labeled by clustering methods, second to determine if the identified regions can be associated with local cancer relapse. Materials and methods. Eighty-one patients with locally advanced cervical cancer treated with chemoradiotherapy underwent DCEMRI with Gd-DTPA prior to external beam radiotherapy. The median follow-up time after treatment was four years, in which nine patients had primary tumor relapse. By fitting a pharmacokinetic two-compartment model function to the temporal contrast enhancement in the tumor, two pharmacokinetic parameters, K trans and u e , were estimated voxel by voxel from the DCEMR-images. Intratumoral regions with similar vascularization were identified by k-means clustering of the two pharmacokinetic parameter estimates over all patients. The volume fraction of each cluster was used to evaluate the prognostic value of the clusters. Results. Three clusters provided a sufficient reduction of the cluster variance to label different vascular properties within the tumors. The corresponding median volume fraction of each cluster was 38%, 46% and 10%. The second cluster was significantly associated with primary tumor control in a log-rank survival test (p-value: 0.042), showing a decreased risk of treatment failure for patients with high volume fraction of voxels. Conclusions. Intratumoral regions showing similar vascular properties could successfully be labeled in three distinct clusters and the volume fraction of one cluster region was associated with primary tumor control

  11. Pharmacokinetic analysis and k-means clustering of DCEMR images for radiotherapy outcome prediction of advanced cervical cancers

    Energy Technology Data Exchange (ETDEWEB)

    Andersen, Erlend K. F. (Dept. of Medical Physics, The Norwegian Radium Hospital, Oslo Univ. Hospital, Oslo (Norway)), e-mail: eirik.malinen@fys.uio.no; Kristensen, Gunnar B. (Section for Gynaecological Oncology, The Norwegian Radium Hospital, Oslo Univ. Hospital, Oslo (Norway)); Lyng, Heidi (Dept. of Radiation Biology, The Norwegian Radium Hospital, Oslo Univ. Hospital, Oslo (Norway)); Malinen, Eirik (Dept. of Medical Physics, The Norwegian Radium Hospital, Oslo Univ. Hospital, Oslo (Norway); Dept. of Physics, Univ. of Oslo, Oslo (Norway))

    2011-08-15

    Introduction. Pharmacokinetic analysis of dynamic contrast enhanced magnetic resonance images (DCEMRI) allows for quantitative characterization of vascular properties of tumors. The aim of this study is twofold, first to determine if tumor regions with similar vascularization could be labeled by clustering methods, second to determine if the identified regions can be associated with local cancer relapse. Materials and methods. Eighty-one patients with locally advanced cervical cancer treated with chemoradiotherapy underwent DCEMRI with Gd-DTPA prior to external beam radiotherapy. The median follow-up time after treatment was four years, in which nine patients had primary tumor relapse. By fitting a pharmacokinetic two-compartment model function to the temporal contrast enhancement in the tumor, two pharmacokinetic parameters, Ktrans and u{sub e}, were estimated voxel by voxel from the DCEMR-images. Intratumoral regions with similar vascularization were identified by k-means clustering of the two pharmacokinetic parameter estimates over all patients. The volume fraction of each cluster was used to evaluate the prognostic value of the clusters. Results. Three clusters provided a sufficient reduction of the cluster variance to label different vascular properties within the tumors. The corresponding median volume fraction of each cluster was 38%, 46% and 10%. The second cluster was significantly associated with primary tumor control in a log-rank survival test (p-value: 0.042), showing a decreased risk of treatment failure for patients with high volume fraction of voxels. Conclusions. Intratumoral regions showing similar vascular properties could successfully be labeled in three distinct clusters and the volume fraction of one cluster region was associated with primary tumor control

  12. Systematic analysis of whistler-mode emissions below the lower hybrid frequency based on the data of the Cluster project.

    Science.gov (United States)

    Nemec, F.; Santolik, O.; Gereova, K.; Macusova, E.; Cornilleau-Wehrlin, N.

    2003-12-01

    We report results of a systematic analysis of equatorial noise below the local lower hybrid frequency. Our analysis is based on the entire data set collected by the STAFF-SA instruments on board the Cluster spacecraft during the first two years of operation (2001 - 2002). We compare intensities of equatorial noise with other whistler-mode emissions, for example with chorus or hiss. The results indicate that these emissions can play a significant role in the dynamics of the inner magnetosphere. Using the multipoint measurement we show considerable spatio-temporal variations of the wave intensity.

  13. Visualizing Confidence in Cluster-Based Ensemble Weather Forecast Analyses.

    Science.gov (United States)

    Kumpf, Alexander; Tost, Bianca; Baumgart, Marlene; Riemer, Michael; Westermann, Rudiger; Rautenhaus, Marc

    2018-01-01

    In meteorology, cluster analysis is frequently used to determine representative trends in ensemble weather predictions in a selected spatio-temporal region, e.g., to reduce a set of ensemble members to simplify and improve their analysis. Identified clusters (i.e., groups of similar members), however, can be very sensitive to small changes of the selected region, so that clustering results can be misleading and bias subsequent analyses. In this article, we - a team of visualization scientists and meteorologists-deliver visual analytics solutions to analyze the sensitivity of clustering results with respect to changes of a selected region. We propose an interactive visual interface that enables simultaneous visualization of a) the variation in composition of identified clusters (i.e., their robustness), b) the variability in cluster membership for individual ensemble members, and c) the uncertainty in the spatial locations of identified trends. We demonstrate that our solution shows meteorologists how representative a clustering result is, and with respect to which changes in the selected region it becomes unstable. Furthermore, our solution helps to identify those ensemble members which stably belong to a given cluster and can thus be considered similar. In a real-world application case we show how our approach is used to analyze the clustering behavior of different regions in a forecast of "Tropical Cyclone Karl", guiding the user towards the cluster robustness information required for subsequent ensemble analysis.

  14. Clustering stock market companies via chaotic map synchronization

    Science.gov (United States)

    Basalto, N.; Bellotti, R.; De Carlo, F.; Facchi, P.; Pascazio, S.

    2005-01-01

    A pairwise clustering approach is applied to the analysis of the Dow Jones index companies, in order to identify similar temporal behavior of the traded stock prices. To this end, the chaotic map clustering algorithm is used, where a map is associated to each company and the correlation coefficients of the financial time series to the coupling strengths between maps. The simulation of a chaotic map dynamics gives rise to a natural partition of the data, as companies belonging to the same industrial branch are often grouped together. The identification of clusters of companies of a given stock market index can be exploited in the portfolio optimization strategies.

  15. Semi-supervised consensus clustering for gene expression data analysis

    OpenAIRE

    Wang, Yunli; Pan, Youlian

    2014-01-01

    Background Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and do...

  16. [Epidemiologic and spatio-temporal characteristics of hepatitis E in China, 2004-2014].

    Science.gov (United States)

    Liu, Z Q; Zuo, J L; Yan, Q; Fang, Q W; Zhang, T J

    2017-10-10

    Objective: To describe and analyze the epidemiologic and spatio-temporal characteristics of hepatitis E in China from 2004 to 2014. Methods: Data on the incidence of hepatitis E in 31 provinces (municipality and autonomous region) from 2004 to 2014, were collected. Empirical Mode Decomposition (EMD) was applied to decompose the time-series data to accurately describe the trend of hepatitis E incidence. Mathematic model was used to estimate the annual change of incidence in each age group and the whole province. Software ArcGIS 10.1 and SaTScan 9.01 were used to analyze the spatio-temporal clusters. Results: During 2004-2014, a total of 245 414 hepatitis E cases were reported in China. The overall incidence showed a slight increase ( OR =1.05, 95 %CI : 1.03-1.10). Incidence rates on hepatitis E were discovered different across the provinces, with significant increase appearing in the southern, central and northwestern areas. The highest increase was seen in the elderly, especially in the 65-69 and 70-74 year-olds. Results from the Local spatial autocorrelation analysis showed that the "high-high cluster" was moving from the north to the south and the "low-low cluster" disappeared as time went by. Data from Spatio-temporal scanning showed that there were five spatio-temporal clustering areas across the country. Conclusion: The overall incidence of hepatitis E was on the rise from 2004 to 2014, in China, but with differences seen across the areas and age groups.

  17. Allergen Sensitization Pattern by Sex: A Cluster Analysis in Korea.

    Science.gov (United States)

    Ohn, Jungyoon; Paik, Seung Hwan; Doh, Eun Jin; Park, Hyun-Sun; Yoon, Hyun-Sun; Cho, Soyun

    2017-12-01

    Allergens tend to sensitize simultaneously. Etiology of this phenomenon has been suggested to be allergen cross-reactivity or concurrent exposure. However, little is known about specific allergen sensitization patterns. To investigate the allergen sensitization characteristics according to gender. Multiple allergen simultaneous test (MAST) is widely used as a screening tool for detecting allergen sensitization in dermatologic clinics. We retrospectively reviewed the medical records of patients with MAST results between 2008 and 2014 in our Department of Dermatology. A cluster analysis was performed to elucidate the allergen-specific immunoglobulin (Ig)E cluster pattern. The results of MAST (39 allergen-specific IgEs) from 4,360 cases were analyzed. By cluster analysis, 39items were grouped into 8 clusters. Each cluster had characteristic features. When compared with female, the male group tended to be sensitized more frequently to all tested allergens, except for fungus allergens cluster. The cluster and comparative analysis results demonstrate that the allergen sensitization is clustered, manifesting allergen similarity or co-exposure. Only the fungus cluster allergens tend to sensitize female group more frequently than male group.

  18. WebGimm: An integrated web-based platform for cluster analysis, functional analysis, and interactive visualization of results.

    Science.gov (United States)

    Joshi, Vineet K; Freudenberg, Johannes M; Hu, Zhen; Medvedovic, Mario

    2011-01-17

    Cluster analysis methods have been extensively researched, but the adoption of new methods is often hindered by technical barriers in their implementation and use. WebGimm is a free cluster analysis web-service, and an open source general purpose clustering web-server infrastructure designed to facilitate easy deployment of integrated cluster analysis servers based on clustering and functional annotation algorithms implemented in R. Integrated functional analyses and interactive browsing of both, clustering structure and functional annotations provides a complete analytical environment for cluster analysis and interpretation of results. The Java Web Start client-based interface is modeled after the familiar cluster/treeview packages making its use intuitive to a wide array of biomedical researchers. For biomedical researchers, WebGimm provides an avenue to access state of the art clustering procedures. For Bioinformatics methods developers, WebGimm offers a convenient avenue to deploy their newly developed clustering methods. WebGimm server, software and manuals can be freely accessed at http://ClusterAnalysis.org/.

  19. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.

    Science.gov (United States)

    Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy

    2016-01-01

    Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.

  20. Detecting space-time disease clusters with arbitrary shapes and sizes using a co-clustering approach

    Directory of Open Access Journals (Sweden)

    Sami Ullah

    2017-11-01

    Full Text Available Ability to detect potential space-time clusters in spatio-temporal data on disease occurrences is necessary for conducting surveillance and implementing disease prevention policies. Most existing techniques use geometrically shaped (circular, elliptical or square scanning windows to discover disease clusters. In certain situations, where the disease occurrences tend to cluster in very irregularly shaped areas, these algorithms are not feasible in practise for the detection of space-time clusters. To address this problem, a new algorithm is proposed, which uses a co-clustering strategy to detect prospective and retrospective space-time disease clusters with no restriction on shape and size. The proposed method detects space-time disease clusters by tracking the changes in space–time occurrence structure instead of an in-depth search over space. This method was utilised to detect potential clusters in the annual and monthly malaria data in Khyber Pakhtunkhwa Province, Pakistan from 2012 to 2016 visualising the results on a heat map. The results of the annual data analysis showed that the most likely hotspot emerged in three sub-regions in the years 2013-2014. The most likely hotspots in monthly data appeared in the month of July to October in each year and showed a strong periodic trend.

  1. Effective and efficient analysis of spatio-temporal data

    Science.gov (United States)

    Zhang, Zhongnan

    Spatio-temporal data mining, i.e., mining knowledge from large amount of spatio-temporal data, is a highly demanding field because huge amounts of spatio-temporal data have been collected in various applications, ranging from remote sensing, to geographical information systems (GIS), computer cartography, environmental assessment and planning, etc. The collection data far exceeded human's ability to analyze which make it crucial to develop analysis tools. Recent studies on data mining have extended to the scope of data mining from relational and transactional datasets to spatial and temporal datasets. Among the various forms of spatio-temporal data, remote sensing images play an important role, due to the growing wide-spreading of outer space satellites. In this dissertation, we proposed two approaches to analyze the remote sensing data. The first one is about applying association rules mining onto images processing. Each image was divided into a number of image blocks. We built a spatial relationship for these blocks during the dividing process. This made a large number of images into a spatio-temporal dataset since each image was shot in time-series. The second one implemented co-occurrence patterns discovery from these images. The generated patterns represent subsets of spatial features that are located together in space and time. A weather analysis is composed of individual analysis of several meteorological variables. These variables include temperature, pressure, dew point, wind, clouds, visibility and so on. Local-scale models provide detailed analysis and forecasts of meteorological phenomena ranging from a few kilometers to about 100 kilometers in size. When some of above meteorological variables have some special change tendency, some kind of severe weather will happen in most cases. Using the discovery of association rules, we found that some special meteorological variables' changing has tight relation with some severe weather situation that will happen

  2. Advanced analysis of forest fire clustering

    Science.gov (United States)

    Kanevski, Mikhail; Pereira, Mario; Golay, Jean

    2017-04-01

    Analysis of point pattern clustering is an important topic in spatial statistics and for many applications: biodiversity, epidemiology, natural hazards, geomarketing, etc. There are several fundamental approaches used to quantify spatial data clustering using topological, statistical and fractal measures. In the present research, the recently introduced multi-point Morisita index (mMI) is applied to study the spatial clustering of forest fires in Portugal. The data set consists of more than 30000 fire events covering the time period from 1975 to 2013. The distribution of forest fires is very complex and highly variable in space. mMI is a multi-point extension of the classical two-point Morisita index. In essence, mMI is estimated by covering the region under study by a grid and by computing how many times more likely it is that m points selected at random will be from the same grid cell than it would be in the case of a complete random Poisson process. By changing the number of grid cells (size of the grid cells), mMI characterizes the scaling properties of spatial clustering. From mMI, the data intrinsic dimension (fractal dimension) of the point distribution can be estimated as well. In this study, the mMI of forest fires is compared with the mMI of random patterns (RPs) generated within the validity domain defined as the forest area of Portugal. It turns out that the forest fires are highly clustered inside the validity domain in comparison with the RPs. Moreover, they demonstrate different scaling properties at different spatial scales. The results obtained from the mMI analysis are also compared with those of fractal measures of clustering - box counting and sand box counting approaches. REFERENCES Golay J., Kanevski M., Vega Orozco C., Leuenberger M., 2014: The multipoint Morisita index for the analysis of spatial patterns. Physica A, 406, 191-202. Golay J., Kanevski M. 2015: A new estimator of intrinsic dimension based on the multipoint Morisita index

  3. Physicochemical properties of different corn varieties by principal components analysis and cluster analysis

    International Nuclear Information System (INIS)

    Zeng, J.; Li, G.; Sun, J.

    2013-01-01

    Principal components analysis and cluster analysis were used to investigate the properties of different corn varieties. The chemical compositions and some properties of corn flour which processed by drying milling were determined. The results showed that the chemical compositions and physicochemical properties were significantly different among twenty six corn varieties. The quality of corn flour was concerned with five principal components from principal component analysis and the contribution rate of starch pasting properties was important, which could account for 48.90%. Twenty six corn varieties could be classified into four groups by cluster analysis. The consistency between principal components analysis and cluster analysis indicated that multivariate analyses were feasible in the study of corn variety properties. (author)

  4. A model of photon cell killing based on the spatio-temporal clustering of DNA damage in higher order chromatin structures.

    Directory of Open Access Journals (Sweden)

    Lisa Herr

    Full Text Available We present a new approach to model dose rate effects on cell killing after photon radiation based on the spatio-temporal clustering of DNA double strand breaks (DSBs within higher order chromatin structures of approximately 1-2 Mbp size, so called giant loops. The main concept of this approach consists of a distinction of two classes of lesions, isolated and clustered DSBs, characterized by the number of double strand breaks induced in a giant loop. We assume a low lethality and fast component of repair for isolated DSBs and a high lethality and slow component of repair for clustered DSBs. With appropriate rates, the temporal transition between the different lesion classes is expressed in terms of five differential equations. These allow formulating the dynamics involved in the competition of damage induction and repair for arbitrary dose rates and fractionation schemes. Final cell survival probabilities are computable with a cell line specific set of three parameters: The lethality for isolated DSBs, the lethality for clustered DSBs and the half-life time of isolated DSBs. By comparison with larger sets of published experimental data it is demonstrated that the model describes the cell line dependent response to treatments using either continuous irradiation at a constant dose rate or to split dose irradiation well. Furthermore, an analytic investigation of the formulation concerning single fraction treatments with constant dose rates in the limiting cases of extremely high or low dose rates is presented. The approach is consistent with the Linear-Quadratic model extended by the Lea-Catcheside factor up to the second moment in dose. Finally, it is shown that the model correctly predicts empirical findings about the dose rate dependence of incidence probabilities for deterministic radiation effects like pneumonitis and the bone marrow syndrome. These findings further support the general concepts on which the approach is based.

  5. Taxonomical analysis of the Cancer cluster of galaxies

    International Nuclear Information System (INIS)

    Perea, J.; Olmo, A. del; Moles, M.

    1986-01-01

    A description is presented of the Cancer cluster of galaxies, based on a taxonomical analysis in (α,delta, Vsub(r)) space. Earlier results by previous authors on the lack of dynamical entity of the cluster are confirmed. The present analysis points out the existence of a binary structure in the most populated region of the complex. (author)

  6. Spatio-temporal cluster analysis of the incidence of Campylobacter cases and patients with general diarrhea in a Danish county, 1995–2004

    Directory of Open Access Journals (Sweden)

    Simonsen Jacob

    2009-02-01

    Full Text Available Abstract Campylobacter infections are the main cause of bacterial gastroenteritis in Denmark. While primarily foodborne, Campylobacter infections are also to some degree acquired through other sources which may include contact with animals or the environment, locally contaminated drinking water and more. We analyzed Campylobacter cases for clustering in space and time for the large Danish island of Funen in the period 1995–2003, under the assumption that infections caused by 'environmental' factors may show persistent clustering while foodborne infections will occur randomly in space. Input data were geo-coded datasets of the addresses of laboratory-confirmed Campylobacter cases and of the background population of Funen County. The dataset had a spatial extent of 4.900 km2. Data were aggregated into units of analysis (so-called features of 5 km by 5 km times 1 year, and the Campylobacter incidence calculated. We used a modified form of local Moran's I to test if features with similar incidence rates occurred next to each other in space and time, and compared the observed clusters with simulated clusters. Because clusters may be caused by a high tendency among local GPs to submit stool samples, we also analyzed a dataset of all submitted stool samples for comparison. The results showed a significant persisting clustering of Campylobacter incidence rates in the Western part of Funen. Results were visualized using the Netlogo software. The underlying causes of the observed clustering are not known and will require further examination, but may be partially explained by an increased rate of stool samples submissions by physicians in the area. We hope, by this approach, to have developed a tool which will allow for analyses of geographical clusters which may in turn form a basis for further epidemiological examinations to cast light on the sources of infection.

  7. A Resting-State Brain Functional Network Study in MDD Based on Minimum Spanning Tree Analysis and the Hierarchical Clustering

    Directory of Open Access Journals (Sweden)

    Xiaowei Li

    2017-01-01

    Full Text Available A large number of studies demonstrated that major depressive disorder (MDD is characterized by the alterations in brain functional connections which is also identifiable during the brain’s “resting-state.” But, in the present study, the approach of constructing functional connectivity is often biased by the choice of the threshold. Besides, more attention was paid to the number and length of links in brain networks, and the clustering partitioning of nodes was unclear. Therefore, minimum spanning tree (MST analysis and the hierarchical clustering were first used for the depression disease in this study. Resting-state electroencephalogram (EEG sources were assessed from 15 healthy and 23 major depressive subjects. Then the coherence, MST, and the hierarchical clustering were obtained. In the theta band, coherence analysis showed that the EEG coherence of the MDD patients was significantly higher than that of the healthy controls especially in the left temporal region. The MST results indicated the higher leaf fraction in the depressed group. Compared with the normal group, the major depressive patients lost clustering in frontal regions. Our findings suggested that there was a stronger brain interaction in the MDD group and a left-right functional imbalance in the frontal regions for MDD controls.

  8. Assessment of surface water quality using hierarchical cluster analysis

    Directory of Open Access Journals (Sweden)

    Dheeraj Kumar Dabgerwal

    2016-02-01

    Full Text Available This study was carried out to assess the physicochemical quality river Varuna inVaranasi,India. Water samples were collected from 10 sites during January-June 2015. Pearson correlation analysis was used to assess the direction and strength of relationship between physicochemical parameters. Hierarchical Cluster analysis was also performed to determine the sources of pollution in the river Varuna. The result showed quite high value of DO, Nitrate, BOD, COD and Total Alkalinity, above the BIS permissible limit. The results of correlation analysis identified key water parameters as pH, electrical conductivity, total alkalinity and nitrate, which influence the concentration of other water parameters. Cluster analysis identified three major clusters of sampling sites out of total 10 sites, according to the similarity in water quality. This study illustrated the usefulness of correlation and cluster analysis for getting better information about the river water quality.International Journal of Environment Vol. 5 (1 2016,  pp: 32-44

  9. Spatial-Temporal Event Detection from Geo-Tagged Tweets

    Directory of Open Access Journals (Sweden)

    Yuqian Huang

    2018-04-01

    Full Text Available As one of the most popular social networking services in the world, Twitter allows users to post messages along with their current geographic locations. Such georeferenced or geo-tagged Twitter datasets can benefit location-based services, targeted advertising and geosocial studies. Our study focused on the detection of small-scale spatial-temporal events and their textual content. First, we used Spatial-Temporal Density-Based Spatial Clustering of Applications with Noise (ST-DBSCAN to spatially-temporally cluster the tweets. Then, the word frequencies were summarized for each cluster and the potential topics were modeled by the Latent Dirichlet Allocation (LDA algorithm. Using two years of Twitter data from four college cities in the U.S., we were able to determine the spatial-temporal patterns of two known events, two unknown events and one recurring event, which then were further explored and modeled to identify the semantic content about the events. This paper presents our process and recommendations for both finding event-related tweets as well as understanding the spatial-temporal behaviors and semantic natures of the detected events.

  10. Comparison of Three Plot Selection Methods for Estimating Change in Temporally Variable, Spatially Clustered Populations.

    Energy Technology Data Exchange (ETDEWEB)

    Thompson, William L. [Bonneville Power Administration, Portland, OR (US). Environment, Fish and Wildlife

    2001-07-01

    Monitoring population numbers is important for assessing trends and meeting various legislative mandates. However, sampling across time introduces a temporal aspect to survey design in addition to the spatial one. For instance, a sample that is initially representative may lose this attribute if there is a shift in numbers and/or spatial distribution in the underlying population that is not reflected in later sampled plots. Plot selection methods that account for this temporal variability will produce the best trend estimates. Consequently, I used simulation to compare bias and relative precision of estimates of population change among stratified and unstratified sampling designs based on permanent, temporary, and partial replacement plots under varying levels of spatial clustering, density, and temporal shifting of populations. Permanent plots produced more precise estimates of change than temporary plots across all factors. Further, permanent plots performed better than partial replacement plots except for high density (5 and 10 individuals per plot) and 25% - 50% shifts in the population. Stratified designs always produced less precise estimates of population change for all three plot selection methods, and often produced biased change estimates and greatly inflated variance estimates under sampling with partial replacement. Hence, stratification that remains fixed across time should be avoided when monitoring populations that are likely to exhibit large changes in numbers and/or spatial distribution during the study period. Key words: bias; change estimation; monitoring; permanent plots; relative precision; sampling with partial replacement; temporary plots.

  11. Analysis of Aspects of Innovation in a Brazilian Cluster

    Directory of Open Access Journals (Sweden)

    Adriana Valélia Saraceni

    2012-09-01

    Full Text Available Innovation through clustering has become very important on the increased significance that interaction represents on innovation and learning process concept. This study aims to identify whereas a case analysis on innovation process in a cluster represents on the learning process. Therefore, this study is developed in two stages. First, we used a preliminary case study verifying a cluster innovation analysis and it Innovation Index, for further, exploring a combined body of theory and practice. Further, the second stage is developed by exploring the learning process concept. Both stages allowed us building a theory model for the learning process development in clusters. The main results of the model development come up with a mechanism of improvement implementation on clusters when case studies are applied.

  12. Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-Means Cluster Analysis

    Science.gov (United States)

    de Craen, Saskia; Commandeur, Jacques J. F.; Frank, Laurence E.; Heiser, Willem J.

    2006-01-01

    K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these…

  13. Identification and characterization of earthquake clusters: a comparative analysis for selected sequences in Italy

    Science.gov (United States)

    Peresan, Antonella; Gentili, Stefania

    2017-04-01

    Identification and statistical characterization of seismic clusters may provide useful insights about the features of seismic energy release and their relation to physical properties of the crust within a given region. Moreover, a number of studies based on spatio-temporal analysis of main-shocks occurrence require preliminary declustering of the earthquake catalogs. Since various methods, relying on different physical/statistical assumptions, may lead to diverse classifications of earthquakes into main events and related events, we aim to investigate the classification differences among different declustering techniques. Accordingly, a formal selection and comparative analysis of earthquake clusters is carried out for the most relevant earthquakes in North-Eastern Italy, as reported in the local OGS-CRS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics since 1977. The comparison is then extended to selected earthquake sequences associated with a different seismotectonic setting, namely to events that occurred in the region struck by the recent Central Italy destructive earthquakes, making use of INGV data. Various techniques, ranging from classical space-time windows methods to ad hoc manual identification of aftershocks, are applied for detection of earthquake clusters. In particular, a statistical method based on nearest-neighbor distances of events in space-time-energy domain, is considered. Results from clusters identification by the nearest-neighbor method turn out quite robust with respect to the time span of the input catalogue, as well as to minimum magnitude cutoff. The identified clusters for the largest events reported in North-Eastern Italy since 1977 are well consistent with those reported in earlier studies, which were aimed at detailed manual aftershocks identification. The study shows that the data-driven approach, based on the nearest-neighbor distances, can be satisfactorily applied to decompose the seismic

  14. Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.

    Science.gov (United States)

    Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun

    2017-01-01

    Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.

  15. Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.

    Science.gov (United States)

    Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D

    2017-06-01

    Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.

  16. Termination of seizure clusters is related to the duration of focal seizures.

    Science.gov (United States)

    Ferastraoaru, Victor; Schulze-Bonhage, Andreas; Lipton, Richard B; Dümpelmann, Matthias; Legatt, Alan D; Blumberg, Julie; Haut, Sheryl R

    2016-06-01

    Clustered seizures are characterized by shorter than usual interseizure intervals and pose increased morbidity risk. This study examines the characteristics of seizures that cluster, with special attention to the final seizure in a cluster. This is a retrospective analysis of long-term inpatient monitoring data from the EPILEPSIAE project. Patients underwent presurgical evaluation from 2002 to 2009. Seizure clusters were defined by the occurrence of at least two consecutive seizures with interseizure intervals of <4 h. Other definitions of seizure clustering were examined in a sensitivity analysis. Seizures were classified into three contextually defined groups: isolated seizures (not meeting clustering criteria), terminal seizure (last seizure in a cluster), and intracluster seizures (any other seizures within a cluster). Seizure characteristics were compared among the three groups in terms of duration, type (focal seizures remaining restricted to one hemisphere vs. evolving bilaterally), seizure origin, and localization concordance among pairs of consecutive seizures. Among 92 subjects, 77 (83%) had at least one seizure cluster. The intracluster seizures were significantly shorter than the last seizure in a cluster (p = 0.011), whereas the last seizure in a cluster resembled the isolated seizures in terms of duration. Although focal only (unilateral), seizures were shorter than seizures that evolved bilaterally and there was no correlation between the seizure type and the seizure position in relation to a cluster (p = 0.762). Frontal and temporal lobe seizures were more likely to cluster compared with other localizations (p = 0.009). Seizure pairs that are part of a cluster were more likely to have a concordant origin than were isolated seizures. Results were similar for the 2 h definition of clustering, but not for the 8 h definition of clustering. We demonstrated that intracluster seizures are short relative to isolated seizures and terminal seizures. Frontal

  17. Evaluation of secular trend and the existence of cases of clusters of bladder cancer in Goiania: descriptive study population-based; Avaliacao da tendencia temporal e da existencia de casos de clusters de cancer de bexiga em Goiania: estudo descritivo de base populacional

    Energy Technology Data Exchange (ETDEWEB)

    Antonio, Gisele Guimaraes Daflon

    2008-07-01

    More than 20 years after the radiological accident with cesium-137 in the city of Goiania, there is still a feeling in local population that the number of cases of cancer in the city is growing up due to the past radiation exposure and that the number of people contaminated or exposed was higher than the number reported. The present study aims to evaluate the temporal trend and the space-time distribution of bladder cancer cases in Goiania from 1988 and 2003, taking into account that bladder cancer presents the highest risk coefficients per unit of radiation dose among solid cancers. The study population was composed of all incident cases of bladder cancer registered in the Population-Based Cancer Registry of Goiania, between 1988 and 2003.Temporal trend of bladder cancer incidence was analyzed by sex and age groups ( < 60 and {>=} 60 years of age) through polynomial regression using age standardized incidence rates of bladder cancer (world population). SaTscan was used to determine whether statistical significant geographic clusters of high incidence of bladder cancer cases can be located in the city. The results showed a significant increase of bladder cancer incidence rates in males of all ages (p= 0.025) and for age group higher or equal to 60 years old (p=O.022), and a stability in trends for female sex. In the space-time analysis, a cluster was identified, however without statistical significance (p=0.278) and its location has no relationship with the main focuses of contamination of the radiological accident in 1987. We concluded that, despite of the increase of incidence rates in males, this can be explained by the improvement in diagnostic procedures throughout time, being this increase still not perceived in females considering the small number of cases. As chance can not be ruled out as the explanation of the identified cluster, we do not suggest any further detailed investigation in this cluster, as the occurrence of cluster diseases in space can occur

  18. Evaluating spatial and temporal relationships between an earthquake cluster near Entiat, central Washington, and the large December 1872 Entiat earthquake

    Science.gov (United States)

    Brocher, Thomas M.; Blakely, Richard J.; Sherrod, Brian

    2017-01-01

    We investigate spatial and temporal relations between an ongoing and prolific seismicity cluster in central Washington, near Entiat, and the 14 December 1872 Entiat earthquake, the largest historic crustal earthquake in Washington. A fault scarp produced by the 1872 earthquake lies within the Entiat cluster; the locations and areas of both the cluster and the estimated 1872 rupture surface are comparable. Seismic intensities and the 1–2 m of coseismic displacement suggest a magnitude range between 6.5 and 7.0 for the 1872 earthquake. Aftershock forecast models for (1) the first several hours following the 1872 earthquake, (2) the largest felt earthquakes from 1900 to 1974, and (3) the seismicity within the Entiat cluster from 1976 through 2016 are also consistent with this magnitude range. Based on this aftershock modeling, most of the current seismicity in the Entiat cluster could represent aftershocks of the 1872 earthquake. Other earthquakes, especially those with long recurrence intervals, have long‐lived aftershock sequences, including the Mw">MwMw 7.5 1891 Nobi earthquake in Japan, with aftershocks continuing 100 yrs after the mainshock. Although we do not rule out ongoing tectonic deformation in this region, a long‐lived aftershock sequence can account for these observations.

  19. THE RELEVANT TEMPORAL MARKET DEFINITION IN ANTITRUST ANALYSIS

    Directory of Open Access Journals (Sweden)

    Anzhelika Gerasymenko

    2018-01-01

    Full Text Available The purpose of the paper is to compare various theoretical approaches to the relevant temporal market definition, collecting the arguments for their implementation under the different kinds of antitrust cases. It is vital for the markets with peak demand (transport, electricity, markets of intergenerational products or discreet supply (agriculture. Methodology. The survey is based on the theoretical and graphical modelling of product space perception by consumers. It investigates changes of the latter under different marketing strategies of a seller. Statistical methods are used to analyse trends of demand and prices for iPhones’ changes, as well as dynamics of electricity consumption. Results. The paper reveals two facing approaches to the definition of relevant temporal market: 1 the discrete one that provides a short-run analysis of products’ substitutability and combines only those time periods that are characterized by a stable balance of demand and supply, as well as stable market equilibrium; 2 the coherent one that provides a long-run analysis of cyclical variation of the market. This cycling is based on the awareness of consumers and producers of intertemporal substitutability of products. The authors use the model of intertemporal competition to explain principles of these approaches and apply it to the iPhone market analysis. They conclude that the coherent approach must be applied to the temporal market definition for the products with elastic demand. Inelastic demand brings the necessity to apply the discrete approach to the temporal market definition. These conclusions cannot be applied to regulated markets. The system of government regulation is the main determinant of the temporal boundaries of such markets. Practical implications. The results of this research can be used by competition agencies in antitrust cases to define the relevant temporal market, where the violations of antitrust legislation can occur. The correct

  20. Incremental temporal pattern mining using efficient batch-free stream clustering

    NARCIS (Netherlands)

    Lu, Y.; Hassani, M.; Seidl, T.

    2017-01-01

    This paper address the problem of temporal pattern mining from multiple data streams containing temporal events. Temporal events are considered as real world events aligned with comprehensive starting and ending timing information rather than simple integer timestamps. Predefined relations, such as

  1. The Home Care Crew Scheduling Problem: Preference-based visit clustering and temporal dependencies

    DEFF Research Database (Denmark)

    Rasmussen, Matias Sevel; Justesen, Tor Fog; Dohn, Anders Høeg

    2012-01-01

    In the Home Care Crew Scheduling Problem a staff of home carers has to be assigned a number of visits to patients’ homes, such that the overall service level is maximised. The problem is a generalisation of the vehicle routing problem with time windows. Required travel time between visits and time...... preference constraints. The algorithm is tested both on real-life problem instances and on generated test instances inspired by realistic settings. The use of the specialised branching scheme on real-life problems is novel. The visit clustering decreases run times significantly, and only gives a loss...... windows of the visits must be respected. The challenge when assigning visits to home carers lies in the existence of soft preference constraints and in temporal dependencies between the start times of visits.We model the problem as a set partitioning problem with side constraints and develop an exact...

  2. Cluster analysis for determining distribution center location

    Science.gov (United States)

    Lestari Widaningrum, Dyah; Andika, Aditya; Murphiyanto, Richard Dimas Julian

    2017-12-01

    Determination of distribution facilities is highly important to survive in the high level of competition in today’s business world. Companies can operate multiple distribution centers to mitigate supply chain risk. Thus, new problems arise, namely how many and where the facilities should be provided. This study examines a fast-food restaurant brand, which located in the Greater Jakarta. This brand is included in the category of top 5 fast food restaurant chain based on retail sales. There were three stages in this study, compiling spatial data, cluster analysis, and network analysis. Cluster analysis results are used to consider the location of the additional distribution center. Network analysis results show a more efficient process referring to a shorter distance to the distribution process.

  3. Clustering Analysis for Credit Default Probabilities in a Retail Bank Portfolio

    Directory of Open Access Journals (Sweden)

    Elena ANDREI (DRAGOMIR

    2012-08-01

    Full Text Available Methods underlying cluster analysis are very useful in data analysis, especially when the processed volume of data is very large, so that it becomes impossible to extract essential information, unless specific instruments are used to summarize and structure the gross information. In this context, cluster analysis techniques are used particularly, for systematic information analysis. The aim of this article is to build an useful model for banking field, based on data mining techniques, by dividing the groups of borrowers into clusters, in order to obtain a profile of the customers (debtors and good payers. We assume that a class is appropriate if it contains members that have a high degree of similarity and the standard method for measuring the similarity within a group shows the lowest variance. After clustering, data mining techniques are implemented on the cluster with bad debtors, reaching a very high accuracy after implementation. The paper is structured as follows: Section 2 describes the model for data analysis based on a specific scoring model that we proposed. In section 3, we present a cluster analysis using K-means algorithm and the DM models are applied on a specific cluster. Section 4 shows the conclusions.

  4. Cluster Analysis as an Analytical Tool of Population Policy

    Directory of Open Access Journals (Sweden)

    Oksana Mikhaylovna Shubat

    2017-12-01

    Full Text Available The predicted negative trends in Russian demography (falling birth rates, population decline actualize the need to strengthen measures of family and population policy. Our research purpose is to identify groups of Russian regions with similar characteristics in the family sphere using cluster analysis. The findings should make an important contribution to the field of family policy. We used hierarchical cluster analysis based on the Ward method and the Euclidean distance for segmentation of Russian regions. Clustering is based on four variables, which allowed assessing the family institution in the region. The authors used the data of Federal State Statistics Service from 2010 to 2015. Clustering and profiling of each segment has allowed forming a model of Russian regions depending on the features of the family institution in these regions. The authors revealed four clusters grouping regions with similar problems in the family sphere. This segmentation makes it possible to develop the most relevant family policy measures in each group of regions. Thus, the analysis has shown a high degree of differentiation of the family institution in the regions. This suggests that a unified approach to population problems’ solving is far from being effective. To achieve greater results in the implementation of family policy, a differentiated approach is needed. Methods of multidimensional data classification can be successfully applied as a relevant analytical toolkit. Further research could develop the adaptation of multidimensional classification methods to the analysis of the population problems in Russian regions. In particular, the algorithms of nonparametric cluster analysis may be of relevance in future studies.

  5. Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis.

    Science.gov (United States)

    Kim, Mi Ae; Shin, Seung Woo; Park, Jong Sook; Uh, Soo Taek; Chang, Hun Soo; Bae, Da Jeong; Cho, You Sook; Park, Hae Sim; Yoon, Ho Joo; Choi, Byoung Whui; Kim, Yong Hoon; Park, Choon Sik

    2017-11-01

    Asthma is a heterogeneous disease characterized by various types of airway inflammation and obstruction. Therefore, it is classified into several subphenotypes, such as early-onset atopic, obese non-eosinophilic, benign, and eosinophilic asthma, using cluster analysis. A number of asthmatics frequently experience exacerbation over a long-term follow-up period, but the exacerbation-prone subphenotype has rarely been evaluated by cluster analysis. This prompted us to identify clusters reflecting asthma exacerbation. A uniform cluster analysis method was applied to 259 adult asthmatics who were regularly followed-up for over 1 year using 12 variables, selected on the basis of their contribution to asthma phenotypes. After clustering, clinical profiles and exacerbation rates during follow-up were compared among the clusters. Four subphenotypes were identified: cluster 1 was comprised of patients with early-onset atopic asthma with preserved lung function, cluster 2 late-onset non-atopic asthma with impaired lung function, cluster 3 early-onset atopic asthma with severely impaired lung function, and cluster 4 late-onset non-atopic asthma with well-preserved lung function. The patients in clusters 2 and 3 were identified as exacerbation-prone asthmatics, showing a higher risk of asthma exacerbation. Two different phenotypes of exacerbation-prone asthma were identified among Korean asthmatics using cluster analysis; both were characterized by impaired lung function, but the age at asthma onset and atopic status were different between the two. Copyright © 2017 The Korean Academy of Asthma, Allergy and Clinical Immunology · The Korean Academy of Pediatric Allergy and Respiratory Disease

  6. Temporal diagnostic analysis of the SWAT model to detect dominant periods of poor model performance

    Science.gov (United States)

    Guse, Björn; Reusser, Dominik E.; Fohrer, Nicola

    2013-04-01

    Hydrological models generally include thresholds and non-linearities, such as snow-rain-temperature thresholds, non-linear reservoirs, infiltration thresholds and the like. When relating observed variables to modelling results, formal methods often calculate performance metrics over long periods, reporting model performance with only few numbers. Such approaches are not well suited to compare dominating processes between reality and model and to better understand when thresholds and non-linearities are driving model results. We present a combination of two temporally resolved model diagnostic tools to answer when a model is performing (not so) well and what the dominant processes are during these periods. We look at the temporal dynamics of parameter sensitivities and model performance to answer this question. For this, the eco-hydrological SWAT model is applied in the Treene lowland catchment in Northern Germany. As a first step, temporal dynamics of parameter sensitivities are analyzed using the Fourier Amplitude Sensitivity test (FAST). The sensitivities of the eight model parameters investigated show strong temporal variations. High sensitivities were detected for two groundwater (GW_DELAY, ALPHA_BF) and one evaporation parameters (ESCO) most of the time. The periods of high parameter sensitivity can be related to different phases of the hydrograph with dominances of the groundwater parameters in the recession phases and of ESCO in baseflow and resaturation periods. Surface runoff parameters show high parameter sensitivities in phases of a precipitation event in combination with high soil water contents. The dominant parameters give indication for the controlling processes during a given period for the hydrological catchment. The second step included the temporal analysis of model performance. For each time step, model performance was characterized with a "finger print" consisting of a large set of performance measures. These finger prints were clustered into

  7. Spatio-Temporal Characteristics of Resident Trip Based on Poi and OD Data of Float CAR in Beijing

    Science.gov (United States)

    Mou, N.; Li, J.; Zhang, L.; Liu, W.; Xu, Y.

    2017-09-01

    Due to the influence of the urban inherent regional functional distribution, the daily activities of the residents presented some spatio-temporal patterns (periodic patterns, gathering patterns, etc.). In order to further understand the spatial and temporal characteristics of urban residents, this paper research takes the taxi trajectory data of Beijing as a sample data and studies the spatio-temporal characteristics of the residents' activities on the weekdays. At first, according to the characteristics of the taxi trajectory data distributed along the road network, it takes the Voronoi generated by the road nodes as the research unit. This paper proposes a hybrid clustering method - based on grid density, which is used to cluster the OD (origin and destination) data of taxi at different times. Then combining with the POI data of Beijing, this research calculated the density of the POI data in the clustering results, and analyzed the relationship between the activities of residents in different periods and the functional types of the region. The final results showed that the residents were mainly commuting on weekdays. And it found that the distribution of travel density showed a concentric circle of the characteristics, focusing on residential areas and work areas. The results of cluster analysis and POI analysis showed that the residents' travel had experienced the process of "spatial relative dispersion - spatial aggregation - spatial relative dispersion" in one day.

  8. Automated analysis of organic particles using cluster SIMS

    Energy Technology Data Exchange (ETDEWEB)

    Gillen, Greg; Zeissler, Cindy; Mahoney, Christine; Lindstrom, Abigail; Fletcher, Robert; Chi, Peter; Verkouteren, Jennifer; Bright, David; Lareau, Richard T.; Boldman, Mike

    2004-06-15

    Cluster primary ion bombardment combined with secondary ion imaging is used on an ion microscope secondary ion mass spectrometer for the spatially resolved analysis of organic particles on various surfaces. Compared to the use of monoatomic primary ion beam bombardment, the use of a cluster primary ion beam (SF{sub 5}{sup +} or C{sub 8}{sup -}) provides significant improvement in molecular ion yields and a reduction in beam-induced degradation of the analyte molecules. These characteristics of cluster bombardment, along with automated sample stage control and custom image analysis software are utilized to rapidly characterize the spatial distribution of trace explosive particles, narcotics and inkjet-printed microarrays on a variety of surfaces.

  9. Segmentation and clustering as complementary sources of information

    Science.gov (United States)

    Dale, Michael B.; Allison, Lloyd; Dale, Patricia E. R.

    2007-03-01

    This paper examines the effects of using a segmentation method to identify change-points or edges in vegetation. It identifies coherence (spatial or temporal) in place of unconstrained clustering. The segmentation method involves change-point detection along a sequence of observations so that each cluster formed is composed of adjacent samples; this is a form of constrained clustering. The protocol identifies one or more models, one for each section identified, and the quality of each is assessed using a minimum message length criterion, which provides a rational basis for selecting an appropriate model. Although the segmentation is less efficient than clustering, it does provide other information because it incorporates textural similarity as well as homogeneity. In addition it can be useful in determining various scales of variation that may apply to the data, providing a general method of small-scale pattern analysis.

  10. Second-order analysis of structured inhomogeneous spatio-temporal point processes

    DEFF Research Database (Denmark)

    Møller, Jesper; Ghorbani, Mohammad

    Statistical methodology for spatio-temporal point processes is in its infancy. We consider second-order analysis based on pair correlation functions and K-functions for first general inhomogeneous spatio-temporal point processes and second inhomogeneous spatio-temporal Cox processes. Assuming...... spatio-temporal separability of the intensity function, we clarify different meanings of second-order spatio-temporal separability. One is second-order spatio-temporal independence and relates e.g. to log-Gaussian Cox processes with an additive covariance structure of the underlying spatio......-temporal Gaussian process. Another concerns shot-noise Cox processes with a separable spatio-temporal covariance density. We propose diagnostic procedures for checking hypotheses of second-order spatio-temporal separability, which we apply on simulated and real data (the UK 2001 epidemic foot and mouth disease data)....

  11. Network Analysis Tools: from biological networks to clusters and pathways.

    Science.gov (United States)

    Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Vanderstocken, Gilles; van Helden, Jacques

    2008-01-01

    Network Analysis Tools (NeAT) is a suite of computer tools that integrate various algorithms for the analysis of biological networks: comparison between graphs, between clusters, or between graphs and clusters; network randomization; analysis of degree distribution; network-based clustering and path finding. The tools are interconnected to enable a stepwise analysis of the network through a complete analytical workflow. In this protocol, we present a typical case of utilization, where the tasks above are combined to decipher a protein-protein interaction network retrieved from the STRING database. The results returned by NeAT are typically subnetworks, networks enriched with additional information (i.e., clusters or paths) or tables displaying statistics. Typical networks comprising several thousands of nodes and arcs can be analyzed within a few minutes. The complete protocol can be read and executed in approximately 1 h.

  12. Performance analysis of clustering techniques over microarray data: A case study

    Science.gov (United States)

    Dash, Rasmita; Misra, Bijan Bihari

    2018-03-01

    Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.

  13. Cluster analysis of typhoid cases in Kota Bharu, Kelantan, Malaysia

    Directory of Open Access Journals (Sweden)

    Nazarudin Safian

    2008-09-01

    Full Text Available Typhoid fever is still a major public health problem globally as well as in Malaysia. This study was done to identify the spatial epidemiology of typhoid fever in the Kota Bharu District of Malaysia as a first step to developing more advanced analysis of the whole country. The main characteristic of the epidemiological pattern that interested us was whether typhoid cases occurred in clusters or whether they were evenly distributed throughout the area. We also wanted to know at what spatial distances they were clustered. All confirmed typhoid cases that were reported to the Kota Bharu District Health Department from the year 2001 to June of 2005 were taken as the samples. From the home address of the cases, the location of the house was traced and a coordinate was taken using handheld GPS devices. Spatial statistical analysis was done to determine the distribution of typhoid cases, whether clustered, random or dispersed. The spatial statistical analysis was done using CrimeStat III software to determine whether typhoid cases occur in clusters, and later on to determine at what distances it clustered. From 736 cases involved in the study there was significant clustering for cases occurring in the years 2001, 2002, 2003 and 2005. There was no significant clustering in year 2004. Typhoid clustering also occurred strongly for distances up to 6 km. This study shows that typhoid cases occur in clusters, and this method could be applicable to describe spatial epidemiology for a specific area. (Med J Indones 2008; 17: 175-82Keywords: typhoid, clustering, spatial epidemiology, GIS

  14. Cluster analysis of Southeastern U.S. climate stations

    Science.gov (United States)

    Stooksbury, D. E.; Michaels, P. J.

    1991-09-01

    A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.

  15. Cluster analysis by optimal decomposition of induced fuzzy sets

    Energy Technology Data Exchange (ETDEWEB)

    Backer, E

    1978-01-01

    Nonsupervised pattern recognition is addressed and the concept of fuzzy sets is explored in order to provide the investigator (data analyst) additional information supplied by the pattern class membership values apart from the classical pattern class assignments. The basic ideas behind the pattern recognition problem, the clustering problem, and the concept of fuzzy sets in cluster analysis are discussed, and a brief review of the literature of the fuzzy cluster analysis is given. Some mathematical aspects of fuzzy set theory are briefly discussed; in particular, a measure of fuzziness is suggested. The optimization-clustering problem is characterized. Then the fundamental idea behind affinity decomposition is considered. Next, further analysis takes place with respect to the partitioning-characterization functions. The iterative optimization procedure is then addressed. The reclassification function is investigated and convergence properties are examined. Finally, several experiments in support of the method suggested are described. Four object data sets serve as appropriate test cases. 120 references, 70 figures, 11 tables. (RWR)

  16. Graph analysis of cell clusters forming vascular networks

    Science.gov (United States)

    Alves, A. P.; Mesquita, O. N.; Gómez-Gardeñes, J.; Agero, U.

    2018-03-01

    This manuscript describes the experimental observation of vasculogenesis in chick embryos by means of network analysis. The formation of the vascular network was observed in the area opaca of embryos from 40 to 55 h of development. In the area opaca endothelial cell clusters self-organize as a primitive and approximately regular network of capillaries. The process was observed by bright-field microscopy in control embryos and in embryos treated with Bevacizumab (Avastin), an antibody that inhibits the signalling of the vascular endothelial growth factor (VEGF). The sequence of images of the vascular growth were thresholded, and used to quantify the forming network in control and Avastin-treated embryos. This characterization is made by measuring vessels density, number of cell clusters and the largest cluster density. From the original images, the topology of the vascular network was extracted and characterized by means of the usual network metrics such as: the degree distribution, average clustering coefficient, average short path length and assortativity, among others. This analysis allows to monitor how the largest connected cluster of the vascular network evolves in time and provides with quantitative evidence of the disruptive effects that Avastin has on the tree structure of vascular networks.

  17. application of single-linkage clustering method in the analysis of ...

    African Journals Online (AJOL)

    Admin

    ANALYSIS OF GROWTH RATE OF GROSS DOMESTIC PRODUCT. (GDP) AT ... The end result of the algorithm is a tree of clusters called a dendrogram, which shows how the clusters are ..... Number of cluster sum from from observations of ...

  18. A Flocking Based algorithm for Document Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.

  19. Spatio-temporal analysis of Modified Omori law in Bayesian framework

    Science.gov (United States)

    Rezanezhad, V.; Narteau, C.; Shebalin, P.; Zoeller, G.; Holschneider, M.

    2017-12-01

    This work presents a study of the spatio temporal evolution of the modified Omori parameters in southern California in then time period of 1981-2016. A nearest-neighbor approach is applied for earthquake clustering. This study targets small mainshocks and corresponding big aftershocks ( 2.5 ≤ mmainshocks ≤ 4.5 and 1.8 ≤ maftershocks ≤ 2.8 ). We invert for the spatio temporal behavior of c and p values (especially c) all over the area using a MCMC based maximum likelihood estimator. As parameterizing families we use Voronoi cells with randomly distributed cell centers. Considering that c value represents a physical character like stress change we expect to see a coherent c value pattern over seismologically coacting areas. This correlation of c valus can actually be seen for the San Andreas, San Jacinto and Elsinore faults. Moreover, the depth dependency of c value is studied which shows a linear behavior of log(c) with respect to aftershock's depth within 5 to 15 km depth.

  20. Topic modeling for cluster analysis of large biological and medical datasets.

    Science.gov (United States)

    Zhao, Weizhong; Zou, Wen; Chen, James J

    2014-01-01

    The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting

  1. Acoustic Cluster Therapy: In Vitro and Ex Vivo Measurement of Activated Bubble Size Distribution and Temporal Dynamics.

    Science.gov (United States)

    Healey, Andrew John; Sontum, Per Christian; Kvåle, Svein; Eriksen, Morten; Bendiksen, Ragnar; Tornes, Audun; Østensen, Jonny

    2016-05-01

    Acoustic cluster technology (ACT) is a two-component, microparticle formulation platform being developed for ultrasound-mediated drug delivery. Sonazoid microbubbles, which have a negative surface charge, are mixed with micron-sized perfluoromethylcyclopentane droplets stabilized with a positively charged surface membrane to form microbubble/microdroplet clusters. On exposure to ultrasound, the oil undergoes a phase change to the gaseous state, generating 20- to 40-μm ACT bubbles. An acoustic transmission technique is used to measure absorption and velocity dispersion of the ACT bubbles. An inversion technique computes bubble size population with temporal resolution of seconds. Bubble populations are measured both in vitro and in vivo after activation within the cardiac chambers of a dog model, with catheter-based flow through an extracorporeal measurement flow chamber. Volume-weighted mean diameter in arterial blood after activation in the left ventricle was 22 μm, with no bubbles >44 μm in diameter. After intravenous administration, 24.4% of the oil is activated in the cardiac chambers. Copyright © 2016 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.

  2. CLUSTER ANALYSIS UKRAINIAN REGIONAL DISTRIBUTION BY LEVEL OF INNOVATION

    Directory of Open Access Journals (Sweden)

    Roman Shchur

    2016-07-01

    Full Text Available   SWOT-analysis of the threats and benefits of innovation development strategy of Ivano-Frankivsk region in the context of financial support was сonducted. Methodical approach to determine of public-private partnerships potential that is tool of innovative economic development financing was identified. Cluster analysis of possibilities of forming public-private partnership in a particular region was carried out. Optimal set of problem areas that require urgent solutions and financial security is defined on the basis of cluster approach. It will help to form practical recommendations for the formation of an effective financial mechanism in the regions of Ukraine. Key words: the mechanism of innovation development financial provision, innovation development, public-private partnerships, cluster analysis, innovative development strategy.

  3. Multiscale visual quality assessment for cluster analysis with self-organizing maps

    Science.gov (United States)

    Bernard, Jürgen; von Landesberger, Tatiana; Bremm, Sebastian; Schreck, Tobias

    2011-01-01

    Cluster analysis is an important data mining technique for analyzing large amounts of data, reducing many objects to a limited number of clusters. Cluster visualization techniques aim at supporting the user in better understanding the characteristics and relationships among the found clusters. While promising approaches to visual cluster analysis already exist, these usually fall short of incorporating the quality of the obtained clustering results. However, due to the nature of the clustering process, quality plays an important aspect, as for most practical data sets, typically many different clusterings are possible. Being aware of clustering quality is important to judge the expressiveness of a given cluster visualization, or to adjust the clustering process with refined parameters, among others. In this work, we present an encompassing suite of visual tools for quality assessment of an important visual cluster algorithm, namely, the Self-Organizing Map (SOM) technique. We define, measure, and visualize the notion of SOM cluster quality along a hierarchy of cluster abstractions. The quality abstractions range from simple scalar-valued quality scores up to the structural comparison of a given SOM clustering with output of additional supportive clustering methods. The suite of methods allows the user to assess the SOM quality on the appropriate abstraction level, and arrive at improved clustering results. We implement our tools in an integrated system, apply it on experimental data sets, and show its applicability.

  4. On the potential for the Partial Triadic Analysis to grasp the spatio-temporal variability of groundwater hydrochemistry

    Science.gov (United States)

    Gourdol, L.; Hissler, C.; Pfister, L.

    2012-04-01

    The Luxembourg sandstone aquifer is of major relevance for the national supply of drinking water in Luxembourg. The city of Luxembourg (20% of the country's population) gets almost 2/3 of its drinking water from this aquifer. As a consequence, the study of both the groundwater hydrochemistry, as well as its spatial and temporal variations, are considered as of highest priority. Since 2005, a monitoring network has been implemented by the Water Department of Luxembourg City, with a view to a more sustainable management of this strategic water resource. The data collected to date forms a large and complex dataset, describing spatial and temporal variations of many hydrochemical parameters. The data treatment issue is tightly connected to this kind of water monitoring programs and complex databases. Standard multivariate statistical techniques, such as principal components analysis and hierarchical cluster analysis, have been widely used as unbiased methods for extracting meaningful information from groundwater quality data and are now classically used in many hydrogeological studies, in particular to characterize temporal or spatial hydrochemical variations induced by natural and anthropogenic factors. But these classical multivariate methods deal with two-way matrices, usually parameters/sites or parameters/time, while often the dataset resulting from qualitative water monitoring programs should be seen as a datacube parameters/sites/time. Three-way matrices, such as the one we propose here, are difficult to handle and to analyse by classical multivariate statistical tools and thus should be treated with approaches dealing with three-way data structures. One possible analysis approach consists in the use of partial triadic analysis (PTA). The PTA was previously used with success in many ecological studies but never to date in the domain of hydrogeology. Applied to the dataset of the Luxembourg Sandstone aquifer, the PTA appears as a new promising statistical

  5. Bayesian Modeling of Temporal Coherence in Videos for Entity Discovery and Summarization.

    Science.gov (United States)

    Mitra, Adway; Biswas, Soma; Bhattacharyya, Chiranjib

    2017-03-01

    A video is understood by users in terms of entities present in it. Entity Discovery is the task of building appearance model for each entity (e.g., a person), and finding all its occurrences in the video. We represent a video as a sequence of tracklets, each spanning 10-20 frames, and associated with one entity. We pose Entity Discovery as tracklet clustering, and approach it by leveraging Temporal Coherence (TC): the property that temporally neighboring tracklets are likely to be associated with the same entity. Our major contributions are the first Bayesian nonparametric models for TC at tracklet-level. We extend Chinese Restaurant Process (CRP) to TC-CRP, and further to Temporally Coherent Chinese Restaurant Franchise (TC-CRF) to jointly model entities and temporal segments using mixture components and sparse distributions. For discovering persons in TV serial videos without meta-data like scripts, these methods show considerable improvement over state-of-the-art approaches to tracklet clustering in terms of clustering accuracy, cluster purity and entity coverage. The proposed methods can perform online tracklet clustering on streaming videos unlike existing approaches, and can automatically reject false tracklets. Finally we discuss entity-driven video summarization- where temporal segments of the video are selected based on the discovered entities, to create a semantically meaningful summary.

  6. A Probabilistic Framework for Constructing Temporal Relations in Replica Exchange Molecular Trajectories.

    Science.gov (United States)

    Chattopadhyay, Aditya; Zheng, Min; Waller, Mark Paul; Priyakumar, U Deva

    2018-05-23

    Knowledge of the structure and dynamics of biomolecules is essential for elucidating the underlying mechanisms of biological processes. Given the stochastic nature of many biological processes, like protein unfolding, it's almost impossible that two independent simulations will generate the exact same sequence of events, which makes direct analysis of simulations difficult. Statistical models like Markov Chains, transition networks etc. help in shedding some light on the mechanistic nature of such processes by predicting long-time dynamics of these systems from short simulations. However, such methods fall short in analyzing trajectories with partial or no temporal information, for example, replica exchange molecular dynamics or Monte Carlo simulations. In this work we propose a probabilistic algorithm, borrowing concepts from graph theory and machine learning, to extract reactive pathways from molecular trajectories in the absence of temporal data. A suitable vector representation was chosen to represent each frame in the macromolecular trajectory (as a series of interaction and conformational energies) and dimensionality reduction was performed using principal component analysis (PCA). The trajectory was then clustered using a density-based clustering algorithm, where each cluster represents a metastable state on the potential energy surface (PES) of the biomolecule under study. A graph was created with these clusters as nodes with the edges learnt using an iterative expectation maximization algorithm. The most reactive path is conceived as the widest path along this graph. We have tested our method on RNA hairpin unfolding trajectory in aqueous urea solution. Our method makes the understanding of the mechanism of unfolding in RNA hairpin molecule more tractable. As this method doesn't rely on temporal data it can be used to analyze trajectories from Monte Carlo sampling techniques and replica exchange molecular dynamics (REMD).

  7. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    Science.gov (United States)

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  8. Cluster analysis of clinical data identifies fibromyalgia subgroups.

    Directory of Open Access Journals (Sweden)

    Elisa Docampo

    Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.

  9. Aspects of second-order analysis of structured inhomogeneous spatio-temporal processes

    DEFF Research Database (Denmark)

    Møller, Jesper; Ghorbani, Mohammad

    2012-01-01

    Statistical methodology for spatio-temporal point processes is in its infancy. We consider second-order analysis based on pair correlation functions and K-functions for general inhomogeneous spatio-temporal point processes and for inhomogeneous spatio-temporal Cox processes. Assuming spatio......-temporal separability of the intensity function, we clarify different meanings of second-order spatio-temporal separability. One is second-order spatio-temporal independence and relates to log-Gaussian Cox processes with an additive covariance structure of the underlying spatio-temporal Gaussian process. Another...... concerns shot-noise Cox processes with a separable spatio-temporal covariance density. We propose diagnostic procedures for checking hypotheses of second-order spatio-temporal separability, which we apply on simulated and real data....

  10. How do childhood diagnoses of type 1 diabetes cluster in time?

    Directory of Open Access Journals (Sweden)

    Colin R Muirhead

    Full Text Available BACKGROUND: Previous studies have indicated that type 1 diabetes may have an infectious origin. The presence of temporal clustering-an irregular temporal distribution of cases--would provide additional evidence that occurrence may be linked with an agent that displays epidemicity. We tested for the presence and form of temporal clustering using population- based data from northeast England. MATERIALS AND METHODS: The study analysed data on children aged 0-14 years diagnosed with type 1 diabetes during the period 1990-2007 and resident in a defined geographical region of northeast England (Northumberland, Newcastle upon Tyne, and North Tyneside. Tests for temporal clustering by time of diagnosis were applied using a modified version of the Potthoff-Whittinghill method. RESULTS: The study analysed 468 cases of children diagnosed with type 1 diabetes. There was highly statistically significant evidence of temporal clustering over periods of a few months and over longer time intervals (p<0.001. The clustering within years did not show a consistent seasonal pattern. CONCLUSIONS: The study adds to the growing body of literature that supports the involvement of infectious agents in the aetiology of type 1 diabetes in children. Specifically it suggests that the precipitating agent or agents involved might be an infection that occurs in "mini-epidemics".

  11. Multiscale recurrence analysis of spatio-temporal data

    Science.gov (United States)

    Riedl, M.; Marwan, N.; Kurths, J.

    2015-12-01

    The description and analysis of spatio-temporal dynamics is a crucial task in many scientific disciplines. In this work, we propose a method which uses the mapogram as a similarity measure between spatially distributed data instances at different time points. The resulting similarity values of the pairwise comparison are used to construct a recurrence plot in order to benefit from established tools of recurrence quantification analysis and recurrence network analysis. In contrast to other recurrence tools for this purpose, the mapogram approach allows the specific focus on different spatial scales that can be used in a multi-scale analysis of spatio-temporal dynamics. We illustrate this approach by application on mixed dynamics, such as traveling parallel wave fronts with additive noise, as well as more complicate examples, pseudo-random numbers and coupled map lattices with a semi-logistic mapping rule. Especially the complicate examples show the usefulness of the multi-scale consideration in order to take spatial pattern of different scales and with different rhythms into account. So, this mapogram approach promises new insights in problems of climatology, ecology, or medicine.

  12. Development of small scale cluster computer for numerical analysis

    Science.gov (United States)

    Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.

    2017-09-01

    In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.

  13. Comprehensive analysis of tornado statistics in comparison to earthquakes: intensity and temporal behaviour

    Directory of Open Access Journals (Sweden)

    L. Schielicke

    2013-01-01

    Full Text Available Tornadoes and earthquakes are characterised by a high variability in their properties concerning intensity, geometric properties and temporal behaviour. Earthquakes are known for power-law behaviour in their intensity (Gutenberg–Richter law and temporal statistics (e.g. Omori law and interevent waiting times. The observed similarity of high variability of these two phenomena motivated us to compare the statistical behaviour of tornadoes using seismological methods and quest for power-law behaviour. In general, the statistics of tornadoes show power-law behaviour partly coextensive with characteristic scales when the temporal resolution is high (10 to 60 min. These characteristic scales match with the typical diurnal behaviour of tornadoes, which is characterised by a maximum of tornado occurrences in the late afternoon hours. Furthermore, the distributions support the observation that tornadoes cluster in time. Finally, we shortly discuss a possible similar underlying structure composed of heterogeneous, coupled, interactive threshold oscillators that possibly explains the observed behaviour.

  14. SPATIO-TEMPORAL CHARACTERISTICS OF RESIDENT TRIP BASED ON POI AND OD DATA OF FLOAT CAR IN BEIJING

    Directory of Open Access Journals (Sweden)

    N. Mou

    2017-09-01

    Full Text Available Due to the influence of the urban inherent regional functional distribution, the daily activities of the residents presented some spatio-temporal patterns (periodic patterns, gathering patterns, etc.. In order to further understand the spatial and temporal characteristics of urban residents, this paper research takes the taxi trajectory data of Beijing as a sample data and studies the spatio-temporal characteristics of the residents' activities on the weekdays. At first, according to the characteristics of the taxi trajectory data distributed along the road network, it takes the Voronoi generated by the road nodes as the research unit. This paper proposes a hybrid clustering method – based on grid density, which is used to cluster the OD (origin and destination data of taxi at different times. Then,combining with the POI data of Beijing, this research calculated the density of the POI data in the clustering results, and analyzed the relationship between the activities of residents in different periods and the functional types of the region. The final results showed that the residents were mainly commuting on weekdays. And it found that the distribution of travel density showed a concentric circle of the characteristics, focusing on residential areas and work areas. The results of cluster analysis and POI analysis showed that the residents' travel had experienced the process of "spatial relative dispersion – spatial aggregation – spatial relative dispersion" in one day.

  15. Using Cluster Analysis for Data Mining in Educational Technology Research

    Science.gov (United States)

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

    2012-01-01

    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  16. [Typologies of Madrid's citizens (Spain) at the end-of-life: cluster analysis].

    Science.gov (United States)

    Ortiz-Gonçalves, Belén; Perea-Pérez, Bernardo; Labajo González, Elena; Albarrán Juan, Elena; Santiago-Sáez, Andrés

    2018-03-06

    To establish typologies within Madrid's citizens (Spain) with regard to end-of-life by cluster analysis. The SPAD 8 programme was implemented in a sample from a health care centre in the autonomous region of Madrid (Spain). A multiple correspondence analysis technique was used, followed by a cluster analysis to create a dendrogram. A cross-sectional study was made beforehand with the results of the questionnaire. Five clusters stand out. Cluster 1: a group who preferred not to answer numerous questions (5%). Cluster 2: in favour of receiving palliative care and euthanasia (40%). Cluster 3: would oppose assisted suicide and would not ask for spiritual assistance (15%). Cluster 4: would like to receive palliative care and assisted suicide (16%). Cluster 5: would oppose assisted suicide and would ask for spiritual assistance (24%). The following four clusters stood out. Clusters 2 and 4 would like to receive palliative care, euthanasia (2) and assisted suicide (4). Clusters 4 and 5 regularly practiced their faith and their family members did not receive palliative care. Clusters 3 and 5 would be opposed to euthanasia and assisted suicide in particular. Clusters 2, 4 and 5 had not completed an advance directive document (2, 4 and 5). Clusters 2 and 3 seldom practiced their faith. This study could be taken into consideration to improve the quality of end-of-life care choices. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.

  17. Suicide Clusters: A Review of Risk Factors and Mechanisms

    Science.gov (United States)

    Haw, Camilla; Hawton, Keith; Niedzwiedz, Claire; Platt, Steve

    2013-01-01

    Suicide clusters, although uncommon, cause great concern in the communities in which they occur. We searched the world literature on suicide clusters and describe the risk factors and proposed psychological mechanisms underlying the spatio-temporal clustering of suicides (point clusters). Potential risk factors include male gender, being an…

  18. Using cluster analysis to organize and explore regional GPS velocities

    Science.gov (United States)

    Simpson, Robert W.; Thatcher, Wayne; Savage, James C.

    2012-01-01

    Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.

  19. Review of multi-physics temporal coupling methods for analysis of nuclear reactors

    International Nuclear Information System (INIS)

    Zerkak, Omar; Kozlowski, Tomasz; Gajev, Ivan

    2015-01-01

    Highlights: • Review of the numerical methods used for the multi-physics temporal coupling. • Review of high-order improvements to the Operator Splitting coupling method. • Analysis of truncation error due to the temporal coupling. • Recommendations on best-practice approaches for multi-physics temporal coupling. - Abstract: The advanced numerical simulation of a realistic physical system typically involves multi-physics problem. For example, analysis of a LWR core involves the intricate simulation of neutron production and transport, heat transfer throughout the structures of the system and the flowing, possibly two-phase, coolant. Such analysis involves the dynamic coupling of multiple simulation codes, each one devoted to the solving of one of the coupled physics. Multiple temporal coupling methods exist, yet the accuracy of such coupling is generally driven by the least accurate numerical scheme. The goal of this paper is to review in detail the approaches and numerical methods that can be used for the multi-physics temporal coupling, including a comprehensive discussion of the issues associated with the temporal coupling, and define approaches that can be used to perform multi-physics analysis. The paper is not limited to any particular multi-physics process or situation, but is intended to provide a generic description of multi-physics temporal coupling schemes for any development stage of the individual (single-physics) tools and methods. This includes a wide spectrum of situation, where the individual (single-physics) solvers are based on pre-existing computation codes embedded as individual components, or a new development where the temporal coupling can be developed and implemented as a part of code development. The discussed coupling methods are demonstrated in the framework of LWR core analysis

  20. Discovery of temporal and disease association patterns in condition-specific hospital utilization rates.

    Directory of Open Access Journals (Sweden)

    Julian S Haimovich

    Full Text Available Identifying temporal variation in hospitalization rates may provide insights about disease patterns and thereby inform research, policy, and clinical care. However, the majority of medical conditions have not been studied for their potential seasonal variation. The objective of this study was to apply a data-driven approach to characterize temporal variation in condition-specific hospitalizations. Using a dataset of 34 million inpatient discharges gathered from hospitals in New York State from 2008-2011, we grouped all discharges into 263 clinical conditions based on the principal discharge diagnosis using Clinical Classification Software in order to mitigate the limitation that administrative claims data reflect clinical conditions to varying specificity. After applying Seasonal-Trend Decomposition by LOESS, we estimated the periodicity of the seasonal component using spectral analysis and applied harmonic regression to calculate the amplitude and phase of the condition's seasonal utilization pattern. We also introduced four new indices of temporal variation: mean oscillation width, seasonal coefficient, trend coefficient, and linearity of the trend. Finally, K-means clustering was used to group conditions across these four indices to identify common temporal variation patterns. Of all 263 clinical conditions considered, 164 demonstrated statistically significant seasonality. Notably, we identified conditions for which seasonal variation has not been previously described such as ovarian cancer, tuberculosis, and schizophrenia. Clustering analysis yielded three distinct groups of conditions based on multiple measures of seasonal variation. Our study was limited to New York State and results may not directly apply to other regions with distinct climates and health burden. A substantial proportion of medical conditions, larger than previously described, exhibit seasonal variation in hospital utilization. Moreover, the application of clustering

  1. Methodology сomparative statistical analysis of Russian industry based on cluster analysis

    Directory of Open Access Journals (Sweden)

    Sergey S. Shishulin

    2017-01-01

    Full Text Available The article is devoted to researching of the possibilities of applying multidimensional statistical analysis in the study of industrial production on the basis of comparing its growth rates and structure with other developed and developing countries of the world. The purpose of this article is to determine the optimal set of statistical methods and the results of their application to industrial production data, which would give the best access to the analysis of the result.Data includes such indicators as output, output, gross value added, the number of employed and other indicators of the system of national accounts and operational business statistics. The objects of observation are the industry of the countrys of the Customs Union, the United States, Japan and Erope in 2005-2015. As the research tool used as the simplest methods of transformation, graphical and tabular visualization of data, and methods of statistical analysis. In particular, based on a specialized software package (SPSS, the main components method, discriminant analysis, hierarchical methods of cluster analysis, Ward’s method and k-means were applied.The application of the method of principal components to the initial data makes it possible to substantially and effectively reduce the initial space of industrial production data. Thus, for example, in analyzing the structure of industrial production, the reduction was from fifteen industries to three basic, well-interpreted factors: the relatively extractive industries (with a low degree of processing, high-tech industries and consumer goods (medium-technology sectors. At the same time, as a result of comparison of the results of application of cluster analysis to the initial data and data obtained on the basis of the principal components method, it was established that clustering industrial production data on the basis of new factors significantly improves the results of clustering.As a result of analyzing the parameters of

  2. Foot and mouth disease in Zambia: Spatial and temporal distributions of outbreaks, assessment of clusters and implications for control

    Directory of Open Access Journals (Sweden)

    Yona Sinkala

    2014-04-01

    Full Text Available Zambia has been experiencing low livestock productivity as well as trade restrictions owing to the occurrence of foot and mouth disease (FMD, but little is known about the epidemiology of the disease in these endemic settings. The fundamental questions relate to the spatio-temporal distribution of FMD cases and what determines their occurrence. A retrospective review of FMD cases in Zambia from 1981 to 2012 was conducted using geographical information systems and the SaTScan software package. Information was collected from peer-reviewed journal articles, conference proceedings, laboratory reports, unpublished scientific reports and grey literature. A space–time permutation probability model using a varying time window of one year was used to scan for areas with high infection rates. The spatial scan statistic detected a significant purely spatial cluster around the Mbala–Isoka area between 2009 and 2012, with secondary clusters in Sesheke–Kazungula in 2007 and 2008, the Kafue flats in 2004 and 2005 and Livingstone in 2012. This study provides evidence of the existence of statistically significant FMD clusters and an increase in occurrence in Zambia between 2004 and 2012. The identified clusters agree with areas known to be at high risk of FMD. The FMD virus transmission dynamics and the heterogeneous variability in risk within these locations may need further investigation.

  3. Genome-scale analysis of positional clustering of mouse testis-specific genes

    Directory of Open Access Journals (Sweden)

    Lee Bernett TK

    2005-01-01

    Full Text Available Abstract Background Genes are not randomly distributed on a chromosome as they were thought even after removal of tandem repeats. The positional clustering of co-expressed genes is known in prokaryotes and recently reported in several eukaryotic organisms such as Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens. In order to further investigate the mode of tissue-specific gene clustering in higher eukaryotes, we have performed a genome-scale analysis of positional clustering of the mouse testis-specific genes. Results Our computational analysis shows that a large proportion of testis-specific genes are clustered in groups of 2 to 5 genes in the mouse genome. The number of clusters is much higher than expected by chance even after removal of tandem repeats. Conclusion Our result suggests that testis-specific genes tend to cluster on the mouse chromosomes. This provides another piece of evidence for the hypothesis that clusters of tissue-specific genes do exist.

  4. Análisis temporal de la mortalidad por violencia del compañero íntimo en España Temporal analysis of mortality due to intimate partner violence in Spain

    Directory of Open Access Journals (Sweden)

    Carmen Vives

    2004-10-01

    due to violence by intimate partners (VIP and to identify possible temporal clusters in women deaths by VIP in Spain. Methods: We performed a descriptive epidemiological study based on the VIP deaths included in the database of the Federation of Divorced and Separated Women (1998-2003. The epidemic index (EI was calculated as the ratio between the actual number of VIP deaths in a given month from January to July 2003 and the median number in the same month in the five preceding years. A Poisson model was used to analyze the distribution by years (1998-2002, seasons, months, and days. Simple regression analysis was performed with three-monthly means. A temporal cluster analysis was also carried out. Results: In 2003, the EI of VIP mortality was high in January (EI = 1.6, March (EI = 1.2, May (EI = 1.5, June (EI = 2, and July (EI = 2.5. Compared with 1998 and Sundays, respectively, mortality due to VIP was significantly increased in 2001 (relative risk, RR = 1.52; 95% confidence interval [CI], 1.05-2.20 and on Mondays (RR = 1.77; 95%CI, 1.13-2.76. The regression analyses confirmed an increase between the first three-month period of 1998 and the last three-month period of 2001. There were no differences between seasons and months. No temporal clusters of deaths were detected. Conclusions: VIP is currently an increasing epidemic in Spain with no clear temporal pattern. Political and legal efforts to reduce this problem do not seem to be successful.

  5. Pattern recognition in menstrual bleeding diaries by statistical cluster analysis

    Directory of Open Access Journals (Sweden)

    Wessel Jens

    2009-07-01

    Full Text Available Abstract Background The aim of this paper is to empirically identify a treatment-independent statistical method to describe clinically relevant bleeding patterns by using bleeding diaries of clinical studies on various sex hormone containing drugs. Methods We used the four cluster analysis methods single, average and complete linkage as well as the method of Ward for the pattern recognition in menstrual bleeding diaries. The optimal number of clusters was determined using the semi-partial R2, the cubic cluster criterion, the pseudo-F- and the pseudo-t2-statistic. Finally, the interpretability of the results from a gynecological point of view was assessed. Results The method of Ward yielded distinct clusters of the bleeding diaries. The other methods successively chained the observations into one cluster. The optimal number of distinctive bleeding patterns was six. We found two desirable and four undesirable bleeding patterns. Cyclic and non cyclic bleeding patterns were well separated. Conclusion Using this cluster analysis with the method of Ward medications and devices having an impact on bleeding can be easily compared and categorized.

  6. Comparative analysis of clustering methods for gene expression time course data

    Directory of Open Access Journals (Sweden)

    Ivan G. Costa

    2004-01-01

    Full Text Available This work performs a data driven comparative study of clustering methods used in the analysis of gene expression time courses (or time series. Five clustering methods found in the literature of gene expression analysis are compared: agglomerative hierarchical clustering, CLICK, dynamical clustering, k-means and self-organizing maps. In order to evaluate the methods, a k-fold cross-validation procedure adapted to unsupervised methods is applied. The accuracy of the results is assessed by the comparison of the partitions obtained in these experiments with gene annotation, such as protein function and series classification.

  7. The Productivity Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, E.

    2014-07-01

    Chennai, also called the Detroit of India, is India's second fastest growing auto market and exports auto components and vehicles to US, Germany, Japan and Brazil. For inclusive growth and sustainable development, 250 auto component industries in Ambattur, Thirumalisai and Thirumudivakkam Industrial Estates located in Chennai have adopted the Cluster Development Approach called Automotive Component Cluster. The objective is to study the Value Chain, Correlation and Data Envelopment Analysis by determining technical efficiency, peer weights, input and output slacks of 100 auto component industries in three estates. The methodology adopted is using Data Envelopment Analysis of Output Oriented Banker Charnes Cooper model by taking net worth, fixed assets, employment as inputs and gross output as outputs. The non-zero represents the weights for efficient clusters. The higher slack obtained reveals the excess net worth, fixed assets, employment and shortage in gross output. To conclude, the variables are highly correlated and the inefficient industries should increase their gross output or decrease the fixed assets or employment. Moreover for sustainable development, the cluster should strengthen infrastructure, technology, procurement, production and marketing interrelationships to decrease costs and to increase productivity and efficiency to compete in the indigenous and export market.

  8. Water quality, Multivariate statistical techniques, submarine out fall, spatial variation, temporal variation

    International Nuclear Information System (INIS)

    Garcia, Francisco; Palacio, Carlos; Garcia, Uriel

    2012-01-01

    Multivariate statistical techniques were used to investigate the temporal and spatial variations of water quality at the Santa Marta coastal area where a submarine out fall that discharges 1 m3/s of domestic wastewater is located. Two-way analysis of variance (ANOVA), cluster and principal component analysis and Krigging interpolation were considered for this report. Temporal variation showed two heterogeneous periods. From December to April, and July, where the concentration of the water quality parameters is higher; the rest of the year (May, June, August-November) were significantly lower. The spatial variation reported two areas where the water quality is different, this difference is related to the proximity to the submarine out fall discharge.

  9. Extracting the Textual and Temporal Structure of Supercomputing Logs

    Energy Technology Data Exchange (ETDEWEB)

    Jain, S; Singh, I; Chandra, A; Zhang, Z; Bronevetsky, G

    2009-05-26

    Supercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource of information about their operational status and health. However, their massive size, complexity, and lack of standard format makes it difficult to automatically extract information that can be used to improve system management. In this work we propose a novel method to succinctly represent the contents of supercomputing logs, by using textual clustering to automatically find the syntactic structures of log messages. This information is used to automatically classify messages into semantic groups via an online clustering algorithm. Further, we describe a methodology for using the temporal proximity between groups of log messages to identify correlated events in the system. We apply our proposed methods to two large, publicly available supercomputing logs and show that our technique features nearly perfect accuracy for online log-classification and extracts meaningful structural and temporal message patterns that can be used to improve the accuracy of other log analysis techniques.

  10. MMPI profiles of males accused of severe crimes: a cluster analysis

    NARCIS (Netherlands)

    Spaans, M.; Barendregt, M.; Muller, E.; Beurs, E. de; Nijman, H.L.I.; Rinne, T.

    2009-01-01

    In studies attempting to classify criminal offenders by cluster analysis of Minnesota Multiphasic Personality Inventory-2 (MMPI-2) data, the number of clusters found varied between 10 (the Megargee System) and two (one cluster indicating no psychopathology and one exhibiting serious

  11. Statistical parametric mapping analysis of the relationship between regional cerebral blood flow and symptom clusters of the depressive mood in patients with pre-dialytic chronic kidney disease

    International Nuclear Information System (INIS)

    Kim, Seong-Jang; Song, Sang Heon; Kim, Ji Hoon; Kwak, Ihm Soo

    2008-01-01

    The aim of this study is to investigate the relationship between regional cerebral blood flow (rCBF) and symptom clusters of depressive mood in pre-dialytic chronic kidney disease (CKD). Twenty-seven patients with stage 4-5 CKD were subjected to statistical parametric mapping analysis of brain single-photon emission computed tomography. Correlation analyses between separate symptom clusters of depressive mood and rCBF were done. The first factor (depressive mood) was negatively correlated with rCBF in the right insula, posterior cingulate gyrus, and left superior temporal gyrus, and positively correlated with rCBF in the left fusiform gyrus. The second factor (insomnia) was negatively correlated with rCBF in the right middle frontal gyrus, bilateral cingulate gyri, right insula, right putamen, and right inferior parietal lobule, and positively correlated with rCBF in left fusiform gyrus and bilateral cerebellar tonsils. The third factor (anxiety and psychomotor aspects) was negatively correlated with rCBF in the left inferior frontal gyms, right superior frontal gyms, right middle temporal gyrus, right superior temporal gyrus, and left superior frontal gyrus, and positively correlated with rCBF in the right ligual gyrus and right parahippocampal gyrus. In this study, the separate symptom clusters were correlated with specific rCBF patterns similar to those in major depressive disorder patients without CKD. However, some areas with discordant rCBF patterns were also noted when compared with major depressive disorder patients. Further larger scale investigations are needed. (author)

  12. ANALYSIS OF DEVELOPING BATIK INDUSTRY CLUSTER IN BAKARAN VILLAGE CENTRAL JAVA PROVINCE

    Directory of Open Access Journals (Sweden)

    Hermanto Hermanto

    2017-06-01

    Full Text Available SMEs grow in a cluster in a certain geographical area. The entrepreneurs grow and thrive through the business cluster. Central Java Province has a lot of business clusters in improving the regional economy, one of which is batik industry cluster. Pati Regency is one of regencies / city in Central Java that has the lowest turnover. Batik industy cluster in Pati develops quite well, which can be seen from the increasing number of batik industry incorporated in the cluster. This research examines the strategy of developing the batik industry cluster in Pati Regency. The purpose of this research is to determine the proper strategy for developing the batik industry clusters in Pati. The method of research is quantitative. The analysis tool of this research is the Strengths, Weakness, Opportunity, Threats (SWOT analysis. The result of SWOT analysis in this research shows that the proper strategy for developing the batik industry cluster in Pati is optimizing the management of batik business cluster in Bakaran Village; the local government provides information of the facility of business capital loans; the utilization of labors from Bakaran Village while improving the quality of labors by training, and marketing the Bakaran batik to the broader markets while maintaining the quality of batik. Advice that can be given from this research is that the parties who have a role in batik industry cluster development in Bakaran Village, Pati Regency, such as the Local Government.

  13. Investigation of Spatial and Temporal Trends in Water Quality in Daya Bay, South China Sea

    Science.gov (United States)

    Wu, Mei-Lin; Wang, You-Shao; Dong, Jun-De; Sun, Cui-Ci; Wang, Yu-Tu; Sun, Fu-Lin; Cheng, Hao

    2011-01-01

    The objective is to identify the spatial and temporal variability of the hydrochemical quality of the water column in a subtropical coastal system, Daya Bay, China. Water samples were collected in four seasons at 12 monitoring sites. The Southeast Asian monsoons, northeasterly from October to the next April and southwesterly from May to September have also an important influence on water quality in Daya Bay. In the spatial pattern, two groups have been identified, with the help of multidimensional scaling analysis and cluster analysis. Cluster I consisted of the sites S3, S8, S10 and S11 in the west and north coastal parts of Daya Bay. Cluster I is mainly related to anthropogenic activities such as fish-farming. Cluster II consisted of the rest of the stations in the center, east and south parts of Daya Bay. Cluster II is mainly related to seawater exchange from South China Sea. PMID:21776234

  14. Water quality assessment with hierarchical cluster analysis based on Mahalanobis distance.

    Science.gov (United States)

    Du, Xiangjun; Shao, Fengjing; Wu, Shunyao; Zhang, Hanlin; Xu, Si

    2017-07-01

    Water quality assessment is crucial for assessment of marine eutrophication, prediction of harmful algal blooms, and environment protection. Previous studies have developed many numeric modeling methods and data driven approaches for water quality assessment. The cluster analysis, an approach widely used for grouping data, has also been employed. However, there are complex correlations between water quality variables, which play important roles in water quality assessment but have always been overlooked. In this paper, we analyze correlations between water quality variables and propose an alternative method for water quality assessment with hierarchical cluster analysis based on Mahalanobis distance. Further, we cluster water quality data collected form coastal water of Bohai Sea and North Yellow Sea of China, and apply clustering results to evaluate its water quality. To evaluate the validity, we also cluster the water quality data with cluster analysis based on Euclidean distance, which are widely adopted by previous studies. The results show that our method is more suitable for water quality assessment with many correlated water quality variables. To our knowledge, it is the first attempt to apply Mahalanobis distance for coastal water quality assessment.

  15. A SURVEY ON DOCUMENT CLUSTERING APPROACH FOR COMPUTER FORENSIC ANALYSIS

    OpenAIRE

    Monika Raghuvanshi*, Rahul Patel

    2016-01-01

    In a forensic analysis, large numbers of files are examined. Much of the information comprises of in unstructured format, so it’s quite difficult task for computer forensic to perform such analysis. That’s why to do the forensic analysis of document within a limited period of time require a special approach such as document clustering. This paper review different document clustering algorithms methodologies for example K-mean, K-medoid, single link, complete link, average link in accorandance...

  16. Cluster Analysis in Rapeseed (Brassica Napus L.)

    International Nuclear Information System (INIS)

    Mahasi, J.M

    2002-01-01

    With widening edible deficit, Kenya has become increasingly dependent on imported edible oils. Many oilseed crops (e.g. sunflower, soya beans, rapeseed/mustard, sesame, groundnuts etc) can be grown in Kenya. But oilseed rape is preferred because it very high yielding (1.5 tons-4.0 tons/ha) with oil content of 42-46%. Other uses include fitting in various cropping systems as; relay/inter crops, rotational crops, trap crops and fodder. It is soft seeded hence oil extraction is relatively easy. The meal is high in protein and very useful in livestock supplementation. Rapeseed can be straight combined using adjusted wheat combines. The priority is to expand domestic oilseed production, hence the need to introduce improved rapeseed germplasm from other countries. The success of any crop improvement programme depends on the extent of genetic diversity in the material. Hence, it is essential to understand the adaptation of introduced genotypes and the similarities if any among them. Evaluation trials were carried out on 17 rapeseed genotypes (nine Canadian origin and eight of European origin) grown at 4 locations namely Endebess, Njoro, Timau and Mau Narok in three years (1992, 1993 and 1994). Results for 1993 were discarded due to severe drought. An analysis of variance was carried out only on seed yields and the treatments were found to be significantly different. Cluster analysis was then carried out on mean seed yields and based on this analysis; only one major group exists within the material. In 1992, varieties 2,3,8 and 9 didn't fall in the same cluster as the rest. Variety 8 was the only one not classified with the rest of the Canadian varieties. Three European varieties (2,3 and 9) were however not classified with the others. In 1994, varieties 10 and 6 didn't fall in the major cluster. Of these two, variety 10 is of Canadian origin. Varieties were more similar in 1994 than 1992 due to favorable weather. It is evident that, genotypes from different geographical

  17. Extending the Functionality of Behavioural Change-Point Analysis with k-Means Clustering: A Case Study with the Little Penguin (Eudyptula minor)

    Science.gov (United States)

    Zhang, Jingjing; Dennis, Todd E.

    2015-01-01

    We present a simple framework for classifying mutually exclusive behavioural states within the geospatial lifelines of animals. This method involves use of three sequentially applied statistical procedures: (1) behavioural change point analysis to partition movement trajectories into discrete bouts of same-state behaviours, based on abrupt changes in the spatio-temporal autocorrelation structure of movement parameters; (2) hierarchical multivariate cluster analysis to determine the number of different behavioural states; and (3) k-means clustering to classify inferred bouts of same-state location observations into behavioural modes. We demonstrate application of the method by analysing synthetic trajectories of known ‘artificial behaviours’ comprised of different correlated random walks, as well as real foraging trajectories of little penguins (Eudyptula minor) obtained by global-positioning-system telemetry. Our results show that the modelling procedure correctly classified 92.5% of all individual location observations in the synthetic trajectories, demonstrating reasonable ability to successfully discriminate behavioural modes. Most individual little penguins were found to exhibit three unique behavioural states (resting, commuting/active searching, area-restricted foraging), with variation in the timing and locations of observations apparently related to ambient light, bathymetry, and proximity to coastlines and river mouths. Addition of k-means clustering extends the utility of behavioural change point analysis, by providing a simple means through which the behaviours inferred for the location observations comprising individual movement trajectories can be objectively classified. PMID:25922935

  18. Extending the Functionality of Behavioural Change-Point Analysis with k-Means Clustering: A Case Study with the Little Penguin (Eudyptula minor).

    Science.gov (United States)

    Zhang, Jingjing; O'Reilly, Kathleen M; Perry, George L W; Taylor, Graeme A; Dennis, Todd E

    2015-01-01

    We present a simple framework for classifying mutually exclusive behavioural states within the geospatial lifelines of animals. This method involves use of three sequentially applied statistical procedures: (1) behavioural change point analysis to partition movement trajectories into discrete bouts of same-state behaviours, based on abrupt changes in the spatio-temporal autocorrelation structure of movement parameters; (2) hierarchical multivariate cluster analysis to determine the number of different behavioural states; and (3) k-means clustering to classify inferred bouts of same-state location observations into behavioural modes. We demonstrate application of the method by analysing synthetic trajectories of known 'artificial behaviours' comprised of different correlated random walks, as well as real foraging trajectories of little penguins (Eudyptula minor) obtained by global-positioning-system telemetry. Our results show that the modelling procedure correctly classified 92.5% of all individual location observations in the synthetic trajectories, demonstrating reasonable ability to successfully discriminate behavioural modes. Most individual little penguins were found to exhibit three unique behavioural states (resting, commuting/active searching, area-restricted foraging), with variation in the timing and locations of observations apparently related to ambient light, bathymetry, and proximity to coastlines and river mouths. Addition of k-means clustering extends the utility of behavioural change point analysis, by providing a simple means through which the behaviours inferred for the location observations comprising individual movement trajectories can be objectively classified.

  19. Extending the Functionality of Behavioural Change-Point Analysis with k-Means Clustering: A Case Study with the Little Penguin (Eudyptula minor.

    Directory of Open Access Journals (Sweden)

    Jingjing Zhang

    Full Text Available We present a simple framework for classifying mutually exclusive behavioural states within the geospatial lifelines of animals. This method involves use of three sequentially applied statistical procedures: (1 behavioural change point analysis to partition movement trajectories into discrete bouts of same-state behaviours, based on abrupt changes in the spatio-temporal autocorrelation structure of movement parameters; (2 hierarchical multivariate cluster analysis to determine the number of different behavioural states; and (3 k-means clustering to classify inferred bouts of same-state location observations into behavioural modes. We demonstrate application of the method by analysing synthetic trajectories of known 'artificial behaviours' comprised of different correlated random walks, as well as real foraging trajectories of little penguins (Eudyptula minor obtained by global-positioning-system telemetry. Our results show that the modelling procedure correctly classified 92.5% of all individual location observations in the synthetic trajectories, demonstrating reasonable ability to successfully discriminate behavioural modes. Most individual little penguins were found to exhibit three unique behavioural states (resting, commuting/active searching, area-restricted foraging, with variation in the timing and locations of observations apparently related to ambient light, bathymetry, and proximity to coastlines and river mouths. Addition of k-means clustering extends the utility of behavioural change point analysis, by providing a simple means through which the behaviours inferred for the location observations comprising individual movement trajectories can be objectively classified.

  20. Measuring temporal liking simultaneously to Temporal Dominance of Sensations in several intakes. An application to Gouda cheeses in 6 Europeans countries.

    Science.gov (United States)

    Thomas, A; Chambault, M; Dreyfuss, L; Gilbert, C C; Hegyi, A; Henneberg, S; Knippertz, A; Kostyra, E; Kremer, S; Silva, A P; Schlich, P

    2017-09-01

    The idea of having untrained consumers performing Temporal Dominance of Sensations (TDS) and dynamic liking in the same session was recently introduced (Thomas, van der Stelt, Prokop, Lawlor, & Schlich, 2016). In the present study, a variation of the data acquisition protocol was done, aiming to record TDS and liking simultaneously on the same screen in a single session during multiple product intakes. This method, called Simultaneous Temporal Drivers of Liking (S-TDL), was used to describe samples of Gouda cheese in an international experiment. To test this idea, consumers from six European countries (n=667) assessed 4 Gouda cheeses with different ages and fat contents during one sensory evaluation session. Ten sensory attributes and a 9-point hedonic scale were presented simultaneously on the computer screen. While performing TDS, consumers could reassess their liking score as often as they wanted. This new type of sensory data was coded by individual average liking scores while a given attribute was perceived as dominant (Liking While Dominant; LWD). Although significant differences in preference were observed among countries, there were global preferences for a longer dominance of melting, fatty and tender textures. The cheese flavour attribute was the best positive TDL, whereas bitter was a strong negative TDL. A cluster analysis of the 667 consumers identified three significant liking clusters, each with different most and least preferred samples. For the TDL computation by cluster, significant specific TDL were observed. These results showed the importance of overall liking segmentation before TDL analysis to determine which attributes should have a longer dominance duration in order to please specific consumer targets. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. The Quantitative Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, Ethirajan

    2016-07-01

    Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity

  2. Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability.

    Science.gov (United States)

    Miller, Christopher B; Bartlett, Delwyn J; Mullins, Anna E; Dodds, Kirsty L; Gordon, Christopher J; Kyle, Simon D; Kim, Jong Won; D'Rozario, Angela L; Lee, Rico S C; Comas, Maria; Marshall, Nathaniel S; Yee, Brendon J; Espie, Colin A; Grunstein, Ronald R

    2016-11-01

    To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative ( q )-EEG and heart rate variability (HRV). Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q -EEG. Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. © 2016 Associated Professional Sleep Societies, LLC.

  3. Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability

    Science.gov (United States)

    Miller, Christopher B.; Bartlett, Delwyn J.; Mullins, Anna E.; Dodds, Kirsty L.; Gordon, Christopher J.; Kyle, Simon D.; Kim, Jong Won; D'Rozario, Angela L.; Lee, Rico S.C.; Comas, Maria; Marshall, Nathaniel S.; Yee, Brendon J.; Espie, Colin A.; Grunstein, Ronald R.

    2016-01-01

    Study Objectives: To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative (q)-EEG and heart rate variability (HRV). Methods: Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. Results: From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q-EEG. Clinical Trial Registration: Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. Citation: Miller CB, Bartlett DJ, Mullins AE, Dodds KL, Gordon CJ, Kyle SD, Kim JW, D'Rozario AL, Lee RS, Comas M, Marshall NS, Yee BJ, Espie CA, Grunstein RR. Clusters of Insomnia Disorder: an exploratory cluster analysis of objective sleep parameters reveals differences in neurocognitive functioning, quantitative EEG, and heart rate variability. SLEEP 2016;39(11):1993–2004. PMID:27568796

  4. Characterization of spatial and temporal variability in hydrochemistry of Johor Straits, Malaysia.

    Science.gov (United States)

    Abdullah, Pauzi; Abdullah, Sharifah Mastura Syed; Jaafar, Othman; Mahmud, Mastura; Khalik, Wan Mohd Afiq Wan Mohd

    2015-12-15

    Characterization of hydrochemistry changes in Johor Straits within 5 years of monitoring works was successfully carried out. Water quality data sets (27 stations and 19 parameters) collected in this area were interpreted subject to multivariate statistical analysis. Cluster analysis grouped all the stations into four clusters ((Dlink/Dmax) × 1001) that explained 82.6% of the total variance of the data set. Classification matrix of discriminant analysis assigned 88.9-92.6% and 83.3-100% correctness in spatial and temporal variability, respectively. Times series analysis then confirmed that only four parameters were not significant over time change. Therefore, it is imperative that the environmental impact of reclamation and dredging works, municipal or industrial discharge, marine aquaculture and shipping activities in this area be effectively controlled and managed. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Assessment of genetic divergence in tomato through agglomerative hierarchical clustering and principal component analysis

    International Nuclear Information System (INIS)

    Iqbal, Q.; Saleem, M.Y.; Hameed, A.; Asghar, M.

    2014-01-01

    For the improvement of qualitative and quantitative traits, existence of variability has prime importance in plant breeding. Data on different morphological and reproductive traits of 47 tomato genotypes were analyzed for correlation,agglomerative hierarchical clustering and principal component analysis (PCA) to select genotypes and traits for future breeding program. Correlation analysis revealed significant positive association between yield and yield components like fruit diameter, single fruit weight and number of fruits plant-1. Principal component (PC) analysis depicted first three PCs with Eigen-value higher than 1 contributing 81.72% of total variability for different traits. The PC-I showed positive factor loadings for all the traits except number of fruits plant-1. The contribution of single fruit weight and fruit diameter was highest in PC-1. Cluster analysis grouped all genotypes into five divergent clusters. The genotypes in cluster-II and cluster-V exhibited uniform maturity and higher yield. The D2 statistics confirmed highest distance between cluster- III and cluster-V while maximum similarity was observed in cluster-II and cluster-III. It is therefore suggested that crosses between genotypes of cluster-II and cluster-V with those of cluster-I and cluster-III may exhibit heterosis in F1 for hybrid breeding and for selection of superior genotypes in succeeding generations for cross breeding programme. (author)

  6. A Distributed Flocking Approach for Information Stream Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Intelligence analysts are currently overwhelmed with the amount of information streams generated everyday. There is a lack of comprehensive tool that can real-time analyze the information streams. Document clustering analysis plays an important role in improving the accuracy of information retrieval. However, most clustering technologies can only be applied for analyzing the static document collection because they normally require a large amount of computation resource and long time to get accurate result. It is very difficult to cluster a dynamic changed text information streams on an individual computer. Our early research has resulted in a dynamic reactive flock clustering algorithm which can continually refine the clustering result and quickly react to the change of document contents. This character makes the algorithm suitable for cluster analyzing dynamic changed document information, such as text information stream. Because of the decentralized character of this algorithm, a distributed approach is a very natural way to increase the clustering speed of the algorithm. In this paper, we present a distributed multi-agent flocking approach for the text information stream clustering and discuss the decentralized architectures and communication schemes for load balance and status information synchronization in this approach.

  7. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    Science.gov (United States)

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.

  8. Cluster analysis of obesity and asthma phenotypes.

    Directory of Open Access Journals (Sweden)

    E Rand Sutherland

    Full Text Available Asthma is a heterogeneous disease with variability among patients in characteristics such as lung function, symptoms and control, body weight, markers of inflammation, and responsiveness to glucocorticoids (GC. Cluster analysis of well-characterized cohorts can advance understanding of disease subgroups in asthma and point to unsuspected disease mechanisms. We utilized an hypothesis-free cluster analytical approach to define the contribution of obesity and related variables to asthma phenotype.In a cohort of clinical trial participants (n = 250, minimum-variance hierarchical clustering was used to identify clinical and inflammatory biomarkers important in determining disease cluster membership in mild and moderate persistent asthmatics. In a subset of participants, GC sensitivity was assessed via expression of GC receptor alpha (GCRα and induction of MAP kinase phosphatase-1 (MKP-1 expression by dexamethasone. Four asthma clusters were identified, with body mass index (BMI, kg/m(2 and severity of asthma symptoms (AEQ score the most significant determinants of cluster membership (F = 57.1, p<0.0001 and F = 44.8, p<0.0001, respectively. Two clusters were composed of predominantly obese individuals; these two obese asthma clusters differed from one another with regard to age of asthma onset, measures of asthma symptoms (AEQ and control (ACQ, exhaled nitric oxide concentration (F(ENO and airway hyperresponsiveness (methacholine PC(20 but were similar with regard to measures of lung function (FEV(1 (% and FEV(1/FVC, airway eosinophilia, IgE, leptin, adiponectin and C-reactive protein (hsCRP. Members of obese clusters demonstrated evidence of reduced expression of GCRα, a finding which was correlated with a reduced induction of MKP-1 expression by dexamethasoneObesity is an important determinant of asthma phenotype in adults. There is heterogeneity in expression of clinical and inflammatory biomarkers of asthma across obese individuals

  9. Cluster: A New Application for Spatial Analysis of Pixelated Data for Epiphytotics.

    Science.gov (United States)

    Nelson, Scot C; Corcoja, Iulian; Pethybridge, Sarah J

    2017-12-01

    Spatial analysis of epiphytotics is essential to develop and test hypotheses about pathogen ecology, disease dynamics, and to optimize plant disease management strategies. Data collection for spatial analysis requires substantial investment in time to depict patterns in various frames and hierarchies. We developed a new approach for spatial analysis of pixelated data in digital imagery and incorporated the method in a stand-alone desktop application called Cluster. The user isolates target entities (clusters) by designating up to 24 pixel colors as nontargets and moves a threshold slider to visualize the targets. The app calculates the percent area occupied by targeted pixels, identifies the centroids of targeted clusters, and computes the relative compass angle of orientation for each cluster. Users can deselect anomalous clusters manually and/or automatically by specifying a size threshold value to exclude smaller targets from the analysis. Up to 1,000 stochastic simulations randomly place the centroids of each cluster in ranked order of size (largest to smallest) within each matrix while preserving their calculated angles of orientation for the long axes. A two-tailed probability t test compares the mean inter-cluster distances for the observed versus the values derived from randomly simulated maps. This is the basis for statistical testing of the null hypothesis that the clusters are randomly distributed within the frame of interest. These frames can assume any shape, from natural (e.g., leaf) to arbitrary (e.g., a rectangular or polygonal field). Cluster summarizes normalized attributes of clusters, including pixel number, axis length, axis width, compass orientation, and the length/width ratio, available to the user as a downloadable spreadsheet. Each simulated map may be saved as an image and inspected. Provided examples demonstrate the utility of Cluster to analyze patterns at various spatial scales in plant pathology and ecology and highlight the

  10. Hierarchical organization in the temporal structure of infant-direct speech and song.

    Science.gov (United States)

    Falk, Simone; Kello, Christopher T

    2017-06-01

    Caregivers alter the temporal structure of their utterances when talking and singing to infants compared with adult communication. The present study tested whether temporal variability in infant-directed registers serves to emphasize the hierarchical temporal structure of speech. Fifteen German-speaking mothers sang a play song and told a story to their 6-months-old infants, or to an adult. Recordings were analyzed using a recently developed method that determines the degree of nested clustering of temporal events in speech. Events were defined as peaks in the amplitude envelope, and clusters of various sizes related to periods of acoustic speech energy at varying timescales. Infant-directed speech and song clearly showed greater event clustering compared with adult-directed registers, at multiple timescales of hundreds of milliseconds to tens of seconds. We discuss the relation of this newly discovered acoustic property to temporal variability in linguistic units and its potential implications for parent-infant communication and infants learning the hierarchical structures of speech and language. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Cluster analysis as a prediction tool for pregnancy outcomes.

    Science.gov (United States)

    Banjari, Ines; Kenjerić, Daniela; Šolić, Krešimir; Mandić, Milena L

    2015-03-01

    Considering specific physiology changes during gestation and thinking of pregnancy as a "critical window", classification of pregnant women at early pregnancy can be considered as crucial. The paper demonstrates the use of a method based on an approach from intelligent data mining, cluster analysis. Cluster analysis method is a statistical method which makes possible to group individuals based on sets of identifying variables. The method was chosen in order to determine possibility for classification of pregnant women at early pregnancy to analyze unknown correlations between different variables so that the certain outcomes could be predicted. 222 pregnant women from two general obstetric offices' were recruited. The main orient was set on characteristics of these pregnant women: their age, pre-pregnancy body mass index (BMI) and haemoglobin value. Cluster analysis gained a 94.1% classification accuracy rate with three branch- es or groups of pregnant women showing statistically significant correlations with pregnancy outcomes. The results are showing that pregnant women both of older age and higher pre-pregnancy BMI have a significantly higher incidence of delivering baby of higher birth weight but they gain significantly less weight during pregnancy. Their babies are also longer, and these women have significantly higher probability for complications during pregnancy (gestosis) and higher probability of induced or caesarean delivery. We can conclude that the cluster analysis method can appropriately classify pregnant women at early pregnancy to predict certain outcomes.

  12. ANALYSIS OF SPATIO-TEMPORAL TRAFFIC PATTERNS BASED ON PEDESTRIAN TRAJECTORIES

    Directory of Open Access Journals (Sweden)

    S. Busch

    2016-06-01

    Full Text Available For driver assistance and autonomous driving systems, it is essential to predict the behaviour of other traffic participants. Usually, standard filter approaches are used to this end, however, in many cases, these are not sufficient. For example, pedestrians are able to change their speed or direction instantly. Also, there may be not enough observation data to determine the state of an object reliably, e.g. in case of occlusions. In those cases, it is very useful if a prior model exists, which suggests certain outcomes. For example, it is useful to know that pedestrians are usually crossing the road at a certain location and at certain times. This information can then be stored in a map which then can be used as a prior in scene analysis, or in practical terms to reduce the speed of a vehicle in advance in order to minimize critical situations. In this paper, we present an approach to derive such a spatio-temporal map automatically from the observed behaviour of traffic participants in everyday traffic situations. In our experiments, we use one stationary camera to observe a complex junction, where cars, public transportation and pedestrians interact. We concentrate on the pedestrians trajectories to map traffic patterns. In the first step, we extract trajectory segments from the video data. These segments are then clustered in order to derive a spatial model of the scene, in terms of a spatially embedded graph. In the second step, we analyse the temporal patterns of pedestrian movement on this graph. We are able to derive traffic light sequences as well as the timetables of nearby public transportation. To evaluate our approach, we used a 4 hour video sequence. We show that we are able to derive traffic light sequences as well as time tables of nearby public transportation.

  13. Analysis of Spatio-Temporal Traffic Patterns Based on Pedestrian Trajectories

    Science.gov (United States)

    Busch, S.; Schindler, T.; Klinger, T.; Brenner, C.

    2016-06-01

    For driver assistance and autonomous driving systems, it is essential to predict the behaviour of other traffic participants. Usually, standard filter approaches are used to this end, however, in many cases, these are not sufficient. For example, pedestrians are able to change their speed or direction instantly. Also, there may be not enough observation data to determine the state of an object reliably, e.g. in case of occlusions. In those cases, it is very useful if a prior model exists, which suggests certain outcomes. For example, it is useful to know that pedestrians are usually crossing the road at a certain location and at certain times. This information can then be stored in a map which then can be used as a prior in scene analysis, or in practical terms to reduce the speed of a vehicle in advance in order to minimize critical situations. In this paper, we present an approach to derive such a spatio-temporal map automatically from the observed behaviour of traffic participants in everyday traffic situations. In our experiments, we use one stationary camera to observe a complex junction, where cars, public transportation and pedestrians interact. We concentrate on the pedestrians trajectories to map traffic patterns. In the first step, we extract trajectory segments from the video data. These segments are then clustered in order to derive a spatial model of the scene, in terms of a spatially embedded graph. In the second step, we analyse the temporal patterns of pedestrian movement on this graph. We are able to derive traffic light sequences as well as the timetables of nearby public transportation. To evaluate our approach, we used a 4 hour video sequence. We show that we are able to derive traffic light sequences as well as time tables of nearby public transportation.

  14. Natural Time and Nowcasting Earthquakes: Are Large Global Earthquakes Temporally Clustered?

    Science.gov (United States)

    Luginbuhl, Molly; Rundle, John B.; Turcotte, Donald L.

    2018-02-01

    The objective of this paper is to analyze the temporal clustering of large global earthquakes with respect to natural time, or interevent count, as opposed to regular clock time. To do this, we use two techniques: (1) nowcasting, a new method of statistically classifying seismicity and seismic risk, and (2) time series analysis of interevent counts. We chose the sequences of M_{λ } ≥ 7.0 and M_{λ } ≥ 8.0 earthquakes from the global centroid moment tensor (CMT) catalog from 2004 to 2016 for analysis. A significant number of these earthquakes will be aftershocks of the largest events, but no satisfactory method of declustering the aftershocks in clock time is available. A major advantage of using natural time is that it eliminates the need for declustering aftershocks. The event count we utilize is the number of small earthquakes that occur between large earthquakes. The small earthquake magnitude is chosen to be as small as possible, such that the catalog is still complete based on the Gutenberg-Richter statistics. For the CMT catalog, starting in 2004, we found the completeness magnitude to be M_{σ } ≥ 5.1. For the nowcasting method, the cumulative probability distribution of these interevent counts is obtained. We quantify the distribution using the exponent, β, of the best fitting Weibull distribution; β = 1 for a random (exponential) distribution. We considered 197 earthquakes with M_{λ } ≥ 7.0 and found β = 0.83 ± 0.08. We considered 15 earthquakes with M_{λ } ≥ 8.0, but this number was considered too small to generate a meaningful distribution. For comparison, we generated synthetic catalogs of earthquakes that occur randomly with the Gutenberg-Richter frequency-magnitude statistics. We considered a synthetic catalog of 1.97 × 10^5 M_{λ } ≥ 7.0 earthquakes and found β = 0.99 ± 0.01. The random catalog converted to natural time was also random. We then generated 1.5 × 10^4 synthetic catalogs with 197 M_{λ } ≥ 7.0 in each catalog and

  15. Phenotypes of asthma in low-income children and adolescents: cluster analysis

    Directory of Open Access Journals (Sweden)

    Anna Lucia Barros Cabral

    Full Text Available ABSTRACT Objective: Studies characterizing asthma phenotypes have predominantly included adults or have involved children and adolescents in developed countries. Therefore, their applicability in other populations, such as those of developing countries, remains indeterminate. Our objective was to determine how low-income children and adolescents with asthma in Brazil are distributed across a cluster analysis. Methods: We included 306 children and adolescents (6-18 years of age with a clinical diagnosis of asthma and under medical treatment for at least one year of follow-up. At enrollment, all the patients were clinically stable. For the cluster analysis, we selected 20 variables commonly measured in clinical practice and considered important in defining asthma phenotypes. Variables with high multicollinearity were excluded. A cluster analysis was applied using a twostep agglomerative test and log-likelihood distance measure. Results: Three clusters were defined for our population. Cluster 1 (n = 94 included subjects with normal pulmonary function, mild eosinophil inflammation, few exacerbations, later age at asthma onset, and mild atopy. Cluster 2 (n = 87 included those with normal pulmonary function, a moderate number of exacerbations, early age at asthma onset, more severe eosinophil inflammation, and moderate atopy. Cluster 3 (n = 108 included those with poor pulmonary function, frequent exacerbations, severe eosinophil inflammation, and severe atopy. Conclusions: Asthma was characterized by the presence of atopy, number of exacerbations, and lung function in low-income children and adolescents in Brazil. The many similarities with previous cluster analyses of phenotypes indicate that this approach shows good generalizability.

  16. Similarity, Clustering, and Scaling Analyses for the Foreign Exchange Market ---Comprehensive Analysis on States of Market Participants with High-Frequency Financial Data---

    Science.gov (United States)

    Sato, A.; Sakai, H.; Nishimura, M.; Holyst, J. A.

    This article proposes mathematical methods to quantify states of marketparticipants in the foreign exchange market (FX market) and conduct comprehensive analysis on behavior of market participants by means of high-frequency financial data. Based on econophysics tools and perspectives we study similarity measures for both rate movements and quotation activities among various currency pairs. We perform also clustering analysis on market states for observation days, and find scaling relationship between mean values of quotation activities and their standard deviations. Using these mathematical methods we can visualize states of the FX market comprehensively. Finally we conclude that states of market participants temporally vary due to both external and internal factors.

  17. Hierarchical Star Formation in Turbulent Media: Evidence from Young Star Clusters

    Energy Technology Data Exchange (ETDEWEB)

    Grasha, K.; Calzetti, D. [Astronomy Department, University of Massachusetts, Amherst, MA 01003 (United States); Elmegreen, B. G. [IBM Research Division, T.J. Watson Research Center, Yorktown Heights, NY (United States); Adamo, A.; Messa, M. [Department of Astronomy, The Oskar Klein Centre, Stockholm University, Stockholm (Sweden); Aloisi, A.; Bright, S. N.; Lee, J. C.; Ryon, J. E.; Ubeda, L. [Space Telescope Science Institute, Baltimore, MD (United States); Cook, D. O. [California Institute of Technology, 1200 East California Boulevard, Pasadena, CA (United States); Dale, D. A. [Department of Physics and Astronomy, University of Wyoming, Laramie, WY (United States); Fumagalli, M. [Institute for Computational Cosmology and Centre for Extragalactic Astronomy, Department of Physics, Durham University, Durham (United Kingdom); Gallagher III, J. S. [Department of Astronomy, University of Wisconsin–Madison, Madison, WI (United States); Gouliermis, D. A. [Zentrum für Astronomie der Universität Heidelberg, Institut für Theoretische Astrophysik, Albert-Ueberle-Str. 2, D-69120 Heidelberg (Germany); Grebel, E. K. [Astronomisches Rechen-Institut, Zentrum für Astronomie der Universität Heidelberg, Mönchhofstr. 12-14, D-69120, Heidelberg (Germany); Kahre, L. [Department of Astronomy, New Mexico State University, Las Cruces, NM (United States); Kim, H. [Gemini Observatory, La Serena (Chile); Krumholz, M. R., E-mail: kgrasha@astro.umass.edu [Research School of Astronomy and Astrophysics, Australian National University, Canberra, ACT 2611 (Australia)

    2017-06-10

    We present an analysis of the positions and ages of young star clusters in eight local galaxies to investigate the connection between the age difference and separation of cluster pairs. We find that star clusters do not form uniformly but instead are distributed so that the age difference increases with the cluster pair separation to the 0.25–0.6 power, and that the maximum size over which star formation is physically correlated ranges from ∼200 pc to ∼1 kpc. The observed trends between age difference and separation suggest that cluster formation is hierarchical both in space and time: clusters that are close to each other are more similar in age than clusters born further apart. The temporal correlations between stellar aggregates have slopes that are consistent with predictions of turbulence acting as the primary driver of star formation. The velocity associated with the maximum size is proportional to the galaxy’s shear, suggesting that the galactic environment influences the maximum size of the star-forming structures.

  18. Reproducibility of Cognitive Profiles in Psychosis Using Cluster Analysis.

    Science.gov (United States)

    Lewandowski, Kathryn E; Baker, Justin T; McCarthy, Julie M; Norris, Lesley A; Öngür, Dost

    2018-04-01

    Cognitive dysfunction is a core symptom dimension that cuts across the psychoses. Recent findings support classification of patients along the cognitive dimension using cluster analysis; however, data-derived groupings may be highly determined by sampling characteristics and the measures used to derive the clusters, and so their interpretability must be established. We examined cognitive clusters in a cross-diagnostic sample of patients with psychosis and associations with clinical and functional outcomes. We then compared our findings to a previous report of cognitive clusters in a separate sample using a different cognitive battery. Participants with affective or non-affective psychosis (n=120) and healthy controls (n=31) were administered the MATRICS Consensus Cognitive Battery, and clinical and community functioning assessments. Cluster analyses were performed on cognitive variables, and clusters were compared on demographic, cognitive, and clinical measures. Results were compared to findings from our previous report. A four-cluster solution provided a good fit to the data; profiles included a neuropsychologically normal cluster, a globally impaired cluster, and two clusters of mixed profiles. Cognitive burden was associated with symptom severity and poorer community functioning. The patterns of cognitive performance by cluster were highly consistent with our previous findings. We found evidence of four cognitive subgroups of patients with psychosis, with cognitive profiles that map closely to those produced in our previous work. Clusters were associated with clinical and community variables and a measure of premorbid functioning, suggesting that they reflect meaningful groupings: replicable, and related to clinical presentation and functional outcomes. (JINS, 2018, 24, 382-390).

  19. rEMM: Extensible Markov Model for Data Stream Clustering in R

    Directory of Open Access Journals (Sweden)

    Michael Hahsler

    2010-10-01

    Full Text Available Clustering streams of continuously arriving data has become an important application of data mining in recent years and efficient algorithms have been proposed by several researchers. However, clustering alone neglects the fact that data in a data stream is not only characterized by the proximity of data points which is used by clustering, but also by a temporal component. The extensible Markov model (EMM adds the temporal component to data stream clustering by superimposing a dynamically adapting Markov chain. In this paper we introduce the implementation of the R extension package rEMM which implements EMM and we discuss some examples and applications.

  20. Spatial and temporal characterizations of water quality in Kuwait Bay.

    Science.gov (United States)

    Al-Mutairi, N; Abahussain, A; El-Battay, A

    2014-06-15

    The spatial and temporal patterns of water quality in Kuwait Bay have been investigated using data from six stations between 2009 and 2011. The results showed that most of water quality parameters such as phosphorus (PO4), nitrate (NO3), dissolved oxygen (DO), and Total Suspended Solids (TSS) fluctuated over time and space. Based on Water Quality Index (WQI) data, six stations were significantly clustered into two main classes using cluster analysis, one group located in western side of the Bay, and other in eastern side. Three principal components are responsible for water quality variations in the Bay. The first component included DO and pH. The second included PO4, TSS and NO3, and the last component contained seawater temperature and turbidity. The spatial and temporal patterns of water quality in Kuwait Bay are mainly controlled by seasonal variations and discharges from point sources of pollution along Kuwait Bay's coast as well as from Shatt Al-Arab River. Copyright © 2014 Elsevier Ltd. All rights reserved.

  1. Identifying novel phenotypes of acute heart failure using cluster analysis of clinical variables.

    Science.gov (United States)

    Horiuchi, Yu; Tanimoto, Shuzou; Latif, A H M Mahbub; Urayama, Kevin Y; Aoki, Jiro; Yahagi, Kazuyuki; Okuno, Taishi; Sato, Yu; Tanaka, Tetsu; Koseki, Keita; Komiyama, Kota; Nakajima, Hiroyoshi; Hara, Kazuhiro; Tanabe, Kengo

    2018-07-01

    Acute heart failure (AHF) is a heterogeneous disease caused by various cardiovascular (CV) pathophysiology and multiple non-CV comorbidities. We aimed to identify clinically important subgroups to improve our understanding of the pathophysiology of AHF and inform clinical decision-making. We evaluated detailed clinical data of 345 consecutive AHF patients using non-hierarchical cluster analysis of 77 variables, including age, sex, HF etiology, comorbidities, physical findings, laboratory data, electrocardiogram, echocardiogram and treatment during hospitalization. Cox proportional hazards regression analysis was performed to estimate the association between the clusters and clinical outcomes. Three clusters were identified. Cluster 1 (n=108) represented "vascular failure". This cluster had the highest average systolic blood pressure at admission and lung congestion with type 2 respiratory failure. Cluster 2 (n=89) represented "cardiac and renal failure". They had the lowest ejection fraction (EF) and worst renal function. Cluster 3 (n=148) comprised mostly older patients and had the highest prevalence of atrial fibrillation and preserved EF. Death or HF hospitalization within 12-month occurred in 23% of Cluster 1, 36% of Cluster 2 and 36% of Cluster 3 (p=0.034). Compared with Cluster 1, risk of death or HF hospitalization was 1.74 (95% CI, 1.03-2.95, p=0.037) for Cluster 2 and 1.82 (95% CI, 1.13-2.93, p=0.014) for Cluster 3. Cluster analysis may be effective in producing clinically relevant categories of AHF, and may suggest underlying pathophysiology and potential utility in predicting clinical outcomes. Copyright © 2018 Elsevier B.V. All rights reserved.

  2. Identification and validation of asthma phenotypes in Chinese population using cluster analysis.

    Science.gov (United States)

    Wang, Lei; Liang, Rui; Zhou, Ting; Zheng, Jing; Liang, Bing Miao; Zhang, Hong Ping; Luo, Feng Ming; Gibson, Peter G; Wang, Gang

    2017-10-01

    Asthma is a heterogeneous airway disease, so it is crucial to clearly identify clinical phenotypes to achieve better asthma management. To identify and prospectively validate asthma clusters in a Chinese population. Two hundred eighty-four patients were consecutively recruited and 18 sociodemographic and clinical variables were collected. Hierarchical cluster analysis was performed by the Ward method followed by k-means cluster analysis. Then, a prospective 12-month cohort study was used to validate the identified clusters. Five clusters were successfully identified. Clusters 1 (n = 71) and 3 (n = 81) were mild asthma phenotypes with slight airway obstruction and low exacerbation risk, but with a sex differential. Cluster 2 (n = 65) described an "allergic" phenotype, cluster 4 (n = 33) featured a "fixed airflow limitation" phenotype with smoking, and cluster 5 (n = 34) was a "low socioeconomic status" phenotype. Patients in clusters 2, 4, and 5 had distinctly lower socioeconomic status and more psychological symptoms. Cluster 2 had a significantly increased risk of exacerbations (risk ratio [RR] 1.13, 95% confidence interval [CI] 1.03-1.25), unplanned visits for asthma (RR 1.98, 95% CI 1.07-3.66), and emergency visits for asthma (RR 7.17, 95% CI 1.26-40.80). Cluster 4 had an increased risk of unplanned visits (RR 2.22, 95% CI 1.02-4.81), and cluster 5 had increased emergency visits (RR 12.72, 95% CI 1.95-69.78). Kaplan-Meier analysis confirmed that cluster grouping was predictive of time to the first asthma exacerbation, unplanned visit, emergency visit, and hospital admission (P clusters as "allergic asthma," "fixed airflow limitation," and "low socioeconomic status" phenotypes that are at high risk of severe asthma exacerbations and that have management implications for clinical practice in developing countries. Copyright © 2017 American College of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  3. Comparison of population-averaged and cluster-specific models for the analysis of cluster randomized trials with missing binary outcomes: a simulation study

    Directory of Open Access Journals (Sweden)

    Ma Jinhui

    2013-01-01

    Full Text Available Abstracts Background The objective of this simulation study is to compare the accuracy and efficiency of population-averaged (i.e. generalized estimating equations (GEE and cluster-specific (i.e. random-effects logistic regression (RELR models for analyzing data from cluster randomized trials (CRTs with missing binary responses. Methods In this simulation study, clustered responses were generated from a beta-binomial distribution. The number of clusters per trial arm, the number of subjects per cluster, intra-cluster correlation coefficient, and the percentage of missing data were allowed to vary. Under the assumption of covariate dependent missingness, missing outcomes were handled by complete case analysis, standard multiple imputation (MI and within-cluster MI strategies. Data were analyzed using GEE and RELR. Performance of the methods was assessed using standardized bias, empirical standard error, root mean squared error (RMSE, and coverage probability. Results GEE performs well on all four measures — provided the downward bias of the standard error (when the number of clusters per arm is small is adjusted appropriately — under the following scenarios: complete case analysis for CRTs with a small amount of missing data; standard MI for CRTs with variance inflation factor (VIF 50. RELR performs well only when a small amount of data was missing, and complete case analysis was applied. Conclusion GEE performs well as long as appropriate missing data strategies are adopted based on the design of CRTs and the percentage of missing data. In contrast, RELR does not perform well when either standard or within-cluster MI strategy is applied prior to the analysis.

  4. Cluster analysis of radionuclide concentrations in beach sand

    NARCIS (Netherlands)

    de Meijer, R.J.; James, I.; Jennings, P.J.; Keoyers, J.E.

    This paper presents a method in which natural radionuclide concentrations of beach sand minerals are traced along a stretch of coast by cluster analysis. This analysis yields two groups of mineral deposit with different origins. The method deviates from standard methods of following dispersal of

  5. Spatiotemporal Analysis of the Ebola Hemorrhagic Fever in West Africa in 2014

    Science.gov (United States)

    Xu, M.; Cao, C. X.; Guo, H. F.

    2017-09-01

    Ebola hemorrhagic fever (EHF) is an acute hemorrhagic diseases caused by the Ebola virus, which is highly contagious. This paper aimed to explore the possible gathering area of EHF cases in West Africa in 2014, and identify endemic areas and their tendency by means of time-space analysis. We mapped distribution of EHF incidences and explored statistically significant space, time and space-time disease clusters. We utilized hotspot analysis to find the spatial clustering pattern on the basis of the actual outbreak cases. spatial-temporal cluster analysis is used to analyze the spatial or temporal distribution of agglomeration disease, examine whether its distribution is statistically significant. Local clusters were investigated using Kulldorff's scan statistic approach. The result reveals that the epidemic mainly gathered in the western part of Africa near north Atlantic with obvious regional distribution. For the current epidemic, we have found areas in high incidence of EVD by means of spatial cluster analysis.

  6. Mortality from Suicide in the Municipalities of Mainland Portugal: Spatio-Temporal Evolution between 1980 and 2015

    Directory of Open Access Journals (Sweden)

    Adriana Loureiro

    2018-01-01

    Full Text Available Introduction: Suicide is considered a public health priority. It is a complex phenomenon resulting from the interaction of several factors, which do not depend solely on individual conditions. This study analyzes the spatio-temporal evolution of suicide mortality between 1980 and 2015, identifying areas of high risk, and their variation, in the 278 municipalities of Continental Portugal. Material and Methods: Based on the number of self-inflicted injuries and deaths from suicide and the resident population, the spatio-temporal evolution of the suicide mortality rate was assessed via: i a Poisson joinpoint regression model, and ii spatio-temporal clustering methods. Results: The suicide mortality rate evolution showed statistically significant increases over three periods (1980 - 1984; 1999 - 2002 and 2006 - 2015 and two statistically significant periods of decrease (1984 - 1995 and 1995 - 1999. The spatio-temporal analysis identified five clusters of high suicide risk (relative risk >1 and four clusters of low suicide risk (relative risk < 1. Discussion: The periods when suicide mortality increases seem to overlap with times of economic and financial instability. The geographical pattern of suicide risk has changed: presently, the suicide rates from the municipalities in the Center and North are showing more similarity with those seen in the South, thus increasing the ruralization of the phenomenon of suicide. Conclusion: Between 1980 and 2015 the spacio-temporal pattern of mortality from suicide has been changing and is a phenomenon that is currently experiencing a growing trend (since 2006 and is of higher risk in rural areas.

  7. Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient.

    Science.gov (United States)

    Yao, Jianchao; Chang, Chunqi; Salmi, Mari L; Hung, Yeung Sam; Loraine, Ann; Roux, Stanley J

    2008-06-18

    Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. This study shows that SCC is an alternative to the Pearson

  8. Application of microarray analysis on computer cluster and cloud platforms.

    Science.gov (United States)

    Bernau, C; Boulesteix, A-L; Knaus, J

    2013-01-01

    Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.

  9. GLOBULAR CLUSTER ABUNDANCES FROM HIGH-RESOLUTION, INTEGRATED-LIGHT SPECTROSCOPY. II. EXPANDING THE METALLICITY RANGE FOR OLD CLUSTERS AND UPDATED ANALYSIS TECHNIQUES

    Energy Technology Data Exchange (ETDEWEB)

    Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew [The Observatories of the Carnegie Institution for Science, 813 Santa Barbara St., Pasadena, CA 91101 (United States)

    2017-01-10

    We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ∼0.1 dex for GCs with metallicities as high as [Fe/H] = −0.3, but the abundances measured for more metal-rich clusters may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na i, Mg i, Al i, Si i, Ca i, Ti i, Ti ii, Sc ii, V i, Cr i, Mn i, Co i, Ni i, Cu i, Y ii, Zr i, Ba ii, La ii, Nd ii, and Eu ii. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe i, Ca i, Si i, Ni i, and Ba ii. The elements that show the greatest differences include Mg i and Zr i. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements.

  10. Temporal analysis of static priority preemptive scheduled cyclic streaming applications using CSDF models

    NARCIS (Netherlands)

    Kurtin, Philip Sebastian; Bekooij, Marco Jan Gerrit

    2016-01-01

    Real-time streaming applications with cyclic data dependencies that are executed on multiprocessor systems with processor sharing usually require a temporal analysis to give guarantees on their temporal behavior at design time. Current accurate analysis techniques for cyclic applications that are

  11. Spatio-temporal trends and risk factors for Shigella from 2001 to 2011 in Jiangsu Province, People's Republic of China.

    Science.gov (United States)

    Tang, Fenyang; Cheng, Yuejia; Bao, Changjun; Hu, Jianli; Liu, Wendong; Liang, Qi; Wu, Ying; Norris, Jessie; Peng, Zhihang; Yu, Rongbin; Shen, Hongbing; Chen, Feng

    2014-01-01

    This study aimed to describe the spatial and temporal trends of Shigella incidence rates in Jiangsu Province, People's Republic of China. It also intended to explore complex risk modes facilitating Shigella transmission. County-level incidence rates were obtained for analysis using geographic information system (GIS) tools. Trend surface and incidence maps were established to describe geographic distributions. Spatio-temporal cluster analysis and autocorrelation analysis were used for detecting clusters. Based on the number of monthly Shigella cases, an autoregressive integrated moving average (ARIMA) model successfully established a time series model. A spatial correlation analysis and a case-control study were conducted to identify risk factors contributing to Shigella transmissions. The far southwestern and northwestern areas of Jiangsu were the most infected. A cluster was detected in southwestern Jiangsu (LLR = 11674.74, PHighways and water sources potentially caused spatial variation in Shigella development in Jiangsu. The case-control study confirmed not washing hands before dinner (OR = 3.64) and not having access to a safe water source (OR = 2.04) as the main causes of Shigella in Jiangsu Province. Improvement of sanitation and hygiene should be strengthened in economically developed counties, while access to a safe water supply in impoverished areas should be increased at the same time.

  12. A Novel Divisive Hierarchical Clustering Algorithm for Geospatial Analysis

    Directory of Open Access Journals (Sweden)

    Shaoning Li

    2017-01-01

    Full Text Available In the fields of geographic information systems (GIS and remote sensing (RS, the clustering algorithm has been widely used for image segmentation, pattern recognition, and cartographic generalization. Although clustering analysis plays a key role in geospatial modelling, traditional clustering methods are limited due to computational complexity, noise resistant ability and robustness. Furthermore, traditional methods are more focused on the adjacent spatial context, which makes it hard for the clustering methods to be applied to multi-density discrete objects. In this paper, a new method, cell-dividing hierarchical clustering (CDHC, is proposed based on convex hull retraction. The main steps are as follows. First, a convex hull structure is constructed to describe the global spatial context of geospatial objects. Then, the retracting structure of each borderline is established in sequence by setting the initial parameter. The objects are split into two clusters (i.e., “sub-clusters” if the retracting structure intersects with the borderlines. Finally, clusters are repeatedly split and the initial parameter is updated until the terminate condition is satisfied. The experimental results show that CDHC separates the multi-density objects from noise sufficiently and also reduces complexity compared to the traditional agglomerative hierarchical clustering algorithm.

  13. Temporal and spatial assessment of river surface water quality using multivariate statistical techniques: a study in Can Tho City, a Mekong Delta area, Vietnam.

    Science.gov (United States)

    Phung, Dung; Huang, Cunrui; Rutherford, Shannon; Dwirahmadi, Febi; Chu, Cordia; Wang, Xiaoming; Nguyen, Minh; Nguyen, Nga Huy; Do, Cuong Manh; Nguyen, Trung Hieu; Dinh, Tuan Anh Diep

    2015-05-01

    The present study is an evaluation of temporal/spatial variations of surface water quality using multivariate statistical techniques, comprising cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminant analysis (DA). Eleven water quality parameters were monitored at 38 different sites in Can Tho City, a Mekong Delta area of Vietnam from 2008 to 2012. Hierarchical cluster analysis grouped the 38 sampling sites into three clusters, representing mixed urban-rural areas, agricultural areas and industrial zone. FA/PCA resulted in three latent factors for the entire research location, three for cluster 1, four for cluster 2, and four for cluster 3 explaining 60, 60.2, 80.9, and 70% of the total variance in the respective water quality. The varifactors from FA indicated that the parameters responsible for water quality variations are related to erosion from disturbed land or inflow of effluent from sewage plants and industry, discharges from wastewater treatment plants and domestic wastewater, agricultural activities and industrial effluents, and contamination by sewage waste with faecal coliform bacteria through sewer and septic systems. Discriminant analysis (DA) revealed that nephelometric turbidity units (NTU), chemical oxygen demand (COD) and NH₃ are the discriminating parameters in space, affording 67% correct assignation in spatial analysis; pH and NO₂ are the discriminating parameters according to season, assigning approximately 60% of cases correctly. The findings suggest a possible revised sampling strategy that can reduce the number of sampling sites and the indicator parameters responsible for large variations in water quality. This study demonstrates the usefulness of multivariate statistical techniques for evaluation of temporal/spatial variations in water quality assessment and management.

  14. Clustering of users of digital libraries through log file analysis

    Directory of Open Access Journals (Sweden)

    Juan Antonio Martínez-Comeche

    2017-09-01

    Full Text Available This study analyzes how users perform information retrieval tasks when introducing queries to the Hispanic Digital Library. Clusters of users are differentiated based on their distinct information behavior. The study used the log files collected by the server over a year and different possible clustering algorithms are compared. The k-means algorithm is found to be a suitable clustering method for the analysis of large log files from digital libraries. In the case of the Hispanic Digital Library the results show three clusters of users and the characteristic information behavior of each group is described.

  15. Feasibility Study of Parallel Finite Element Analysis on Cluster-of-Clusters

    Science.gov (United States)

    Muraoka, Masae; Okuda, Hiroshi

    With the rapid growth of WAN infrastructure and development of Grid middleware, it's become a realistic and attractive methodology to connect cluster machines on wide-area network for the execution of computation-demanding applications. Many existing parallel finite element (FE) applications have been, however, designed and developed with a single computing resource in mind, since such applications require frequent synchronization and communication among processes. There have been few FE applications that can exploit the distributed environment so far. In this study, we explore the feasibility of FE applications on the cluster-of-clusters. First, we classify FE applications into two types, tightly coupled applications (TCA) and loosely coupled applications (LCA) based on their communication pattern. A prototype of each application is implemented on the cluster-of-clusters. We perform numerical experiments executing TCA and LCA on both the cluster-of-clusters and a single cluster. Thorough these experiments, by comparing the performances and communication cost in each case, we evaluate the feasibility of FEA on the cluster-of-clusters.

  16. Time fluctuation analysis of forest fire sequences

    Science.gov (United States)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value

  17. Classification of behavior using unsupervised temporal neural networks

    International Nuclear Information System (INIS)

    Adair, K.L.

    1998-03-01

    Adding recurrent connections to unsupervised neural networks used for clustering creates a temporal neural network which clusters a sequence of inputs as they appear over time. The model presented combines the Jordan architecture with the unsupervised learning technique Adaptive Resonance Theory, Fuzzy ART. The combination yields a neural network capable of quickly clustering sequential pattern sequences as the sequences are generated. The applicability of the architecture is illustrated through a facility monitoring problem

  18. Full text clustering and relationship network analysis of biomedical publications.

    Directory of Open Access Journals (Sweden)

    Renchu Guan

    Full Text Available Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  19. Full text clustering and relationship network analysis of biomedical publications.

    Science.gov (United States)

    Guan, Renchu; Yang, Chen; Marchese, Maurizio; Liang, Yanchun; Shi, Xiaohu

    2014-01-01

    Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP) to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  20. The Review of Visual Analysis Methods of Multi-modal Spatio-temporal Big Data

    Directory of Open Access Journals (Sweden)

    ZHU Qing

    2017-10-01

    Full Text Available The visual analysis of spatio-temporal big data is not only the state-of-art research direction of both big data analysis and data visualization, but also the core module of pan-spatial information system. This paper reviews existing visual analysis methods at three levels:descriptive visual analysis, explanatory visual analysis and exploratory visual analysis, focusing on spatio-temporal big data's characteristics of multi-source, multi-granularity, multi-modal and complex association.The technical difficulties and development tendencies of multi-modal feature selection, innovative human-computer interaction analysis and exploratory visual reasoning in the visual analysis of spatio-temporal big data were discussed. Research shows that the study of descriptive visual analysis for data visualizationis is relatively mature.The explanatory visual analysis has become the focus of the big data analysis, which is mainly based on interactive data mining in a visual environment to diagnose implicit reason of problem. And the exploratory visual analysis method needs a major break-through.

  1. Steady state subchannel analysis of AHWR fuel cluster

    International Nuclear Information System (INIS)

    Dasgupta, A.; Chandraker, D.K.; Vijayan, P.K.; Saha, D.

    2006-09-01

    Subchannel analysis is a technique used to predict the thermal hydraulic behavior of reactor fuel assemblies. The rod cluster is subdivided into a number of parallel interacting flow subchannels. The conservation equations are solved for each of these subchannels, taking into account subchannel interactions. Subchannel analysis of AHWR D-5 fuel cluster has been carried out to determine the variations in thermal hydraulic conditions of coolant and fuel temperatures along the length of the fuel bundle. The hottest regions within the AHWR fuel bundle have been identified. The effect of creep on the fuel performance has also been studied. MCHFR has been calculated using Jansen-Levy correlation. The calculations have been backed by sensitivity analysis for parameters whose values are not known accurately. The sensitivity analysis showed the calculations to have a very low sensitivity to these parameters. Apart from the analysis, the report also includes a brief introduction of a few subchannel codes. A brief description of the equations and solution methodology used in COBRA-IIIC and COBRA-IV-I is also given. (author)

  2. Mobility in Europe: Recent Trends from a Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Ioana Manafi

    2017-08-01

    Full Text Available During the past decade, Europe was confronted with major changes and events offering large opportunities for mobility. The EU enlargement process, the EU policies regarding youth, the economic crisis affecting national economies on different levels, political instabilities in some European countries, high rates of unemployment or the increasing number of refugees are only a few of the factors influencing net migration in Europe. Based on a set of socio-economic indicators for EU/EFTA countries and cluster analysis, the paper provides an overview of regional differences across European countries, related to migration magnitude in the identified clusters. The obtained clusters are in accordance with previous studies in migration, and appear stable during the period of 2005-2013, with only some exceptions. The analysis revealed three country clusters: EU/EFTA center-receiving countries, EU/EFTA periphery-sending countries and EU/EFTA outlier countries, the names suggesting not only the geographical position within Europe, but the trends in net migration flows during the years. Therewith, the results provide evidence for the persistence of a movement from periphery to center countries, which is correlated with recent flows of mobility in Europe.

  3. Cluster analysis for portfolio optimization

    OpenAIRE

    Vincenzo Tola; Fabrizio Lillo; Mauro Gallegati; Rosario N. Mantegna

    2005-01-01

    We consider the problem of the statistical uncertainty of the correlation matrix in the optimization of a financial portfolio. We show that the use of clustering algorithms can improve the reliability of the portfolio in terms of the ratio between predicted and realized risk. Bootstrap analysis indicates that this improvement is obtained in a wide range of the parameters N (number of assets) and T (investment horizon). The predicted and realized risk level and the relative portfolio compositi...

  4. Hierarchical cluster analysis of progression patterns in open-angle glaucoma patients with medical treatment.

    Science.gov (United States)

    Bae, Hyoung Won; Rho, Seungsoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2014-04-29

    To classify medically treated open-angle glaucoma (OAG) by the pattern of progression using hierarchical cluster analysis, and to determine OAG progression characteristics by comparing clusters. Ninety-five eyes of 95 OAG patients who received medical treatment, and who had undergone visual field (VF) testing at least once per year for 5 or more years. OAG was classified into subgroups using hierarchical cluster analysis based on the following five variables: baseline mean deviation (MD), baseline visual field index (VFI), MD slope, VFI slope, and Glaucoma Progression Analysis (GPA) printout. After that, other parameters were compared between clusters. Two clusters were made after a hierarchical cluster analysis. Cluster 1 showed -4.06 ± 2.43 dB baseline MD, 92.58% ± 6.27% baseline VFI, -0.28 ± 0.38 dB per year MD slope, -0.52% ± 0.81% per year VFI slope, and all "no progression" cases in GPA printout, whereas cluster 2 showed -8.68 ± 3.81 baseline MD, 77.54 ± 12.98 baseline VFI, -0.72 ± 0.55 MD slope, -2.22 ± 1.89 VFI slope, and seven "possible" and four "likely" progression cases in GPA printout. There were no significant differences in age, sex, mean IOP, central corneal thickness, and axial length between clusters. However, cluster 2 included more high-tension glaucoma patients and used a greater number of antiglaucoma eye drops significantly compared with cluster 1. Hierarchical cluster analysis of progression patterns divided OAG into slow and fast progression groups, evidenced by assessing the parameters of glaucomatous progression in VF testing. In the fast progression group, the prevalence of high-tension glaucoma was greater and the number of antiglaucoma medications administered was increased versus the slow progression group. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.

  5. Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms.

    Science.gov (United States)

    Esplin, M Sean; Manuck, Tracy A; Varner, Michael W; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M; Ilekis, John

    2015-09-01

    We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators' ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB. We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks' gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software. One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ(2) analysis (PD and DH, P cluster 3 of SPTB. We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. SERA Scenarios of Early Market Fuel Cell Electric Vehicle Introductions: Modeling Framework, Regional Markets, and Station Clustering

    Energy Technology Data Exchange (ETDEWEB)

    Bush, B. [National Renewable Energy Lab. (NREL), Golden, CO (United States); Melaina, M. [National Renewable Energy Lab. (NREL), Golden, CO (United States); Penev, M. [National Renewable Energy Lab. (NREL), Golden, CO (United States); Daniel, W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2013-09-01

    This report describes the development and analysis of detailed temporal and spatial scenarios for early market hydrogen fueling infrastructure clustering and fuel cell electric vehicle rollout using the Scenario Evaluation, Regionalization and Analysis (SERA) model. The report provides an overview of the SERA scenario development framework and discusses the approach used to develop the nationwidescenario.

  7. Cluster and Double Star observations of dipolarization

    Directory of Open Access Journals (Sweden)

    R. Nakamura

    2005-11-01

    Full Text Available We studied two types of dipolarization events with different IMF conditions when Cluster and Double Star (TC-1 were located in the same local time sector: 7 August 2004, 18:00-24:00 UT, during a disturbed southward/northward IMF interval, and 14 August 2004, 21:00-24:00 UT, when the IMF was stably northward. Cluster observed dipolarization as well as fast flows during both intervals, but this was not the case for TC-1. For both events the satellites crossed near the conjugate location of the MIRACLE stations. By using multi-point analysis techniques, the direction/speed of the propagation is determined using Cluster and is then compared with the disturbances at TC-1 to discuss its spatial/temporal scale. The propagation direction of the BZ disturbance at Cluster was mainly dawnward with a tailward component for 7 August and with a significant Earthward component for 14 August associated with fast flows. We suggest that the role of the midtail fast flows can be quite different in the dissipation process depending on the condition of the IMF and resultant configuration of the tail.

  8. A critical cluster analysis of 44 indicators of author-level performance

    DEFF Research Database (Denmark)

    Wildgaard, Lorna Elizabeth

    2016-01-01

    -four indicators of individual researcher performance were computed using the data. The clustering solution was supported by continued reference to the researcher’s curriculum vitae, an effect analysis and a risk analysis. Disciplinary appropriate indicators were identified and used to divide the researchers......This paper explores a 7-stage cluster methodology as a process to identify appropriate indicators for evaluation of individual researchers at a disciplinary and seniority level. Publication and citation data for 741 researchers from 4 disciplines was collected in Web of Science. Forty...... of statistics in research evaluation. The strength of the 7-stage cluster methodology is that it makes clear that in the evaluation of individual researchers, statistics cannot stand alone. The methodology is reliant on contextual information to verify the bibliometric values and cluster solution...

  9. Tweets clustering using latent semantic analysis

    Science.gov (United States)

    Rasidi, Norsuhaili Mahamed; Bakar, Sakhinah Abu; Razak, Fatimah Abdul

    2017-04-01

    Social media are becoming overloaded with information due to the increasing number of information feeds. Unlike other social media, Twitter users are allowed to broadcast a short message called as `tweet". In this study, we extract tweets related to MH370 for certain of time. In this paper, we present overview of our approach for tweets clustering to analyze the users' responses toward tragedy of MH370. The tweets were clustered based on the frequency of terms obtained from the classification process. The method we used for the text classification is Latent Semantic Analysis. As a result, there are two types of tweets that response to MH370 tragedy which is emotional and non-emotional. We show some of our initial results to demonstrate the effectiveness of our approach.

  10. Symptom Cluster Research With Biomarkers and Genetics Using Latent Class Analysis.

    Science.gov (United States)

    Conley, Samantha

    2017-12-01

    The purpose of this article is to provide an overview of latent class analysis (LCA) and examples from symptom cluster research that includes biomarkers and genetics. A review of LCA with genetics and biomarkers was conducted using Medline, Embase, PubMed, and Google Scholar. LCA is a robust latent variable model used to cluster categorical data and allows for the determination of empirically determined symptom clusters. Researchers should consider using LCA to link empirically determined symptom clusters to biomarkers and genetics to better understand the underlying etiology of symptom clusters. The full potential of LCA in symptom cluster research has not yet been realized because it has been used in limited populations, and researchers have explored limited biologic pathways.

  11. The composite sequential clustering technique for analysis of multispectral scanner data

    Science.gov (United States)

    Su, M. Y.

    1972-01-01

    The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.

  12. Indexing, Query Processing, and Clustering of Spatio-Temporal Text Objects

    DEFF Research Database (Denmark)

    Skovsgaard, Anders

    With the increasing mobile use of the web from geo-positioned devices, the Internet is increasingly acquiring a spatial aspect, with still more types of content being geo-tagged. As a result of this development, a wide range of location-aware queries and applications have emerged. The large amounts...... of data available coupled with the increasing number of location-aware queries calls for efficient indexing and query processing techniques. This dissertation investigates how to manage geo-tagged text content to support these workloads in three specific areas: (i) grouping of spatio-textual objects, (ii......, the grouping of spatio-textual objects is done without considering query locations, and a clustering approach is proposed that takes into account both the spatial and textual attributes of the objects. The technique expands clusters based on a proposed quality function that enables clusters of arbitrary shape...

  13. Cluster-based analysis of multi-model climate ensembles

    Science.gov (United States)

    Hyde, Richard; Hossaini, Ryan; Leeson, Amber A.

    2018-06-01

    Clustering - the automated grouping of similar data - can provide powerful and unique insight into large and complex data sets, in a fast and computationally efficient manner. While clustering has been used in a variety of fields (from medical image processing to economics), its application within atmospheric science has been fairly limited to date, and the potential benefits of the application of advanced clustering techniques to climate data (both model output and observations) has yet to be fully realised. In this paper, we explore the specific application of clustering to a multi-model climate ensemble. We hypothesise that clustering techniques can provide (a) a flexible, data-driven method of testing model-observation agreement and (b) a mechanism with which to identify model development priorities. We focus our analysis on chemistry-climate model (CCM) output of tropospheric ozone - an important greenhouse gas - from the recent Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP). Tropospheric column ozone from the ACCMIP ensemble was clustered using the Data Density based Clustering (DDC) algorithm. We find that a multi-model mean (MMM) calculated using members of the most-populous cluster identified at each location offers a reduction of up to ˜ 20 % in the global absolute mean bias between the MMM and an observed satellite-based tropospheric ozone climatology, with respect to a simple, all-model MMM. On a spatial basis, the bias is reduced at ˜ 62 % of all locations, with the largest bias reductions occurring in the Northern Hemisphere - where ozone concentrations are relatively large. However, the bias is unchanged at 9 % of all locations and increases at 29 %, particularly in the Southern Hemisphere. The latter demonstrates that although cluster-based subsampling acts to remove outlier model data, such data may in fact be closer to observed values in some locations. We further demonstrate that clustering can provide a viable and

  14. Spatiotemporal analysis of single-trial EEG of emotional pictures based on independent component analysis and source location

    Science.gov (United States)

    Liu, Jiangang; Tian, Jie

    2007-03-01

    The present study combined the Independent Component Analysis (ICA) and low-resolution brain electromagnetic tomography (LORETA) algorithms to identify the spatial distribution and time course of single-trial EEG record differences between neural responses to emotional stimuli vs. the neutral. Single-trial multichannel (129-sensor) EEG records were collected from 21 healthy, right-handed subjects viewing the emotion emotional (pleasant/unpleasant) and neutral pictures selected from International Affective Picture System (IAPS). For each subject, the single-trial EEG records of each emotional pictures were concatenated with the neutral, and a three-step analysis was applied to each of them in the same way. First, the ICA was performed to decompose each concatenated single-trial EEG records into temporally independent and spatially fixed components, namely independent components (ICs). The IC associated with artifacts were isolated. Second, the clustering analysis classified, across subjects, the temporally and spatially similar ICs into the same clusters, in which nonparametric permutation test for Global Field Power (GFP) of IC projection scalp maps identified significantly different temporal segments of each emotional condition vs. neutral. Third, the brain regions accounted for those significant segments were localized spatially with LORETA analysis. In each cluster, a voxel-by-voxel randomization test identified significantly different brain regions between each emotional condition vs. the neutral. Compared to the neutral, both emotional pictures elicited activation in the visual, temporal, ventromedial and dorsomedial prefrontal cortex and anterior cingulated gyrus. In addition, the pleasant pictures activated the left middle prefrontal cortex and the posterior precuneus, while the unpleasant pictures activated the right orbitofrontal cortex, posterior cingulated gyrus and somatosensory region. Our results were well consistent with other functional imaging

  15. Identifying probable suicide clusters in wales using national mortality data.

    Directory of Open Access Journals (Sweden)

    Phillip Jones

    Full Text Available Up to 2% of suicides in young people may occur in clusters i.e., close together in time and space. In early 2008 unprecedented attention was given by national and international news media to a suspected suicide cluster among young people living in Bridgend, Wales. This paper investigates the strength of statistical evidence for this apparent cluster, its size, and temporal and geographical limits.The analysis is based on official mortality statistics for Wales for 2000-2009 provided by the UK's Office for National Statistics (ONS. Temporo-spatial analysis was performed using Space Time Permutation Scan Statistics with SaTScan v9.1 for suicide deaths aged 15 and over, with a sub-group analysis focussing on cases aged 15-34 years. These analyses were conducted for deaths coded by ONS as: (i suicide or of undetermined intent (probable suicides and (ii for a combination of suicide, undetermined, and accidental poisoning and hanging (possible suicides. The temporo-spatial analysis did not identify any clusters of suicide or undetermined intent deaths (probable suicides. However, analysis of all deaths by suicide, undetermined intent, accidental poisoning and accidental hanging (possible suicides identified a temporo-spatial cluster (p = 0.029 involving 10 deaths amongst 15-34 year olds centred on the County Borough of Bridgend for the period 27(th December 2007 to 19(th February 2008. Less than 1% of possible suicides in younger people in Wales in the ten year period were identified as being cluster-related.There was a possible suicide cluster in young people in Bridgend between December 2007 and February 2008. This cluster was smaller, shorter in duration, and predominantly later than the phenomenon that was reported in national and international print media. Further investigation of factors leading to the onset and termination of this series of deaths, in particular the role of the media, is required.

  16. Statistical analysis of long term spatial and temporal trends of ...

    Indian Academy of Sciences (India)

    Statistical analysis of long term spatial and temporal trends of temperature ... CGCM3; HadCM3; modified Mann–Kendall test; statistical analysis; Sutlej basin. ... Water Resources Systems Division, National Institute of Hydrology, Roorkee 247 ...

  17. Clusters of galaxies as tools in observational cosmology : results from x-ray analysis

    International Nuclear Information System (INIS)

    Weratschnig, J.M.

    2009-01-01

    Clusters of galaxies are the largest gravitationally bound structures in the universe. They can be used as ideal tools to study large scale structure formation (e.g. when studying merger clusters) and provide highly interesting environments to analyse several characteristic interaction processes (like ram pressure stripping of galaxies, magnetic fields). In this dissertation thesis, we have studied several clusters of galaxies using X-ray observations. To obtain scientific results, we have applied different data reduction and analysis methods. With a combination of morphological and spectral analysis, the merger cluster Abell 514 was studied in much detail. It has a highly interesting morphology and shows signs for an ongoing merger as well as a shock. using a new method to detect substructure, we have analysed several clusters to determine whether any substructure is present in the X-ray image. This hints towards a real structure in the distribution of the intra-cluster medium (ICM) and is evidence for ongoing mergers. The results from this analysis are extensively used with the cluster of galaxies Abell S1136. Here, we study the ICM distribution and compare its structure with the spatial distribution of star forming galaxies. Cluster magnetic fields are another important topic of my thesis. They can be studied in Radio observations, which can be put into relation with results from X-ray observations. using observational data from several clusters, we could support the theory that cluster magnetic fields are frozen into the ICM. (author)

  18. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

    Directory of Open Access Journals (Sweden)

    Marco Borri

    Full Text Available To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment.The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4. Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters.The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4, determined with cluster validation, produced the best separation between reducing and non-reducing clusters.The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.

  19. Latent cluster analysis of ALS phenotypes identifies prognostically differing groups.

    Directory of Open Access Journals (Sweden)

    Jeban Ganesalingam

    2009-09-01

    Full Text Available Amyotrophic lateral sclerosis (ALS is a degenerative disease predominantly affecting motor neurons and manifesting as several different phenotypes. Whether these phenotypes correspond to different underlying disease processes is unknown. We used latent cluster analysis to identify groupings of clinical variables in an objective and unbiased way to improve phenotyping for clinical and research purposes.Latent class cluster analysis was applied to a large database consisting of 1467 records of people with ALS, using discrete variables which can be readily determined at the first clinic appointment. The model was tested for clinical relevance by survival analysis of the phenotypic groupings using the Kaplan-Meier method.The best model generated five distinct phenotypic classes that strongly predicted survival (p<0.0001. Eight variables were used for the latent class analysis, but a good estimate of the classification could be obtained using just two variables: site of first symptoms (bulbar or limb and time from symptom onset to diagnosis (p<0.00001.The five phenotypic classes identified using latent cluster analysis can predict prognosis. They could be used to stratify patients recruited into clinical trials and generating more homogeneous disease groups for genetic, proteomic and risk factor research.

  20. Global classification of human facial healthy skin using PLS discriminant analysis and clustering analysis.

    Science.gov (United States)

    Guinot, C; Latreille, J; Tenenhaus, M; Malvy, D J

    2001-04-01

    Today's classifications of healthy skin are predominantly based on a very limited number of skin characteristics, such as skin oiliness or susceptibility to sun exposure. The aim of the present analysis was to set up a global classification of healthy facial skin, using mathematical models. This classification is based on clinical, biophysical skin characteristics and self-reported information related to the skin, as well as the results of a theoretical skin classification assessed separately for the frontal and the malar zones of the face. In order to maximize the predictive power of the models with a minimum of variables, the Partial Least Square (PLS) discriminant analysis method was used. The resulting PLS components were subjected to clustering analyses to identify the plausible number of clusters and to group the individuals according to their proximities. Using this approach, four PLS components could be constructed and six clusters were found relevant. So, from the 36 hypothetical combinations of the theoretical skin types classification, we tended to a strengthened six classes proposal. Our data suggest that the association of the PLS discriminant analysis and the clustering methods leads to a valid and simple way to classify healthy human skin and represents a potentially useful tool for cosmetic and dermatological research.

  1. Acoustic and temporal analysis of speech: A potential biomarker for schizophrenia.

    LENUS (Irish Health Repository)

    Rapcan, Viliam

    2010-11-01

    Currently, there are no established objective biomarkers for the diagnosis or monitoring of schizophrenia. It has been previously reported that there are notable qualitative differences in the speech of schizophrenics. The objective of this study was to determine whether a quantitative acoustic and temporal analysis of speech may be a potential biomarker for schizophrenia. In this study, 39 schizophrenic patients and 18 controls were digitally recorded reading aloud an emotionally neutral text passage from a children\\'s story. Temporal, energy and vocal pitch features were automatically extracted from the recordings. A classifier based on linear discriminant analysis was employed to differentiate between controls and schizophrenic subjects. Processing the recordings with the algorithm developed demonstrated that it is possible to differentiate schizophrenic patients and controls with a classification accuracy of 79.4% (specificity=83.6%, sensitivity=75.2%) based on speech pause related parameters extracted from recordings carried out in standard office (non-studio) environments. Acoustic and temporal analysis of speech may represent a potential tool for the objective analysis in schizophrenia.

  2. Application of cluster analysis and unsupervised learning to multivariate tissue characterization

    International Nuclear Information System (INIS)

    Momenan, R.; Insana, M.F.; Wagner, R.F.; Garra, B.S.; Loew, M.H.

    1987-01-01

    This paper describes a procedure for classifying tissue types from unlabeled acoustic measurements (data type unknown) using unsupervised cluster analysis. These techniques are being applied to unsupervised ultrasonic image segmentation and tissue characterization. The performance of a new clustering technique is measured and compared with supervised methods, such as a linear Bayes classifier. In these comparisons two objectives are sought: a) How well does the clustering method group the data?; b) Do the clusters correspond to known tissue classes? The first question is investigated by a measure of cluster similarity and dispersion. The second question involves a comparison with a supervised technique using labeled data

  3. Clustering analysis for muon tomography data elaboration in the Muon Portal project

    Science.gov (United States)

    Bandieramonte, M.; Antonuccio-Delogu, V.; Becciani, U.; Costa, A.; La Rocca, P.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, F.; Riggi, S.; Sciacca, E.; Vitello, F.

    2015-05-01

    Clustering analysis is one of multivariate data analysis techniques which allows to gather statistical data units into groups, in order to minimize the logical distance within each group and to maximize the one between different groups. In these proceedings, the authors present a novel approach to the muontomography data analysis based on clustering algorithms. As a case study we present the Muon Portal project that aims to build and operate a dedicated particle detector for the inspection of harbor containers to hinder the smuggling of nuclear materials. Clustering techniques, working directly on scattering points, help to detect the presence of suspicious items inside the container, acting, as it will be shown, as a filter for a preliminary analysis of the data.

  4. Subtypes of autism by cluster analysis based on structural MRI data.

    Science.gov (United States)

    Hrdlicka, Michal; Dudova, Iva; Beranova, Irena; Lisy, Jiri; Belsan, Tomas; Neuwirth, Jiri; Komarek, Vladimir; Faladova, Ludvika; Havlovicova, Marketa; Sedlacek, Zdenek; Blatny, Marek; Urbanek, Tomas

    2005-05-01

    The aim of our study was to subcategorize Autistic Spectrum Disorders (ASD) using a multidisciplinary approach. Sixty four autistic patients (mean age 9.4+/-5.6 years) were entered into a cluster analysis. The clustering analysis was based on MRI data. The clusters obtained did not differ significantly in the overall severity of autistic symptomatology as measured by the total score on the Childhood Autism Rating Scale (CARS). The clusters could be characterized as showing significant differences: Cluster 1: showed the largest sizes of the genu and splenium of the corpus callosum (CC), the lowest pregnancy order and the lowest frequency of facial dysmorphic features. Cluster 2: showed the largest sizes of the amygdala and hippocampus (HPC), the least abnormal visual response on the CARS, the lowest frequency of epilepsy and the least frequent abnormal psychomotor development during the first year of life. Cluster 3: showed the largest sizes of the caput of the nucleus caudatus (NC), the smallest sizes of the HPC and facial dysmorphic features were always present. Cluster 4: showed the smallest sizes of the genu and splenium of the CC, as well as the amygdala, and caput of the NC, the most abnormal visual response on the CARS, the highest frequency of epilepsy, the highest pregnancy order, abnormal psychomotor development during the first year of life was always present and facial dysmorphic features were always present. This multidisciplinary approach seems to be a promising method for subtyping autism.

  5. Symptom Clusters in People Living with HIV Attending Five Palliative Care Facilities in Two Sub-Saharan African Countries: A Hierarchical Cluster Analysis.

    Science.gov (United States)

    Moens, Katrien; Siegert, Richard J; Taylor, Steve; Namisango, Eve; Harding, Richard

    2015-01-01

    Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries. Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward's method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores. Among the sample (N=217) the mean age was 36.5 (SD 9.0), 73.2% were female, and 49.1% were on antiretroviral therapy (ART). The cluster analysis produced five symptom clusters identified as: 1) dermatological; 2) generalised anxiety and elimination; 3) social and image; 4) persistently present; and 5) a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, ppeople living with HIV with longitudinally collected symptom data to test cluster stability and identify common symptom trajectories is recommended.

  6. Patterns of urban violent injury: a spatio-temporal analysis.

    Directory of Open Access Journals (Sweden)

    Michael Cusimano

    2010-01-01

    Full Text Available Injury related to violent acts is a problem in every society. Although some authors have examined the geography of violent crime, few have focused on the spatio-temporal patterns of violent injury and none have used an ambulance dataset to explore the spatial characteristics of injury. The purpose of this study was to describe the combined spatial and temporal characteristics of violent injury in a large urban centre.Using a geomatics framework and geographic information systems software, we studied 4,587 ambulance dispatches and 10,693 emergency room admissions for violent injury occurrences among adults (aged 18-64 in Toronto, Canada, during 2002 and 2004, using population-based datasets. We created kernel density and choropleth maps for 24-hour periods and four-hour daily time periods and compared location of ambulance dispatches and patient residences with local land use and socioeconomic characteristics. We used multivariate regressions to control for confounding factors. We found the locations of violent injury and the residence locations of those injured were both closely related to each other and clearly clustered in certain parts of the city characterised by high numbers of bars, social housing units, and homeless shelters, as well as lower household incomes. The night and early morning showed a distinctive peak in injuries and a shift in the location of injuries to a "nightlife" district. The locational pattern of patient residences remained unchanged during those times.Our results demonstrate that there is a distinctive spatio-temporal pattern in violent injury reflected in the ambulance data. People injured in this urban centre more commonly live in areas of social deprivation. During the day, locations of injury and locations of residences are similar. However, later at night, the injury location of highest density shifts to a "nightlife" district, whereas the residence locations of those most at risk of injury do not change.

  7. Spatio-temporal patterns of gun violence in Syracuse, New York 2009-2015.

    Science.gov (United States)

    Larsen, David A; Lane, Sandra; Jennings-Bey, Timothy; Haygood-El, Arnett; Brundage, Kim; Rubinstein, Robert A

    2017-01-01

    Gun violence in the United States of America is a large public health problem that disproportionately affects urban areas. The epidemiology of gun violence reflects various aspects of an infectious disease including spatial and temporal clustering. We examined the spatial and temporal trends of gun violence in Syracuse, New York, a city of 145,000. We used a spatial scan statistic to reveal spatio-temporal clusters of gunshots investigated and corroborated by Syracuse City Police Department for the years 2009-2015. We also examined predictors of areas with increased gun violence using a multi-level zero-inflated Poisson regression with data from the 2010 census. Two space-time clusters of gun violence were revealed in the city. Higher rates of segregation, poverty and the summer months were all associated with increased risk of gun violence. Previous gunshots in the area were associated with a 26.8% increase in the risk of gun violence. Gun violence in Syracuse, NY is both spatially and temporally stable, with some neighborhoods of the city greatly afflicted.

  8. SOMFlow: Guided Exploratory Cluster Analysis with Self-Organizing Maps and Analytic Provenance.

    Science.gov (United States)

    Sacha, Dominik; Kraus, Matthias; Bernard, Jurgen; Behrisch, Michael; Schreck, Tobias; Asano, Yuki; Keim, Daniel A

    2018-01-01

    Clustering is a core building block for data analysis, aiming to extract otherwise hidden structures and relations from raw datasets, such as particular groups that can be effectively related, compared, and interpreted. A plethora of visual-interactive cluster analysis techniques has been proposed to date, however, arriving at useful clusterings often requires several rounds of user interactions to fine-tune the data preprocessing and algorithms. We present a multi-stage Visual Analytics (VA) approach for iterative cluster refinement together with an implementation (SOMFlow) that uses Self-Organizing Maps (SOM) to analyze time series data. It supports exploration by offering the analyst a visual platform to analyze intermediate results, adapt the underlying computations, iteratively partition the data, and to reflect previous analytical activities. The history of previous decisions is explicitly visualized within a flow graph, allowing to compare earlier cluster refinements and to explore relations. We further leverage quality and interestingness measures to guide the analyst in the discovery of useful patterns, relations, and data partitions. We conducted two pair analytics experiments together with a subject matter expert in speech intonation research to demonstrate that the approach is effective for interactive data analysis, supporting enhanced understanding of clustering results as well as the interactive process itself.

  9. Assessment of Random Assignment in Training and Test Sets using Generalized Cluster Analysis Technique

    Directory of Open Access Journals (Sweden)

    Sorana D. BOLBOACĂ

    2011-06-01

    Full Text Available Aim: The properness of random assignment of compounds in training and validation sets was assessed using the generalized cluster technique. Material and Method: A quantitative Structure-Activity Relationship model using Molecular Descriptors Family on Vertices was evaluated in terms of assignment of carboquinone derivatives in training and test sets during the leave-many-out analysis. Assignment of compounds was investigated using five variables: observed anticancer activity and four structure descriptors. Generalized cluster analysis with K-means algorithm was applied in order to investigate if the assignment of compounds was or not proper. The Euclidian distance and maximization of the initial distance using a cross-validation with a v-fold of 10 was applied. Results: All five variables included in analysis proved to have statistically significant contribution in identification of clusters. Three clusters were identified, each of them containing both carboquinone derivatives belonging to training as well as to test sets. The observed activity of carboquinone derivatives proved to be normal distributed on every. The presence of training and test sets in all clusters identified using generalized cluster analysis with K-means algorithm and the distribution of observed activity within clusters sustain a proper assignment of compounds in training and test set. Conclusion: Generalized cluster analysis using the K-means algorithm proved to be a valid method in assessment of random assignment of carboquinone derivatives in training and test sets.

  10. Cluster analysis in severe emphysema subjects using phenotype and genotype data: an exploratory investigation

    Directory of Open Access Journals (Sweden)

    Martinez Fernando J

    2010-03-01

    Full Text Available Abstract Background Numerous studies have demonstrated associations between genetic markers and COPD, but results have been inconsistent. One reason may be heterogeneity in disease definition. Unsupervised learning approaches may assist in understanding disease heterogeneity. Methods We selected 31 phenotypic variables and 12 SNPs from five candidate genes in 308 subjects in the National Emphysema Treatment Trial (NETT Genetics Ancillary Study cohort. We used factor analysis to select a subset of phenotypic variables, and then used cluster analysis to identify subtypes of severe emphysema. We examined the phenotypic and genotypic characteristics of each cluster. Results We identified six factors accounting for 75% of the shared variability among our initial phenotypic variables. We selected four phenotypic variables from these factors for cluster analysis: 1 post-bronchodilator FEV1 percent predicted, 2 percent bronchodilator responsiveness, and quantitative CT measurements of 3 apical emphysema and 4 airway wall thickness. K-means cluster analysis revealed four clusters, though separation between clusters was modest: 1 emphysema predominant, 2 bronchodilator responsive, with higher FEV1; 3 discordant, with a lower FEV1 despite less severe emphysema and lower airway wall thickness, and 4 airway predominant. Of the genotypes examined, membership in cluster 1 (emphysema-predominant was associated with TGFB1 SNP rs1800470. Conclusions Cluster analysis may identify meaningful disease subtypes and/or groups of related phenotypic variables even in a highly selected group of severe emphysema subjects, and may be useful for genetic association studies.

  11. Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

    Science.gov (United States)

    Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

    2016-04-01

    Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.

  12. Annotating spatio-temporal datasets for meaningful analysis in the Web

    Science.gov (United States)

    Stasch, Christoph; Pebesma, Edzer; Scheider, Simon

    2014-05-01

    More and more environmental datasets that vary in space and time are available in the Web. This comes along with an advantage of using the data for other purposes than originally foreseen, but also with the danger that users may apply inappropriate analysis procedures due to lack of important assumptions made during the data collection process. In order to guide towards a meaningful (statistical) analysis of spatio-temporal datasets available in the Web, we have developed a Higher-Order-Logic formalism that captures some relevant assumptions in our previous work [1]. It allows to proof on meaningful spatial prediction and aggregation in a semi-automated fashion. In this poster presentation, we will present a concept for annotating spatio-temporal datasets available in the Web with concepts defined in our formalism. Therefore, we have defined a subset of the formalism as a Web Ontology Language (OWL) pattern. It allows capturing the distinction between the different spatio-temporal variable types, i.e. point patterns, fields, lattices and trajectories, that in turn determine whether a particular dataset can be interpolated or aggregated in a meaningful way using a certain procedure. The actual annotations that link spatio-temporal datasets with the concepts in the ontology pattern are provided as Linked Data. In order to allow data producers to add the annotations to their datasets, we have implemented a Web portal that uses a triple store at the backend to store the annotations and to make them available in the Linked Data cloud. Furthermore, we have implemented functions in the statistical environment R to retrieve the RDF annotations and, based on these annotations, to support a stronger typing of spatio-temporal datatypes guiding towards a meaningful analysis in R. [1] Stasch, C., Scheider, S., Pebesma, E., Kuhn, W. (2014): "Meaningful spatial prediction and aggregation", Environmental Modelling & Software, 51, 149-165.

  13. Cluster Analysis of Maize Inbred Lines

    Directory of Open Access Journals (Sweden)

    Jiban Shrestha

    2016-12-01

    Full Text Available The determination of diversity among inbred lines is important for heterosis breeding. Sixty maize inbred lines were evaluated for their eight agro morphological traits during winter season of 2011 to analyze their genetic diversity. Clustering was done by average linkage method. The inbred lines were grouped into six clusters. Inbred lines grouped into Clusters II had taller plants with maximum number of leaves. The cluster III was characterized with shorter plants with minimum number of leaves. The inbred lines categorized into cluster V had early flowering whereas the group into cluster VI had late flowering time. The inbred lines grouped into the cluster III were characterized by higher value of anthesis silking interval (ASI and those of cluster VI had lower value of ASI. These results showed that the inbred lines having widely divergent clusters can be utilized in hybrid breeding programme.

  14. [Principal component analysis and cluster analysis of inorganic elements in sea cucumber Apostichopus japonicus].

    Science.gov (United States)

    Liu, Xiao-Fang; Xue, Chang-Hu; Wang, Yu-Ming; Li, Zhao-Jie; Xue, Yong; Xu, Jie

    2011-11-01

    The present study is to investigate the feasibility of multi-elements analysis in determination of the geographical origin of sea cucumber Apostichopus japonicus, and to make choice of the effective tracers in sea cucumber Apostichopus japonicus geographical origin assessment. The content of the elements such as Al, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Mo, Cd, Hg and Pb in sea cucumber Apostichopus japonicus samples from seven places of geographical origin were determined by means of ICP-MS. The results were used for the development of elements database. Cluster analysis(CA) and principal component analysis (PCA) were applied to differentiate the sea cucumber Apostichopus japonicus geographical origin. Three principal components which accounted for over 89% of the total variance were extracted from the standardized data. The results of Q-type cluster analysis showed that the 26 samples could be clustered reasonably into five groups, the classification results were significantly associated with the marine distribution of the sea cucumber Apostichopus japonicus samples. The CA and PCA were the effective methods for elements analysis of sea cucumber Apostichopus japonicus samples. The content of the mineral elements in sea cucumber Apostichopus japonicus samples was good chemical descriptors for differentiating their geographical origins.

  15. Statistical Techniques Applied to Aerial Radiometric Surveys (STAARS): cluster analysis. National Uranium Resource Evaluation

    International Nuclear Information System (INIS)

    Pirkle, F.L.; Stablein, N.K.; Howell, J.A.; Wecksung, G.W.; Duran, B.S.

    1982-11-01

    One objective of the aerial radiometric surveys flown as part of the US Department of Energy's National Uranium Resource Evaluation (NURE) program was to ascertain the regional distribution of near-surface radioelement abundances. Some method for identifying groups of observations with similar radioelement values was therefore required. It is shown in this report that cluster analysis can identify such groups even when no a priori knowledge of the geology of an area exists. A method of convergent k-means cluster analysis coupled with a hierarchical cluster analysis is used to classify 6991 observations (three radiometric variables at each observation location) from the Precambrian rocks of the Copper Mountain, Wyoming, area. Another method, one that combines a principal components analysis with a convergent k-means analysis, is applied to the same data. These two methods are compared with a convergent k-means analysis that utilizes available geologic knowledge. All three methods identify four clusters. Three of the clusters represent background values for the Precambrian rocks of the area, and one represents outliers (anomalously high 214 Bi). A segmentation of the data corresponding to geologic reality as discovered by other methods has been achieved based solely on analysis of aerial radiometric data. The techniques employed are composites of classical clustering methods designed to handle the special problems presented by large data sets. 20 figures, 7 tables

  16. GPI-anchored proteins are confined in subdiffraction clusters at the apical surface of polarized epithelial cells.

    Science.gov (United States)

    Paladino, Simona; Lebreton, Stéphanie; Lelek, Mickaël; Riccio, Patrizia; De Nicola, Sergio; Zimmer, Christophe; Zurzolo, Chiara

    2017-12-01

    Spatio-temporal compartmentalization of membrane proteins is critical for the regulation of diverse vital functions in eukaryotic cells. It was previously shown that, at the apical surface of polarized MDCK cells, glycosylphosphatidylinositol (GPI)-anchored proteins (GPI-APs) are organized in small cholesterol-independent clusters of single GPI-AP species (homoclusters), which are required for the formation of larger cholesterol-dependent clusters formed by multiple GPI-AP species (heteroclusters). This clustered organization is crucial for the biological activities of GPI-APs; hence, understanding the spatio-temporal properties of their membrane organization is of fundamental importance. Here, by using direct stochastic optical reconstruction microscopy coupled to pair correlation analysis (pc-STORM), we were able to visualize and measure the size of these clusters. Specifically, we show that they are non-randomly distributed and have an average size of 67 nm. We also demonstrated that polarized MDCK and non-polarized CHO cells have similar cluster distribution and size, but different sensitivity to cholesterol depletion. Finally, we derived a model that allowed a quantitative characterization of the cluster organization of GPI-APs at the apical surface of polarized MDCK cells for the first time. Experimental FRET (fluorescence resonance energy transfer)/FLIM (fluorescence-lifetime imaging microscopy) data were correlated to the theoretical predictions of the model. © 2017 The Author(s).

  17. Induction of nitrate tolerance is not a useful treatment in cluster headache

    DEFF Research Database (Denmark)

    Christiansen, I; Iversen, Helle Klingenberg; Olesen, J

    2000-01-01

    UNLABELLED: The aims of the present study were to investigate whether induction of nitrate tolerance is a useful treatment in cluster headache and to correlate any changes in attack frequency of cluster headache and nitrate-induced headache to the vascular adaptation during continuous nitrate...... attacks and interval headaches over time. RESULTS: Tolerance was complete within 24 h in the middle cerebral arteries and after 7 days in the symptomatic temporal artery, while tolerance of the radial artery was not observed within this period. The time profiles of tolerance were almost identical...... to the time profiles observed in healthy subjects. A close temporal association between the disappearance of nitrate-induced headache and tolerance of the temporal artery was observed but tolerance had no effect on cluster headache attack frequency. CONCLUSIONS: Induction of tolerance to nitrates cannot...

  18. Temporality in British young women's magazines: food, cooking and weight loss.

    Science.gov (United States)

    Spencer, Rosemary J; Russell, Jean M; Barker, Margo E

    2014-10-01

    The present study examines seasonal and temporal patterns in food-related content of two UK magazines for young women focusing on food types, cooking and weight loss. Content analysis of magazines from three time blocks between 1999 and 2011. Desk-based study. Ninety-seven magazines yielding 590 advertisements and 148 articles. Cluster analysis of type of food advertising produced three clusters of magazines, which reflected recognised food behaviours of young women: vegetarianism, convenience eating and weight control. The first cluster of magazines was associated with Christmas and Millennium time periods, with advertising of alcohol, coffee, cheese, vegetarian meat substitutes and weight-loss pills. Recipes were prominent in article content and tended to be for cakes/desserts, luxury meals and party food. The second cluster was associated with summer months and 2010 issues. There was little advertising for conventional foods in cluster 2, but strong representation of diet plans and foods for weight loss. Weight-loss messages in articles focused on short-term aesthetic goals, emphasising speedy weight loss without giving up nice foods or exercising. Cluster 3 magazines were associated with post-New Year and 2005 periods. Food advertising was for everyday foods and convenience products, with fewer weight-loss products than other clusters; conversely, article content had a greater prevalence of weight-loss messages. The cyclical nature of magazine content - indulgence and excess encouraged at Christmas, restraint recommended post-New Year and severe dieting advocated in the summer months - endorses yo-yo dieting behaviour and may not be conducive to public health.

  19. Spatio-Temporal Variation and Prediction of Ischemic Heart Disease Hospitalizations in Shenzhen, China

    Directory of Open Access Journals (Sweden)

    Yanxia Wang

    2014-05-01

    Full Text Available Ischemic heart disease (IHD is a leading cause of death worldwide. Urban public health and medical management in Shenzhen, an international city in the developing country of China, is challenged by an increasing burden of IHD. This study analyzed the spatio-temporal variation of IHD hospital admissions from 2003 to 2012 utilizing spatial statistics, spatial analysis, and space-time scan statistics. The spatial statistics and spatial analysis measured the incidence rate (hospital admissions per 1,000 residents and the standardized rate (the observed cases standardized by the expected cases of IHD at the district level to determine the spatio-temporal distribution and identify patterns of change. The space-time scan statistics was used to identify spatio-temporal clusters of IHD hospital admissions at the district level. The other objective of this study was to forecast the IHD hospital admissions over the next three years (2013–2015 to predict the IHD incidence rates and the varying burdens of IHD-related medical services among the districts in Shenzhen. The results show that the highest hospital admissions, incidence rates, and standardized rates of IHD are in Futian. From 2003 to 2012, the IHD hospital admissions exhibited similar mean centers and directional distributions, with a slight increase in admissions toward the north in accordance with the movement of the total population. The incidence rates of IHD exhibited a gradual increase from 2003 to 2012 for all districts in Shenzhen, which may be the result of the rapid development of the economy and the increasing traffic pollution. In addition, some neighboring areas exhibited similar temporal change patterns, which were also detected by the spatio-temporal cluster analysis. Futian and Dapeng would have the highest and the lowest hospital admissions, respectively, although these districts have the highest incidence rates among all of the districts from 2013 to 2015 based on the prediction

  20. PARALLEL SPATIOTEMPORAL SPECTRAL CLUSTERING WITH MASSIVE TRAJECTORY DATA

    Directory of Open Access Journals (Sweden)

    Y. Z. Gu

    2017-09-01

    Full Text Available Massive trajectory data contains wealth useful information and knowledge. Spectral clustering, which has been shown to be effective in finding clusters, becomes an important clustering approaches in the trajectory data mining. However, the traditional spectral clustering lacks the temporal expansion on the algorithm and limited in its applicability to large-scale problems due to its high computational complexity. This paper presents a parallel spatiotemporal spectral clustering based on multiple acceleration solutions to make the algorithm more effective and efficient, the performance is proved due to the experiment carried out on the massive taxi trajectory dataset in Wuhan city, China.

  1. Geographical, temporal and racial disparities in late-stage prostate cancer incidence across Florida: A multiscale joinpoint regression analysis

    Directory of Open Access Journals (Sweden)

    Goovaerts Pierre

    2011-12-01

    Full Text Available Abstract Background Although prostate cancer-related incidence and mortality have declined recently, striking racial/ethnic differences persist in the United States. Visualizing and modelling temporal trends of prostate cancer late-stage incidence, and how they vary according to geographic locations and race, should help explaining such disparities. Joinpoint regression is increasingly used to identify the timing and extent of changes in time series of health outcomes. Yet, most analyses of temporal trends are aspatial and conducted at the national level or for a single cancer registry. Methods Time series (1981-2007 of annual proportions of prostate cancer late-stage cases were analyzed for non-Hispanic Whites and non-Hispanic Blacks in each county of Florida. Noise in the data was first filtered by binomial kriging and results were modelled using joinpoint regression. A similar analysis was also conducted at the state level and for groups of metropolitan and non-metropolitan counties. Significant racial differences were detected using tests of parallelism and coincidence of time trends. A new disparity statistic was introduced to measure spatial and temporal changes in the frequency of racial disparities. Results State-level percentage of late-stage diagnosis decreased 50% since 1981; a decline that accelerated in the 90's when Prostate Specific Antigen (PSA screening was introduced. Analysis at the metropolitan and non-metropolitan levels revealed that the frequency of late-stage diagnosis increased recently in urban areas, and this trend was significant for white males. The annual rate of decrease in late-stage diagnosis and the onset years for significant declines varied greatly among counties and racial groups. Most counties with non-significant average annual percent change (AAPC were located in the Florida Panhandle for white males, whereas they clustered in South-eastern Florida for black males. The new disparity statistic indicated

  2. Geographical, temporal and racial disparities in late-stage prostate cancer incidence across Florida: a multiscale joinpoint regression analysis.

    Science.gov (United States)

    Goovaerts, Pierre; Xiao, Hong

    2011-12-05

    Although prostate cancer-related incidence and mortality have declined recently, striking racial/ethnic differences persist in the United States. Visualizing and modelling temporal trends of prostate cancer late-stage incidence, and how they vary according to geographic locations and race, should help explaining such disparities. Joinpoint regression is increasingly used to identify the timing and extent of changes in time series of health outcomes. Yet, most analyses of temporal trends are aspatial and conducted at the national level or for a single cancer registry. Time series (1981-2007) of annual proportions of prostate cancer late-stage cases were analyzed for non-Hispanic Whites and non-Hispanic Blacks in each county of Florida. Noise in the data was first filtered by binomial kriging and results were modelled using joinpoint regression. A similar analysis was also conducted at the state level and for groups of metropolitan and non-metropolitan counties. Significant racial differences were detected using tests of parallelism and coincidence of time trends. A new disparity statistic was introduced to measure spatial and temporal changes in the frequency of racial disparities. State-level percentage of late-stage diagnosis decreased 50% since 1981; a decline that accelerated in the 90's when Prostate Specific Antigen (PSA) screening was introduced. Analysis at the metropolitan and non-metropolitan levels revealed that the frequency of late-stage diagnosis increased recently in urban areas, and this trend was significant for white males. The annual rate of decrease in late-stage diagnosis and the onset years for significant declines varied greatly among counties and racial groups. Most counties with non-significant average annual percent change (AAPC) were located in the Florida Panhandle for white males, whereas they clustered in South-eastern Florida for black males. The new disparity statistic indicated that the spatial extent of racial disparities reached a

  3. Performance Evaluation of Hadoop-based Large-scale Network Traffic Analysis Cluster

    Directory of Open Access Journals (Sweden)

    Tao Ran

    2016-01-01

    Full Text Available As Hadoop has gained popularity in big data era, it is widely used in various fields. The self-design and self-developed large-scale network traffic analysis cluster works well based on Hadoop, with off-line applications running on it to analyze the massive network traffic data. On purpose of scientifically and reasonably evaluating the performance of analysis cluster, we propose a performance evaluation system. Firstly, we set the execution times of three benchmark applications as the benchmark of the performance, and pick 40 metrics of customized statistical resource data. Then we identify the relationship between the resource data and the execution times by a statistic modeling analysis approach, which is composed of principal component analysis and multiple linear regression. After training models by historical data, we can predict the execution times by current resource data. Finally, we evaluate the performance of analysis cluster by the validated predicting of execution times. Experimental results show that the predicted execution times by trained models are within acceptable error range, and the evaluation results of performance are accurate and reliable.

  4. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome

    Energy Technology Data Exchange (ETDEWEB)

    Lalonde, Michel, E-mail: mlalonde15@rogers.com; Wassenaar, Richard [Department of Physics, Carleton University, Ottawa, Ontario K1S 5B6 (Canada); Wells, R. Glenn; Birnie, David; Ruddy, Terrence D. [Division of Cardiology, University of Ottawa Heart Institute, Ottawa, Ontario K1Y 4W7 (Canada)

    2014-07-15

    Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster

  5. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome

    International Nuclear Information System (INIS)

    Lalonde, Michel; Wassenaar, Richard; Wells, R. Glenn; Birnie, David; Ruddy, Terrence D.

    2014-01-01

    Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster

  6. Statistical indicators of collective behavior and functional clusters in gene networks of yeast

    Science.gov (United States)

    Živković, J.; Tadić, B.; Wick, N.; Thurner, S.

    2006-03-01

    We analyze gene expression time-series data of yeast (S. cerevisiae) measured along two full cell-cycles. We quantify these data by using q-exponentials, gene expression ranking and a temporal mean-variance analysis. We construct gene interaction networks based on correlation coefficients and study the formation of the corresponding giant components and minimum spanning trees. By coloring genes according to their cell function we find functional clusters in the correlation networks and functional branches in the associated trees. Our results suggest that a percolation point of functional clusters can be identified on these gene expression correlation networks.

  7. Analysis of RXTE data on Clusters of Galaxies

    Science.gov (United States)

    Petrosian, Vahe

    2004-01-01

    This grant provided support for the reduction, analysis and interpretation of of hard X-ray (HXR, for short) observations of the cluster of galaxies RXJO658--5557 scheduled for the week of August 23, 2002 under the RXTE Cycle 7 program (PI Vahe Petrosian, Obs. ID 70165). The goal of the observation was to search for and characterize the shape of the HXR component beyond the well established thermal soft X-ray (SXR) component. Such hard components have been detected in several nearby clusters. distant cluster would provide information on the characteristics of this radiation at a different epoch in the evolution of the imiverse and shed light on its origin. We (Petrosian, 2001) have argued that thermal bremsstrahlung, as proposed earlier, cannot be the mechanism for the production of the HXRs and that the most likely mechanism is Compton upscattering of the cosmic microwave radiation by relativistic electrons which are known to be present in the clusters and be responsible for the observed radio emission. Based on this picture we estimated that this cluster, in spite of its relatively large distance, will have HXR signal comparable to the other nearby ones. The planned observation of a relatively The proposed RXTE observations were carried out and the data have been analyzed. We detect a hard X-ray tail in the spectrum of this cluster with a flux very nearly equal to our predicted value. This has strengthen the case for the Compton scattering model. We intend the data obtained via this observation to be a part of a larger data set. We have identified other clusters of galaxies (in archival RXTE and other instrument data sets) with sufficiently high quality data where we can search for and measure (or at least put meaningful limits) on the strength of the hard component. With these studies we expect to clarify the mechanism for acceleration of particles in the intercluster medium and provide guidance for future observations of this intriguing phenomenon by instrument

  8. A temporal analysis of the spatial clustering of food outlets around schools in Christchurch, New Zealand, 1966 to 2006.

    Science.gov (United States)

    Day, Peter L; Pearce, Jamie R; Pearson, Amber L

    2015-01-01

    To explore changes in urban food environments near schools, as potential contributors to the rising prevalence of overweight and obesity among children. Addresses of food premises and schools in 1966, 1976, 1986, 1996 and 2006 were geo-coded. For each year, the number and proportion of outlets by category (supermarket/grocery; convenience; fast-food outlet) within 800 m of schools were calculated. The degree of spatial clustering of outlets was assessed using a bivariate K-function analysis. Food outlet categories, school level and school social deprivation quintiles were compared. Christchurch, New Zealand. All schools and food outlets at 10-year snapshots from 1966 to 2006. Between 1966 and 2006, the median number of supermarkets/grocery stores within 800 m of schools decreased from 5 to 1, convenience stores decreased from 2 to 1, and fast-food outlets increased from 1 to 4. The ratio of fast-food outlets to total outlets increased from 0·10 to 0·67. The clustering of fast-food outlets was greatest within 800 m of schools and around the most socially deprived schools. Over the 40-year study period, school food environments in Christchurch can be characterized by increased densities of fast-food outlets within walking distance of schools, especially around the most deprived schools. Since the 1960s, there have been substantial changes to the food environments around schools which may increasingly facilitate away-from-home food consumption for children and provide easily accessible, cheap energy-dense foods, a recognized contributor to the rise in prevalence of overweight and obesity among young people.

  9. k-t PCA: temporally constrained k-t BLAST reconstruction using principal component analysis

    DEFF Research Database (Denmark)

    Pedersen, Henrik; Kozerke, Sebastian; Ringgaard, Steffen

    2009-01-01

    in applications exhibiting a broad range of temporal frequencies such as free-breathing myocardial perfusion imaging. We show that temporal basis functions calculated by subjecting the training data to principal component analysis (PCA) can be used to constrain the reconstruction such that the temporal resolution...... is improved. The presented method is called k-t PCA....

  10. Profitability and efficiency of Italian utilities: cluster analysis of financial statement ratios

    International Nuclear Information System (INIS)

    Linares, E.

    2008-01-01

    The last ten years have witnessed conspicuous changes in European and Italian regulation of public utility services and in the strategies of the major players in these fields. In response to these changes Italian utilities have made a variety of choices regarding size, presence in more or less capital-intensive stages of different value chains, and diversification. These choices have been implemented both through internal growth and by means of mergers and acquisitions. In this context it is interesting to try to establish whether there is a nexus between these choices and the performance of Italian utilities in terms of profitability and efficiency. Therefore statistical multivariate analysis techniques (cluster analysis and factor analysis) have been applied to several ratios obtained from the 2005 financial statement of 34 utilities. First, a hierarchical cluster analysis method has been applied to financial statement data in order to identify homogeneous groups based on several indicators of the incidence of costs (external costs, personnel costs, depreciation and amortization), profitability (return on sales, return on assets, return on equity) and efficiency (in the utilization of personnel, of total assets, of property, plant and equipment). Five clusters have been found. Then the clusters have been characterized in terms of the aforementioned indicators, the presence in different stages of the energy value chains (electricity and gas) and other descriptive variables (such as turnover, number of employees, assets, percentage of property, plant and equipment on total assets, sales revenues from electricity, gas, water supply and sanitation, waste collection and treatment and other services). In a second round cluster analysis has been preceded by factor analysis, in order to find a smaller set of variables. This procedure has revealed three not directly observable factors that can be interpreted as follows: i) efficiency in ordinary and financial management

  11. Eating or meeting? Cluster analysis reveals intricacies of white shark (Carcharodon carcharias migration and offshore behavior.

    Directory of Open Access Journals (Sweden)

    Salvador J Jorgensen

    Full Text Available Elucidating how mobile ocean predators utilize the pelagic environment is vital to understanding the dynamics of oceanic species and ecosystems. Pop-up archival transmitting (PAT tags have emerged as an important tool to describe animal migrations in oceanic environments where direct observation is not feasible. Available PAT tag data, however, are for the most part limited to geographic position, swimming depth and environmental temperature, making effective behavioral observation challenging. However, novel analysis approaches have the potential to extend the interpretive power of these limited observations. Here we developed an approach based on clustering analysis of PAT daily time-at-depth histogram records to distinguish behavioral modes in white sharks (Carcharodon carcharias. We found four dominant and distinctive behavioral clusters matching previously described behavioral patterns, including two distinctive offshore diving modes. Once validated, we mapped behavior mode occurrence in space and time. Our results demonstrate spatial, temporal and sex-based structure in the diving behavior of white sharks in the northeastern Pacific previously unrecognized including behavioral and migratory patterns resembling those of species with lek mating systems. We discuss our findings, in combination with available life history and environmental data, and propose specific testable hypotheses to distinguish between mating and foraging in northeastern Pacific white sharks that can provide a framework for future work. Our methodology can be applied to similar datasets from other species to further define behaviors during unobservable phases.

  12. GRASS GIS: The first Open Source Temporal GIS

    Science.gov (United States)

    Gebbert, Sören; Leppelt, Thomas

    2015-04-01

    over temporal aggregation, temporal accumulation, spatio-temporal statistics, spatio-temporal sampling, temporal algebra, temporal topology analysis, time series animation and temporal topology visualization to time series import and export capabilities with support for NetCDF and VTK data formats. We will present several temporal modules that support parallel processing of raster and 3D raster time series. [1] GRASS GIS Open Source Approaches in Spatial Data Handling In Open Source Approaches in Spatial Data Handling, Vol. 2 (2008), pp. 171-199, doi:10.1007/978-3-540-74831-19 by M. Neteler, D. Beaudette, P. Cavallini, L. Lami, J. Cepicky edited by G. Brent Hall, Michael G. Leahy [2] Gebbert, S., Pebesma, E., 2014. A temporal GIS for field based environmental modeling. Environ. Model. Softw. 53, 1-12. [3] Zambelli, P., Gebbert, S., Ciolli, M., 2013. Pygrass: An Object Oriented Python Application Programming Interface (API) for Geographic Resources Analysis Support System (GRASS) Geographic Information System (GIS). ISPRS Intl Journal of Geo-Information 2, 201-219. [4] Löwe, P., Klump, J., Thaler, J. (2012): The FOSS GIS Workbench on the GFZ Load Sharing Facility compute cluster, (Geophysical Research Abstracts Vol. 14, EGU2012-4491, 2012), General Assembly European Geosciences Union (Vienna, Austria 2012). [5] Akhter, S., Aida, K., Chemin, Y., 2010. "GRASS GIS on High Performance Computing with MPI, OpenMP and Ninf-G Programming Framework". ISPRS Conference, Kyoto, 9-12 August 2010

  13. A formal concept analysis approach to consensus clustering of multi-experiment expression data

    Science.gov (United States)

    2014-01-01

    Background Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them. Results We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group. These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological

  14. Interactive K-Means Clustering Method Based on User Behavior for Different Analysis Target in Medicine.

    Science.gov (United States)

    Lei, Yang; Yu, Dai; Bin, Zhang; Yang, Yang

    2017-01-01

    Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor's knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K -means clustering method to improve the user's satisfactions towards the result. The core of this method is to get the user's feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user's business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user's requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.

  15. Improving estimation of kinetic parameters in dynamic force spectroscopy using cluster analysis

    Science.gov (United States)

    Yen, Chi-Fu; Sivasankar, Sanjeevi

    2018-03-01

    Dynamic Force Spectroscopy (DFS) is a widely used technique to characterize the dissociation kinetics and interaction energy landscape of receptor-ligand complexes with single-molecule resolution. In an Atomic Force Microscope (AFM)-based DFS experiment, receptor-ligand complexes, sandwiched between an AFM tip and substrate, are ruptured at different stress rates by varying the speed at which the AFM-tip and substrate are pulled away from each other. The rupture events are grouped according to their pulling speeds, and the mean force and loading rate of each group are calculated. These data are subsequently fit to established models, and energy landscape parameters such as the intrinsic off-rate (koff) and the width of the potential energy barrier (xβ) are extracted. However, due to large uncertainties in determining mean forces and loading rates of the groups, errors in the estimated koff and xβ can be substantial. Here, we demonstrate that the accuracy of fitted parameters in a DFS experiment can be dramatically improved by sorting rupture events into groups using cluster analysis instead of sorting them according to their pulling speeds. We test different clustering algorithms including Gaussian mixture, logistic regression, and K-means clustering, under conditions that closely mimic DFS experiments. Using Monte Carlo simulations, we benchmark the performance of these clustering algorithms over a wide range of koff and xβ, under different levels of thermal noise, and as a function of both the number of unbinding events and the number of pulling speeds. Our results demonstrate that cluster analysis, particularly K-means clustering, is very effective in improving the accuracy of parameter estimation, particularly when the number of unbinding events are limited and not well separated into distinct groups. Cluster analysis is easy to implement, and our performance benchmarks serve as a guide in choosing an appropriate method for DFS data analysis.

  16. Disordered semantic representation in schizophrenic temporal cortex revealed by neuromagnetic response patterns

    Directory of Open Access Journals (Sweden)

    Silberman Yaron

    2006-05-01

    Full Text Available Abstract Background Loosening of associations and thought disruption are key features of schizophrenic psychopathology. Alterations in neural networks underlying this basic abnormality have not yet been sufficiently identified. Previously, we demonstrated that spatio-temporal clustering of magnetic brain responses to pictorial stimuli map categorical representations in temporal cortex. This result has opened the possibility to quantify associative strength within and across semantic categories in schizophrenic patients. We hypothesized that in contrast to controls, schizophrenic patients exhibit disordered representations of semantic categories. Methods The spatio-temporal clusters of brain magnetic activities elicited by object pictures related to super-ordinate (flowers, animals, furniture, clothes and base-level (e.g. tulip, rose, orchid, sunflower categories were analysed in the source space for the time epochs 170–210 and 210–450 ms following stimulus onset and were compared between 10 schizophrenic patients and 10 control subjects. Results Spatio-temporal correlations of responses elicited by base-level concepts and the difference of within vs. across super-ordinate categories were distinctly lower in patients than in controls. Additionally, in contrast to the well-defined categorical representation in control subjects, unsupervised clustering indicated poorly defined representation of semantic categories in patients. Within the patient group, distinctiveness of categorical representation in the temporal cortex was positively related to negative symptoms and tended to be inversely related to positive symptoms. Conclusion Schizophrenic patients show a less organized representation of semantic categories in clusters of magnetic brain responses than healthy adults. This atypical neural network architecture may be a correlate of loosening of associations, promoting positive symptoms.

  17. Fatigue Feature Extraction Analysis based on a K-Means Clustering Approach

    Directory of Open Access Journals (Sweden)

    M.F.M. Yunoh

    2015-06-01

    Full Text Available This paper focuses on clustering analysis using a K-means approach for fatigue feature dataset extraction. The aim of this study is to group the dataset as closely as possible (homogeneity for the scattered dataset. Kurtosis, the wavelet-based energy coefficient and fatigue damage are calculated for all segments after the extraction process using wavelet transform. Kurtosis, the wavelet-based energy coefficient and fatigue damage are used as input data for the K-means clustering approach. K-means clustering calculates the average distance of each group from the centroid and gives the objective function values. Based on the results, maximum values of the objective function can be seen in the two centroid clusters, with a value of 11.58. The minimum objective function value is found at 8.06 for five centroid clusters. It can be seen that the objective function with the lowest value for the number of clusters is equal to five; which is therefore the best cluster for the dataset.

  18. The dynamics of cyclone clustering in re-analysis and a high-resolution climate model

    Science.gov (United States)

    Priestley, Matthew; Pinto, Joaquim; Dacre, Helen; Shaffrey, Len

    2017-04-01

    Extratropical cyclones have a tendency to occur in groups (clusters) in the exit of the North Atlantic storm track during wintertime, potentially leading to widespread socioeconomic impacts. The Winter of 2013/14 was the stormiest on record for the UK and was characterised by the recurrent clustering of intense extratropical cyclones. This clustering was associated with a strong, straight and persistent North Atlantic 250 hPa jet with Rossby wave-breaking (RWB) on both flanks, pinning the jet in place. Here, we provide for the first time an analysis of all clustered events in 36 years of the ERA-Interim Re-analysis at three latitudes (45˚ N, 55˚ N, 65˚ N) encompassing various regions of Western Europe. The relationship between the occurrence of RWB and cyclone clustering is studied in detail. Clustering at 55˚ N is associated with an extended and anomalously strong jet flanked on both sides by RWB. However, clustering at 65(45)˚ N is associated with RWB to the south (north) of the jet, deflecting the jet northwards (southwards). A positive correlation was found between the intensity of the clustering and RWB occurrence to the north and south of the jet. However, there is considerable spread in these relationships. Finally, analysis has shown that the relationships identified in the re-analysis are also present in a high-resolution coupled global climate model (HiGEM). In particular, clustering is associated with the same dynamical conditions at each of our three latitudes in spite of the identified biases in frequency and intensity of RWB.

  19. Pattern of language-related potential maps in cluster and noncluster initial consonants in consonant-vowel (CV syllables

    Directory of Open Access Journals (Sweden)

    Naiphinich Kotchabhakdi

    2006-09-01

    Full Text Available Mismatch negativity (MMN was used to investigate the processing of cluster and noncluster initial consonants in consonant vowel syllables in the human brain. The MMN was elicited by either syllable with cluster or noncluster initial consonant, phonetic contrasts being identical in both syllables. Compared to the noncluster consonant, the cluster consonant elicited a more prominent MMN. The MMN to the cluster consonant occurred later than that of the noncluster consonant. The topography of the mismatch responses showed clear left-hemispheric laterality in both syllables. However, the syllable with an initial noncluster consonant stimulus produced MMN maximum over the middle temporal gyrus, whereas maximum of the MMN activated by the syllable with initial cluster consonant was observed over the superior temporal gyrus. We suggest that the MMN component in consonant-vowel syllables is more sensitive to cluster compared to noncluster initial consonants. Spatial and temporal features of the cluster consonant indicate delayed activation of left-lateralized perisylvian cell assemblies that function as cortical memory traces of cluster initial consonant in consonant-vowel syllables.

  20. Cluster analysis of HZE particle tracks as applied to space radiobiology problems

    International Nuclear Information System (INIS)

    Batmunkh, M.; Bayarchimeg, L.; Lkhagva, O.; Belov, O.

    2013-01-01

    A cluster analysis is performed of ionizations in tracks produced by the most abundant nuclei in the charge and energy spectra of the galactic cosmic rays. The frequency distribution of clusters is estimated for cluster sizes comparable to the DNA molecule at different packaging levels. For this purpose, an improved K-mean-based algorithm is suggested. This technique allows processing particle tracks containing a large number of ionization events without setting the number of clusters as an input parameter. Using this method, the ionization distribution pattern is analyzed depending on the cluster size and particle's linear energy transfer

  1. Cluster analysis of autoantibodies in 852 patients with systemic lupus erythematosus from a single center.

    Science.gov (United States)

    Artim-Esen, Bahar; Çene, Erhan; Şahinkaya, Yasemin; Ertan, Semra; Pehlivan, Özlem; Kamali, Sevil; Gül, Ahmet; Öcal, Lale; Aral, Orhan; Inanç, Murat

    2014-07-01

    Associations between autoantibodies and clinical features have been described in systemic lupus erythematosus (SLE). Herein, we aimed to define autoantibody clusters and their clinical correlations in a large cohort of patients with SLE. We analyzed 852 patients with SLE who attended our clinic. Seven autoantibodies were selected for cluster analysis: anti-DNA, anti-Sm, anti-RNP, anticardiolipin (aCL) immunoglobulin (Ig)G or IgM, lupus anticoagulant (LAC), anti-Ro, and anti-La. Two-step clustering and Kaplan-Meier survival analyses were used. Five clusters were identified. A cluster consisted of patients with only anti-dsDNA antibodies, a cluster of anti-Sm and anti-RNP, a cluster of aCL IgG/M and LAC, and a cluster of anti-Ro and anti-La antibodies. Analysis revealed 1 more cluster that consisted of patients who did not belong to any of the clusters formed by antibodies chosen for cluster analysis. Sm/RNP cluster had significantly higher incidence of pulmonary hypertension and Raynaud phenomenon. DsDNA cluster had the highest incidence of renal involvement. In the aCL/LAC cluster, there were significantly more patients with neuropsychiatric involvement, antiphospholipid syndrome, autoimmune hemolytic anemia, and thrombocytopenia. According to the Systemic Lupus International Collaborating Clinics damage index, the highest frequency of damage was in the aCL/LAC cluster. Comparison of 10 and 20 years survival showed reduced survival in the aCL/LAC cluster. This study supports the existence of autoantibody clusters with distinct clinical features in SLE and shows that forming clinical subsets according to autoantibody clusters may be useful in predicting the outcome of the disease. Autoantibody clusters in SLE may exhibit differences according to the clinical setting or population.

  2. Application of cluster analysis to geochemical compositional data for identifying ore-related geochemical anomalies

    Science.gov (United States)

    Zhou, Shuguang; Zhou, Kefa; Wang, Jinlin; Yang, Genfang; Wang, Shanshan

    2017-12-01

    Cluster analysis is a well-known technique that is used to analyze various types of data. In this study, cluster analysis is applied to geochemical data that describe 1444 stream sediment samples collected in northwestern Xinjiang with a sample spacing of approximately 2 km. Three algorithms (the hierarchical, k-means, and fuzzy c-means algorithms) and six data transformation methods (the z-score standardization, ZST; the logarithmic transformation, LT; the additive log-ratio transformation, ALT; the centered log-ratio transformation, CLT; the isometric log-ratio transformation, ILT; and no transformation, NT) are compared in terms of their effects on the cluster analysis of the geochemical compositional data. The study shows that, on the one hand, the ZST does not affect the results of column- or variable-based (R-type) cluster analysis, whereas the other methods, including the LT, the ALT, and the CLT, have substantial effects on the results. On the other hand, the results of the row- or observation-based (Q-type) cluster analysis obtained from the geochemical data after applying NT and the ZST are relatively poor. However, we derive some improved results from the geochemical data after applying the CLT, the ILT, the LT, and the ALT. Moreover, the k-means and fuzzy c-means clustering algorithms are more reliable than the hierarchical algorithm when they are used to cluster the geochemical data. We apply cluster analysis to the geochemical data to explore for Au deposits within the study area, and we obtain a good correlation between the results retrieved by combining the CLT or the ILT with the k-means or fuzzy c-means algorithms and the potential zones of Au mineralization. Therefore, we suggest that the combination of the CLT or the ILT with the k-means or fuzzy c-means algorithms is an effective tool to identify potential zones of mineralization from geochemical data.

  3. Influence of birth cohort on age of onset cluster analysis in bipolar I disorder

    DEFF Research Database (Denmark)

    Bauer, M; Glenn, T; Alda, M

    2015-01-01

    Purpose: Two common approaches to identify subgroups of patients with bipolar disorder are clustering methodology (mixture analysis) based on the age of onset, and a birth cohort analysis. This study investigates if a birth cohort effect will influence the results of clustering on the age of onset...... cohort. Model-based clustering (mixture analysis) was then performed on the age of onset data using the residuals. Clinical variables in subgroups were compared. Results: There was a strong birth cohort effect. Without adjusting for the birth cohort, three subgroups were found by clustering. After...... on the age of onset, and that there is a birth cohort effect. Including the birth cohort adjustment altered the number and characteristics of subgroups detected when clustering by age of onset. Further investigation is needed to determine if combining both approaches will identify subgroups that are more...

  4. MMPI-2: Cluster Analysis of Personality Profiles in Perinatal Depression—Preliminary Evidence

    Directory of Open Access Journals (Sweden)

    Valentina Meuti

    2014-01-01

    Full Text Available Background. To assess personality characteristics of women who develop perinatal depression. Methods. The study started with a screening of a sample of 453 women in their third trimester of pregnancy, to which was administered a survey data form, the Edinburgh Postnatal Depression Scale (EPDS and the Minnesota Multiphasic Personality Inventory 2 (MMPI-2. A clinical group of subjects with perinatal depression (PND, 55 subjects was selected; clinical and validity scales of MMPI-2 were used as predictors in hierarchical cluster analysis carried out. Results. The analysis identified three clusters of personality profile: two “clinical” clusters (1 and 3 and an “apparently common” one (cluster 2. The first cluster (39.5% collects structures of personality with prevalent obsessive or dependent functioning tending to develop a “psychasthenic” depression; the third cluster (13.95% includes women with prevalent borderline functioning tending to develop “dysphoric” depression; the second cluster (46.5% shows a normal profile with a “defensive” attitude, probably due to the presence of defense mechanisms or to the fear of stigma. Conclusion. Characteristics of personality have a key role in clinical manifestations of perinatal depression; it is important to detect them to identify mothers at risk and to plan targeted therapeutic interventions.

  5. MMPI-2: Cluster Analysis of Personality Profiles in Perinatal Depression—Preliminary Evidence

    Science.gov (United States)

    Grillo, Alessandra; Lauriola, Marco; Giacchetti, Nicoletta

    2014-01-01

    Background. To assess personality characteristics of women who develop perinatal depression. Methods. The study started with a screening of a sample of 453 women in their third trimester of pregnancy, to which was administered a survey data form, the Edinburgh Postnatal Depression Scale (EPDS) and the Minnesota Multiphasic Personality Inventory 2 (MMPI-2). A clinical group of subjects with perinatal depression (PND, 55 subjects) was selected; clinical and validity scales of MMPI-2 were used as predictors in hierarchical cluster analysis carried out. Results. The analysis identified three clusters of personality profile: two “clinical” clusters (1 and 3) and an “apparently common” one (cluster 2). The first cluster (39.5%) collects structures of personality with prevalent obsessive or dependent functioning tending to develop a “psychasthenic” depression; the third cluster (13.95%) includes women with prevalent borderline functioning tending to develop “dysphoric” depression; the second cluster (46.5%) shows a normal profile with a “defensive” attitude, probably due to the presence of defense mechanisms or to the fear of stigma. Conclusion. Characteristics of personality have a key role in clinical manifestations of perinatal depression; it is important to detect them to identify mothers at risk and to plan targeted therapeutic interventions. PMID:25574499

  6. A hierarchical cluster analysis of normal-tension glaucoma using spectral-domain optical coherence tomography parameters.

    Science.gov (United States)

    Bae, Hyoung Won; Ji, Yongwoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2015-01-01

    Normal-tension glaucoma (NTG) is a heterogenous disease, and there is still controversy about subclassifications of this disorder. On the basis of spectral-domain optical coherence tomography (SD-OCT), we subdivided NTG with hierarchical cluster analysis using optic nerve head (ONH) parameters and retinal nerve fiber layer (RNFL) thicknesses. A total of 200 eyes of 200 NTG patients between March 2011 and June 2012 underwent SD-OCT scans to measure ONH parameters and RNFL thicknesses. We classified NTG into homogenous subgroups based on these variables using a hierarchical cluster analysis, and compared clusters to evaluate diverse NTG characteristics. Three clusters were found after hierarchical cluster analysis. Cluster 1 (62 eyes) had the thickest RNFL and widest rim area, and showed early glaucoma features. Cluster 2 (60 eyes) was characterized by the largest cup/disc ratio and cup volume, and showed advanced glaucomatous damage. Cluster 3 (78 eyes) had small disc areas in SD-OCT and were comprised of patients with significantly younger age, longer axial length, and greater myopia than the other 2 groups. A hierarchical cluster analysis of SD-OCT scans divided NTG patients into 3 groups based upon ONH parameters and RNFL thicknesses. It is anticipated that the small disc area group comprised of younger and more myopic patients may show unique features unlike the other 2 groups.

  7. Principal Component Clustering Approach to Teaching Quality Discriminant Analysis

    Science.gov (United States)

    Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan

    2016-01-01

    Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…

  8. Evaluation of Portland cement from X-ray diffraction associated with cluster analysis

    International Nuclear Information System (INIS)

    Gobbo, Luciano de Andrade; Montanheiro, Tarcisio Jose; Montanheiro, Filipe; Sant'Agostino, Lilia Mascarenhas

    2013-01-01

    The Brazilian cement industry produced 64 million tons of cement in 2012, with noteworthy contribution of CP-II (slag), CP-III (blast furnace) and CP-IV (pozzolanic) cements. The industrial pole comprises about 80 factories that utilize raw materials of different origins and chemical compositions that require enhanced analytical technologies to optimize production in order to gain space in the growing consumer market in Brazil. This paper assesses the sensitivity of mineralogical analysis by X-ray diffraction associated with cluster analysis to distinguish different kinds of cements with different additions. This technique can be applied, for example, in the prospection of different types of limestone (calcitic, dolomitic and siliceous) as well as in the qualification of different clinkers. The cluster analysis does not require any specific knowledge of the mineralogical composition of the diffractograms to be clustered; rather, it is based on their similarity. The materials tested for addition have different origins: fly ashes from different power stations from South Brazil and slag from different steel plants in the Southeast. Cement with different additions of limestone and white Portland cement were also used. The Rietveld method of qualitative and quantitative analysis was used for measuring the results generated by the cluster analysis technique. (author)

  9. Using Cluster Analysis to Group Countries for Cost-effectiveness Analysis: An Application to Sub-Saharan Africa.

    Science.gov (United States)

    Russell, Louise B; Bhanot, Gyan; Kim, Sun-Young; Sinha, Anushua

    2018-02-01

    To explore the use of cluster analysis to define groups of similar countries for the purpose of evaluating the cost-effectiveness of a public health intervention-maternal immunization-within the constraints of a project budget originally meant for an overall regional analysis. We used the most common cluster analysis algorithm, K-means, and the most common measure of distance, Euclidean distance, to group 37 low-income, sub-Saharan African countries on the basis of 24 measures of economic development, general health resources, and past success in public health programs. The groups were tested for robustness and reviewed by regional disease experts. We explored 2-, 3- and 4-group clustering. Public health performance was consistently important in determining the groups. For the 2-group clustering, for example, infant mortality in Group 1 was 81 per 1,000 live births compared with 51 per 1,000 in Group 2, and 67% of children in Group 1 received DPT immunization compared with 87% in Group 2. The experts preferred four groups to fewer, on the ground that national decision makers would more readily recognize their country among four groups. Clusters defined by K-means clustering made sense to subject experts and allowed a more detailed evaluation of the cost-effectiveness of maternal immunization within the constraint of the project budget. The method may be useful for other evaluations that, without having the resources to conduct separate analyses for each unit, seek to inform decision makers in numerous countries or subdivisions within countries, such as states or counties.

  10. Spatio-Temporal Diffusion Pattern and Hotspot Detection of Dengue in Chachoengsao Province, Thailand

    Directory of Open Access Journals (Sweden)

    Phaisarn Jeefoo

    2010-12-01

    Full Text Available In recent years, dengue has become a major international public health concern. In Thailand it is also an important concern as several dengue outbreaks were reported in last decade. This paper presents a GIS approach to analyze the spatial and temporal dynamics of dengue epidemics. The major objective of this study was to examine spatial diffusion patterns and hotspot identification for reported dengue cases. Geospatial diffusion pattern of the 2007 dengue outbreak was investigated. Map of daily cases was generated for the 153 days of the outbreak. Epidemiological data from Chachoengsao province, Thailand (reported dengue cases for the years 1999–2007 was used for this study. To analyze the dynamic space-time pattern of dengue outbreaks, all cases were positioned in space at a village level. After a general statistical analysis (by gender and age group, data was subsequently analyzed for temporal patterns and correlation with climatic data (especially rainfall, spatial patterns and cluster analysis, and spatio-temporal patterns of hotspots during epidemics. The results revealed spatial diffusion patterns during the years 1999–2007 representing spatially clustered patterns with significant differences by village. Villages on the urban fringe reported higher incidences. The space and time of the cases showed outbreak movement and spread patterns that could be related to entomologic and epidemiologic factors. The hotspots showed the spatial trend of dengue diffusion. This study presents useful information related to the dengue outbreak patterns in space and time and may help public health departments to plan strategies to control the spread of disease. The methodology is general for space-time analysis and can be applied for other infectious diseases as well.

  11. A cluster analysis investigation of workaholism as a syndrome.

    Science.gov (United States)

    Aziz, Shahnaz; Zickar, Michael J

    2006-01-01

    Workaholism has been conceptualized as a syndrome although there have been few tests that explicitly consider its syndrome status. The authors analyzed a three-dimensional scale of workaholism developed by Spence and Robbins (1992) using cluster analysis. The authors identified three clusters of individuals, one of which corresponded to Spence and Robbins's profile of the workaholic (high work involvement, high drive to work, low work enjoyment). Consistent with previously conjectured relations with workaholism, individuals in the workaholic cluster were more likely to label themselves as workaholics, more likely to have acquaintances label them as workaholics, and more likely to have lower life satisfaction and higher work-life imbalance. The importance of considering workaholism as a syndrome and the implications for effective interventions are discussed. Copyright 2006 APA.

  12. Sejong Open Cluster Survey (SOS). 0. Target Selection and Data Analysis

    Science.gov (United States)

    Sung, Hwankyung; Lim, Beomdu; Bessell, Michael S.; Kim, Jinyoung S.; Hur, Hyeonoh; Chun, Moo-Young; Park, Byeong-Gon

    2013-06-01

    Star clusters are superb astrophysical laboratories containing cospatial and coeval samples of stars with similar chemical composition. We initiate the Sejong Open cluster Survey (SOS) - a project dedicated to providing homogeneous photometry of a large number of open clusters in the SAAO Johnson-Cousins' UBVI system. To achieve our main goal, we pay much attention to the observation of standard stars in order to reproduce the SAAO standard system. Many of our targets are relatively small sparse clusters that escaped previous observations. As clusters are considered building blocks of the Galactic disk, their physical properties such as the initial mass function, the pattern of mass segregation, etc. give valuable information on the formation and evolution of the Galactic disk. The spatial distribution of young open clusters will be used to revise the local spiral arm structure of the Galaxy. In addition, the homogeneous data can also be used to test stellar evolutionary theory, especially concerning rare massive stars. In this paper we present the target selection criteria, the observational strategy for accurate photometry, and the adopted calibrations for data analysis such as color-color relations, zero-age main sequence relations, Sp - M_V relations, Sp - T_{eff} relations, Sp - color relations, and T_{eff} - BC relations. Finally we provide some data analysis such as the determination of the reddening law, the membership selection criteria, and distance determination.

  13. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    Science.gov (United States)

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  14. Fault detection of flywheel system based on clustering and principal component analysis

    Directory of Open Access Journals (Sweden)

    Wang Rixin

    2015-12-01

    Full Text Available Considering the nonlinear, multifunctional properties of double-flywheel with closed-loop control, a two-step method including clustering and principal component analysis is proposed to detect the two faults in the multifunctional flywheels. At the first step of the proposed algorithm, clustering is taken as feature recognition to check the instructions of “integrated power and attitude control” system, such as attitude control, energy storage or energy discharge. These commands will ask the flywheel system to work in different operation modes. Therefore, the relationship of parameters in different operations can define the cluster structure of training data. Ordering points to identify the clustering structure (OPTICS can automatically identify these clusters by the reachability-plot. K-means algorithm can divide the training data into the corresponding operations according to the reachability-plot. Finally, the last step of proposed model is used to define the relationship of parameters in each operation through the principal component analysis (PCA method. Compared with the PCA model, the proposed approach is capable of identifying the new clusters and learning the new behavior of incoming data. The simulation results show that it can effectively detect the faults in the multifunctional flywheels system.

  15. Technology Clusters Exploration for Patent Portfolio through Patent Abstract Analysis

    Directory of Open Access Journals (Sweden)

    Gabjo Kim

    2016-12-01

    Full Text Available This study explores technology clusters through patent analysis. The aim of exploring technology clusters is to grasp competitors’ levels of sustainable research and development (R&D and establish a sustainable strategy for entering an industry. To achieve this, we first grouped the patent documents with similar technologies by applying affinity propagation (AP clustering, which is effective while grouping large amounts of data. Next, in order to define the technology clusters, we adopted the term frequency-inverse document frequency (TF-IDF weight, which lists the terms in order of importance. We collected the patent data of Korean electric car companies from the United States Patent and Trademark Office (USPTO to verify our proposed methodology. As a result, our proposed methodology presents more detailed information on the Korean electric car industry than previous studies.

  16. Spatio-temporal modeling for residential burglary

    NARCIS (Netherlands)

    Mahfoud, M.; Bhulai, Sandjai; van der Mei, R.D.; Bhulai, Sandjai; Kardaras, Dimitris

    2017-01-01

    Spatio-temporal modeling is widely recognized as a promising means for predicting crime patterns. Despite their enormous potential, the available methods are still in their infancy. A lot of research focuses on crime hotspot detection and geographic crime clusters, while a systematic approach to

  17. Real-Time EEG Signal Enhancement Using Canonical Correlation Analysis and Gaussian Mixture Clustering

    Directory of Open Access Journals (Sweden)

    Chin-Teng Lin

    2018-01-01

    Full Text Available Electroencephalogram (EEG signals are usually contaminated with various artifacts, such as signal associated with muscle activity, eye movement, and body motion, which have a noncerebral origin. The amplitude of such artifacts is larger than that of the electrical activity of the brain, so they mask the cortical signals of interest, resulting in biased analysis and interpretation. Several blind source separation methods have been developed to remove artifacts from the EEG recordings. However, the iterative process for measuring separation within multichannel recordings is computationally intractable. Moreover, manually excluding the artifact components requires a time-consuming offline process. This work proposes a real-time artifact removal algorithm that is based on canonical correlation analysis (CCA, feature extraction, and the Gaussian mixture model (GMM to improve the quality of EEG signals. The CCA was used to decompose EEG signals into components followed by feature extraction to extract representative features and GMM to cluster these features into groups to recognize and remove artifacts. The feasibility of the proposed algorithm was demonstrated by effectively removing artifacts caused by blinks, head/body movement, and chewing from EEG recordings while preserving the temporal and spectral characteristics of the signals that are important to cognitive research.

  18. Application of Cluster Analysis in Assessment of Dietary Habits of Secondary School Students

    Directory of Open Access Journals (Sweden)

    Zalewska Magdalena

    2014-12-01

    Full Text Available Maintenance of proper health and prevention of diseases of civilization are now significant public health problems. Nutrition is an important factor in the development of youth, as well as the current and future state of health. The aim of the study was to show the benefits of the application of cluster analysis to assess the dietary habits of high school students. The survey was carried out on 1,631 eighteen-year-old students in seven randomly selected secondary schools in Bialystok using a self-prepared anonymous questionnaire. An evaluation of the time of day meals were eaten and the number of meals consumed was made for the surveyed students. The cluster analysis allowed distinguishing characteristic structures of dietary habits in the observed population. Four clusters were identified, which were characterized by relative internal homogeneity and substantial variation in terms of the number of meals during the day and the time of their consumption. The most important characteristics of cluster 1 were cumulated food ration in 2 or 3 meals and long intervals between meals. Cluster 2 was characterized by eating the recommended number of 4 or 5 meals a day. In the 3rd cluster, students ate 3 meals a day with large intervals between them, and in the 4th they had four meals a day while maintaining proper intervals between them. In all clusters dietary mistakes occurred, but most of them were related to clusters 1 and 3. Cluster analysis allowed for the identification of major flaws in nutrition, which may include irregular eating and skipping meals, and indicated possible connections between eating patterns and disturbances of body weight in the examined population.

  19. Profiling physical activity motivation based on self-determination theory: a cluster analysis approach.

    Science.gov (United States)

    Friederichs, Stijn Ah; Bolman, Catherine; Oenema, Anke; Lechner, Lilian

    2015-01-01

    In order to promote physical activity uptake and maintenance in individuals who do not comply with physical activity guidelines, it is important to increase our understanding of physical activity motivation among this group. The present study aimed to examine motivational profiles in a large sample of adults who do not comply with physical activity guidelines. The sample for this study consisted of 2473 individuals (31.4% male; age 44.6 ± 12.9). In order to generate motivational profiles based on motivational regulation, a cluster analysis was conducted. One-way analyses of variance were then used to compare the clusters in terms of demographics, physical activity level, motivation to be active and subjective experience while being active. Three motivational clusters were derived based on motivational regulation scores: a low motivation cluster, a controlled motivation cluster and an autonomous motivation cluster. These clusters differed significantly from each other with respect to physical activity behavior, motivation to be active and subjective experience while being active. Overall, the autonomous motivation cluster displayed more favorable characteristics compared to the other two clusters. The results of this study provide additional support for the importance of autonomous motivation in the context of physical activity behavior. The three derived clusters may be relevant in the context of physical activity interventions as individuals within the different clusters might benefit most from different intervention approaches. In addition, this study shows that cluster analysis is a useful method for differentiating between motivational profiles in large groups of individuals who do not comply with physical activity guidelines.

  20. Deconstructing Bipolar Disorder and Schizophrenia: A cross-diagnostic cluster analysis of cognitive phenotypes.

    Science.gov (United States)

    Lee, Junghee; Rizzo, Shemra; Altshuler, Lori; Glahn, David C; Miklowitz, David J; Sugar, Catherine A; Wynn, Jonathan K; Green, Michael F

    2017-02-01

    Bipolar disorder (BD) and schizophrenia (SZ) show substantial overlap. It has been suggested that a subgroup of patients might contribute to these overlapping features. This study employed a cross-diagnostic cluster analysis to identify subgroups of individuals with shared cognitive phenotypes. 143 participants (68 BD patients, 39 SZ patients and 36 healthy controls) completed a battery of EEG and performance assessments on perception, nonsocial cognition and social cognition. A K-means cluster analysis was conducted with all participants across diagnostic groups. Clinical symptoms, functional capacity, and functional outcome were assessed in patients. A two-cluster solution across 3 groups was the most stable. One cluster including 44 BD patients, 31 controls and 5 SZ patients showed better cognition (High cluster) than the other cluster with 24 BD patients, 35 SZ patients and 5 controls (Low cluster). BD patients in the High cluster performed better than BD patients in the Low cluster across cognitive domains. Within each cluster, participants with different clinical diagnoses showed different profiles across cognitive domains. All patients are in the chronic phase and out of mood episode at the time of assessment and most of the assessment were behavioral measures. This study identified two clusters with shared cognitive phenotype profiles that were not proxies for clinical diagnoses. The finding of better social cognitive performance of BD patients than SZ patients in the Lowe cluster suggest that relatively preserved social cognition may be important to identify disease process distinct to each disorder. Copyright © 2016 Elsevier B.V. All rights reserved.

  1. Schedulability Analysis and Optimization for the Synthesis of Multi-Cluster Distributed Embedded Systems

    DEFF Research Database (Denmark)

    Pop, Paul; Eles, Petru; Peng, Zebo

    2003-01-01

    We present an approach to schedulability analysis for the synthesis of multi-cluster distributed embedded systems consisting of time-triggered and event-triggered clusters, interconnected via gateways. We have also proposed a buffer size and worst case queuing delay analysis for the gateways......, responsible for routing inter-cluster traffic. Optimization heuristics for the priority assignment and synthesis of bus access parameters aimed at producing a schedulable system with minimal buffer needs have been proposed. Extensive experiments and a real-life example show the efficiency of our approaches....

  2. Schedulability Analysis and Optimization for the Synthesis of Multi-Cluster Distributed Embedded Systems

    DEFF Research Database (Denmark)

    Pop, Paul; Eles, Petru; Peng, Zebo

    2003-01-01

    An approach to schedulability analysis for the synthesis of multi-cluster distributed embedded systems consisting of time-triggered and event-triggered clusters, interconnected via gateways, is presented. A buffer size and worst case queuing delay analysis for the gateways, responsible for routing...... inter-cluster traffic, is also proposed. Optimisation heuristics for the priority assignment and synthesis of bus access parameters aimed at producing a schedulable system with minimal buffer needs have been proposed. Extensive experiments and a real-life example show the efficiency of the approaches....

  3. DGA Clustering and Analysis: Mastering Modern, Evolving Threats, DGALab

    Directory of Open Access Journals (Sweden)

    Alexander Chailytko

    2016-05-01

    Full Text Available Domain Generation Algorithms (DGA is a basic building block used in almost all modern malware. Malware researchers have attempted to tackle the DGA problem with various tools and techniques, with varying degrees of success. We present a complex solution to populate DGA feed using reversed DGAs, third-party feeds, and a smart DGA extraction and clustering based on emulation of a large number of samples. Smart DGA extraction requires no reverse engineering and works regardless of the DGA type or initialization vector, while enabling a cluster-based analysis. Our method also automatically allows analysis of the whole malware family, specific campaign, etc. We present our system and demonstrate its abilities on more than 20 malware families. This includes showing connections between different campaigns, as well as comparing results. Most importantly, we discuss how to utilize the outcome of the analysis to create smarter protections against similar malware.

  4. Genetic Diversity and Relationships of Neolamarckia cadamba (Roxb. Bosser progenies through cluster analysis

    Directory of Open Access Journals (Sweden)

    M. Preethi Shree

    2018-04-01

    Full Text Available Genetic diversity analysis was conducted for biometric attributes in 20 progenies of Neolamarckia cadamba. The application of D2 clustering technique in Neolamarckia cadamba genetic resources resolved the 20 progenies into five clusters. The maximum intra cluster distance was shown by the cluster II. The maximum inter cluster distance was recorded between cluster III and V which indicated the presence of wider genetic distance between Neolamarckia cadamba progenies. Among the growth attributes, volume (36.84 % contributed maximum towards genetic divergence followed by bole height, basal diameter, tree height, number of branches in Neolamarckia cadamba progenies.

  5. Predicting healthcare outcomes in prematurely born infants using cluster analysis.

    Science.gov (United States)

    MacBean, Victoria; Lunt, Alan; Drysdale, Simon B; Yarzi, Muska N; Rafferty, Gerrard F; Greenough, Anne

    2018-05-23

    Prematurely born infants are at high risk of respiratory morbidity following neonatal unit discharge, though prediction of outcomes is challenging. We have tested the hypothesis that cluster analysis would identify discrete groups of prematurely born infants with differing respiratory outcomes during infancy. A total of 168 infants (median (IQR) gestational age 33 (31-34) weeks) were recruited in the neonatal period from consecutive births in a tertiary neonatal unit. The baseline characteristics of the infants were used to classify them into hierarchical agglomerative clusters. Rates of viral lower respiratory tract infections (LRTIs) were recorded for 151 infants in the first year after birth. Infants could be classified according to birth weight and duration of neonatal invasive mechanical ventilation (MV) into three clusters. Cluster one (MV ≤5 days) had few LRTIs. Clusters two and three (both MV ≥6 days, but BW ≥or <882 g respectively), had significantly higher LRTI rates. Cluster two had a higher proportion of infants experiencing respiratory syncytial virus LRTIs (P = 0.01) and cluster three a higher proportion of rhinovirus LRTIs (P < 0.001) CONCLUSIONS: Readily available clinical data allowed classification of prematurely born infants into one of three distinct groups with differing subsequent respiratory morbidity in infancy. © 2018 Wiley Periodicals, Inc.

  6. Analysis and comparison of very large metagenomes with fast clustering and functional annotation

    Directory of Open Access Journals (Sweden)

    Li Weizhong

    2009-10-01

    Full Text Available Abstract Background The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. Results The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes". Conclusion RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from http://tools.camera.calit2.net/camera/rammcap/.

  7. Study on Adaptive Parameter Determination of Cluster Analysis in Urban Management Cases

    Science.gov (United States)

    Fu, J. Y.; Jing, C. F.; Du, M. Y.; Fu, Y. L.; Dai, P. P.

    2017-09-01

    The fine management for cities is the important way to realize the smart city. The data mining which uses spatial clustering analysis for urban management cases can be used in the evaluation of urban public facilities deployment, and support the policy decisions, and also provides technical support for the fine management of the city. Aiming at the problem that DBSCAN algorithm which is based on the density-clustering can not realize parameter adaptive determination, this paper proposed the optimizing method of parameter adaptive determination based on the spatial analysis. Firstly, making analysis of the function Ripley's K for the data set to realize adaptive determination of global parameter MinPts, which means setting the maximum aggregation scale as the range of data clustering. Calculating every point object's highest frequency K value in the range of Eps which uses K-D tree and setting it as the value of clustering density to realize the adaptive determination of global parameter MinPts. Then, the R language was used to optimize the above process to accomplish the precise clustering of typical urban management cases. The experimental results based on the typical case of urban management in XiCheng district of Beijing shows that: The new DBSCAN clustering algorithm this paper presents takes full account of the data's spatial and statistical characteristic which has obvious clustering feature, and has a better applicability and high quality. The results of the study are not only helpful for the formulation of urban management policies and the allocation of urban management supervisors in XiCheng District of Beijing, but also to other cities and related fields.

  8. STUDY ON ADAPTIVE PARAMETER DETERMINATION OF CLUSTER ANALYSIS IN URBAN MANAGEMENT CASES

    Directory of Open Access Journals (Sweden)

    J. Y. Fu

    2017-09-01

    Full Text Available The fine management for cities is the important way to realize the smart city. The data mining which uses spatial clustering analysis for urban management cases can be used in the evaluation of urban public facilities deployment, and support the policy decisions, and also provides technical support for the fine management of the city. Aiming at the problem that DBSCAN algorithm which is based on the density-clustering can not realize parameter adaptive determination, this paper proposed the optimizing method of parameter adaptive determination based on the spatial analysis. Firstly, making analysis of the function Ripley's K for the data set to realize adaptive determination of global parameter MinPts, which means setting the maximum aggregation scale as the range of data clustering. Calculating every point object’s highest frequency K value in the range of Eps which uses K-D tree and setting it as the value of clustering density to realize the adaptive determination of global parameter MinPts. Then, the R language was used to optimize the above process to accomplish the precise clustering of typical urban management cases. The experimental results based on the typical case of urban management in XiCheng district of Beijing shows that: The new DBSCAN clustering algorithm this paper presents takes full account of the data’s spatial and statistical characteristic which has obvious clustering feature, and has a better applicability and high quality. The results of the study are not only helpful for the formulation of urban management policies and the allocation of urban management supervisors in XiCheng District of Beijing, but also to other cities and related fields.

  9. HICOSMO - X-ray analysis of a complete sample of galaxy clusters

    Science.gov (United States)

    Schellenberger, G.; Reiprich, T.

    2017-10-01

    Galaxy clusters are known to be the largest virialized objects in the Universe. Based on the theory of structure formation one can use them as cosmological probes, since they originate from collapsed overdensities in the early Universe and witness its history. The X-ray regime provides the unique possibility to measure in detail the most massive visible component, the intra cluster medium. Using Chandra observations of a local sample of 64 bright clusters (HIFLUGCS) we provide total (hydrostatic) and gas mass estimates of each cluster individually. Making use of the completeness of the sample we quantify two interesting cosmological parameters by a Bayesian cosmological likelihood analysis. We find Ω_{M}=0.3±0.01 and σ_{8}=0.79±0.03 (statistical uncertainties) using our default analysis strategy combining both, a mass function analysis and the gas mass fraction results. The main sources of biases that we discuss and correct here are (1) the influence of galaxy groups (higher incompleteness in parent samples and a differing behavior of the L_{x} - M relation), (2) the hydrostatic mass bias (as determined by recent hydrodynamical simulations), (3) the extrapolation of the total mass (comparing various methods), (4) the theoretical halo mass function and (5) other cosmological (non-negligible neutrino mass), and instrumental (calibration) effects.

  10. Cataloging the Praesepe Cluster: Identifying Interlopers and Binary Systems

    Science.gov (United States)

    Lucey, Madeline R.; Gosnell, Natalie M.; Mann, Andrew; Douglas, Stephanie

    2018-01-01

    We present radial velocity measurements from an ongoing survey of the Praesepe open cluster using the WIYN 3.5m Telescope. Our target stars include 229 early-K to mid-M dwarfs with proper motion memberships that have been observed by the repurposed Kepler mission, K2. With this survey, we will provide a well-constrained membership list of the cluster. By removing interloping stars and determining the cluster binary frequency we can avoid systematic errors in our analysis of the K2 findings and more accurately determine exoplanet properties in the Praesepe cluster. Obtaining accurate exoplanet parameters in open clusters allows us to study the temporal dimension of exoplanet parameter space. We find Praesepe to have a mean radial velocity of 34.09 km/s and a velocity dispersion of 1.13 km/s, which is consistent with previous studies. We derive radial velocity membership probabilities for stars with ≥3 radial velocity measurements and compare against published membership probabilities. We also identify radial velocity variables and potential double-lined spectroscopic binaries. We plan to obtain more observations to determine the radial velocity membership of all the stars in our sample, as well as follow up on radial velocity variables to determine binary orbital solutions.

  11. Forecasting Antarctic Sea Ice Concentrations Using Results of Temporal Mixture Analysis

    Science.gov (United States)

    Chi, Junhwa; Kim, Hyun-Cheol

    2016-06-01

    Sea ice concentration (SIC) data acquired by passive microwave sensors at daily temporal frequencies over extended areas provide seasonal characteristics of sea ice dynamics and play a key role as an indicator of global climate trends; however, it is typically challenging to study long-term time series. Of the various advanced remote sensing techniques that address this issue, temporal mixture analysis (TMA) methods are often used to investigate the temporal characteristics of environmental factors, including SICs in the case of the present study. This study aims to forecast daily SICs for one year using a combination of TMA and time series modeling in two stages. First, we identify temporally meaningful sea ice signatures, referred to as temporal endmembers, using machine learning algorithms, and then we decompose each pixel into a linear combination of temporal endmembers. Using these corresponding fractional abundances of endmembers, we apply a autoregressive model that generally fits all Antarctic SIC data for 1979 to 2013 to forecast SIC values for 2014. We compare our results using the proposed approach based on daily SIC data reconstructed from real fractional abundances derived from a pixel unmixing method and temporal endmember signatures. The proposed method successfully forecasts new fractional abundance values, and the resulting images are qualitatively and quantitatively similar to the reference data.

  12. Temporal abstraction for the analysis of intensive care information

    Science.gov (United States)

    Hadad, Alejandro J.; Evin, Diego A.; Drozdowicz, Bartolomé; Chiotti, Omar

    2007-11-01

    This paper proposes a scheme for the analysis of time-stamped series data from multiple monitoring devices of intensive care units, using Temporal Abstraction concepts. This scheme is oriented to obtain a description of the patient state evolution in an unsupervised way. The case of study is based on a dataset clinically classified with Pulmonary Edema. For this dataset a trends based Temporal Abstraction mechanism is proposed, by means of a Behaviours Base of time-stamped series and then used in a classification step. Combining this approach with the introduction of expert knowledge, using Fuzzy Logic, and multivariate analysis by means of Self-Organizing Maps, a states characterization model is obtained. This model is feasible of being extended to different patients groups and states. The proposed scheme allows to obtain intermediate states descriptions through which it is passing the patient and that could be used to anticipate alert situations.

  13. Temporal abstraction for the analysis of intensive care information

    International Nuclear Information System (INIS)

    Hadad, Alejandro J; Evin, Diego A; Drozdowicz, Bartolome; Chiotti, Omar

    2007-01-01

    This paper proposes a scheme for the analysis of time-stamped series data from multiple monitoring devices of intensive care units, using Temporal Abstraction concepts. This scheme is oriented to obtain a description of the patient state evolution in an unsupervised way. The case of study is based on a dataset clinically classified with Pulmonary Edema. For this dataset a trends based Temporal Abstraction mechanism is proposed, by means of a Behaviours Base of time-stamped series and then used in a classification step. Combining this approach with the introduction of expert knowledge, using Fuzzy Logic, and multivariate analysis by means of Self-Organizing Maps, a states characterization model is obtained. This model is feasible of being extended to different patients groups and states. The proposed scheme allows to obtain intermediate states descriptions through which it is passing the patient and that could be used to anticipate alert situations

  14. CHOOSING A HEALTH INSTITUTION WITH MULTIPLE CORRESPONDENCE ANALYSIS AND CLUSTER ANALYSIS IN A POPULATION BASED STUDY

    Directory of Open Access Journals (Sweden)

    ASLI SUNER

    2013-06-01

    Full Text Available Multiple correspondence analysis is a method making easy to interpret the categorical variables given in contingency tables, showing the similarities, associations as well as divergences among these variables via graphics on a lower dimensional space. Clustering methods are helped to classify the grouped data according to their similarities and to get useful summarized data from them. In this study, interpretations of multiple correspondence analysis are supported by cluster analysis; factors affecting referred health institute such as age, disease group and health insurance are examined and it is aimed to compare results of the methods.

  15. Identifying influential individuals on intensive care units: using cluster analysis to explore culture.

    Science.gov (United States)

    Fong, Allan; Clark, Lindsey; Cheng, Tianyi; Franklin, Ella; Fernandez, Nicole; Ratwani, Raj; Parker, Sarah Henrickson

    2017-07-01

    The objective of this paper is to identify attribute patterns of influential individuals in intensive care units using unsupervised cluster analysis. Despite the acknowledgement that culture of an organisation is critical to improving patient safety, specific methods to shift culture have not been explicitly identified. A social network analysis survey was conducted and an unsupervised cluster analysis was used. A total of 100 surveys were gathered. Unsupervised cluster analysis was used to group individuals with similar dimensions highlighting three general genres of influencers: well-rounded, knowledge and relational. Culture is created locally by individual influencers. Cluster analysis is an effective way to identify common characteristics among members of an intensive care unit team that are noted as highly influential by their peers. To change culture, identifying and then integrating the influencers in intervention development and dissemination may create more sustainable and effective culture change. Additional studies are ongoing to test the effectiveness of utilising these influencers to disseminate patient safety interventions. This study offers an approach that can be helpful in both identifying and understanding influential team members and may be an important aspect of developing methods to change organisational culture. © 2017 John Wiley & Sons Ltd.

  16. Short-Term Wind Power Forecasting Based on Clustering Pre-Calculated CFD Method

    Directory of Open Access Journals (Sweden)

    Yimei Wang

    2018-04-01

    Full Text Available To meet the increasing wind power forecasting (WPF demands of newly built wind farms without historical data, physical WPF methods are widely used. The computational fluid dynamics (CFD pre-calculated flow fields (CPFF-based WPF is a promising physical approach, which can balance well the competing demands of computational efficiency and accuracy. To enhance its adaptability for wind farms in complex terrain, a WPF method combining wind turbine clustering with CPFF is first proposed where the wind turbines in the wind farm are clustered and a forecasting is undertaken for each cluster. K-means, hierarchical agglomerative and spectral analysis methods are used to establish the wind turbine clustering models. The Silhouette Coefficient, Calinski-Harabaz index and within-between index are proposed as criteria to evaluate the effectiveness of the established clustering models. Based on different clustering methods and schemes, various clustering databases are built for clustering pre-calculated CFD (CPCC-based short-term WPF. For the wind farm case studied, clustering evaluation criteria show that hierarchical agglomerative clustering has reasonable results, spectral clustering is better and K-means gives the best performance. The WPF results produced by different clustering databases also prove the effectiveness of the three evaluation criteria in turn. The newly developed CPCC model has a much higher WPF accuracy than the CPFF model without using clustering techniques, both on temporal and spatial scales. The research provides supports for both the development and improvement of short-term physical WPF systems.

  17. Diagnostics of subtropical plants functional state by cluster analysis

    Directory of Open Access Journals (Sweden)

    Oksana Belous

    2016-05-01

    Full Text Available The article presents an application example of statistical methods for data analysis on diagnosis of the adaptive capacity of subtropical plants varieties. We depicted selection indicators and basic physiological parameters that were defined as diagnostic. We used evaluation on a set of parameters of water regime, there are: determination of water deficit of the leaves, determining the fractional composition of water and detection parameters of the concentration of cell sap (CCS (for tea culture flushes. These settings are characterized by high liability and high responsiveness to the effects of many abiotic factors that determined the particular care in the selection of plant material for analysis and consideration of the impact on sustainability. On the basis of the experimental data calculated the coefficients of pair correlation between climatic factors and used physiological indicators. The result was a selection of physiological and biochemical indicators proposed to assess the adaptability and included in the basis of methodical recommendations on diagnostics of the functional state of the studied cultures. Analysis of complex studies involving a large number of indicators is quite difficult, especially does not allow to quickly identify the similarity of new varieties for their adaptive responses to adverse factors, and, therefore, to set general requirements to conditions of cultivation. Use of cluster analysis suggests that in the analysis of only quantitative data; define a set of variables used to assess varieties (and the more sampling, the more accurate the clustering will happen, be sure to ascertain the measure of similarity (or difference between objects. It is shown that the identification of diagnostic features, which are subjected to statistical processing, impact the accuracy of the varieties classification. Selection in result of the mono-clusters analysis (variety tea Kolhida; hazelnut Lombardsky red; variety kiwi Monty

  18. Spatio-temporal trends and risk factors for Shigella from 2001 to 2011 in Jiangsu Province, People's Republic of China.

    Directory of Open Access Journals (Sweden)

    Fenyang Tang

    Full Text Available This study aimed to describe the spatial and temporal trends of Shigella incidence rates in Jiangsu Province, People's Republic of China. It also intended to explore complex risk modes facilitating Shigella transmission.County-level incidence rates were obtained for analysis using geographic information system (GIS tools. Trend surface and incidence maps were established to describe geographic distributions. Spatio-temporal cluster analysis and autocorrelation analysis were used for detecting clusters. Based on the number of monthly Shigella cases, an autoregressive integrated moving average (ARIMA model successfully established a time series model. A spatial correlation analysis and a case-control study were conducted to identify risk factors contributing to Shigella transmissions.The far southwestern and northwestern areas of Jiangsu were the most infected. A cluster was detected in southwestern Jiangsu (LLR = 11674.74, P<0.001. The time series model was established as ARIMA (1, 12, 0, which predicted well for cases from August to December, 2011. Highways and water sources potentially caused spatial variation in Shigella development in Jiangsu. The case-control study confirmed not washing hands before dinner (OR = 3.64 and not having access to a safe water source (OR = 2.04 as the main causes of Shigella in Jiangsu Province.Improvement of sanitation and hygiene should be strengthened in economically developed counties, while access to a safe water supply in impoverished areas should be increased at the same time.

  19. The reflection of hierarchical cluster analysis of co-occurrence matrices in SPSS

    NARCIS (Netherlands)

    Zhou, Q.; Leng, F.; Leydesdorff, L.

    2015-01-01

    Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare

  20. Poisson cluster analysis of cardiac arrest incidence in Columbus, Ohio.

    Science.gov (United States)

    Warden, Craig; Cudnik, Michael T; Sasson, Comilla; Schwartz, Greg; Semple, Hugh

    2012-01-01

    Scarce resources in disease prevention and emergency medical services (EMS) need to be focused on high-risk areas of out-of-hospital cardiac arrest (OHCA). Cluster analysis using geographic information systems (GISs) was used to find these high-risk areas and test potential predictive variables. This was a retrospective cohort analysis of EMS-treated adults with OHCAs occurring in Columbus, Ohio, from April 1, 2004, through March 31, 2009. The OHCAs were aggregated to census tracts and incidence rates were calculated based on their adult populations. Poisson cluster analysis determined significant clusters of high-risk census tracts. Both census tract-level and case-level characteristics were tested for association with high-risk areas by multivariate logistic regression. A total of 2,037 eligible OHCAs occurred within the city limits during the study period. The mean incidence rate was 0.85 OHCAs/1,000 population/year. There were five significant geographic clusters with 76 high-risk census tracts out of the total of 245 census tracts. In the case-level analysis, being in a high-risk cluster was associated with a slightly younger age (-3 years, adjusted odds ratio [OR] 0.99, 95% confidence interval [CI] 0.99-1.00), not being white, non-Hispanic (OR 0.54, 95% CI 0.45-0.64), cardiac arrest occurring at home (OR 1.53, 95% CI 1.23-1.71), and not receiving bystander cardiopulmonary resuscitation (CPR) (OR 0.77, 95% CI 0.62-0.96), but with higher survival to hospital discharge (OR 1.78, 95% CI 1.30-2.46). In the census tract-level analysis, high-risk census tracts were also associated with a slightly lower average age (-0.1 years, OR 1.14, 95% CI 1.06-1.22) and a lower proportion of white, non-Hispanic patients (-0.298, OR 0.04, 95% CI 0.01-0.19), but also a lower proportion of high-school graduates (-0.184, OR 0.00, 95% CI 0.00-0.00). This analysis identified high-risk census tracts and associated census tract-level and case-level characteristics that can be used to

  1. Extending the input–output energy balance methodology in agriculture through cluster analysis

    International Nuclear Information System (INIS)

    Bojacá, Carlos Ricardo; Casilimas, Héctor Albeiro; Gil, Rodrigo; Schrevens, Eddie

    2012-01-01

    The input–output balance methodology has been applied to characterize the energy balance of agricultural systems. This study proposes to extend this methodology with the inclusion of multivariate analysis to reveal particular patterns in the energy use of a system. The objective was to demonstrate the usefulness of multivariate exploratory techniques to analyze the variability found in a farming system and, establish efficiency categories that can be used to improve the energy balance of the system. To this purpose an input–output analysis was applied to the major greenhouse tomato production area in Colombia. Individual energy profiles were built and the k-means clustering method was applied to the production factors. On average, the production system in the study zone consumes 141.8 GJ ha −1 to produce 96.4 GJ ha −1 , resulting in an energy efficiency of 0.68. With the k-means clustering analysis, three clusters of farmers were identified with energy efficiencies of 0.54, 0.67 and 0.78. The most energy efficient cluster grouped 56.3% of the farmers. It is possible to optimize the production system by improving the management practices of those with the lowest energy use efficiencies. Multivariate analysis techniques demonstrated to be a complementary pathway to improve the energy efficiency of a system. -- Highlights: ► An input–output energy balance was estimated for greenhouse tomatoes in Colombia. ► We used the k-means clustering method to classify growers based on their energy use. ► Three clusters of growers were found with energy efficiencies of 0.54, 0.67 and 0.78. ► Overall system optimization is possible by improving the energy use of the less efficient.

  2. Centrality measures in temporal networks with time series analysis

    Science.gov (United States)

    Huang, Qiangjuan; Zhao, Chengli; Zhang, Xue; Wang, Xiaojie; Yi, Dongyun

    2017-05-01

    The study of identifying important nodes in networks has a wide application in different fields. However, the current researches are mostly based on static or aggregated networks. Recently, the increasing attention to networks with time-varying structure promotes the study of node centrality in temporal networks. In this paper, we define a supra-evolution matrix to depict the temporal network structure. With using of the time series analysis, the relationships between different time layers can be learned automatically. Based on the special form of the supra-evolution matrix, the eigenvector centrality calculating problem is turned into the calculation of eigenvectors of several low-dimensional matrices through iteration, which effectively reduces the computational complexity. Experiments are carried out on two real-world temporal networks, Enron email communication network and DBLP co-authorship network, the results of which show that our method is more efficient at discovering the important nodes than the common aggregating method.

  3. Spatial and temporal analysis of mass movement using dendrochronology

    NARCIS (Netherlands)

    Braam, R.R.; Weiss, E.E.J.; Burrough, P.A.

    1987-01-01

    Tree growth and inclination on sloping land is affected by mass movement. Suitable analysis of tree growth and tree form can therefore provide considerable information on mass movement activity. This paper reports a new, automated method for studying the temporal and spatial aspects of mass

  4. Temporal and spatial distribution of human cryptosporidiosis in the west of Ireland 2004-2007.

    LENUS (Irish Health Repository)

    Callaghan, Mary

    2009-01-01

    BACKGROUND: Cryptosporidiosis is increasingly recognised as a cause of gastrointestinal infection in Ireland and has been implicated in several outbreaks. This study aimed to investigate the spatial and temporal distribution of human cryptosporidiosis in the west of Ireland in order to identify high risk seasons and areas and to compare Classically Calculated (CC) and Empirical Bayesian (EB) incidence rates. Two spatial scales of analysis were used with a view to identifying the best one in assessing geographical patterns of infection. Global Moran\\'s I and Local Moran\\'s I tests of autocorrelation were used to test for evidence of global and local spatial clustering. RESULTS: There were statistically significant seasonal patterns of cryptosporidiosis with peaks in spring and an increasing temporal trend. Significant (p < 0.05) global spatial clustering was observed in CC rates at the Electoral Division (ED) level but not in EB rates at the same level. Despite variations in disease, ED level was found to provide the most accurate account of distribution of cryptosporidiosis in the West of Ireland but required spatial EB smoothing of cases. There were a number of areas identified with significant local clustering of cryptosporidiosis rates. CONCLUSION: This study identified spatial and temporal patterns in cryptosporidiosis distribution. The study also showed benefit in performing spatial analyses at more than one spatial scale to assess geographical patterns in disease distribution and that smoothing of disease rates for mapping in small areas enhances visualisation of spatial patterns. These findings are relevant in guiding policy decisions on disease control strategies.

  5. Distributed Similarity based Clustering and Compressed Forwarding for wireless sensor networks.

    Science.gov (United States)

    Arunraja, Muruganantham; Malathi, Veluchamy; Sakthivel, Erulappan

    2015-11-01

    Wireless sensor networks are engaged in various data gathering applications. The major bottleneck in wireless data gathering systems is the finite energy of sensor nodes. By conserving the on board energy, the life span of wireless sensor network can be well extended. Data communication being the dominant energy consuming activity of wireless sensor network, data reduction can serve better in conserving the nodal energy. Spatial and temporal correlation among the sensor data is exploited to reduce the data communications. Data similar cluster formation is an effective way to exploit spatial correlation among the neighboring sensors. By sending only a subset of data and estimate the rest using this subset is the contemporary way of exploiting temporal correlation. In Distributed Similarity based Clustering and Compressed Forwarding for wireless sensor networks, we construct data similar iso-clusters with minimal communication overhead. The intra-cluster communication is reduced using adaptive-normalized least mean squares based dual prediction framework. The cluster head reduces the inter-cluster data payload using a lossless compressive forwarding technique. The proposed work achieves significant data reduction in both the intra-cluster and the inter-cluster communications, with the optimal data accuracy of collected data. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.

  6. The quantitative analysis of silicon carbide surface smoothing by Ar and Xe cluster ions

    Science.gov (United States)

    Ieshkin, A. E.; Kireev, D. S.; Ermakov, Yu. A.; Trifonov, A. S.; Presnov, D. E.; Garshev, A. V.; Anufriev, Yu. V.; Prokhorova, I. G.; Krupenin, V. A.; Chernysh, V. S.

    2018-04-01

    The gas cluster ion beam technique was used for the silicon carbide crystal surface smoothing. The effect of processing by two inert cluster ions, argon and xenon, was quantitatively compared. While argon is a standard element for GCIB, results for xenon clusters were not reported yet. Scanning probe microscopy and high resolution transmission electron microscopy techniques were used for the analysis of the surface roughness and surface crystal layer quality. The gas cluster ion beam processing results in surface relief smoothing down to average roughness about 1 nm for both elements. It was shown that xenon as the working gas is more effective: sputtering rate for xenon clusters is 2.5 times higher than for argon at the same beam energy. High resolution transmission electron microscopy analysis of the surface defect layer gives values of 7 ± 2 nm and 8 ± 2 nm for treatment with argon and xenon clusters.

  7. Temporal expression-based analysis of metabolism.

    Directory of Open Access Journals (Sweden)

    Sara B Collins

    Full Text Available Metabolic flux is frequently rerouted through cellular metabolism in response to dynamic changes in the intra- and extra-cellular environment. Capturing the mechanisms underlying these metabolic transitions in quantitative and predictive models is a prominent challenge in systems biology. Progress in this regard has been made by integrating high-throughput gene expression data into genome-scale stoichiometric models of metabolism. Here, we extend previous approaches to perform a Temporal Expression-based Analysis of Metabolism (TEAM. We apply TEAM to understanding the complex metabolic dynamics of the respiratorily versatile bacterium Shewanella oneidensis grown under aerobic, lactate-limited conditions. TEAM predicts temporal metabolic flux distributions using time-series gene expression data. Increased predictive power is achieved by supplementing these data with a large reference compendium of gene expression, which allows us to take into account the unique character of the distribution of expression of each individual gene. We further propose a straightforward method for studying the sensitivity of TEAM to changes in its fundamental free threshold parameter θ, and reveal that discrete zones of distinct metabolic behavior arise as this parameter is changed. By comparing the qualitative characteristics of these zones to additional experimental data, we are able to constrain the range of θ to a small, well-defined interval. In parallel, the sensitivity analysis reveals the inherently difficult nature of dynamic metabolic flux modeling: small errors early in the simulation propagate to relatively large changes later in the simulation. We expect that handling such "history-dependent" sensitivities will be a major challenge in the future development of dynamic metabolic-modeling techniques.

  8. Multisource Images Analysis Using Collaborative Clustering

    Directory of Open Access Journals (Sweden)

    Pierre Gançarski

    2008-04-01

    Full Text Available The development of very high-resolution (VHR satellite imagery has produced a huge amount of data. The multiplication of satellites which embed different types of sensors provides a lot of heterogeneous images. Consequently, the image analyst has often many different images available, representing the same area of the Earth surface. These images can be from different dates, produced by different sensors, or even at different resolutions. The lack of machine learning tools using all these representations in an overall process constraints to a sequential analysis of these various images. In order to use all the information available simultaneously, we propose a framework where different algorithms can use different views of the scene. Each one works on a different remotely sensed image and, thus, produces different and useful information. These algorithms work together in a collaborative way through an automatic and mutual refinement of their results, so that all the results have almost the same number of clusters, which are statistically similar. Finally, a unique result is produced, representing a consensus among the information obtained by each clustering method on its own image. The unified result and the complementarity of the single results (i.e., the agreement between the clustering methods as well as the disagreement lead to a better understanding of the scene. The experiments carried out on multispectral remote sensing images have shown that this method is efficient to extract relevant information and to improve the scene understanding.

  9. Sensory over responsivity and obsessive compulsive symptoms: A cluster analysis.

    Science.gov (United States)

    Ben-Sasson, Ayelet; Podoly, Tamar Yonit

    2017-02-01

    Several studies have examined the sensory component in Obsesseive Compulsive Disorder (OCD) and described an OCD subtype which has a unique profile, and that Sensory Phenomena (SP) is a significant component of this subtype. SP has some commonalities with Sensory Over Responsivity (SOR) and might be in part a characteristic of this subtype. Although there are some studies that have examined SOR and its relation to Obsessive Compulsive Symptoms (OCS), literature lacks sufficient data on this interplay. First to further examine the correlations between OCS and SOR, and to explore the correlations between SOR modalities (i.e. smell, touch, etc.) and OCS subscales (i.e. washing, ordering, etc.). Second, to investigate the cluster analysis of SOR and OCS dimensions in adults, that is, to classify the sample using the sensory scores to find whether a sensory OCD subtype can be specified. Our third goal was to explore the psychometric features of a new sensory questionnaire: the Sensory Perception Quotient (SPQ). A sample of non clinical adults (n=350) was recruited via e-mail, social media and social networks. Participants completed questionnaires for measuring SOR, OCS, and anxiety. SOR and OCI-F scores were moderately significantly correlated (n=274), significant correlations between all SOR modalities and OCS subscales were found with no specific higher correlation between one modality to one OCS subscale. Cluster analysis revealed four distinct clusters: (1) No OC and SOR symptoms (NONE; n=100), (2) High OC and SOR symptoms (BOTH; n=28), (3) Moderate OC symptoms (OCS; n=63), (4) Moderate SOR symptoms (SOR; n=83). The BOTH cluster had significantly higher anxiety levels than the other clusters, and shared OC subscales scores with the OCS cluster. The BOTH cluster also reported higher SOR scores across tactile, vision, taste and olfactory modalities. The SPQ was found reliable and suitable to detect SOR, the sample SPQ scores was normally distributed (n=350). SOR is a

  10. Mental State Talk Structure in Children’s Narratives: A Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Giuliana Pinto

    2017-01-01

    Full Text Available This study analysed children’s Theory of Mind (ToM as assessed by mental state talk in oral narratives. We hypothesized that the children’s mental state talk in narratives has an underlying structure, with specific terms organized in clusters. Ninety-eight children attending the last year of kindergarten were asked to tell a story twice, at the beginning and at the end of the school year. Mental state talk was analysed by identifying terms and expressions referring to perceptual, physiological, emotional, willingness, cognitive, moral, and sociorelational states. The cluster analysis showed that children’s mental state talk is organized in two main clusters: perceptual states and affective states. Results from the study confirm the feasibility of narratives as an outlet to inquire mental state talk and offer a more fine-grained analysis of mental state talk structure.

  11. Propagation Characteristics of Electromagnetic Waves Recorded by the Four CLUSTER Satellites

    International Nuclear Information System (INIS)

    Parrot, M.; Santolik, O.; Cornilleau-Wehrlin, N.; Maksimovic, M.; Harvey, Ch.

    2001-01-01

    This paper will describe the methods we use to determine the propagation characteristics of electromagnetic waves observed by the four CLUSTER satellites. The data is recorded aboard CLUSTER by the STAFF (Spatio-Temporal Analysis of Field Fluctuations) spectrum analyser. This instrument has several modes of operation, and can provide the spectral matrix of three magnetic and two electric components. This spectral matrix is processed by a dedicated software (PRASSADCO: Propagation Analysis of STAFF-SA Data with Coherency Tests) in order to determine the wave normal directions with respect to the DC magnetic field. PRASSADCO also provides a number of alternative methods to estimate wave polarisation and propagation parameters, such as the Poynting vector, and the refractive index. It is therefore possible to detect the source extension of various electromagnetic waves using the 4 satellites. Results of this data processing will be shown here for one event observed by one satellite. (author)

  12. The Assessment of Hydrogen Energy Systems for Fuel Cell Vehicles Using Principal Componenet Analysis and Cluster Analysis

    DEFF Research Database (Denmark)

    Ren, Jingzheng; Tan, Shiyu; Dong, Lichun

    2012-01-01

    and analysis of the hydrogen systems is meaningful for decision makers to select the best scenario. principal component analysis (PCA) has been used to evaluate the integrated performance of different hydrogen energy systems and select the best scenario, and hierarchical cluster analysis (CA) has been used...... for transportation of hydrogen, hydrogen gas tank for the storage of hydrogen at refueling stations, and gaseous hydrogen as power energy for fuel cell vehicles has been recognized as the best scenario. Also, the clustering results calculated by CA are consistent with those determined by PCA, denoting...

  13. Common Factor Analysis Versus Principal Component Analysis: Choice for Symptom Cluster Research

    Directory of Open Access Journals (Sweden)

    Hee-Ju Kim, PhD, RN

    2008-03-01

    Conclusion: If the study purpose is to explain correlations among variables and to examine the structure of the data (this is usual for most cases in symptom cluster research, CFA provides a more accurate result. If the purpose of a study is to summarize data with a smaller number of variables, PCA is the choice. PCA can also be used as an initial step in CFA because it provides information regarding the maximum number and nature of factors. In using factor analysis for symptom cluster research, several issues need to be considered, including subjectivity of solution, sample size, symptom selection, and level of measure.

  14. Spatio-Temporal Patterns and Source Identification of Water Pollution in Lake Taihu (China

    Directory of Open Access Journals (Sweden)

    Yan Chen

    2016-03-01

    Full Text Available Various multivariate methods were used to analyze datasets of river water quality for 11 variables measured at 20 different sites surrounding Lake Taihu from 2006 to 2010 (13,200 observations, to determine temporal and spatial variations in river water quality and to identify potential pollution sources. Hierarchical cluster analysis (CA grouped the 12 months into two periods (May to November, December to the next April and the 20 sampling sites into two groups (A and B based on similarities in river water quality characteristics. Discriminant analysis (DA was important in data reduction because it used only three variables (water temperature, dissolved oxygen (DO and five-day biochemical oxygen demand (BOD5 to correctly assign about 94% of the cases and five variables (petroleum, volatile phenol, dissolved oxygen, ammonium nitrogen and total phosphorus to correctly assign >88.6% of the cases. In addition, principal component analysis (PCA identified four potential pollution sources for Clusters A and B: industrial source (chemical-related, petroleum-related or N-related, domestic source, combination of point and non-point sources and natural source. The Cluster A area received more industrial and domestic pollution-related agricultural runoff, whereas Cluster B was mainly influenced by the combination of point and non-point sources. The results imply that comprehensive analysis by using multiple methods could be more effective for facilitating effective management for the Lake Taihu Watershed in the future.

  15. On the use of Cox regression to examine the temporal clustering of flooding and heavy precipitation across the central United States

    Science.gov (United States)

    Mallakpour, Iman; Villarini, Gabriele; Jones, Michael P.; Smith, James A.

    2017-08-01

    The central United States is plagued by frequent catastrophic flooding, such as the flood events of 1993, 2008, 2011, 2013, 2014 and 2016. The goal of this study is to examine whether it is possible to describe the occurrence of flood and heavy precipitation events at the sub-seasonal scale in terms of variations in the climate system. Daily streamflow and precipitation time series over the central United States (defined here to include North Dakota, South Dakota, Nebraska, Kansas, Missouri, Iowa, Minnesota, Wisconsin, Illinois, West Virginia, Kentucky, Ohio, Indiana, and Michigan) are used in this study. We model the occurrence/non-occurrence of a flood and heavy precipitation event over time using regression models based on Cox processes, which can be viewed as a generalization of Poisson processes. Rather than assuming that an event (i.e., flooding or precipitation) occurs independently of the occurrence of the previous one (as in Poisson processes), Cox processes allow us to account for the potential presence of temporal clustering, which manifests itself in an alternation of quiet and active periods. Here we model the occurrence/non-occurrence of flood and heavy precipitation events using two climate indices as time-varying covariates: the Arctic Oscillation (AO) and the Pacific-North American pattern (PNA). We find that AO and/or PNA are important predictors in explaining the temporal clustering in flood occurrences in over 78% of the stream gages we considered. Similar results are obtained when working with heavy precipitation events. Analyses of the sensitivity of the results to different thresholds used to identify events lead to the same conclusions. The findings of this work highlight that variations in the climate system play a critical role in explaining the occurrence of flood and heavy precipitation events at the sub-seasonal scale over the central United States.

  16. Outcome-Driven Cluster Analysis with Application to Microarray Data.

    Directory of Open Access Journals (Sweden)

    Jessie J Hsu

    Full Text Available One goal of cluster analysis is to sort characteristics into groups (clusters so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes into groups of highly correlated genes that have the same effect on the outcome (recovery. We propose a random effects model where the genes within each group (cluster equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.

  17. Detection of secondary structure elements in proteins by hydrophobic cluster analysis.

    Science.gov (United States)

    Woodcock, S; Mornon, J P; Henrissat, B

    1992-10-01

    Hydrophobic cluster analysis (HCA) is a protein sequence comparison method based on alpha-helical representations of the sequences where the size, shape and orientation of the clusters of hydrophobic residues are primarily compared. The effectiveness of HCA has been suggested to originate from its potential ability to focus on the residues forming the hydrophobic core of globular proteins. We have addressed the robustness of the bidimensional representation used for HCA in its ability to detect the regular secondary structure elements of proteins. Various parameters have been studied such as those governing cluster size and limits, the hydrophobic residues constituting the clusters as well as the potential shift of the cluster positions with respect to the position of the regular secondary structure elements. The following results have been found to support the alpha-helical bidimensional representation used in HCA: (i) there is a positive correlation (clearly above background noise) between the hydrophobic clusters and the regular secondary structure elements in proteins; (ii) the hydrophobic clusters are centred on the regular secondary structure elements; (iii) the pitch of the helical representation which gives the best correspondence is that of an alpha-helix. The correspondence between hydrophobic clusters and regular secondary structure elements suggests a way to implement variable gap penalties during the automatic alignment of protein sequences.

  18. A mathematical programming approach for sequential clustering of dynamic networks

    Science.gov (United States)

    Silva, Jonathan C.; Bennett, Laura; Papageorgiou, Lazaros G.; Tsoka, Sophia

    2016-02-01

    A common analysis performed on dynamic networks is community structure detection, a challenging problem that aims to track the temporal evolution of network modules. An emerging area in this field is evolutionary clustering, where the community structure of a network snapshot is identified by taking into account both its current state as well as previous time points. Based on this concept, we have developed a mixed integer non-linear programming (MINLP) model, SeqMod, that sequentially clusters each snapshot of a dynamic network. The modularity metric is used to determine the quality of community structure of the current snapshot and the historical cost is accounted for by optimising the number of node pairs co-clustered at the previous time point that remain so in the current snapshot partition. Our method is tested on social networks of interactions among high school students, college students and members of the Brazilian Congress. We show that, for an adequate parameter setting, our algorithm detects the classes that these students belong more accurately than partitioning each time step individually or by partitioning the aggregated snapshots. Our method also detects drastic discontinuities in interaction patterns across network snapshots. Finally, we present comparative results with similar community detection methods for time-dependent networks from the literature. Overall, we illustrate the applicability of mathematical programming as a flexible, adaptable and systematic approach for these community detection problems. Contribution to the Topical Issue "Temporal Network Theory and Applications", edited by Petter Holme.

  19. Identifying patterns of general practitioner service utilisation and their relationship with potentially preventable hospitalisations in people with diabetes: The utility of a cluster analysis approach.

    Science.gov (United States)

    Ha, Ninh Thi; Harris, Mark; Preen, David; Robinson, Suzanne; Moorin, Rachael

    2018-04-01

    We aimed to characterise use of general practitioners (GP) simultaneously across multiple attributes in people with diabetes and examine its impact on diabetes related potentially preventable hospitalisations (PPHs). Five-years of panel data from 40,625 adults with diabetes were sourced from Western Australian administrative health records. Cluster analysis (CA) was used to group individuals with similar patterns of GP utilisation characterised by frequency and recency of services. The relationship between GP utilisation cluster and the risk of PPHs was examined using multivariable random-effects negative binomial regression. CA categorised GP utilisation into three clusters: moderate; high and very high usage, having distinct patient characteristics. After adjusting for potential confounders, the rate of PPHs was significantly lower across all GP usage clusters compared with those with no GP usage; IRR = 0.67 (95%CI: 0.62-0.71) among the moderate, IRR = 0.70 (95%CI 0.66-0.73) high and IRR = 0.76 (95%CI 0.72-0.80) very high GP usage clusters. Combination of temporal factors with measures of frequency of use of GP services revealed patterns of primary health care utilisation associated with different underlying patient characteristics. Incorporation of multiple attributes, that go beyond frequency-based approaches may better characterise the complex relationship between use of GP services and diabetes-related hospitalisation. Copyright © 2018 Elsevier B.V. All rights reserved.

  20. Some Linguistic-based and temporal analysis on Wikipedia

    International Nuclear Information System (INIS)

    Yasseri, T.

    2010-01-01

    Wikipedia as a web-based, collaborative, multilingual encyclopaedia project is a very suitable field to carry out research on social dynamics and to investigate the complex concepts of conflict, collaboration, competition, dispute, etc in a large community (∼26 Million) of Wikipedia users. The other face of Wikipedia as a productive society, is its output, consisting of (∼17) Millions of articles written unsupervised by unprofessional editors in more than 270 different languages. In this talk we report some analysis performed on Wikipedia in two different approaches: temporal analysis to characterize disputes and controversies among users and linguistic-based analysis to characterize linguistic features of English texts in Wikipedia. (author)

  1. Person mobility in the design and analysis of cluster-randomized cohort prevention trials.

    Science.gov (United States)

    Vuchinich, Sam; Flay, Brian R; Aber, Lawrence; Bickman, Leonard

    2012-06-01

    Person mobility is an inescapable fact of life for most cluster-randomized (e.g., schools, hospitals, clinic, cities, state) cohort prevention trials. Mobility rates are an important substantive consideration in estimating the effects of an intervention. In cluster-randomized trials, mobility rates are often correlated with ethnicity, poverty and other variables associated with disparity. This raises the possibility that estimated intervention effects may generalize to only the least mobile segments of a population and, thus, create a threat to external validity. Such mobility can also create threats to the internal validity of conclusions from randomized trials. Researchers must decide how to deal with persons who leave study clusters during a trial (dropouts), persons and clusters that do not comply with an assigned intervention, and persons who enter clusters during a trial (late entrants), in addition to the persons who remain for the duration of a trial (stayers). Statistical techniques alone cannot solve the key issues of internal and external validity raised by the phenomenon of person mobility. This commentary presents a systematic, Campbellian-type analysis of person mobility in cluster-randomized cohort prevention trials. It describes four approaches for dealing with dropouts, late entrants and stayers with respect to data collection, analysis and generalizability. The questions at issue are: 1) From whom should data be collected at each wave of data collection? 2) Which cases should be included in the analyses of an intervention effect? and 3) To what populations can trial results be generalized? The conclusions lead to recommendations for the design and analysis of future cluster-randomized cohort prevention trials.

  2. Characterizing Suicide in Toronto: An Observational Study and Cluster Analysis

    Science.gov (United States)

    Sinyor, Mark; Schaffer, Ayal; Streiner, David L

    2014-01-01

    Objective: To determine whether people who have died from suicide in a large epidemiologic sample form clusters based on demographic, clinical, and psychosocial factors. Method: We conducted a coroner’s chart review for 2886 people who died in Toronto, Ontario, from 1998 to 2010, and whose death was ruled as suicide by the Office of the Chief Coroner of Ontario. A cluster analysis using known suicide risk factors was performed to determine whether suicide deaths separate into distinct groups. Clusters were compared according to person- and suicide-specific factors. Results: Five clusters emerged. Cluster 1 had the highest proportion of females and nonviolent methods, and all had depression and a past suicide attempt. Cluster 2 had the highest proportion of people with a recent stressor and violent suicide methods, and all were married. Cluster 3 had mostly males between the ages of 20 and 64, and all had either experienced recent stressors, suffered from mental illness, or had a history of substance abuse. Cluster 4 had the youngest people and the highest proportion of deaths by jumping from height, few were married, and nearly one-half had bipolar disorder or schizophrenia. Cluster 5 had all unmarried people with no prior suicide attempts, and were the least likely to have an identified mental illness and most likely to leave a suicide note. Conclusions: People who die from suicide assort into different patterns of demographic, clinical, and death-specific characteristics. Identifying and studying subgroups of suicides may advance our understanding of the heterogeneous nature of suicide and help to inform development of more targeted suicide prevention strategies. PMID:24444321

  3. Characterization of the Temporal Clustering of Flood Events across the Central United States in terms of Climate States

    Science.gov (United States)

    Mallakpour, Iman; Villarini, Gabriele; Jones, Michael; Smith, James

    2016-04-01

    The central United States is a region of the country that has been plagued by frequent catastrophic flooding (e.g., flood events of 1993, 2008, 2013, and 2014), with large economic and social repercussions (e.g., fatalities, agricultural losses, flood losses, water quality issues). The goal of this study is to examine whether it is possible to describe the occurrence of flood events at the sub-seasonal scale in terms of variations in the climate system. Daily streamflow time series from 774 USGS stream gage stations over the central United States (defined here to include North Dakota, South Dakota, Nebraska, Kansas, Missouri, Iowa, Minnesota, Wisconsin, Illinois, West Virginia, Kentucky, Ohio, Indiana, and Michigan) with a record of at least 50 years and ending no earlier than 2011 are used for this study. We use a peak-over-threshold (POT) approach to identify flood peaks so that we have, on average two events per year. We model the occurrence/non-occurrence of a flood event over time using regression models based on Cox processes. Cox processes are widely used in biostatistics and can be viewed as a generalization of Poisson processes. Rather than assuming that flood events occur independently of the occurrence of previous events (as in Poisson processes), Cox processes allow us to account for the potential presence of temporal clustering, which manifests itself in an alternation of quiet and active periods. Here we model the occurrence/non-occurrence of flood events using two climate indices as climate time-varying covariates: the North Atlantic Oscillation (NAO) and the Pacific-North American pattern (PNA). The results of this study show that NAO and/or PNA can explain the temporal clustering in flood occurrences in over 90% of the stream gage stations we considered. Analyses of the sensitivity of the results to different average numbers of flood events per year (from one to five) are also performed and lead to the same conclusions. The findings of this work

  4. FLOCK cluster analysis of mast cell event clustering by high-sensitivity flow cytometry predicts systemic mastocytosis.

    Science.gov (United States)

    Dorfman, David M; LaPlante, Charlotte D; Pozdnyakova, Olga; Li, Betty

    2015-11-01

    In our high-sensitivity flow cytometric approach for systemic mastocytosis (SM), we identified mast cell event clustering as a new diagnostic criterion for the disease. To objectively characterize mast cell gated event distributions, we performed cluster analysis using FLOCK, a computational approach to identify cell subsets in multidimensional flow cytometry data in an unbiased, automated fashion. FLOCK identified discrete mast cell populations in most cases of SM (56/75 [75%]) but only a minority of non-SM cases (17/124 [14%]). FLOCK-identified mast cell populations accounted for 2.46% of total cells on average in SM cases and 0.09% of total cells on average in non-SM cases (P < .0001) and were predictive of SM, with a sensitivity of 75%, a specificity of 86%, a positive predictive value of 76%, and a negative predictive value of 85%. FLOCK analysis provides useful diagnostic information for evaluating patients with suspected SM, and may be useful for the analysis of other hematopoietic neoplasms. Copyright© by the American Society for Clinical Pathology.

  5. Cluster Analysis of International Information and Social Development.

    Science.gov (United States)

    Lau, Jesus

    1990-01-01

    Analyzes information activities in relation to socioeconomic characteristics in low, middle, and highly developed economies for the years 1960 and 1977 through the use of cluster analysis. Results of data from 31 countries suggest that information development is achieved mainly by countries that have also achieved social development. (26…

  6. Transcriptional analysis of ESAT-6 cluster 3 in Mycobacterium smegmatis

    Directory of Open Access Journals (Sweden)

    Riccardi Giovanna

    2009-03-01

    Full Text Available Abstract Background The ESAT-6 (early secreted antigenic target, 6 kDa family collects small mycobacterial proteins secreted by Mycobacterium tuberculosis, particularly in the early phase of growth. There are 23 ESAT-6 family members in M. tuberculosis H37Rv. In a previous work, we identified the Zur- dependent regulation of five proteins of the ESAT-6/CFP-10 family (esxG, esxH, esxQ, esxR, and esxS. esxG and esxH are part of ESAT-6 cluster 3, whose expression was already known to be induced by iron starvation. Results In this research, we performed EMSA experiments and transcriptional analysis of ESAT-6 cluster 3 in Mycobacterium smegmatis (msmeg0615-msmeg0625 and M. tuberculosis. In contrast to what we had observed in M. tuberculosis, we found that in M. smegmatis ESAT-6 cluster 3 responds only to iron and not to zinc. In both organisms we identified an internal promoter, a finding which suggests the presence of two transcriptional units and, by consequence, a differential expression of cluster 3 genes. We compared the expression of msmeg0615 and msmeg0620 in different growth and stress conditions by means of relative quantitative PCR. The expression of msmeg0615 and msmeg0620 genes was essentially similar; they appeared to be repressed in most of the tested conditions, with the exception of acid stress (pH 4.2 where msmeg0615 was about 4-fold induced, while msmeg0620 was repressed. Analysis revealed that in acid stress conditions M. tuberculosis rv0282 gene was 3-fold induced too, while rv0287 induction was almost insignificant. Conclusion In contrast with what has been reported for M. tuberculosis, our results suggest that in M. smegmatis only IdeR-dependent regulation is retained, while zinc has no effect on gene expression. The role of cluster 3 in M. tuberculosis virulence is still to be defined; however, iron- and zinc-dependent expression strongly suggests that cluster 3 is highly expressed in the infective process, and that the cluster

  7. The Flemish frozen-vegetable industry as an example of cluster analysis : Flanders Vegetable Valley

    NARCIS (Netherlands)

    Vanhaverbeke, W.P.M.; Larosse, J.; Winnen, W.; Hulsink, W.; Dons, J.J.M.

    2008-01-01

    In this contribution we present a strategic analysis of the cluster dynamics in the frozen-vegetable industry in Flanders (Belgium)1. The main purpose of this case is twofold. First, we determine the added value of using data about customer and supplier relationships in cluster analysis. Second, we

  8. A spatial and temporal analysis of Japanese encephalitis in mainland China, 1963-1975: a period without Japanese encephalitis vaccination.

    Science.gov (United States)

    Li, Xiaolong; Gao, Xiaoyan; Ren, Zhoupeng; Cao, Yuxi; Wang, Jinfeng; Liang, Guodong

    2014-01-01

    More than a million Japanese encephalitis (JE) cases occurred in mainland China from the 1960s to 1970s without vaccine interventions. The aim of this study is to analyze the spatial and temporal pattern of JE cases reported in mainland China from 1965 to 1973 in the absence of JE vaccination, and to discuss the impacts of climatic and geographical factors on JE during that period. Thus, the data of reported JE cases at provincial level and monthly precipitation and monthly mean temperature from 1963 to 1975 in mainland China were collected. Local Indicators of Spatial Association analysis was performed to identify spatial clusters at the province level. During that period, The epidemic peaked in 1966 and 1971 and the JE incidence reached up to 20.58/100000 and 20.92/100000, respectively. The endemic regions can be divided into three classes including high, medium, and low prevalence regions. Through spatial cluster analysis, JE epidemic hot spots were identified; most were located in the Yangtze River Plain which lies in the southeast of China. In addition, JE incidence was shown to vary among eight geomorphic units in China. Also, the JE incidence in the Loess Plateau and the North China Plain was showed to increase with the rise of temperature. Likewise, JE incidence in the Loess Plateau and the Yangtze River Plain was observed a same trend with the increase of rainfall. In conclusion, the JE cases clustered geographically during the epidemic period. Besides, the JE incidence was markedly higher on the plains than plateaus. These results may provide an insight into the epidemiological characteristics of JE in the absence of vaccine interventions and assist health authorities, both in China and potentially in Europe and Americas, in JE prevention and control strategies.

  9. Spatial-temporal data model and fractal analysis of transportation network in GIS environment

    Science.gov (United States)

    Feng, Yongjiu; Tong, Xiaohua; Li, Yangdong

    2008-10-01

    How to organize transportation data characterized by multi-time, multi-scale, multi-resolution and multi-source is one of the fundamental problems of GIS-T development. A spatial-temporal data model for GIS-T is proposed based on Spatial-temporal- Object Model. Transportation network data is systemically managed using dynamic segmentation technologies. And then a spatial-temporal database is built to integrally store geographical data of multi-time for transportation. Based on the spatial-temporal database, functions of spatial analysis of GIS-T are substantively extended. Fractal module is developed to improve the analyzing in intensity, density, structure and connectivity of transportation network based on the validation and evaluation of topologic relation. Integrated fractal with GIS-T strengthens the functions of spatial analysis and enriches the approaches of data mining and knowledge discovery of transportation network. Finally, the feasibility of the model and methods are tested thorough Guangdong Geographical Information Platform for Highway Project.

  10. Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data

    Science.gov (United States)

    Hallac, David; Vare, Sagar; Boyd, Stephen; Leskovec, Jure

    2018-01-01

    Subsequence clustering of multivariate time series is a useful tool for discovering repeated patterns in temporal data. Once these patterns have been discovered, seemingly complicated datasets can be interpreted as a temporal sequence of only a small number of states, or clusters. For example, raw sensor data from a fitness-tracking application can be expressed as a timeline of a select few actions (i.e., walking, sitting, running). However, discovering these patterns is challenging because it requires simultaneous segmentation and clustering of the time series. Furthermore, interpreting the resulting clusters is difficult, especially when the data is high-dimensional. Here we propose a new method of model-based clustering, which we call Toeplitz Inverse Covariance-based Clustering (TICC). Each cluster in the TICC method is defined by a correlation network, or Markov random field (MRF), characterizing the interdependencies between different observations in a typical subsequence of that cluster. Based on this graphical representation, TICC simultaneously segments and clusters the time series data. We solve the TICC problem through alternating minimization, using a variation of the expectation maximization (EM) algorithm. We derive closed-form solutions to efficiently solve the two resulting subproblems in a scalable way, through dynamic programming and the alternating direction method of multipliers (ADMM), respectively. We validate our approach by comparing TICC to several state-of-the-art baselines in a series of synthetic experiments, and we then demonstrate on an automobile sensor dataset how TICC can be used to learn interpretable clusters in real-world scenarios. PMID:29770257

  11. Human processing of short temporal intervals as revealed by an ERP waveform analysis

    Directory of Open Access Journals (Sweden)

    Yoshitaka eNakajima

    2011-12-01

    Full Text Available To clarify the time course over which the human brain processes information about durations up to ~300 ms, we reanalyzed the data that were previously reported by Mitsudo et al. (2009 using a multivariate analysis method. Event-related potentials were recorded from 19 scalp electrodes on 11 (9 original and 2 additional participants while they judged whether two neighboring empty time intervals—called t1 and t2 and marked by three tone bursts—had equal durations. There was also a control condition in which the participants were presented the same temporal patterns but without a judgment task. In the present reanalysis, we sought to visualize how the temporal patterns were represented in the brain over time. A correlation matrix across channels was calculated for each temporal pattern. Geometric separations between the correlation matrices were calculated, and subjected to multidimensional scaling. We performed such analyses for a moving 100-ms time window after the t1 presentations. In the windows centered at < 100 ms after the t2 presentation, the analyses revealed the local maxima of categorical separation between temporal patterns of perceptually equal durations versus perceptually unequal durations, both in the judgment condition and in the control condition. Such categorization of the temporal patterns was prominent only in narrow temporal regions. The analysis indicated that the participants determined whether the two neighboring time intervals were of equal duration mostly within 100 ms after the presentation of the temporal patterns. A very fast brain activity was related to the perception of elementary temporal patterns without explicit judgments. This is consistent with the findings of Mitsudo et al., and it is in line with the processing time hypothesis proposed by Nakajima et al. (2004. The validity of the correlation matrix analyses turned out to be an effective tool to grasp the overall responses of the brain to temporal

  12. Factor-cluster analysis and enrichment study of Mangrove sediments - An example from Mengkabong, Sabah

    International Nuclear Information System (INIS)

    Praveena, S.M.; Ahmed, A.; Radojevic, M.; Mohd Harun Abdullah; Aris, A.Z.

    2007-01-01

    This paper examines the tidal effects in the sediment of Mengkabong mangrove forest, Sabah. Generally, all the studied parameters showed high value at high tide compared to low tide. Factor-cluster analyses were adopted to allow the identification of controlling factors at high and low tides. Factor analysis extracted six controlling factors at high tide and seven controlling factors at low tide. Cluster analysis extracted two district clusters at high and low tides. The study showed that factor-cluster analysis application is a useful tool to single out the controlling factors at high and low tides. this will provide a basis for describing the tidal effects in the mangrove sediment. The salinity and electrical conductivity clusters as well as component loadings at high and low tide explained the tidal process where there is high contribution of seawater to mangrove sediments that controls the sediment chemistry. The geo accumulation index (T geo ) values suggest the mangrove sediments are having background concentrations for Al, Cu, Fe and Zn and unpolluted for Pb. (author)

  13. Evaluating the Effect of Virtual Reality Temporal Bone Simulation on Mastoidectomy Performance: A Meta-analysis.

    Science.gov (United States)

    Lui, Justin T; Hoy, Monica Y

    2017-06-01

    Background The increasing prevalence of virtual reality simulation in temporal bone surgery warrants an investigation to assess training effectiveness. Objectives To determine if temporal bone simulator use improves mastoidectomy performance. Data Sources Ovid Medline, Embase, and PubMed databases were systematically searched per the PRISMA guidelines. Review Methods Inclusion criteria were peer-reviewed publications that utilized quantitative data of mastoidectomy performance following the use of a temporal bone simulator. The search was restricted to human studies published in English. Studies were excluded if they were in non-peer-reviewed format, were descriptive in nature, or failed to provide surgical performance outcomes. Meta-analysis calculations were then performed. Results A meta-analysis based on the random-effects model revealed an improvement in overall mastoidectomy performance following training on the temporal bone simulator. A standardized mean difference of 0.87 (95% CI, 0.38-1.35) was generated in the setting of a heterogeneous study population ( I 2 = 64.3%, P virtual reality simulation temporal bone surgery studies, meta-analysis calculations demonstrate an improvement in trainee mastoidectomy performance with virtual simulation training.

  14. Spatial and temporal dynamics of the microbial community in the Hanford unconfined aquifer

    Energy Technology Data Exchange (ETDEWEB)

    Lin, Xueju; McKinley, James P.; Resch, Charles T.; Kaluzny, Rachael M.; Lauber, C.; Fredrickson, Jim K.; Knight, Robbie C.; Konopka, Allan

    2012-03-29

    Pyrosequencing analysis of 16S rRNA genes was used to study temporal dynamics of groundwater Bacteria and Archaea over 10 months within 3 well clusters separated by ~30 m and located 250 m from the Columbia River on the Hanford Site, WA. Each cluster contained 3 wells screened at different depths ranging from 10 to 17 m that differed in hydraulic conductivities. Representative samples were selected for analyses of prokaryotic 16S and eukaryotic 18S rRNA gene copy numbers. Temporal changes in community composition occurred in all 9 wells over the 10 month sampling period. However, there were particularly strong effects near the top of the water table when the seasonal rise in the Columbia River caused river water intrusion at the top of the aquifer. The occurrence and disappearance of some microbial assemblages (such as Actinobacteria ACK-M1) were correlated to river water intrusion. This seasonal impact on microbial community structure was greater in the shallow saturated zone than deeper in the aquifer. Spatial and temporal patterns for several 16S rRNA gene operational taxonomic units associated with particular physiological functions (e.g.methane oxidizers and metal reducers) suggests dynamic changes in fluxes of electron donors and acceptors over an annual cycle. In addition, temporal dynamics in eukaryotic 18S rRNA gene copies and the dominance of protozoa in 18S clone libraries suggest that bacterial community dynamics could be affected not only by the physical and chemical environment, but also by top-down biological control.

  15. Convex Clustering: An Attractive Alternative to Hierarchical Clustering

    Science.gov (United States)

    Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth

    2015-01-01

    The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340

  16. Applying Clustering to Statistical Analysis of Student Reasoning about Two-Dimensional Kinematics

    Science.gov (United States)

    Springuel, R. Padraic; Wittman, Michael C.; Thompson, John R.

    2007-01-01

    We use clustering, an analysis method not presently common to the physics education research community, to group and characterize student responses to written questions about two-dimensional kinematics. Previously, clustering has been used to analyze multiple-choice data; we analyze free-response data that includes both sketches of vectors and…

  17. The CERN analysis facility-a PROOF cluster for day-one physics analysis

    International Nuclear Information System (INIS)

    Grosse-Oetringhaus, J F

    2008-01-01

    ALICE (A Large Ion Collider Experiment) at the LHC plans to use a PROOF cluster at CERN (CAF - CERN Analysis Facility) for analysis. The system is especially aimed at the prototyping phase of analyses that need a high number of development iterations and thus require a short response time. Typical examples are the tuning of cuts during the development of an analysis as well as calibration and alignment. Furthermore, the use of an interactive system with very fast response will allow ALICE to extract physics observables out of first data quickly. An additional use case is fast event simulation and reconstruction. A test setup consisting of 40 machines is used for evaluation since May 2006. The PROOF system enables the parallel processing and xrootd the access to files distributed on the test cluster. An automatic staging system for files either catalogued in the ALICE file catalog or stored in the CASTOR mass storage system has been developed. The current setup and ongoing development towards disk quotas and CPU fairshare are described. Furthermore, the integration of PROOF into ALICE's software framework (AliRoot) is discussed

  18. Cluster Analysis of Acute Care Use Yields Insights for Tailored Pediatric Asthma Interventions.

    Science.gov (United States)

    Abir, Mahshid; Truchil, Aaron; Wiest, Dawn; Nelson, Daniel B; Goldstick, Jason E; Koegel, Paul; Lozon, Marie M; Choi, Hwajung; Brenner, Jeffrey

    2017-09-01

    We undertake this study to understand patterns of pediatric asthma-related acute care use to inform interventions aimed at reducing potentially avoidable hospitalizations. Hospital claims data from 3 Camden city facilities for 2010 to 2014 were used to perform cluster analysis classifying patients aged 0 to 17 years according to their asthma-related hospital use. Clusters were based on 2 variables: asthma-related ED visits and hospitalizations. Demographics and a number of sociobehavioral and use characteristics were compared across clusters. Children who met the criteria (3,170) were included in the analysis. An examination of a scree plot showing the decline in within-cluster heterogeneity as the number of clusters increased confirmed that clusters of pediatric asthma patients according to hospital use exist in the data. Five clusters of patients with distinct asthma-related acute care use patterns were observed. Cluster 1 (62% of patients) showed the lowest rates of acute care use. These patients were least likely to have a mental health-related diagnosis, were less likely to have visited multiple facilities, and had no hospitalizations for asthma. Cluster 2 (19% of patients) had a low number of asthma ED visits and onetime hospitalization. Cluster 3 (11% of patients) had a high number of ED visits and low hospitalization rates, and the highest rates of multiple facility use. Cluster 4 (7% of patients) had moderate ED use for both asthma and other illnesses, and high rates of asthma hospitalizations; nearly one quarter received care at all facilities, and 1 in 10 had a mental health diagnosis. Cluster 5 (1% of patients) had extreme rates of acute care use. Differences observed between groups across multiple sociobehavioral factors suggest these clusters may represent children who differ along multiple dimensions, in addition to patterns of service use, with implications for tailored interventions. Copyright © 2017 American College of Emergency Physicians

  19. GibbsCluster: unsupervised clustering and alignment of peptide sequences

    DEFF Research Database (Denmark)

    Andreatta, Massimo; Alvarez, Bruno; Nielsen, Morten

    2017-01-01

    motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large......-scale peptidome data generated by mass spectrometry. The server is available at http://www.cbs.dtu.dk/services/GibbsCluster-2.0....

  20. High-dimensional cluster analysis with the Masked EM Algorithm

    Science.gov (United States)

    Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.

    2014-01-01

    Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694

  1. Phenotypes of asthma in low-income children and adolescents: cluster analysis.

    Science.gov (United States)

    Cabral, Anna Lucia Barros; Sousa, Andrey Wirgues; Mendes, Felipe Augusto Rodrigues; Carvalho, Celso Ricardo Fernandes de

    2017-01-01

    Studies characterizing asthma phenotypes have predominantly included adults or have involved children and adolescents in developed countries. Therefore, their applicability in other populations, such as those of developing countries, remains indeterminate. Our objective was to determine how low-income children and adolescents with asthma in Brazil are distributed across a cluster analysis. We included 306 children and adolescents (6-18 years of age) with a clinical diagnosis of asthma and under medical treatment for at least one year of follow-up. At enrollment, all the patients were clinically stable. For the cluster analysis, we selected 20 variables commonly measured in clinical practice and considered important in defining asthma phenotypes. Variables with high multicollinearity were excluded. A cluster analysis was applied using a twostep agglomerative test and log-likelihood distance measure. Three clusters were defined for our population. Cluster 1 (n = 94) included subjects with normal pulmonary function, mild eosinophil inflammation, few exacerbations, later age at asthma onset, and mild atopy. Cluster 2 (n = 87) included those with normal pulmonary function, a moderate number of exacerbations, early age at asthma onset, more severe eosinophil inflammation, and moderate atopy. Cluster 3 (n = 108) included those with poor pulmonary function, frequent exacerbations, severe eosinophil inflammation, and severe atopy. Asthma was characterized by the presence of atopy, number of exacerbations, and lung function in low-income children and adolescents in Brazil. The many similarities with previous cluster analyses of phenotypes indicate that this approach shows good generalizability. Estudos que caracterizam fenótipos de asma predominantemente incluem adultos ou foram realizados em crianças e adolescentes de países desenvolvidos; portanto, sua aplicabilidade em outras populações, tais como as de países em desenvolvimento, permanece indeterminada. Nosso

  2. Identification and comparative analysis of the protocadherin cluster in a reptile, the green anole lizard.

    Directory of Open Access Journals (Sweden)

    Xiao-Juan Jiang

    Full Text Available BACKGROUND: The vertebrate protocadherins are a subfamily of cell adhesion molecules that are predominantly expressed in the nervous system and are believed to play an important role in establishing the complex neural network during animal development. Genes encoding these molecules are organized into a cluster in the genome. Comparative analysis of the protocadherin subcluster organization and gene arrangements in different vertebrates has provided interesting insights into the history of vertebrate genome evolution. Among tetrapods, protocadherin clusters have been fully characterized only in mammals. In this study, we report the identification and comparative analysis of the protocadherin cluster in a reptile, the green anole lizard (Anolis carolinensis. METHODOLOGY/PRINCIPAL FINDINGS: We show that the anole protocadherin cluster spans over a megabase and encodes a total of 71 genes. The number of genes in the anole protocadherin cluster is significantly higher than that in the coelacanth (49 genes and mammalian (54-59 genes clusters. The anole protocadherin genes are organized into four subclusters: the delta, alpha, beta and gamma. This subcluster organization is identical to that of the coelacanth protocadherin cluster, but differs from the mammalian clusters which lack the delta subcluster. The gene number expansion in the anole protocadherin cluster is largely due to the extensive gene duplication in the gammab subgroup. Similar to coelacanth and elephant shark protocadherin genes, the anole protocadherin genes have experienced a low frequency of gene conversion. CONCLUSIONS/SIGNIFICANCE: Our results suggest that similar to the protocadherin clusters in other vertebrates, the evolution of anole protocadherin cluster is driven mainly by lineage-specific gene duplications and degeneration. Our analysis also shows that loss of the protocadherin delta subcluster in the mammalian lineage occurred after the divergence of mammals and reptiles

  3. Cluster Cooperation in Wireless-Powered Sensor Networks: Modeling and Performance Analysis.

    Science.gov (United States)

    Zhang, Chao; Zhang, Pengcheng; Zhang, Weizhan

    2017-09-27

    A wireless-powered sensor network (WPSN) consisting of one hybrid access point (HAP), a near cluster and the corresponding far cluster is investigated in this paper. These sensors are wireless-powered and they transmit information by consuming the harvested energy from signal ejected by the HAP. Sensors are able to harvest energy as well as store the harvested energy. We propose that if sensors in near cluster do not have their own information to transmit, acting as relays, they can help the sensors in a far cluster to forward information to the HAP in an amplify-and-forward (AF) manner. We use a finite Markov chain to model the dynamic variation process of the relay battery, and give a general analyzing model for WPSN with cluster cooperation. Though the model, we deduce the closed-form expression for the outage probability as the metric of this network. Finally, simulation results validate the start point of designing this paper and correctness of theoretical analysis and show how parameters have an effect on system performance. Moreover, it is also known that the outage probability of sensors in far cluster can be drastically reduced without sacrificing the performance of sensors in near cluster if the transmit power of HAP is fairly high. Furthermore, in the aspect of outage performance of far cluster, the proposed scheme significantly outperforms the direct transmission scheme without cooperation.

  4. Language Learner Motivational Types: A Cluster Analysis Study

    Science.gov (United States)

    Papi, Mostafa; Teimouri, Yasser

    2014-01-01

    The study aimed to identify different second language (L2) learner motivational types drawing on the framework of the L2 motivational self system. A total of 1,278 secondary school students learning English in Iran completed a questionnaire survey. Cluster analysis yielded five different groups based on the strength of different variables within…

  5. Objective Classification of Rainfall in Northern Europe for Online Operation of Urban Water Systems Based on Clustering Techniques

    DEFF Research Database (Denmark)

    Löwe, Roland; Madsen, Henrik; McSharry, Patrick

    2016-01-01

    operators to change modes of control of their facilities. A k-means clustering technique was applied to group events retrospectively and was able to distinguish events with clearly different temporal and spatial correlation properties. For online applications, techniques based on k-means clustering...... and quadratic discriminant analysis both provided a fast and reliable identification of rain events of "high" variability, while the k-means provided the smallest number of rain events falsely identified as being of "high" variability (false hits). A simple classification method based on a threshold...

  6. The use of a cluster analysis in across herd genetic evaluation for ...

    African Journals Online (AJOL)

    To investigate the possibility of a genotype x environment interaction in Bonsmara cattle, a cluster analysis was performed on weaning weight records of 72 811 Bonsmara calves, the progeny of 1 434 sires and 24 186 dams in 35 herds. The following environmental factors were used to classify herds into clusters: solution ...

  7. Analysis of candidates for interacting galaxy clusters. I. A1204 and A2029/A2033

    Science.gov (United States)

    Gonzalez, Elizabeth Johana; de los Rios, Martín; Oio, Gabriel A.; Lang, Daniel Hernández; Tagliaferro, Tania Aguirre; Domínguez R., Mariano J.; Castellón, José Luis Nilo; Cuevas L., Héctor; Valotto, Carlos A.

    2018-04-01

    Context. Merging galaxy clusters allow for the study of different mass components, dark and baryonic, separately. Also, their occurrence enables to test the ΛCDM scenario, which can be used to put constraints on the self-interacting cross-section of the dark-matter particle. Aim. It is necessary to perform a homogeneous analysis of these systems. Hence, based on a recently presented sample of candidates for interacting galaxy clusters, we present the analysis of two of these cataloged systems. Methods: In this work, the first of a series devoted to characterizing galaxy clusters in merger processes, we perform a weak lensing analysis of clusters A1204 and A2029/A2033 to derive the total masses of each identified interacting structure together with a dynamical study based on a two-body model. We also describe the gas and the mass distributions in the field through a lensing and an X-ray analysis. This is the first of a series of works which will analyze these type of system in order to characterize them. Results: Neither merging cluster candidate shows evidence of having had a recent merger event. Nevertheless, there is dynamical evidence that these systems could be interacting or could interact in the future. Conclusions: It is necessary to include more constraints in order to improve the methodology of classifying merging galaxy clusters. Characterization of these clusters is important in order to properly understand the nature of these systems and their connection with dynamical studies.

  8. Spatio-temporal analysis of regional PV generation

    DEFF Research Database (Denmark)

    Nuño Martinez, Edgar; Cutululis, Nicolaos Antonio

    2016-01-01

    Photovoltaic (PV) power is growing in importance worldwide and hence needs to be represented in operation and planning of power system. As opposed to traditional generation technologies, it is characterized by exhibiting both a high variability and a significant spatial dependence. This paper...... presents a fundamental analysis of regional solar generation time series, aiming to potentially facilitate large-scale solar integration. It will focus on characterizing the underlying dependence structure at the system level as well as describing both statistical and temporal properties of regional PV...

  9. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    Directory of Open Access Journals (Sweden)

    I. Crawford

    2015-11-01

    Full Text Available In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4 where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio–hydro–atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen–Rocky Mountain Biogenic Aerosol Study ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the

  10. Temporal Noise Analysis of Charge-Domain Sampling Readout Circuits for CMOS Image Sensors

    Directory of Open Access Journals (Sweden)

    Xiaoliang Ge

    2018-02-01

    Full Text Available This paper presents a temporal noise analysis of charge-domain sampling readout circuits for Complementary Metal-Oxide Semiconductor (CMOS image sensors. In order to address the trade-off between the low input-referred noise and high dynamic range, a Gm-cell-based pixel together with a charge-domain correlated-double sampling (CDS technique has been proposed to provide a way to efficiently embed a tunable conversion gain along the read-out path. Such readout topology, however, operates in a non-stationery large-signal behavior, and the statistical properties of its temporal noise are a function of time. Conventional noise analysis methods for CMOS image sensors are based on steady-state signal models, and therefore cannot be readily applied for Gm-cell-based pixels. In this paper, we develop analysis models for both thermal noise and flicker noise in Gm-cell-based pixels by employing the time-domain linear analysis approach and the non-stationary noise analysis theory, which help to quantitatively evaluate the temporal noise characteristic of Gm-cell-based pixels. Both models were numerically computed in MATLAB using design parameters of a prototype chip, and compared with both simulation and experimental results. The good agreement between the theoretical and measurement results verifies the effectiveness of the proposed noise analysis models.

  11. Patterns of Brucellosis Infection Symptoms in Azerbaijan: A Latent Class Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Rita Ismayilova

    2014-01-01

    Full Text Available Brucellosis infection is a multisystem disease, with a broad spectrum of symptoms. We investigated the existence of clusters of infected patients according to their clinical presentation. Using national surveillance data from the Electronic-Integrated Disease Surveillance System, we applied a latent class cluster (LCC analysis on symptoms to determine clusters of brucellosis cases. A total of 454 cases reported between July 2011 and July 2013 were analyzed. LCC identified a two-cluster model and the Vuong-Lo-Mendell-Rubin likelihood ratio supported the cluster model. Brucellosis cases in the second cluster (19% reported higher percentages of poly-lymphadenopathy, hepatomegaly, arthritis, myositis, and neuritis and changes in liver function tests compared to cases of the first cluster. Patients in the second cluster had a severe brucellosis disease course and were associated with longer delay in seeking medical attention. Moreover, most of them were from Beylagan, a region focused on sheep and goat livestock production in south-central Azerbaijan. Patients in cluster 2 accounted for one-quarter of brucellosis cases and had a more severe clinical presentation. Delay in seeking medical care may explain severe illness. Future work needs to determine the factors that influence brucellosis case seeking and identify brucellosis species, particularly among cases from Beylagan.

  12. Space-time analysis of pneumonia hospitalisations in the Netherlands.

    Science.gov (United States)

    Benincà, Elisa; van Boven, Michiel; Hagenaars, Thomas; van der Hoek, Wim

    2017-01-01

    Community acquired pneumonia is a major global public health problem. In the Netherlands there are 40,000-50,000 hospital admissions for pneumonia per year. In the large majority of these hospital admissions the etiologic agent is not determined and a real-time surveillance system is lacking. Localised and temporal increases in hospital admissions for pneumonia are therefore only detected retrospectively and the etiologic agents remain unknown. Here, we perform spatio-temporal analyses of pneumonia hospital admission data in the Netherlands. To this end, we scanned for spatial clusters on yearly and seasonal basis, and applied wavelet cluster analysis on the time series of five main regions. The pneumonia hospital admissions show strong clustering in space and time superimposed on a regular yearly cycle with high incidence in winter and low incidence in summer. Cluster analysis reveals a heterogeneous pattern, with most significant clusters occurring in the western, highly urbanised, and in the eastern, intensively farmed, part of the Netherlands. Quantitatively, the relative risk (RR) of the significant clusters for the age-standardised incidence varies from a minimum of 1.2 to a maximum of 2.2. We discuss possible underlying causes for the patterns observed, such as variations in air pollution.

  13. Analysis of Temporal-spatial Co-variation within Gene Expression Microarray Data in an Organogenesis Model

    Science.gov (United States)

    Ehler, Martin; Rajapakse, Vinodh; Zeeberg, Barry; Brooks, Brian; Brown, Jacob; Czaja, Wojciech; Bonner, Robert F.

    The gene networks underlying closure of the optic fissure during vertebrate eye development are poorly understood. We used a novel clustering method based on Laplacian Eigenmaps, a nonlinear dimension reduction method, to analyze microarray data from laser capture microdissected (LCM) cells at the site and developmental stages (days 10.5 to 12.5) of optic fissure closure. Our new method provided greater biological specificity than classical clustering algorithms in terms of identifying more biological processes and functions related to eye development as defined by Gene Ontology at lower false discovery rates. This new methodology builds on the advantages of LCM to isolate pure phenotypic populations within complex tissues and allows improved ability to identify critical gene products expressed at lower copy number. The combination of LCM of embryonic organs, gene expression microarrays, and extracting spatial and temporal co-variations appear to be a powerful approach to understanding the gene regulatory networks that specify mammalian organogenesis.

  14. Semiparametric Bayesian analysis of accelerated failure time models with cluster structures.

    Science.gov (United States)

    Li, Zhaonan; Xu, Xinyi; Shen, Junshan

    2017-11-10

    In this paper, we develop a Bayesian semiparametric accelerated failure time model for survival data with cluster structures. Our model allows distributional heterogeneity across clusters and accommodates their relationships through a density ratio approach. Moreover, a nonparametric mixture of Dirichlet processes prior is placed on the baseline distribution to yield full distributional flexibility. We illustrate through simulations that our model can greatly improve estimation accuracy by effectively pooling information from multiple clusters, while taking into account the heterogeneity in their random error distributions. We also demonstrate the implementation of our method using analysis of Mayo Clinic Trial in Primary Biliary Cirrhosis. Copyright © 2017 John Wiley & Sons, Ltd.

  15. Clustering gene expression time series data using an infinite Gaussian process mixture model.

    Science.gov (United States)

    McDowell, Ian C; Manandhar, Dinesh; Vockley, Christopher M; Schmid, Amy K; Reddy, Timothy E; Engelhardt, Barbara E

    2018-01-01

    Transcriptome-wide time series expression profiling is used to characterize the cellular response to environmental perturbations. The first step to analyzing transcriptional response data is often to cluster genes with similar responses. Here, we present a nonparametric model-based method, Dirichlet process Gaussian process mixture model (DPGP), which jointly models data clusters with a Dirichlet process and temporal dependencies with Gaussian processes. We demonstrate the accuracy of DPGP in comparison to state-of-the-art approaches using hundreds of simulated data sets. To further test our method, we apply DPGP to published microarray data from a microbial model organism exposed to stress and to novel RNA-seq data from a human cell line exposed to the glucocorticoid dexamethasone. We validate our clusters by examining local transcription factor binding and histone modifications. Our results demonstrate that jointly modeling cluster number and temporal dependencies can reveal shared regulatory mechanisms. DPGP software is freely available online at https://github.com/PrincetonUniversity/DP_GP_cluster.

  16. Clustering gene expression time series data using an infinite Gaussian process mixture model.

    Directory of Open Access Journals (Sweden)

    Ian C McDowell

    2018-01-01

    Full Text Available Transcriptome-wide time series expression profiling is used to characterize the cellular response to environmental perturbations. The first step to analyzing transcriptional response data is often to cluster genes with similar responses. Here, we present a nonparametric model-based method, Dirichlet process Gaussian process mixture model (DPGP, which jointly models data clusters with a Dirichlet process and temporal dependencies with Gaussian processes. We demonstrate the accuracy of DPGP in comparison to state-of-the-art approaches using hundreds of simulated data sets. To further test our method, we apply DPGP to published microarray data from a microbial model organism exposed to stress and to novel RNA-seq data from a human cell line exposed to the glucocorticoid dexamethasone. We validate our clusters by examining local transcription factor binding and histone modifications. Our results demonstrate that jointly modeling cluster number and temporal dependencies can reveal shared regulatory mechanisms. DPGP software is freely available online at https://github.com/PrincetonUniversity/DP_GP_cluster.

  17. Extended-range forecast for the temporal distribution of clustering tropical cyclogenesis over the western North Pacific

    Science.gov (United States)

    Zhu, Zhiwei; Li, Tim; Bai, Long; Gao, Jianyun

    2017-11-01

    Based on outgoing longwave radiation (OLR), an index for clustering tropical cyclogenesis (CTC) over the western North Pacific (WNP) was defined. Around 76 % of total CTC events were generated during the active phase of the CTC index, and 38 % of the total active phase was concurrent with CTC events. For its continuous property, the CTC index was used as the representative predictand for extended-range forecasting the temporal distribution of CTC events. The predictability sources for CTC events were detected via correlation analyses of the previous 35-5-day lead atmospheric fields against the CTC index. The results showed that the geopotential height at different levels and the 200 hPa zonal wind over the global tropics possessed large predictability sources, whereas the predictability sources of other variables, e.g., OLR, zonal wind, and relatively vorticity at 850 hPa and relatively humility at 700 hPa, were mainly confined to the tropical Indian Ocean and western Pacific Ocean. Several spatial-temporal projection model (STPM) sets were constructed to carry out the extended-range forecast for the CTC index. By combining the output of STPMs separately conducted for the two dominant modes of intraseasonal variability, e.g., the 10-30 and the 30-80 day mode, useful forecast skill could be achieved for a 30-day lead time. The combined output successfully captured both the 10-30 and 30-80 day mode at least 10 days in advance. With a relatively low rate of false alarm, the STPM achieved hits for 80 % (69 %) of 54 CTC events during 2003-2014 at the 10-day (20-day) lead time, suggesting a practical value of the STPM for real-time forecasting WNP CTC events at an extended range.

  18. Investigating the usefulness of a cluster-based trend analysis to detect visual field progression in patients with open-angle glaucoma.

    Science.gov (United States)

    Aoki, Shuichiro; Murata, Hiroshi; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Hirasawa, Kazunori; Shoji, Nobuyuki; Asaoka, Ryo

    2017-12-01

    To investigate the usefulness of the Octopus (Haag-Streit) EyeSuite's cluster trend analysis in glaucoma. Ten visual fields (VFs) with the Humphrey Field Analyzer (Carl Zeiss Meditec), spanning 7.7 years on average were obtained from 728 eyes of 475 primary open angle glaucoma patients. Mean total deviation (mTD) trend analysis and EyeSuite's cluster trend analysis were performed on various series of VFs (from 1st to 10th: VF1-10 to 6th to 10th: VF6-10). The results of the cluster-based trend analysis, based on different lengths of VF series, were compared against mTD trend analysis. Cluster-based trend analysis and mTD trend analysis results were significantly associated in all clusters and with all lengths of VF series. Between 21.2% and 45.9% (depending on VF series length and location) of clusters were deemed to progress when the mTD trend analysis suggested no progression. On the other hand, 4.8% of eyes were observed to progress using the mTD trend analysis when cluster trend analysis suggested no progression in any two (or more) clusters. Whole field trend analysis can miss local VF progression. Cluster trend analysis appears as robust as mTD trend analysis and useful to assess both sectorial and whole field progression. Cluster-based trend analyses, in particular the definition of two or more progressing cluster, may help clinicians to detect glaucomatous progression in a timelier manner than using a whole field trend analysis, without significantly compromising specificity. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  19. [Space-time suicide clustering in the community of Antequera (Spain)].

    Science.gov (United States)

    Pérez-Costillas, Lucía; Blasco-Fontecilla, Hilario; Benítez, Nicolás; Comino, Raquel; Antón, José Miguel; Ramos-Medina, Valentín; Lopez, Amalia; Palomo, José Luis; Madrigal, Lucía; Alcalde, Javier; Perea-Millá, Emilio; Artieda-Urrutia, Paula; de León-Martínez, Victoria; de Diego Otero, Yolanda

    2015-01-01

    Approximately 3,500 people commit suicide every year in Spain. The main aim of this study is to explore if a spatial and temporal clustering of suicide exists in the region of Antequera (Málaga, España). Sample and procedure: All suicides from January 1, 2004 to December 31, 2008 were identified using data from the Forensic Pathology Department of the Institute of Legal Medicine, Málaga (España). Geolocalisation. Google Earth was used to calculate the coordinates for each suicide decedent's address. Statistical analysis. A spatiotemporal permutation scan statistic and the Ripley's K function were used to explore spatiotemporal clustering. Pearson's chi-squared was used to determine whether there were differences between suicides inside and outside the spatiotemporal clusters. A total of 120 individuals committed suicide within the region of Antequera, of which 96 (80%) were included in our analyses. Statistically significant evidence for 7 spatiotemporal suicide clusters emerged within critical limits for the 0-2.5 km distance and for the first and second semanas (P<.05 in both cases) after suicide. There was not a single subject diagnosed with a current psychotic disorder, among suicides within clusters, whereas outside clusters, 20% had this diagnosis (X2=4.13; df=1; P<.05). There are spatiotemporal suicide clusters in the area surrounding Antequera. Patients diagnosed with current psychotic disorder are less likely to be influenced by the factors explaining suicide clustering. Copyright © 2013 SEP y SEPB. Published by Elsevier España. All rights reserved.

  20. [Optimization of cluster analysis based on drug resistance profiles of MRSA isolates].

    Science.gov (United States)

    Tani, Hiroya; Kishi, Takahiko; Gotoh, Minehiro; Yamagishi, Yuka; Mikamo, Hiroshige

    2015-12-01

    We examined 402 methicillin-resistant Staphylococcus aureus (MRSA) strains isolated from clinical specimens in our hospital between November 19, 2010 and December 27, 2011 to evaluate the similarity between cluster analysis of drug susceptibility tests and pulsed-field gel electrophoresis (PFGE). The results showed that the 402 strains tested were classified into 27 PFGE patterns (151 subtypes of patterns). Cluster analyses of drug susceptibility tests with the cut-off distance yielding a similar classification capability showed favorable results--when the MIC method was used, and minimum inhibitory concentration (MIC) values were used directly in the method, the level of agreement with PFGE was 74.2% when 15 drugs were tested. The Unweighted Pair Group Method with Arithmetic mean (UPGMA) method was effective when the cut-off distance was 16. Using the SIR method in which susceptible (S), intermediate (I), and resistant (R) were coded as 0, 2, and 3, respectively, according to the Clinical and Laboratory Standards Institute (CLSI) criteria, the level of agreement with PFGE was 75.9% when the number of drugs tested was 17, the method used for clustering was the UPGMA, and the cut-off distance was 3.6. In addition, to assess the reproducibility of the results, 10 strains were randomly sampled from the overall test and subjected to cluster analysis. This was repeated 100 times under the same conditions. The results indicated good reproducibility of the results, with the level of agreement with PFGE showing a mean of 82.0%, standard deviation of 12.1%, and mode of 90.0% for the MIC method and a mean of 80.0%, standard deviation of 13.4%, and mode of 90.0% for the SIR method. In summary, cluster analysis for drug susceptibility tests is useful for the epidemiological analysis of MRSA.

  1. Temporal and spatial distribution characteristics in the natural plague foci of Chinese Mongolian gerbils based on spatial autocorrelation.

    Science.gov (United States)

    Du, Hai-Wen; Wang, Yong; Zhuang, Da-Fang; Jiang, Xiao-San

    2017-08-07

    The nest flea index of Meriones unguiculatus is a critical indicator for the prevention and control of plague, which can be used not only to detect the spatial and temporal distributions of Meriones unguiculatus, but also to reveal its cluster rule. This research detected the temporal and spatial distribution characteristics of the plague natural foci of Mongolian gerbils by body flea index from 2005 to 2014, in order to predict plague outbreaks. Global spatial autocorrelation was used to describe the entire spatial distribution pattern of the body flea index in the natural plague foci of typical Chinese Mongolian gerbils. Cluster and outlier analysis and hot spot analysis were also used to detect the intensity of clusters based on geographic information system methods. The quantity of M. unguiculatus nest fleas in the sentinel surveillance sites from 2005 to 2014 and host density data of the study area from 2005 to 2010 used in this study were provided by Chinese Center for Disease Control and Prevention. The epidemic focus regions of the Mongolian gerbils remain the same as the hot spot regions relating to the body flea index. High clustering areas possess a similar pattern as the distribution pattern of the body flea index indicating that the transmission risk of plague is relatively high. In terms of time series, the area of the epidemic focus gradually increased from 2005 to 2007, declined rapidly in 2008 and 2009, and then decreased slowly and began trending towards stability from 2009 to 2014. For the spatial change, the epidemic focus regions began moving northward from the southwest epidemic focus of the Mongolian gerbils from 2005 to 2007, and then moved from north to south in 2007 and 2008. The body flea index of Chinese gerbil foci reveals significant spatial and temporal aggregation characteristics through the employing of spatial autocorrelation. The diversity of temporary and spatial distribution is mainly affected by seasonal variation, the human

  2. Cosmological analysis of galaxy clusters surveys in X-rays

    International Nuclear Information System (INIS)

    Clerc, N.

    2012-01-01

    Clusters of galaxies are the most massive objects in equilibrium in our Universe. Their study allows to test cosmological scenarios of structure formation with precision, bringing constraints complementary to those stemming from the cosmological background radiation, supernovae or galaxies. They are identified through the X-ray emission of their heated gas, thus facilitating their mapping at different epochs of the Universe. This report presents two surveys of galaxy clusters detected in X-rays and puts forward a method for their cosmological interpretation. Thanks to its multi-wavelength coverage extending over 10 sq. deg. and after one decade of expertise, the XMM-LSS allows a systematic census of clusters in a large volume of the Universe. In the framework of this survey, the first part of this report describes the techniques developed to the purpose of characterizing the detected objects. A particular emphasis is placed on the most distant ones (z ≥ 1) through the complementarity of observations in X-ray, optical and infrared bands. Then the X-CLASS survey is fully described. Based on XMM archival data, it provides a new catalogue of 800 clusters detected in X-rays. A cosmological analysis of this survey is performed thanks to 'CR-HR' diagrams. This new method self-consistently includes selection effects and scaling relations and provides a means to bypass the computation of individual cluster masses. Propositions are made for applying this method to future surveys as XMM-XXL and eRosita. (author) [fr

  3. The spatial analysis on hemorrhagic fever with renal syndrome in Jiangsu province, China based on geographic information system.

    Directory of Open Access Journals (Sweden)

    Changjun Bao

    Full Text Available Hemorrhagic fever with renal syndrome (HFRS is endemic in mainland China, accounting for 90% of total reported cases worldwide, and Jiangsu is one of the most severely affected provinces. In this study, the authors conducted GIS-based spatial analyses in order to determine the spatial distribution of the HFRS cases, identify key areas and explore risk factors for public health planning and resource allocation.Interpolation maps by inverse distance weighting were produced to detect the spatial distribution of HFRS cases in Jiangsu from 2001 to 2011. Spatio-temporal clustering was applied to identify clusters at the county level. Spatial correlation analysis was conducted to detect influencing factors of HFRS in Jiangsu.HFRS cases in Jiangsu from 2001 to 2011 were mapped and the results suggested that cases in Jiangsu were not distributed randomly. Cases were mainly distributed in northeastern and southwestern Jiangsu, especially in Dafeng and Sihong counties. It was notable that prior to this study, Sihong county had rarely been reported as a high-risk area of HFRS. With the maximum spatial size of 50% of the total population and the maximum temporal size of 50% of the total population, spatio-temporal clustering showed that there was one most likely cluster (LLR = 624.52, P<0.0001, RR = 8.19 and one second-most likely cluster (LLR = 553.97, P<0.0001, RR = 8.25, and both of these clusters appeared from 2001 to 2004. Spatial correlation analysis showed that the incidence of HFRS in Jiangsu was influenced by distances to highways, railways, rivers and lakes.The application of GIS together with spatial interpolation, spatio-temporal clustering and spatial correlation analysis can effectively identify high-risk areas and factors influencing HFRS incidence to lay a foundation for researching its pathogenesis.

  4. Analysis of plasmaspheric plumes: CLUSTER and IMAGE observations

    Directory of Open Access Journals (Sweden)

    F. Darrouzet

    2006-07-01

    Full Text Available Plasmaspheric plumes have been routinely observed by CLUSTER and IMAGE. The CLUSTER mission provides high time resolution four-point measurements of the plasmasphere near perigee. Total electron density profiles have been derived from the electron plasma frequency identified by the WHISPER sounder supplemented, in-between soundings, by relative variations of the spacecraft potential measured by the electric field instrument EFW; ion velocity is also measured onboard these satellites. The EUV imager onboard the IMAGE spacecraft provides global images of the plasmasphere with a spatial resolution of 0.1 RE every 10 min; such images acquired near apogee from high above the pole show the geometry of plasmaspheric plumes, their evolution and motion. We present coordinated observations of three plume events and compare CLUSTER in-situ data with global images of the plasmasphere obtained by IMAGE. In particular, we study the geometry and the orientation of plasmaspheric plumes by using four-point analysis methods. We compare several aspects of plume motion as determined by different methods: (i inner and outer plume boundary velocity calculated from time delays of this boundary as observed by the wave experiment WHISPER on the four spacecraft, (ii drift velocity measured by the electron drift instrument EDI onboard CLUSTER and (iii global velocity determined from successive EUV images. These different techniques consistently indicate that plasmaspheric plumes rotate around the Earth, with their foot fully co-rotating, but with their tip rotating slower and moving farther out.

  5. Confidence in Government and Attitudes toward Bribery: A Country-Cluster Analysis of Demographic and Religiosity Perspectives

    Directory of Open Access Journals (Sweden)

    Serkan Benk

    2017-01-01

    Full Text Available In this study, we try to classify the countries by the levels of confidence in government and attitudes toward accepting bribery by using the data of the sixth wave (2010–2014 of the World Values Survey (WVS. We are also interested in which demographic, attitudinal, and religiosity variables affect each class of countries. For these purposes cluster analysis, linear regression analysis, and ordered logistic regression analysis were used. The study found that countries could be grouped into two clusters which had varying levels of opposition to bribe taking and confidence in government. Another finding was that certain demographic, attitudinal, and religiosity variables that were significant in one cluster might not be significant in another cluster.

  6. Cluster Cooperation in Wireless-Powered Sensor Networks: Modeling and Performance Analysis

    Directory of Open Access Journals (Sweden)

    Chao Zhang

    2017-09-01

    Full Text Available A wireless-powered sensor network (WPSN consisting of one hybrid access point (HAP, a near cluster and the corresponding far cluster is investigated in this paper. These sensors are wireless-powered and they transmit information by consuming the harvested energy from signal ejected by the HAP. Sensors are able to harvest energy as well as store the harvested energy. We propose that if sensors in near cluster do not have their own information to transmit, acting as relays, they can help the sensors in a far cluster to forward information to the HAP in an amplify-and-forward (AF manner. We use a finite Markov chain to model the dynamic variation process of the relay battery, and give a general analyzing model for WPSN with cluster cooperation. Though the model, we deduce the closed-form expression for the outage probability as the metric of this network. Finally, simulation results validate the start point of designing this paper and correctness of theoretical analysis and show how parameters have an effect on system performance. Moreover, it is also known that the outage probability of sensors in far cluster can be drastically reduced without sacrificing the performance of sensors in near cluster if the transmit power of HAP is fairly high. Furthermore, in the aspect of outage performance of far cluster, the proposed scheme significantly outperforms the direct transmission scheme without cooperation.

  7. Performance Analysis of Unsupervised Clustering Methods for Brain Tumor Segmentation

    Directory of Open Access Journals (Sweden)

    Tushar H Jaware

    2013-10-01

    Full Text Available Medical image processing is the most challenging and emerging field of neuroscience. The ultimate goal of medical image analysis in brain MRI is to extract important clinical features that would improve methods of diagnosis & treatment of disease. This paper focuses on methods to detect & extract brain tumour from brain MR images. MATLAB is used to design, software tool for locating brain tumor, based on unsupervised clustering methods. K-Means clustering algorithm is implemented & tested on data base of 30 images. Performance evolution of unsupervised clusteringmethods is presented.

  8. Identifying clinical course patterns in SMS data using cluster analysis

    DEFF Research Database (Denmark)

    Kent, Peter; Kongsted, Alice

    2012-01-01

    ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically important...... showed that clinical course patterns can be identified by cluster analysis using all SMS time points as cluster variables. This method is simple, intuitive and does not require a high level of statistical skill. However, there are alternative ways of managing SMS data and many different methods...

  9. Cluster fusion algorithm: application to Lennard-Jones clusters

    DEFF Research Database (Denmark)

    Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter

    2006-01-01

    paths up to the cluster size of 150 atoms. We demonstrate that in this way all known global minima structures of the Lennard-Jones clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence......We present a new general theoretical framework for modelling the cluster structure and apply it to description of the Lennard-Jones clusters. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system and absorbing its energy at each step, we find cluster growing...... for the clusters of noble gas atoms and compare it with experimental observations. We report the striking correspondence of the peaks in the dependence of the second derivative of the binding energy per atom on cluster size calculated for the chain of the Lennard-Jones clusters based on the icosahedral symmetry...

  10. Cluster fusion algorithm: application to Lennard-Jones clusters

    DEFF Research Database (Denmark)

    Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter

    2008-01-01

    paths up to the cluster size of 150 atoms. We demonstrate that in this way all known global minima structures of the Lennard-Jones clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence......We present a new general theoretical framework for modelling the cluster structure and apply it to description of the Lennard-Jones clusters. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system and absorbing its energy at each step, we find cluster growing...... for the clusters of noble gas atoms and compare it with experimental observations. We report the striking correspondence of the peaks in the dependence of the second derivative of the binding energy per atom on cluster size calculated for the chain of the Lennard-Jones clusters based on the icosahedral symmetry...

  11. Statistical analysis of the spatial distribution of galaxies and clusters

    International Nuclear Information System (INIS)

    Cappi, Alberto

    1993-01-01

    This thesis deals with the analysis of the distribution of galaxies and clusters, describing some observational problems and statistical results. First chapter gives a theoretical introduction, aiming to describe the framework of the formation of structures, tracing the history of the Universe from the Planck time, t_p = 10"-"4"3 sec and temperature corresponding to 10"1"9 GeV, to the present epoch. The most usual statistical tools and models of the galaxy distribution, with their advantages and limitations, are described in chapter two. A study of the main observed properties of galaxy clustering, together with a detailed statistical analysis of the effects of selecting galaxies according to apparent magnitude or diameter, is reported in chapter three. Chapter four delineates some properties of groups of galaxies, explaining the reasons of discrepant results on group distributions. Chapter five is a study of the distribution of galaxy clusters, with different statistical tools, like correlations, percolation, void probability function and counts in cells; it is found the same scaling-invariant behaviour of galaxies. Chapter six describes our finding that rich galaxy clusters too belong to the fundamental plane of elliptical galaxies, and gives a discussion of its possible implications. Finally chapter seven reviews the possibilities offered by multi-slit and multi-fibre spectrographs, and I present some observational work on nearby and distant galaxy clusters. In particular, I show the opportunities offered by ongoing surveys of galaxies coupled with multi-object fibre spectrographs, focusing on the ESO Key Programme A galaxy redshift survey in the south galactic pole region to which I collaborate and on MEFOS, a multi-fibre instrument with automatic positioning. Published papers related to the work described in this thesis are reported in the last appendix. (author) [fr

  12. Molecular-dynamics analysis of mobile helium cluster reactions near surfaces of plasma-exposed tungsten

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Lin; Maroudas, Dimitrios, E-mail: maroudas@ecs.umass.edu [Department of Chemical Engineering, University of Massachusetts, Amherst, Massachusetts 01003-9303 (United States); Hammond, Karl D. [Department of Chemical Engineering, University of Missouri, Columbia, Missouri 65211 (United States); Wirth, Brian D. [Department of Nuclear Engineering, University of Tennessee, Knoxville, Tennessee 37996 (United States)

    2015-10-28

    We report the results of a systematic atomic-scale analysis of the reactions of small mobile helium clusters (He{sub n}, 4 ≤ n ≤ 7) near low-Miller-index tungsten (W) surfaces, aiming at a fundamental understanding of the near-surface dynamics of helium-carrying species in plasma-exposed tungsten. These small mobile helium clusters are attracted to the surface and migrate to the surface by Fickian diffusion and drift due to the thermodynamic driving force for surface segregation. As the clusters migrate toward the surface, trap mutation (TM) and cluster dissociation reactions are activated at rates higher than in the bulk. TM produces W adatoms and immobile complexes of helium clusters surrounding W vacancies located within the lattice planes at a short distance from the surface. These reactions are identified and characterized in detail based on the analysis of a large number of molecular-dynamics trajectories for each such mobile cluster near W(100), W(110), and W(111) surfaces. TM is found to be the dominant cluster reaction for all cluster and surface combinations, except for the He{sub 4} and He{sub 5} clusters near W(100) where cluster partial dissociation following TM dominates. We find that there exists a critical cluster size, n = 4 near W(100) and W(111) and n = 5 near W(110), beyond which the formation of multiple W adatoms and vacancies in the TM reactions is observed. The identified cluster reactions are responsible for important structural, morphological, and compositional features in the plasma-exposed tungsten, including surface adatom populations, near-surface immobile helium-vacancy complexes, and retained helium content, which are expected to influence the amount of hydrogen re-cycling and tritium retention in fusion tokamaks.

  13. A cluster merging method for time series microarray with production values.

    Science.gov (United States)

    Chira, Camelia; Sedano, Javier; Camara, Monica; Prieto, Carlos; Villar, Jose R; Corchado, Emilio

    2014-09-01

    A challenging task in time-course microarray data analysis is to cluster genes meaningfully combining the information provided by multiple replicates covering the same key time points. This paper proposes a novel cluster merging method to accomplish this goal obtaining groups with highly correlated genes. The main idea behind the proposed method is to generate a clustering starting from groups created based on individual temporal series (representing different biological replicates measured in the same time points) and merging them by taking into account the frequency by which two genes are assembled together in each clustering. The gene groups at the level of individual time series are generated using several shape-based clustering methods. This study is focused on a real-world time series microarray task with the aim to find co-expressed genes related to the production and growth of a certain bacteria. The shape-based clustering methods used at the level of individual time series rely on identifying similar gene expression patterns over time which, in some models, are further matched to the pattern of production/growth. The proposed cluster merging method is able to produce meaningful gene groups which can be naturally ranked by the level of agreement on the clustering among individual time series. The list of clusters and genes is further sorted based on the information correlation coefficient and new problem-specific relevant measures. Computational experiments and results of the cluster merging method are analyzed from a biological perspective and further compared with the clustering generated based on the mean value of time series and the same shape-based algorithm.

  14. Facebook Applications' Installation and Removal: A Temporal Analysis

    OpenAIRE

    Kagan, Dima; Fire, Michael; Elyashar, Aviad; Elovici, Yuval

    2013-01-01

    Facebook applications are one of the reasons for Facebook attractiveness. Unfortunately, numerous users are not aware of the fact that many malicious Facebook applications exist. To educate users, to raise users' awareness and to improve Facebook users' security and privacy, we developed a Firefox add-on that alerts users to the number of installed applications on their Facebook profiles. In this study, we present the temporal analysis of the Facebook applications' installation and removal da...

  15. WHY DO SOME NATIONS SUCCEED AND OTHERS FAIL IN INTERNATIONAL COMPETITION? FACTOR ANALYSIS AND CLUSTER ANALYSIS AT EUROPEAN LEVEL

    Directory of Open Access Journals (Sweden)

    Popa Ion

    2015-07-01

    Full Text Available As stated by Michael Porter (1998: 57, 'this is perhaps the most frequently asked economic question of our times.' However, a widely accepted answer is still missing. The aim of this paper is not to provide the BIG answer for such a BIG question, but rather to provide a different perspective on the competitiveness at the national level. In this respect, we followed a two step procedure, called “tandem analysis”. (OECD, 2008. First we employed a Factor Analysis in order to reveal the underlying factors of the initial dataset followed by a Cluster Analysis which aims classifying the 35 countries according to the main characteristics of competitiveness resulting from Factor Analysis. The findings revealed that clustering the 35 states after the first two factors: Smart Growth and Market Development, which recovers almost 76% of common variability of the twelve original variables, are highlighted four clusters as well as a series of useful information in order to analyze the characteristics of the four clusters and discussions on them.

  16. Cluster and principal component analysis based on SSR markers of Amomum tsao-ko in Jinping County of Yunnan Province

    Science.gov (United States)

    Ma, Mengli; Lei, En; Meng, Hengling; Wang, Tiantao; Xie, Linyan; Shen, Dong; Xianwang, Zhou; Lu, Bingyue

    2017-08-01

    Amomum tsao-ko is a commercial plant that used for various purposes in medicinal and food industries. For the present investigation, 44 germplasm samples were collected from Jinping County of Yunnan Province. Clusters analysis and 2-dimensional principal component analysis (PCA) was used to represent the genetic relations among Amomum tsao-ko by using simple sequence repeat (SSR) markers. Clustering analysis clearly distinguished the samples groups. Two major clusters were formed; first (Cluster I) consisted of 34 individuals, the second (Cluster II) consisted of 10 individuals, Cluster I as the main group contained multiple sub-clusters. PCA also showed 2 groups: PCA Group 1 included 29 individuals, PCA Group 2 included 12 individuals, consistent with the results of cluster analysis. The purpose of the present investigation was to provide information on genetic relationship of Amomum tsao-ko germplasm resources in main producing areas, also provide a theoretical basis for the protection and utilization of Amomum tsao-ko resources.

  17. Space-time clustering of childhood malaria at the household level: a dynamic cohort in a Mali village

    Directory of Open Access Journals (Sweden)

    Ouattara Amed

    2006-11-01

    Full Text Available Abstract Background Spatial and temporal heterogeneities in the risk of malaria have led the WHO to recommend fine-scale stratification of the epidemiological situation, making it possible to set up actions and clinical or basic researches targeting high-risk zones. Before initiating such studies it is necessary to define local patterns of malaria transmission and infection (in time and in space in order to facilitate selection of the appropriate study population and the intervention allocation. The aim of this study was to identify, spatially and temporally, high-risk zones of malaria, at the household level (resolution of 1 to 3 m. Methods This study took place in a Malian village with hyperendemic seasonal transmission as part of Mali-Tulane Tropical Medicine Research Center (NIAID/NIH. The study design was a dynamic cohort (22 surveys, from June 1996 to June 2001 on about 1300 children (Plasmodium falciparum, P. malariae and P. ovale infection and P. falciparum gametocyte carriage by means of time series and Kulldorff's scan statistic for space-time cluster detection. Results The time series analysis determined that malaria parasitemia (primarily P. falciparum was persistently present throughout the population with the expected seasonal variability pattern and a downward temporal trend. We identified six high-risk clusters of P. falciparum infection, some of which persisted despite an overall tendency towards a decrease in risk. The first high-risk cluster of P. falciparum infection (rate ratio = 14.161 was detected from September 1996 to October 1996, in the north of the village. Conclusion This study showed that, although infection proportions tended to decrease, high-risk zones persisted in the village particularly near temporal backwaters. Analysis of this heterogeneity at the household scale by GIS methods lead to target preventive actions more accurately on the high-risk zones identified. This mapping of malaria risk makes it possible

  18. Weighted Clustering

    DEFF Research Database (Denmark)

    Ackerman, Margareta; Ben-David, Shai; Branzei, Simina

    2012-01-01

    We investigate a natural generalization of the classical clustering problem, considering clustering tasks in which different instances may have different weights.We conduct the first extensive theoretical analysis on the influence of weighted data on standard clustering algorithms in both...... the partitional and hierarchical settings, characterizing the conditions under which algorithms react to weights. Extending a recent framework for clustering algorithm selection, we propose intuitive properties that would allow users to choose between clustering algorithms in the weighted setting and classify...

  19. Going beyond clustering in MD trajectory analysis: an application to villin headpiece folding.

    Directory of Open Access Journals (Sweden)

    Aruna Rajan

    2010-04-01

    Full Text Available Recent advances in computing technology have enabled microsecond long all-atom molecular dynamics (MD simulations of biological systems. Methods that can distill the salient features of such large trajectories are now urgently needed. Conventional clustering methods used to analyze MD trajectories suffer from various setbacks, namely (i they are not data driven, (ii they are unstable to noise and changes in cut-off parameters such as cluster radius and cluster number, and (iii they do not reduce the dimensionality of the trajectories, and hence are unsuitable for finding collective coordinates. We advocate the application of principal component analysis (PCA and a non-metric multidimensional scaling (nMDS method to reduce MD trajectories and overcome the drawbacks of clustering. To illustrate the superiority of nMDS over other methods in reducing data and reproducing salient features, we analyze three complete villin headpiece folding trajectories. Our analysis suggests that the folding process of the villin headpiece is structurally heterogeneous.

  20. Marketing Mix Formulation for Higher Education: An Integrated Analysis Employing Analytic Hierarchy Process, Cluster Analysis and Correspondence Analysis

    Science.gov (United States)

    Ho, Hsuan-Fu; Hung, Chia-Chi

    2008-01-01

    Purpose: The purpose of this paper is to examine how a graduate institute at National Chiayi University (NCYU), by using a model that integrates analytic hierarchy process, cluster analysis and correspondence analysis, can develop effective marketing strategies. Design/methodology/approach: This is primarily a quantitative study aimed at…

  1. Feature-Space Clustering for fMRI Meta-Analysis

    DEFF Research Database (Denmark)

    Goutte, Cyril; Hansen, Lars Kai; Liptrot, Mathew G.

    2001-01-01

    MRI sequences containing several hundreds of images, it is sometimes necessary to invoke feature extraction to reduce the dimensionality of the data space. A second interesting application is in the meta-analysis of fMRI experiment, where features are obtained from a possibly large number of single......-voxel analyses. In particular this allows the checking of the differences and agreements between different methods of analysis. Both approaches are illustrated on a fMRI data set involving visual stimulation, and we show that the feature space clustering approach yields nontrivial results and, in particular......, shows interesting differences between individual voxel analysis performed with traditional methods. © 2001 Wiley-Liss, Inc....

  2. Grey Wolf Optimizer Based on Powell Local Optimization Method for Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Sen Zhang

    2015-01-01

    Full Text Available One heuristic evolutionary algorithm recently proposed is the grey wolf optimizer (GWO, inspired by the leadership hierarchy and hunting mechanism of grey wolves in nature. This paper presents an extended GWO algorithm based on Powell local optimization method, and we call it PGWO. PGWO algorithm significantly improves the original GWO in solving complex optimization problems. Clustering is a popular data analysis and data mining technique. Hence, the PGWO could be applied in solving clustering problems. In this study, first the PGWO algorithm is tested on seven benchmark functions. Second, the PGWO algorithm is used for data clustering on nine data sets. Compared to other state-of-the-art evolutionary algorithms, the results of benchmark and data clustering demonstrate the superior performance of PGWO algorithm.

  3. Spatial and temporal estimation of soil loss for the sustainable management of a wet semi-arid watershed cluster.

    Science.gov (United States)

    Rejani, R; Rao, K V; Osman, M; Srinivasa Rao, Ch; Reddy, K Sammi; Chary, G R; Pushpanjali; Samuel, Josily

    2016-03-01

    The ungauged wet semi-arid watershed cluster, Seethagondi, lies in the Adilabad district of Telangana in India and is prone to severe erosion and water scarcity. The runoff and soil loss data at watershed, catchment, and field level are necessary for planning soil and water conservation interventions. In this study, an attempt was made to develop a spatial soil loss estimation model for Seethagondi cluster using RUSLE coupled with ARCGIS and was used to estimate the soil loss spatially and temporally. The daily rainfall data of Aphrodite for the period from 1951 to 2007 was used, and the annual rainfall varied from 508 to 1351 mm with a mean annual rainfall of 950 mm and a mean erosivity of 6789 MJ mm ha(-1) h(-1) year(-1). Considerable variation in land use land cover especially in crop land and fallow land was observed during normal and drought years, and corresponding variation in the erosivity, C factor, and soil loss was also noted. The mean value of C factor derived from NDVI for crop land was 0.42 and 0.22 in normal year and drought years, respectively. The topography is undulating and major portion of the cluster has slope less than 10°, and 85.3% of the cluster has soil loss below 20 t ha(-1) year(-1). The soil loss from crop land varied from 2.9 to 3.6 t ha(-1) year(-1) in low rainfall years to 31.8 to 34.7 t ha(-1) year(-1) in high rainfall years with a mean annual soil loss of 12.2 t ha(-1) year(-1). The soil loss from crop land was higher in the month of August with an annual soil loss of 13.1 and 2.9 t ha(-1) year(-1) in normal and drought year, respectively. Based on the soil loss in a normal year, the interventions recommended for 85.3% of area of the watershed includes agronomic measures such as contour cultivation, graded bunds, strip cropping, mixed cropping, crop rotations, mulching, summer plowing, vegetative bunds, agri-horticultural system, and management practices such as broad bed furrow, raised sunken beds, and harvesting available water

  4. Performance Analysis of a Cluster-Based MAC Protocol for Wireless Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    Jesús Alonso-Zárate

    2010-01-01

    Full Text Available An analytical model to evaluate the non-saturated performance of the Distributed Queuing Medium Access Control Protocol for Ad Hoc Networks (DQMANs in single-hop networks is presented in this paper. DQMAN is comprised of a spontaneous, temporary, and dynamic clustering mechanism integrated with a near-optimum distributed queuing Medium Access Control (MAC protocol. Clustering is executed in a distributed manner using a mechanism inspired by the Distributed Coordination Function (DCF of the IEEE 802.11. Once a station seizes the channel, it becomes the temporary clusterhead of a spontaneous cluster and it coordinates the peer-to-peer communications between the clustermembers. Within each cluster, a near-optimum distributed queuing MAC protocol is executed. The theoretical performance analysis of DQMAN in single-hop networks under non-saturation conditions is presented in this paper. The approach integrates the analysis of the clustering mechanism into the MAC layer model. Up to the knowledge of the authors, this approach is novel in the literature. In addition, the performance of an ad hoc network using DQMAN is compared to that obtained when using the DCF of the IEEE 802.11, as a benchmark reference.

  5. The identification of credit card encoders by hierarchical cluster analysis of the jitters of magnetic stripes.

    Science.gov (United States)

    Leung, S C; Fung, W K; Wong, K H

    1999-01-01

    The relative bit density variation graphs of 207 specimen credit cards processed by 12 encoding machines were examined first visually, and then classified by means of hierarchical cluster analysis. Twenty-nine credit cards being treated as 'questioned' samples were tested by way of cluster analysis against 'controls' derived from known encoders. It was found that hierarchical cluster analysis provided a high accuracy of identification with all 29 'questioned' samples classified correctly. On the other hand, although visual comparison of jitter graphs was less discriminating, it was nevertheless capable of giving a reasonably accurate result.

  6. Clustering applications in financial and economic analysis of the crop production in the Russian regions

    Directory of Open Access Journals (Sweden)

    Gromov Vladislav Vladimirovich

    2013-08-01

    Full Text Available We used the complex mathematical modeling, multivariate statistical-analysis, fuzzy sets to analyze the financial and economic state of the crop production in Russian regions. We developed a system of indicators, detecting the state agricultural sector in the region, based on the results of correlation, factor, cluster analysis and statistics of the Federal State Statistics Service. We performed clustering analyses to divide regions of Russia on selected factors into five groups. A qualitative and quantitative characteristics of each cluster was received.

  7. Meteor localization via statistical analysis of spatially temporal fluctuations in image sequences

    Science.gov (United States)

    Kukal, Jaromír.; Klimt, Martin; Šihlík, Jan; Fliegel, Karel

    2015-09-01

    Meteor detection is one of the most important procedures in astronomical imaging. Meteor path in Earth's atmosphere is traditionally reconstructed from double station video observation system generating 2D image sequences. However, the atmospheric turbulence and other factors cause spatially-temporal fluctuations of image background, which makes the localization of meteor path more difficult. Our approach is based on nonlinear preprocessing of image intensity using Box-Cox and logarithmic transform as its particular case. The transformed image sequences are then differentiated along discrete coordinates to obtain statistical description of sky background fluctuations, which can be modeled by multivariate normal distribution. After verification and hypothesis testing, we use the statistical model for outlier detection. Meanwhile the isolated outlier points are ignored, the compact cluster of outliers indicates the presence of meteoroids after ignition.

  8. Ecosystem health pattern analysis of urban clusters based on emergy synthesis: Results and implication for management

    International Nuclear Information System (INIS)

    Su, Meirong; Fath, Brian D.; Yang, Zhifeng; Chen, Bin; Liu, Gengyuan

    2013-01-01

    The evaluation of ecosystem health in urban clusters will help establish effective management that promotes sustainable regional development. To standardize the application of emergy synthesis and set pair analysis (EM–SPA) in ecosystem health assessment, a procedure for using EM–SPA models was established in this paper by combining the ability of emergy synthesis to reflect health status from a biophysical perspective with the ability of set pair analysis to describe extensive relationships among different variables. Based on the EM–SPA model, the relative health levels of selected urban clusters and their related ecosystem health patterns were characterized. The health states of three typical Chinese urban clusters – Jing-Jin-Tang, Yangtze River Delta, and Pearl River Delta – were investigated using the model. The results showed that the health status of the Pearl River Delta was relatively good; the health for the Yangtze River Delta was poor. As for the specific health characteristics, the Pearl River Delta and Yangtze River Delta urban clusters were relatively strong in Vigor, Resilience, and Urban ecosystem service function maintenance, while the Jing-Jin-Tang was relatively strong in organizational structure and environmental impact. Guidelines for managing these different urban clusters were put forward based on the analysis of the results of this study. - Highlights: • The use of integrated emergy synthesis and set pair analysis model was standardized. • The integrated model was applied on the scale of an urban cluster. • Health patterns of different urban clusters were compared. • Policy suggestions were provided based on the health pattern analysis

  9. Differences Between Ward's and UPGMA Methods of Cluster Analysis: Implications for School Psychology.

    Science.gov (United States)

    Hale, Robert L.; Dougherty, Donna

    1988-01-01

    Compared the efficacy of two methods of cluster analysis, the unweighted pair-groups method using arithmetic averages (UPGMA) and Ward's method, for students grouped on intelligence, achievement, and social adjustment by both clustering methods. Found UPGMA more efficacious based on output, on cophenetic correlation coefficients generated by each…

  10. Identifying At-Risk Students in General Chemistry via Cluster Analysis of Affective Characteristics

    Science.gov (United States)

    Chan, Julia Y. K.; Bauer, Christopher F.

    2014-01-01

    The purpose of this study is to identify academically at-risk students in first-semester general chemistry using affective characteristics via cluster analysis. Through the clustering of six preselected affective variables, three distinct affective groups were identified: low (at-risk), medium, and high. Students in the low affective group…

  11. Approximate fuzzy C-means (AFCM) cluster analysis of medical magnetic resonance image (MRI) data

    International Nuclear Information System (INIS)

    DelaPaz, R.L.; Chang, P.J.; Bernstein, R.; Dave, J.V.

    1987-01-01

    The authors describe the application of an approximate fuzzy C-means (AFCM) clustering algorithm as a data dimension reduction approach to medical magnetic resonance images (MRI). Image data consisted of one T1-weighted, two T2-weighted, and one T2*-weighted (magnetic susceptibility) image for each cranial study and a matrix of 10 images generated from 10 combinations of TE and TR for each body lymphoma study. All images were obtained with a 1.5 Tesla imaging system (GE Signa). Analyses were performed on over 100 MR image sets with a variety of pathologies. The cluster analysis was operated in an unsupervised mode and computational overhead was minimized by utilizing a table look-up approach without adversely affecting accuracy. Image data were first segmented into 2 coarse clusters, each of which was then subdivided into 16 fine clusters. The final tissue classifications were presented as color-coded anatomically-mapped images and as two and three dimensional displays of cluster center data in selected feature space (minimum spanning tree). Fuzzy cluster analysis appears to be a clinically useful dimension reduction technique which results in improved diagnostic specificity of medical magnetic resonance images

  12. Applications of Cluster Analysis to the Creation of Perfectionism Profiles: A Comparison of two Clustering Approaches

    Directory of Open Access Journals (Sweden)

    Jocelyn H Bolin

    2014-04-01

    Full Text Available Although traditional clustering methods (e.g., K-means have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  13. Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches.

    Science.gov (United States)

    Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C

    2014-01-01

    Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  14. Analysis of risk factors for cluster behavior of dental implant failures.

    Science.gov (United States)

    Chrcanovic, Bruno Ramos; Kisch, Jenö; Albrektsson, Tomas; Wennerberg, Ann

    2017-08-01

    Some studies indicated that implant failures are commonly concentrated in few patients. To identify and analyze cluster behavior of dental implant failures among subjects of a retrospective study. This retrospective study included patients receiving at least three implants only. Patients presenting at least three implant failures were classified as presenting a cluster behavior. Univariate and multivariate logistic regression models and generalized estimating equations analysis evaluated the effect of explanatory variables on the cluster behavior. There were 1406 patients with three or more implants (8337 implants, 592 failures). Sixty-seven (4.77%) patients presented cluster behavior, with 56.8% of all implant failures. The intake of antidepressants and bruxism were identified as potential negative factors exerting a statistically significant influence on a cluster behavior at the patient-level. The negative factors at the implant-level were turned implants, short implants, poor bone quality, age of the patient, the intake of medicaments to reduce the acid gastric production, smoking, and bruxism. A cluster pattern among patients with implant failure is highly probable. Factors of interest as predictors for implant failures could be a number of systemic and local factors, although a direct causal relationship cannot be ascertained. © 2017 Wiley Periodicals, Inc.

  15. Fourier analysis of temporal NDVI in the Southern African and American continents

    NARCIS (Netherlands)

    Azzali, S.; Menenti, M.

    1996-01-01

    Results of applying Fourier analysis of temporal NDVI in southern Africa and southern America are summarized. The decomposition of complex time series of images in simpler periodic components by Fourier analysis allowed the factors that affect the vegetation cover to be analysed much easier. The

  16. Analysis of genetic association using hierarchical clustering and cluster validation indices.

    Science.gov (United States)

    Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L

    2017-10-01

    It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. A K-means multivariate approach for clustering independent components from magnetoencephalographic data.

    Science.gov (United States)

    Spadone, Sara; de Pasquale, Francesco; Mantini, Dante; Della Penna, Stefania

    2012-09-01

    Independent component analysis (ICA) is typically applied on functional magnetic resonance imaging, electroencephalographic and magnetoencephalographic (MEG) data due to its data-driven nature. In these applications, ICA needs to be extended from single to multi-session and multi-subject studies for interpreting and assigning a statistical significance at the group level. Here a novel strategy for analyzing MEG independent components (ICs) is presented, Multivariate Algorithm for Grouping MEG Independent Components K-means based (MAGMICK). The proposed approach is able to capture spatio-temporal dynamics of brain activity in MEG studies by running ICA at subject level and then clustering the ICs across sessions and subjects. Distinctive features of MAGMICK are: i) the implementation of an efficient set of "MEG fingerprints" designed to summarize properties of MEG ICs as they are built on spatial, temporal and spectral parameters; ii) the implementation of a modified version of the standard K-means procedure to improve its data-driven character. This algorithm groups the obtained ICs automatically estimating the number of clusters through an adaptive weighting of the parameters and a constraint on the ICs independence, i.e. components coming from the same session (at subject level) or subject (at group level) cannot be grouped together. The performances of MAGMICK are illustrated by analyzing two sets of MEG data obtained during a finger tapping task and median nerve stimulation. The results demonstrate that the method can extract consistent patterns of spatial topography and spectral properties across sessions and subjects that are in good agreement with the literature. In addition, these results are compared to those from a modified version of affinity propagation clustering method. The comparison, evaluated in terms of different clustering validity indices, shows that our methodology often outperforms the clustering algorithm. Eventually, these results are

  18. Tracking Undergraduate Student Achievement in a First-Year Physiology Course Using a Cluster Analysis Approach

    Science.gov (United States)

    Brown, S. J.; White, S.; Power, N.

    2015-01-01

    A cluster analysis data classification technique was used on assessment scores from 157 undergraduate nursing students who passed 2 successive compulsory courses in human anatomy and physiology. Student scores in five summative assessment tasks, taken in each of the courses, were used as inputs for a cluster analysis procedure. We aimed to group…

  19. Melodic pattern discovery by structural analysis via wavelets and clustering techniques

    DEFF Research Database (Denmark)

    Velarde, Gissel; Meredith, David

    We present an automatic method to support melodic pattern discovery by structural analysis of symbolic representations by means of wavelet analysis and clustering techniques. In previous work, we used the method to recognize the parent works of melodic segments, or to classify tunes into tune......-means to cluster melodic segments into groups of measured similarity and obtain a raking of the most prototypical melodic segments or patterns and their occurrences. We test the method on the JKU Patterns Development Database and evaluate it based on the ground truth defined by the MIREX 2013 Discovery of Repeated...... Themes & Sections task. We compare the results of our method to the output of geometric approaches. Finally, we discuss about the relevance of our wavelet-based analysis in relation to structure, pattern discovery, similarity and variation, and comment about the considerations of the method when used...

  20. Cluster cosmological analysis with X ray instrumental observables: introduction and testing of AsPIX method

    International Nuclear Information System (INIS)

    Valotti, Andrea

    2016-01-01

    Cosmology is one of the fundamental pillars of astrophysics, as such it contains many unsolved puzzles. To investigate some of those puzzles, we analyze X-ray surveys of galaxy clusters. These surveys are possible thanks to the bremsstrahlung emission of the intra-cluster medium. The simultaneous fit of cluster counts as a function of mass and distance provides an independent measure of cosmological parameters such as Ω m , σ s , and the dark energy equation of state w0. A novel approach to cosmological analysis using galaxy cluster data, called top-down, was developed in N. Clerc et al. (2012). This top-down approach is based purely on instrumental observables that are considered in a two-dimensional X-ray color-magnitude diagram. The method self-consistently includes selection effects and scaling relationships. It also provides a means of bypassing the computation of individual cluster masses. My work presents an extension of the top-down method by introducing the apparent size of the cluster, creating a three-dimensional X-ray cluster diagram. The size of a cluster is sensitive to both the cluster mass and its angular diameter, so it must also be included in the assessment of selection effects. The performance of this new method is investigated using a Fisher analysis. In parallel, I have studied the effects of the intrinsic scatter in the cluster size scaling relation on the sample selection as well as on the obtained cosmological parameters. To validate the method, I estimate uncertainties of cosmological parameters with MCMC method Amoeba minimization routine and using two simulated XMM surveys that have an increasing level of complexity. The first simulated survey is a set of toy catalogues of 100 and 10000 deg 2 , whereas the second is a 1000 deg 2 catalogue that was generated using an Aardvark semi-analytical N-body simulation. This comparison corroborates the conclusions of the Fisher analysis. In conclusion, I find that a cluster diagram that accounts

  1. HORIZONTAL BRANCH MORPHOLOGY OF GLOBULAR CLUSTERS: A MULTIVARIATE STATISTICAL ANALYSIS

    International Nuclear Information System (INIS)

    Jogesh Babu, G.; Chattopadhyay, Tanuka; Chattopadhyay, Asis Kumar; Mondal, Saptarshi

    2009-01-01

    The proper interpretation of horizontal branch (HB) morphology is crucial to the understanding of the formation history of stellar populations. In the present study a multivariate analysis is used (principal component analysis) for the selection of appropriate HB morphology parameter, which, in our case, is the logarithm of effective temperature extent of the HB (log T effHB ). Then this parameter is expressed in terms of the most significant observed independent parameters of Galactic globular clusters (GGCs) separately for coherent groups, obtained in a previous work, through a stepwise multiple regression technique. It is found that, metallicity ([Fe/H]), central surface brightness (μ v ), and core radius (r c ) are the significant parameters to explain most of the variations in HB morphology (multiple R 2 ∼ 0.86) for GGC elonging to the bulge/disk while metallicity ([Fe/H]) and absolute magnitude (M v ) are responsible for GGC belonging to the inner halo (multiple R 2 ∼ 0.52). The robustness is tested by taking 1000 bootstrap samples. A cluster analysis is performed for the red giant branch (RGB) stars of the GGC belonging to Galactic inner halo (Cluster 2). A multi-episodic star formation is preferred for RGB stars of GGC belonging to this group. It supports the asymptotic giant branch (AGB) model in three episodes instead of two as suggested by Carretta et al. for halo GGC while AGB model is suggested to be revisited for bulge/disk GGC.

  2. Using Cluster Analysis and ICP-MS to Identify Groups of Ecstasy Tablets in Sao Paulo State, Brazil.

    Science.gov (United States)

    Maione, Camila; de Oliveira Souza, Vanessa Cristina; Togni, Loraine Rezende; da Costa, José Luiz; Campiglia, Andres Dobal; Barbosa, Fernando; Barbosa, Rommel Melgaço

    2017-11-01

    The variations found in the elemental composition in ecstasy samples result in spectral profiles with useful information for data analysis, and cluster analysis of these profiles can help uncover different categories of the drug. We provide a cluster analysis of ecstasy tablets based on their elemental composition. Twenty-five elements were determined by ICP-MS in tablets apprehended by Sao Paulo's State Police, Brazil. We employ the K-means clustering algorithm along with C4.5 decision tree to help us interpret the clustering results. We found a better number of two clusters within the data, which can refer to the approximated number of sources of the drug which supply the cities of seizures. The C4.5 model was capable of differentiating the ecstasy samples from the two clusters with high prediction accuracy using the leave-one-out cross-validation. The model used only Nd, Ni, and Pb concentration values in the classification of the samples. © 2017 American Academy of Forensic Sciences.

  3. What you see is what you remember : Visual chunking by temporal integration enhances working memory

    NARCIS (Netherlands)

    Akyürek, Elkan G.; Kappelmann, Nils; Volkert, Marc; van Rijn, Hedderik

    2017-01-01

    Human memory benefits from information clustering, which can be accomplished by chunking. Chunking typically relies on expertise and strategy and it is unknown whether perceptual clustering over time, through temporal integration, can also enhance working memory. The current study examined the

  4. Using Spatial Clustering in Forecasting Groundwater Quality Parameters by ANFIS

    Directory of Open Access Journals (Sweden)

    MohammadTaghi Alami

    2016-07-01

    Full Text Available Groundwater is a major source of water supply for domestic, agricultural, and industrial uses; hence, its quality modeling is an important task in hydro-environmental studies. While many data-based models have been developed for this purpose, the performance of such data-based models can be drastically enhanced if they are based on temporal and spatial pre-processing. In this study, geostatistics tools (e.g., Co-Kriging, as spatial estimators, and self-organizing map (SOM, as a clustering technique, were employed in conjunction with Adaptive Neuro-Fuzzy Inference System (ANFIS for the temporal forecasting of such quality parameters as electrical conductivity (EC and total dissolved solids (TDS of the groundwater in Ardabil Plain. Using the results thus obtained, the impact of spatial data clustering was also investigated on the same parameters. The results showed that, if propoer input data are selected, the proposed spatial clustering technique is capable of imporving groundwater quality forecasts made by ANFIS.

  5. Social Media Use and Depression and Anxiety Symptoms: A Cluster Analysis.

    Science.gov (United States)

    Shensa, Ariel; Sidani, Jaime E; Dew, Mary Amanda; Escobar-Viera, César G; Primack, Brian A

    2018-03-01

    Individuals use social media with varying quantity, emotional, and behavioral at- tachment that may have differential associations with mental health outcomes. In this study, we sought to identify distinct patterns of social media use (SMU) and to assess associations between those patterns and depression and anxiety symptoms. In October 2014, a nationally-representative sample of 1730 US adults ages 19 to 32 completed an online survey. Cluster analysis was used to identify patterns of SMU. Depression and anxiety were measured using respective 4-item Patient-Reported Outcome Measurement Information System (PROMIS) scales. Multivariable logistic regression models were used to assess associations between clus- ter membership and depression and anxiety. Cluster analysis yielded a 5-cluster solu- tion. Participants were characterized as "Wired," "Connected," "Diffuse Dabblers," "Concentrated Dabblers," and "Unplugged." Membership in 2 clusters - "Wired" and "Connected" - increased the odds of elevated depression and anxiety symptoms (AOR = 2.7, 95% CI = 1.5-4.7; AOR = 3.7, 95% CI = 2.1-6.5, respectively, and AOR = 2.0, 95% CI = 1.3-3.2; AOR = 2.0, 95% CI = 1.3-3.1, respectively). SMU pattern characterization of a large population suggests 2 pat- terns are associated with risk for depression and anxiety. Developing educational interventions that address use patterns rather than single aspects of SMU (eg, quantity) would likely be useful.

  6. Spatio-temporal variability of soil respiration in a spruce-dominated headwater catchment in western Germany

    Science.gov (United States)

    Bossa, A. Y.; Diekkrüger, B.

    2014-08-01

    CO2 production and transport from forest floors is an important component of the carbon cycle and is closely related to the global atmosphere CO2 concentration. If we are to understand the feedback between soil processes and atmospheric CO2, we need to know more about the spatio-temporal variability of this soil respiration under different environmental conditions. In this study, long-term measurements were conducted in a spruce-dominated forest ecosystem in western Germany. Multivariate analysis-based similarities between different measurement sites led to the detection of site clusters along two CO2 emission axes: (1) mainly controlled by soil temperature and moisture condition, and (2) mainly controlled by root biomass and the forest floor litter. The combined effects of soil temperature and soil moisture were used as a time-dependent rating factor affecting the optimal CO2 production and transport at cluster level. High/moderate/weak time-dependent rating factors were associated with the different clusters. The process-based, most distant clusters were identified using specified pattern characteristics: the reaction rates in the soil layers, the activation energy for bio-chemical reactions, the soil moisture dependency parameter, the root biomass factor, the litter layer factor and the organic matter factor. A HYDRUS-1D model system was inversely used to compute soil hydraulic parameters from soil moisture measurements. Heat transport parameters were calibrated based on observed soil temperatures. The results were used to adjust CO2 productions by soil microorganisms and plant roots under optimal conditions for each cluster. Although the uncertainty associated with the HYDRUS-1D simulations is higher, the results were consistent with both the multivariate clustering and the time-dependent rating of site production. Finally, four clusters with significantly different environmental conditions (i.e. permanent high soil moisture condition, accumulated litter amount

  7. clusterMaker: a multi-algorithm clustering plugin for Cytoscape

    Directory of Open Access Journals (Sweden)

    Morris John H

    2011-11-01

    Full Text Available Abstract Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view, k-means, k-medoid, SCPS, AutoSOME, and native (Java MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin cluster

  8. Graph theoretical analysis reveals disrupted topological properties of whole brain functional networks in temporal lobe epilepsy.

    Science.gov (United States)

    Wang, Junjing; Qiu, Shijun; Xu, Yong; Liu, Zhenyin; Wen, Xue; Hu, Xiangshu; Zhang, Ruibin; Li, Meng; Wang, Wensheng; Huang, Ruiwang

    2014-09-01

    Temporal lobe epilepsy (TLE) is one of the most common forms of drug-resistant epilepsy. Previous studies have indicated that the TLE-related impairments existed in extensive local functional networks. However, little is known about the alterations in the topological properties of whole brain functional networks. In this study, we acquired resting-state BOLD-fMRI (rsfMRI) data from 26 TLE patients and 25 healthy controls, constructed their whole brain functional networks, compared the differences in topological parameters between the TLE patients and the controls, and analyzed the correlation between the altered topological properties and the epilepsy duration. The TLE patients showed significant increases in clustering coefficient and characteristic path length, but significant decrease in global efficiency compared to the controls. We also found altered nodal parameters in several regions in the TLE patients, such as the bilateral angular gyri, left middle temporal gyrus, right hippocampus, triangular part of left inferior frontal gyrus, left inferior parietal but supramarginal and angular gyri, and left parahippocampus gyrus. Further correlation analysis showed that the local efficiency of the TLE patients correlated positively with the epilepsy duration. Our results indicated the disrupted topological properties of whole brain functional networks in TLE patients. Our findings indicated the TLE-related impairments in the whole brain functional networks, which may help us to understand the clinical symptoms of TLE patients and offer a clue for the diagnosis and treatment of the TLE patients. Copyright © 2014 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  9. Integrating PROOF Analysis in Cloud and Batch Clusters

    International Nuclear Information System (INIS)

    Rodríguez-Marrero, Ana Y; Fernández-del-Castillo, Enol; López García, Álvaro; Marco de Lucas, Jesús; Matorras Weinig, Francisco; González Caballero, Isidro; Cuesta Noriega, Alberto

    2012-01-01

    High Energy Physics (HEP) analysis are becoming more complex and demanding due to the large amount of data collected by the current experiments. The Parallel ROOT Facility (PROOF) provides researchers with an interactive tool to speed up the analysis of huge volumes of data by exploiting parallel processing on both multicore machines and computing clusters. The typical PROOF deployment scenario is a permanent set of cores configured to run the PROOF daemons. However, this approach is incapable of adapting to the dynamic nature of interactive usage. Several initiatives seek to improve the use of computing resources by integrating PROOF with a batch system, such as Proof on Demand (PoD) or PROOF Cluster. These solutions are currently in production at Universidad de Oviedo and IFCA and are positively evaluated by users. Although they are able to adapt to the computing needs of users, they must comply with the specific configuration, OS and software installed at the batch nodes. Furthermore, they share the machines with other workloads, which may cause disruptions in the interactive service for users. These limitations make PROOF a typical use-case for cloud computing. In this work we take profit from Cloud Infrastructure at IFCA in order to provide a dynamic PROOF environment where users can control the software configuration of the machines. The Proof Analysis Framework (PAF) facilitates the development of new analysis and offers a transparent access to PROOF resources. Several performance measurements are presented for the different scenarios (PoD, SGE and Cloud), showing a speed improvement closely correlated with the number of cores used.

  10. Segmentation of Residential Gas Consumers Using Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Marta P. Fernandes

    2017-12-01

    Full Text Available The growing environmental concerns and liberalization of energy markets have resulted in an increased competition between utilities and a strong focus on efficiency. To develop new energy efficiency measures and optimize operations, utilities seek new market-related insights and customer engagement strategies. This paper proposes a clustering-based methodology to define the segmentation of residential gas consumers. The segments of gas consumers are obtained through a detailed clustering analysis using smart metering data. Insights are derived from the segmentation, where the segments result from the clustering process and are characterized based on the consumption profiles, as well as according to information regarding consumers’ socio-economic and household key features. The study is based on a sample of approximately one thousand households over one year. The representative load profiles of consumers are essentially characterized by two evident consumption peaks, one in the morning and the other in the evening, and an off-peak consumption. Significant insights can be derived from this methodology regarding typical consumption curves of the different segments of consumers in the population. This knowledge can assist energy utilities and policy makers in the development of consumer engagement strategies, demand forecasting tools and in the design of more sophisticated tariff systems.

  11. Comparing 3 dietary pattern methods--cluster analysis, factor analysis, and index analysis--With colorectal cancer risk: The NIH-AARP Diet and Health Study.

    Science.gov (United States)

    Reedy, Jill; Wirfält, Elisabet; Flood, Andrew; Mitrou, Panagiota N; Krebs-Smith, Susan M; Kipnis, Victor; Midthune, Douglas; Leitzmann, Michael; Hollenbeck, Albert; Schatzkin, Arthur; Subar, Amy F

    2010-02-15

    The authors compared dietary pattern methods-cluster analysis, factor analysis, and index analysis-with colorectal cancer risk in the National Institutes of Health (NIH)-AARP Diet and Health Study (n = 492,306). Data from a 124-item food frequency questionnaire (1995-1996) were used to identify 4 clusters for men (3 clusters for women), 3 factors, and 4 indexes. Comparisons were made with adjusted relative risks and 95% confidence intervals, distributions of individuals in clusters by quintile of factor and index scores, and health behavior characteristics. During 5 years of follow-up through 2000, 3,110 colorectal cancer cases were ascertained. In men, the vegetables and fruits cluster, the fruits and vegetables factor, the fat-reduced/diet foods factor, and all indexes were associated with reduced risk; the meat and potatoes factor was associated with increased risk. In women, reduced risk was found with the Healthy Eating Index-2005 and increased risk with the meat and potatoes factor. For men, beneficial health characteristics were seen with all fruit/vegetable patterns, diet foods patterns, and indexes, while poorer health characteristics were found with meat patterns. For women, findings were similar except that poorer health characteristics were seen with diet foods patterns. Similarities were found across methods, suggesting basic qualities of healthy diets. Nonetheless, findings vary because each method answers a different question.

  12. Characterization of population exposure to organochlorines: A cluster analysis application

    NARCIS (Netherlands)

    R.M. Guimarães (Raphael Mendonça); S. Asmus (Sven); A. Burdorf (Alex)

    2013-01-01

    textabstractThis study aimed to show the results from a cluster analysis application in the characterization of population exposure to organochlorines through variables related to time and exposure dose. Characteristics of 354 subjects in a population exposed to organochlorine pesticides residues

  13. Cluster Analysis of the International Stellarator Confinement Database

    International Nuclear Information System (INIS)

    Kus, A.; Dinklage, A.; Preuss, R.; Ascasibar, E.; Harris, J. H.; Okamura, S.; Yamada, H.; Sano, F.; Stroth, U.; Talmadge, J.

    2008-01-01

    Heterogeneous structure of collected data is one of the problems that occur during derivation of scalings for energy confinement time, and whose analysis tourns out to be wide and complicated matter. The International Stellarator Confinement Database [1], shortly ISCDB, comprises in its latest version 21 a total of 3647 observations from 8 experimental devices, 2067 therefrom beeing so far completed for upcoming analyses. For confinement scaling studies 1933 observation were chosen as the standard dataset. Here we describe a statistical method of cluster analysis for identification of possible cohesive substructures in ISDCB and present some preliminary results

  14. Reliable Collaborative Filtering on Spatio-Temporal Privacy Data

    Directory of Open Access Journals (Sweden)

    Zhen Liu

    2017-01-01

    Full Text Available Lots of multilayer information, such as the spatio-temporal privacy check-in data, is accumulated in the location-based social network (LBSN. When using the collaborative filtering algorithm for LBSN location recommendation, one of the core issues is how to improve recommendation performance by combining the traditional algorithm with the multilayer information. The existing approaches of collaborative filtering use only the sparse user-item rating matrix. It entails high computational complexity and inaccurate results. A novel collaborative filtering-based location recommendation algorithm called LGP-CF, which takes spatio-temporal privacy information into account, is proposed in this paper. By mining the users check-in behavior pattern, the dataset is segmented semantically to reduce the data size that needs to be computed. Then the clustering algorithm is used to obtain and narrow the set of similar users. User-location bipartite graph is modeled using the filtered similar user set. Then LGP-CF can quickly locate the location and trajectory of users through message propagation and aggregation over the graph. Through calculating users similarity by spatio-temporal privacy data on the graph, we can finally calculate the rating of recommendable locations. Experiments results on the physical clusters indicate that compared with the existing algorithms, the proposed LGP-CF algorithm can make recommendations more accurately.

  15. Paternal age related schizophrenia (PARS): Latent subgroups detected by k-means clustering analysis.

    Science.gov (United States)

    Lee, Hyejoo; Malaspina, Dolores; Ahn, Hongshik; Perrin, Mary; Opler, Mark G; Kleinhaus, Karine; Harlap, Susan; Goetz, Raymond; Antonius, Daniel

    2011-05-01

    Paternal age related schizophrenia (PARS) has been proposed as a subgroup of schizophrenia with distinct etiology, pathophysiology and symptoms. This study uses a k-means clustering analysis approach to generate hypotheses about differences between PARS and other cases of schizophrenia. We studied PARS (operationally defined as not having any family history of schizophrenia among first and second-degree relatives and fathers' age at birth ≥ 35 years) in a series of schizophrenia cases recruited from a research unit. Data were available on demographic variables, symptoms (Positive and Negative Syndrome Scale; PANSS), cognitive tests (Wechsler Adult Intelligence Scale-Revised; WAIS-R) and olfaction (University of Pennsylvania Smell Identification Test; UPSIT). We conducted a series of k-means clustering analyses to identify clusters of cases containing high concentrations of PARS. Two analyses generated clusters with high concentrations of PARS cases. The first analysis (N=136; PARS=34) revealed a cluster containing 83% PARS cases, in which the patients showed a significant discrepancy between verbal and performance intelligence. The mean paternal and maternal ages were 41 and 33, respectively. The second analysis (N=123; PARS=30) revealed a cluster containing 71% PARS cases, of which 93% were females; the mean age of onset of psychosis, at 17.2, was significantly early. These results strengthen the evidence that PARS cases differ from other patients with schizophrenia. Hypothesis-generating findings suggest that features of PARS may include a discrepancy between verbal and performance intelligence, and in females, an early age of onset. These findings provide a rationale for separating these phenotypes from others in future clinical, genetic and pathophysiologic studies of schizophrenia and in considering responses to treatment. Copyright © 2011 Elsevier B.V. All rights reserved.

  16. Standardized Effect Size Measures for Mediation Analysis in Cluster-Randomized Trials

    Science.gov (United States)

    Stapleton, Laura M.; Pituch, Keenan A.; Dion, Eric

    2015-01-01

    This article presents 3 standardized effect size measures to use when sharing results of an analysis of mediation of treatment effects for cluster-randomized trials. The authors discuss 3 examples of mediation analysis (upper-level mediation, cross-level mediation, and cross-level mediation with a contextual effect) with demonstration of the…

  17. Fast multidimensional ensemble empirical mode decomposition for the analysis of big spatio-temporal datasets.

    Science.gov (United States)

    Wu, Zhaohua; Feng, Jiaxin; Qiao, Fangli; Tan, Zhe-Min

    2016-04-13

    In this big data era, it is more urgent than ever to solve two major issues: (i) fast data transmission methods that can facilitate access to data from non-local sources and (ii) fast and efficient data analysis methods that can reveal the key information from the available data for particular purposes. Although approaches in different fields to address these two questions may differ significantly, the common part must involve data compression techniques and a fast algorithm. This paper introduces the recently developed adaptive and spatio-temporally local analysis method, namely the fast multidimensional ensemble empirical mode decomposition (MEEMD), for the analysis of a large spatio-temporal dataset. The original MEEMD uses ensemble empirical mode decomposition to decompose time series at each spatial grid and then pieces together the temporal-spatial evolution of climate variability and change on naturally separated timescales, which is computationally expensive. By taking advantage of the high efficiency of the expression using principal component analysis/empirical orthogonal function analysis for spatio-temporally coherent data, we design a lossy compression method for climate data to facilitate its non-local transmission. We also explain the basic principles behind the fast MEEMD through decomposing principal components instead of original grid-wise time series to speed up computation of MEEMD. Using a typical climate dataset as an example, we demonstrate that our newly designed methods can (i) compress data with a compression rate of one to two orders; and (ii) speed-up the MEEMD algorithm by one to two orders. © 2016 The Authors.

  18. The spatial analysis on hemorrhagic fever with renal syndrome in Jiangsu province, China based on geographic information system.

    Science.gov (United States)

    Bao, Changjun; Liu, Wanwan; Zhu, Yefei; Liu, Wendong; Hu, Jianli; Liang, Qi; Cheng, Yuejia; Wu, Ying; Yu, Rongbin; Zhou, Minghao; Shen, Hongbing; Chen, Feng; Tang, Fenyang; Peng, Zhihang

    2014-01-01

    Hemorrhagic fever with renal syndrome (HFRS) is endemic in mainland China, accounting for 90% of total reported cases worldwide, and Jiangsu is one of the most severely affected provinces. In this study, the authors conducted GIS-based spatial analyses in order to determine the spatial distribution of the HFRS cases, identify key areas and explore risk factors for public health planning and resource allocation. Interpolation maps by inverse distance weighting were produced to detect the spatial distribution of HFRS cases in Jiangsu from 2001 to 2011. Spatio-temporal clustering was applied to identify clusters at the county level. Spatial correlation analysis was conducted to detect influencing factors of HFRS in Jiangsu. HFRS cases in Jiangsu from 2001 to 2011 were mapped and the results suggested that cases in Jiangsu were not distributed randomly. Cases were mainly distributed in northeastern and southwestern Jiangsu, especially in Dafeng and Sihong counties. It was notable that prior to this study, Sihong county had rarely been reported as a high-risk area of HFRS. With the maximum spatial size of 50% of the total population and the maximum temporal size of 50% of the total population, spatio-temporal clustering showed that there was one most likely cluster (LLR = 624.52, Phighways, railways, rivers and lakes. The application of GIS together with spatial interpolation, spatio-temporal clustering and spatial correlation analysis can effectively identify high-risk areas and factors influencing HFRS incidence to lay a foundation for researching its pathogenesis.

  19. INTER-TEMPORAL ANALYSIS OF HOUSEHOLD CAR AND MOTORCYCLE OWNERSHIP BEHAVIORS

    Directory of Open Access Journals (Sweden)

    Nobuhiro SANKO

    2009-01-01

    Full Text Available This study investigates household car and motorcycle ownership in Nagoya metropolitan area of Japan. Bivariate ordered probit models of household vehicle ownership were developed using the data from the case study area at three time points, 1981, 1991, and 2001. The accessibility that is generally known to be correlated with vehicle ownership decisions is incorporated as an input for the proposed vehicle ownership model to investigate the potential relationship between them. The mode choice models for the area were first estimated to quantify the accessibility indexes that were later integrated into the vehicle ownership models. Inter-temporal comparison and temporal transferability analysis were conducted. Some of the major findings suggest: 1 that age and gender differences have become less important in modal choices and car ownership as motorization proceeds; 2 that the accessibility seems to have a significant correlation with vehicle ownership; 3 that car and motorcycle ownership may not be independent and may have a complementary relationship; and 4 that the deep insights concerning the model selection are obtained from the viewpoints of the temporal transferability.

  20. Spatial-Temporal Correlation Properties of the 3GPP Spatial Channel Model and the Kronecker MIMO Channel Model

    Directory of Open Access Journals (Sweden)

    Cheng-Xiang Wang

    2007-02-01

    Full Text Available The performance of multiple-input multiple-output (MIMO systems is greatly influenced by the spatial-temporal correlation properties of the underlying MIMO channels. This paper investigates the spatial-temporal correlation characteristics of the spatial channel model (SCM in the Third Generation Partnership Project (3GPP and the Kronecker-based stochastic model (KBSM at three levels, namely, the cluster level, link level, and system level. The KBSM has both the spatial separability and spatial-temporal separability at all the three levels. The spatial-temporal separability is observed for the SCM only at the system level, but not at the cluster and link levels. The SCM shows the spatial separability at the link and system levels, but not at the cluster level since its spatial correlation is related to the joint distribution of the angle of arrival (AoA and angle of departure (AoD. The KBSM with the Gaussian-shaped power azimuth spectrum (PAS is found to fit best the 3GPP SCM in terms of the spatial correlations. Despite its simplicity and analytical tractability, the KBSM is restricted to model only the average spatial-temporal behavior of MIMO channels. The SCM provides more insights of the variations of different MIMO channel realizations, but the implementation complexity is relatively high.

  1. piRNA analysis framework from small RNA-Seq data by a novel cluster prediction tool - PILFER.

    Science.gov (United States)

    Ray, Rishav; Pandey, Priyanka

    2017-12-19

    With the increasing number of studies focusing on PIWI-interacting RNA (piRNAs), it is now pertinent to develop efficient tools dedicated towards piRNA analysis. We have developed a novel cluster prediction tool called PILFER (PIrna cLuster FindER), which can accurately predict piRNA clusters from small RNA sequencing data. PILFER is an open source, easy to use tool, and can be executed even on a personal computer with minimum resources. It uses a sliding-window mechanism by integrating the expression of the reads along with the spatial information to predict the piRNA clusters. We have additionally defined a piRNA analysis pipeline incorporating PILFER to detect and annotate piRNAs and their clusters from raw small RNA sequencing data and implemented it on publicly available data from healthy germline and somatic tissues. We compared PILFER with other existing piRNA cluster prediction tools and found it to be statistically more accurate and superior in many aspects such as the robustness of PILFER clusters is higher and memory efficiency is more. Overall, PILFER provides a fast and accurate solution to piRNA cluster prediction. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Network clustering coefficient approach to DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gerhardt, Guenther J.L. [Universidade Federal do Rio Grande do Sul-Hospital de Clinicas de Porto Alegre, Rua Ramiro Barcelos 2350/sala 2040/90035-003 Porto Alegre (Brazil); Departamento de Fisica e Quimica da Universidade de Caxias do Sul, Rua Francisco Getulio Vargas 1130, 95001-970 Caxias do Sul (Brazil); Lemke, Ney [Programa Interdisciplinar em Computacao Aplicada, Unisinos, Av. Unisinos, 950, 93022-000 Sao Leopoldo, RS (Brazil); Corso, Gilberto [Departamento de Biofisica e Farmacologia, Centro de Biociencias, Universidade Federal do Rio Grande do Norte, Campus Universitario, 59072 970 Natal, RN (Brazil)]. E-mail: corso@dfte.ufrn.br

    2006-05-15

    In this work we propose an alternative DNA sequence analysis tool based on graph theoretical concepts. The methodology investigates the path topology of an organism genome through a triplet network. In this network, triplets in DNA sequence are vertices and two vertices are connected if they occur juxtaposed on the genome. We characterize this network topology by measuring the clustering coefficient. We test our methodology against two main bias: the guanine-cytosine (GC) content and 3-bp (base pairs) periodicity of DNA sequence. We perform the test constructing random networks with variable GC content and imposed 3-bp periodicity. A test group of some organisms is constructed and we investigate the methodology in the light of the constructed random networks. We conclude that the clustering coefficient is a valuable tool since it gives information that is not trivially contained in 3-bp periodicity neither in the variable GC content.

  3. Epidemic characteristics and spatio-temporal patterns of scrub typhus during 2006-2013 in Tai'an, Northern China.

    Science.gov (United States)

    Zheng, L; Yang, H-L; Bi, Z-W; Kou, Z-Q; Zhang, L-Y; Zhang, A-H; Yang, L; Zhao, Z-T

    2015-08-01

    Tai'an, a famous cultural tourist district, is a new endemic foci of scrub typhus in northern China. Frequent reports of travel-acquired cases and absence of effective vaccine indicated a significant health problem of scrub typhus in Tai'an. Thus, descriptive epidemiological methods and spatial-temporal scan statistics were used to describe the epidemic characteristics and detect the significant clusters of the high incidence of scrub typhus at the town level in Tai'an. Results of descriptive epidemiological analysis showed a total of 490 cases were reported in Tai'an with the annual average incidence ranging from 0·48 to 2·27/100 000 during 2006-2013. Females, the elderly and farmers are the high-risk groups. Monthly changes of scrub typhus cases indicated an obvious epidemic period in autumn. Spatial-temporal distribution analysis, showed significant clusters of high incidence mainly located in eastern and northern Tai'an. Our study suggests that more effective, targeted measures for local residents should be implemented in the eastern and northern areas of Tai'an in autumn. Meanwhile, it may prove beneficial for health policy makers to advise travellers to take preventive measures in order to minimize the risk of infection of scrub typhus in Tai'an.

  4. Robustness in cluster analysis in the presence of anomalous observations

    NARCIS (Netherlands)

    Zhuk, EE

    Cluster analysis of multivariate observations in the presence of "outliers" (anomalous observations) in a sample is studied. The expected (mean) fraction of erroneous decisions for the decision rule is computed analytically by minimizing the intraclass scatter. A robust decision rule (stable to

  5. Particle Simulation of Oxidation Induced Band 3 Clustering in Human Erythrocytes.

    Directory of Open Access Journals (Sweden)

    Hanae Shimo

    2015-06-01

    Full Text Available Oxidative stress mediated clustering of membrane protein band 3 plays an essential role in the clearance of damaged and aged red blood cells (RBCs from the circulation. While a number of previous experimental studies have observed changes in band 3 distribution after oxidative treatment, the details of how these clusters are formed and how their properties change under different conditions have remained poorly understood. To address these issues, a framework that enables the simultaneous monitoring of the temporal and spatial changes following oxidation is needed. In this study, we established a novel simulation strategy that incorporates deterministic and stochastic reactions with particle reaction-diffusion processes, to model band 3 cluster formation at single molecule resolution. By integrating a kinetic model of RBC antioxidant metabolism with a model of band 3 diffusion, we developed a model that reproduces the time-dependent changes of glutathione and clustered band 3 levels, as well as band 3 distribution during oxidative treatment, observed in prior studies. We predicted that cluster formation is largely dependent on fast reverse reaction rates, strong affinity between clustering molecules, and irreversible hemichrome binding. We further predicted that under repeated oxidative perturbations, clusters tended to progressively grow and shift towards an irreversible state. Application of our model to simulate oxidation in RBCs with cytoskeletal deficiency also suggested that oxidation leads to more enhanced clustering compared to healthy RBCs. Taken together, our model enables the prediction of band 3 spatio-temporal profiles under various situations, thus providing valuable insights to potentially aid understanding mechanisms for removing senescent and premature RBCs.

  6. Particle Simulation of Oxidation Induced Band 3 Clustering in Human Erythrocytes.

    Science.gov (United States)

    Shimo, Hanae; Arjunan, Satya Nanda Vel; Machiyama, Hiroaki; Nishino, Taiko; Suematsu, Makoto; Fujita, Hideaki; Tomita, Masaru; Takahashi, Koichi

    2015-06-01

    Oxidative stress mediated clustering of membrane protein band 3 plays an essential role in the clearance of damaged and aged red blood cells (RBCs) from the circulation. While a number of previous experimental studies have observed changes in band 3 distribution after oxidative treatment, the details of how these clusters are formed and how their properties change under different conditions have remained poorly understood. To address these issues, a framework that enables the simultaneous monitoring of the temporal and spatial changes following oxidation is needed. In this study, we established a novel simulation strategy that incorporates deterministic and stochastic reactions with particle reaction-diffusion processes, to model band 3 cluster formation at single molecule resolution. By integrating a kinetic model of RBC antioxidant metabolism with a model of band 3 diffusion, we developed a model that reproduces the time-dependent changes of glutathione and clustered band 3 levels, as well as band 3 distribution during oxidative treatment, observed in prior studies. We predicted that cluster formation is largely dependent on fast reverse reaction rates, strong affinity between clustering molecules, and irreversible hemichrome binding. We further predicted that under repeated oxidative perturbations, clusters tended to progressively grow and shift towards an irreversible state. Application of our model to simulate oxidation in RBCs with cytoskeletal deficiency also suggested that oxidation leads to more enhanced clustering compared to healthy RBCs. Taken together, our model enables the prediction of band 3 spatio-temporal profiles under various situations, thus providing valuable insights to potentially aid understanding mechanisms for removing senescent and premature RBCs.

  7. SPECT image analysis using statistical parametric mapping in patients with temporal lobe epilepsy associated with hippocampal sclerosis

    International Nuclear Information System (INIS)

    Shiraki, Junko

    2004-01-01

    The author examined interictal 123 I-IMP SPECT images using statistical parametric mapping (SPM) in 19 temporal lobe epilepsy patients who revealed hippocampal sclerosis with MRI. Decreased regional cerebral blood flow (rCBF) were shown for eight patients in the medial temporal lobe, six patients in the lateral temporal lobe and five patients in the both medial and lateral temporal lobe. These patients were classified into two types; medial type and lateral type, the former decreased rCBF only in medial and the latter decreased rCBF in the other temporal area. Correlation of rCBF and clinical parameters in the lateral type, age at seizure onset was significantly older (p=0.0098, t-test) than those of patients in the medial type. SPM analysis for interictal SPECT of temporal lobe epilepsy clarified location of decreased rCBF and find correlations with clinical characteristics. In addition, SPM analysis of SPECT was useful to understand pathophysiology of the epilepsy. (author)

  8. A comparison of heuristic and model-based clustering methods for dietary pattern analysis.

    Science.gov (United States)

    Greve, Benjamin; Pigeot, Iris; Huybrechts, Inge; Pala, Valeria; Börnhorst, Claudia

    2016-02-01

    Cluster analysis is widely applied to identify dietary patterns. A new method based on Gaussian mixture models (GMM) seems to be more flexible compared with the commonly applied k-means and Ward's method. In the present paper, these clustering approaches are compared to find the most appropriate one for clustering dietary data. The clustering methods were applied to simulated data sets with different cluster structures to compare their performance knowing the true cluster membership of observations. Furthermore, the three methods were applied to FFQ data assessed in 1791 children participating in the IDEFICS (Identification and Prevention of Dietary- and Lifestyle-Induced Health Effects in Children and Infants) Study to explore their performance in practice. The GMM outperformed the other methods in the simulation study in 72 % up to 100 % of cases, depending on the simulated cluster structure. Comparing the computationally less complex k-means and Ward's methods, the performance of k-means was better in 64-100 % of cases. Applied to real data, all methods identified three similar dietary patterns which may be roughly characterized as a 'non-processed' cluster with a high consumption of fruits, vegetables and wholemeal bread, a 'balanced' cluster with only slight preferences of single foods and a 'junk food' cluster. The simulation study suggests that clustering via GMM should be preferred due to its higher flexibility regarding cluster volume, shape and orientation. The k-means seems to be a good alternative, being easier to use while giving similar results when applied to real data.

  9. Spatio-Temporal Trends and Identification of Correlated Variables with Water Quality for Drinking-Water Reservoirs.

    Science.gov (United States)

    Gu, Qing; Wang, Ke; Li, Jiadan; Ma, Ligang; Deng, Jinsong; Zheng, Kefeng; Zhang, Xiaobin; Sheng, Li

    2015-10-20

    It is widely accepted that characterizing the spatio-temporal trends of water quality parameters and identifying correlated variables with water quality are indispensable for the management and protection of water resources. In this study, cluster analysis was used to classify 56 typical drinking water reservoirs in Zhejiang Province into three groups representing different water quality levels, using data of four water quality parameters for the period 2006-2010. Then, the spatio-temporal trends in water quality were analyzed, assisted by geographic information systems (GIS) technology and statistical analysis. The results indicated that the water quality showed a trend of degradation from southwest to northeast, and the overall water quality level was exacerbated during the study period. Correlation analysis was used to evaluate the relationships between water quality parameters and ten independent variables grouped into four categories (land use, socio-economic factors, geographical features, and reservoir attributes). According to the correlation coefficients, land use and socio-economic indicators were identified as the most significant factors related to reservoir water quality. The results offer insights into the spatio-temporal variations of water quality parameters and factors impacting the water quality of drinking water reservoirs in Zhejiang Province, and they could assist managers in making effective strategies to better protect water resources.

  10. Spatio-Temporal Trends and Identification of Correlated Variables with Water Quality for Drinking-Water Reservoirs

    Directory of Open Access Journals (Sweden)

    Qing Gu

    2015-10-01

    Full Text Available It is widely accepted that characterizing the spatio-temporal trends of water quality parameters and identifying correlated variables with water quality are indispensable for the management and protection of water resources. In this study, cluster analysis was used to classify 56 typical drinking water reservoirs in Zhejiang Province into three groups representing different water quality levels, using data of four water quality parameters for the period 2006–2010. Then, the spatio-temporal trends in water quality were analyzed, assisted by geographic information systems (GIS technology and statistical analysis. The results indicated that the water quality showed a trend of degradation from southwest to northeast, and the overall water quality level was exacerbated during the study period. Correlation analysis was used to evaluate the relationships between water quality parameters and ten independent variables grouped into four categories (land use, socio-economic factors, geographical features, and reservoir attributes. According to the correlation coefficients, land use and socio-economic indicators were identified as the most significant factors related to reservoir water quality. The results offer insights into the spatio-temporal variations of water quality parameters and factors impacting the water quality of drinking water reservoirs in Zhejiang Province, and they could assist managers in making effective strategies to better protect water resources.

  11. Analysis of precipitation data in Bangladesh through hierarchical clustering and multidimensional scaling

    Science.gov (United States)

    Rahman, Md. Habibur; Matin, M. A.; Salma, Umma

    2017-12-01

    The precipitation patterns of seventeen locations in Bangladesh from 1961 to 2014 were studied using a cluster analysis and metric multidimensional scaling. In doing so, the current research applies four major hierarchical clustering methods to precipitation in conjunction with different dissimilarity measures and metric multidimensional scaling. A variety of clustering algorithms were used to provide multiple clustering dendrograms for a mixture of distance measures. The dendrogram of pre-monsoon rainfall for the seventeen locations formed five clusters. The pre-monsoon precipitation data for the areas of Srimangal and Sylhet were located in two clusters across the combination of five dissimilarity measures and four hierarchical clustering algorithms. The single linkage algorithm with Euclidian and Manhattan distances, the average linkage algorithm with the Minkowski distance, and Ward's linkage algorithm provided similar results with regard to monsoon precipitation. The results of the post-monsoon and winter precipitation data are shown in different types of dendrograms with disparate combinations of sub-clusters. The schematic geometrical representations of the precipitation data using metric multidimensional scaling showed that the post-monsoon rainfall of Cox's Bazar was located far from those of the other locations. The results of a box-and-whisker plot, different clustering techniques, and metric multidimensional scaling indicated that the precipitation behaviour of Srimangal and Sylhet during the pre-monsoon season, Cox's Bazar and Sylhet during the monsoon season, Maijdi Court and Cox's Bazar during the post-monsoon season, and Cox's Bazar and Khulna during the winter differed from those at other locations in Bangladesh.

  12. Time series clustering analysis of health-promoting behavior

    Science.gov (United States)

    Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng

    2013-10-01

    Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.

  13. Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review.

    Science.gov (United States)

    Kristunas, Caroline; Morris, Tom; Gray, Laura

    2017-11-15

    To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Any, not limited to healthcare settings. Any taking part in an SW-CRT published up to March 2016. The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22-0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  14. Case clustering in pityriasis rosea: a multicenter epidemiologic study in primary care settings in Hong Kong.

    Science.gov (United States)

    Chuh, Antonio A T; Lee, Albert; Molinari, Nicolas

    2003-04-01

    To investigate the epidemiology of pityriasis rosea in primary care settings in Hong Kong and to analyze for temporal clustering. Retrospective epidemiologic study. Six primary care teaching practices affiliated with a university. Patients Forty-one patients with pityriasis rosea, 564 patients with atopic dermatitis (negative control condition), and 35 patients with scabies (positive control condition). We retrieved all records of patients with pityriasis rosea, atopic dermatitis, or scabies diagnosed in 3 years. We analyzed temporal clustering by a method based on a regression model. The monthly incidence of pityriasis rosea is negatively but insignificantly correlated with mean air temperature (gamma s = -0.41, P =.19) and mean total rainfall (gamma s = -0.34, P =.27). Three statistically significant clusters with 7, 6, and 7 cases were identified (P =.03), occurring in the second coldest month in the year (February), the second hottest month (July), and a temperate month (April), respectively. For atopic dermatitis (negative control condition), the nonclustering regression model was selected by Akaike information criteria. For scabies (positive control condition), 1 cluster of 20 cases was detected (P =.03). Significant temporal clustering independent of seasonal variation occurred in our series of patients with pityriasis rosea. This may be indicative of an infectious cause.

  15. Unsupervised clustering with spiking neurons by sparse temporal coding and multi-layer RBF networks

    NARCIS (Netherlands)

    S.M. Bohte (Sander); J.A. La Poutré (Han); J.N. Kok (Joost)

    2000-01-01

    textabstractWe demonstrate that spiking neural networks encoding information in spike times are capable of computing and learning clusters from realistic data. We show how a spiking neural network based on spike-time coding and Hebbian learning can successfully perform unsupervised clustering on

  16. Locality-Aware CTA Clustering For Modern GPUs

    Energy Technology Data Exchange (ETDEWEB)

    Li, Ang; Song, Shuaiwen; Liu, Weifeng; Liu, Xu; Kumar, Akash; Corporaal, Henk

    2017-04-08

    In this paper, we proposed a novel clustering technique for tapping into the performance potential of a largely ignored type of locality: inter-CTA locality. We first demonstrated the capability of the existing GPU hardware to exploit such locality, both spatially and temporally, on L1 or L1/Tex unified cache. To verify the potential of this locality, we quantified its existence in a broad spectrum of applications and discussed its sources of origin. Based on these insights, we proposed the concept of CTA-Clustering and its associated software techniques. Finally, We evaluated these techniques on all modern generations of NVIDIA GPU architectures. The experimental results showed that our proposed clustering techniques could significantly improve on-chip cache performance.

  17. Shifting patterns of Aedes aegypti fine scale spatial clustering in Iquitos, Peru.

    Science.gov (United States)

    LaCon, Genevieve; Morrison, Amy C; Astete, Helvio; Stoddard, Steven T; Paz-Soldan, Valerie A; Elder, John P; Halsey, Eric S; Scott, Thomas W; Kitron, Uriel; Vazquez-Prokopec, Gonzalo M

    2014-08-01

    Empiric evidence shows that Aedes aegypti abundance is spatially heterogeneous and that some areas and larval habitats produce more mosquitoes than others. There is a knowledge gap, however, with regards to the temporal persistence of such Ae. aegypti abundance hotspots. In this study, we used a longitudinal entomologic dataset from the city of Iquitos, Peru, to (1) quantify the spatial clustering patterns of adult Ae. aegypti and pupae counts per house, (2) determine overlap between clusters, (3) quantify the temporal stability of clusters over nine entomologic surveys spaced four months apart, and (4) quantify the extent of clustering at the household and neighborhood levels. Data from 13,662 household entomological visits performed in two Iquitos neighborhoods differing in Ae. aegypti abundance and dengue virus transmission was analyzed using global and local spatial statistics. The location and extent of Ae. aegypti pupae and adult hotspots (i.e., small groups of houses with significantly [pentomologic surveys. The extent of clustering was used to quantify the probability of finding spatially correlated populations. Our analyses indicate that Ae. aegypti distribution was highly focal (most clusters do not extend beyond 30 meters) and that hotspots of high vector abundance were common on every survey date, but they were temporally unstable over the period of study. Our findings have implications for understanding Ae. aegypti distribution and for the design of surveillance and control activities relying on household-level data. In settings like Iquitos, where there is a relatively low percentage of Ae. aegypti in permanent water-holding containers, identifying and targeting key premises will be significantly challenged by shifting hotspots of Ae. aegypti infestation. Focusing efforts in large geographic areas with historically high levels of transmission may be more effective than targeting Ae. aegypti hotspots.

  18. Shifting patterns of Aedes aegypti fine scale spatial clustering in Iquitos, Peru.

    Directory of Open Access Journals (Sweden)

    Genevieve LaCon

    2014-08-01

    Full Text Available Empiric evidence shows that Aedes aegypti abundance is spatially heterogeneous and that some areas and larval habitats produce more mosquitoes than others. There is a knowledge gap, however, with regards to the temporal persistence of such Ae. aegypti abundance hotspots. In this study, we used a longitudinal entomologic dataset from the city of Iquitos, Peru, to (1 quantify the spatial clustering patterns of adult Ae. aegypti and pupae counts per house, (2 determine overlap between clusters, (3 quantify the temporal stability of clusters over nine entomologic surveys spaced four months apart, and (4 quantify the extent of clustering at the household and neighborhood levels.Data from 13,662 household entomological visits performed in two Iquitos neighborhoods differing in Ae. aegypti abundance and dengue virus transmission was analyzed using global and local spatial statistics. The location and extent of Ae. aegypti pupae and adult hotspots (i.e., small groups of houses with significantly [p<0.05] high mosquito abundance were calculated for each of the 9 entomologic surveys. The extent of clustering was used to quantify the probability of finding spatially correlated populations. Our analyses indicate that Ae. aegypti distribution was highly focal (most clusters do not extend beyond 30 meters and that hotspots of high vector abundance were common on every survey date, but they were temporally unstable over the period of study.Our findings have implications for understanding Ae. aegypti distribution and for the design of surveillance and control activities relying on household-level data. In settings like Iquitos, where there is a relatively low percentage of Ae. aegypti in permanent water-holding containers, identifying and targeting key premises will be significantly challenged by shifting hotspots of Ae. aegypti infestation. Focusing efforts in large geographic areas with historically high levels of transmission may be more effective than

  19. Applying clustering to statistical analysis of student reasoning about two-dimensional kinematics

    Directory of Open Access Journals (Sweden)

    R. Padraic Springuel

    2007-12-01

    Full Text Available We use clustering, an analysis method not presently common to the physics education research community, to group and characterize student responses to written questions about two-dimensional kinematics. Previously, clustering has been used to analyze multiple-choice data; we analyze free-response data that includes both sketches of vectors and written elements. The primary goal of this paper is to describe the methodology itself; we include a brief overview of relevant results.

  20. Comparison of Outputs for Variable Combinations Used in Cluster Analysis on Polarmetric Imagery

    National Research Council Canada - National Science Library

    Petre, Melinda

    2008-01-01

    .... More specifically, two techniques, Cluster Analysis (CA) and Principle Component Analysis (PCA) can be combined to process Stoke s imagery by distinguishing between pixels, and producing groups of pixels with similar characteristics...

  1. An analysis of hospital brand mark clusters.

    Science.gov (United States)

    Vollmers, Stacy M; Miller, Darryl W; Kilic, Ozcan

    2010-07-01

    This study analyzed brand mark clusters (i.e., various types of brand marks displayed in combination) used by hospitals in the United States. The brand marks were assessed against several normative criteria for creating brand marks that are memorable and that elicit positive affect. Overall, results show a reasonably high level of adherence to many of these normative criteria. Many of the clusters exhibited pictorial elements that reflected benefits and that were conceptually consistent with the verbal content of the cluster. Also, many clusters featured icons that were balanced and moderately complex. However, only a few contained interactive imagery or taglines communicating benefits.

  2. Partitional clustering algorithms

    CERN Document Server

    2015-01-01

    This book summarizes the state-of-the-art in partitional clustering. Clustering, the unsupervised classification of patterns into groups, is one of the most important tasks in exploratory data analysis. Primary goals of clustering include gaining insight into, classifying, and compressing data. Clustering has a long and rich history that spans a variety of scientific disciplines including anthropology, biology, medicine, psychology, statistics, mathematics, engineering, and computer science. As a result, numerous clustering algorithms have been proposed since the early 1950s. Among these algorithms, partitional (nonhierarchical) ones have found many applications, especially in engineering and computer science. This book provides coverage of consensus clustering, constrained clustering, large scale and/or high dimensional clustering, cluster validity, cluster visualization, and applications of clustering. Examines clustering as it applies to large and/or high-dimensional data sets commonly encountered in reali...

  3. Cluster analysis for the probability of DSB site induced by electron tracks

    Energy Technology Data Exchange (ETDEWEB)

    Yoshii, Y. [Biological Research, Education and Instrumentation Center, Sapporo Medical University, Sapporo 060-8556 (Japan); Graduate School of Health Sciences, Hokkaido University, Sapporo 060-0812 (Japan); Sasaki, K. [Faculty of Health Sciences, Hokkaido University of Science, Sapporo 006-8585 (Japan); Matsuya, Y. [Graduate School of Health Sciences, Hokkaido University, Sapporo 060-0812 (Japan); Date, H., E-mail: date@hs.hokudai.ac.jp [Faculty of Health Sciences, Hokkaido University, Sapporo 060-0812 (Japan)

    2015-05-01

    To clarify the influence of bio-cells exposed to ionizing radiations, the densely populated pattern of the ionization in the cell nucleus is of importance because it governs the extent of DNA damage which may lead to cell lethality. In this study, we have conducted a cluster analysis of ionization and excitation events to estimate the number of double-strand breaks (DSBs) induced by electron tracks. A Monte Carlo simulation for electrons in liquid water was performed to determine the spatial location of the ionization and excitation events. The events were divided into clusters by using the density-based spatial clustering of applications with noise (DBSCAN) algorithm. The algorithm enables us to sort out the events into the groups (clusters) in which a minimum number of neighboring events are contained within a given radius. For evaluating the number of DSBs in the extracted clusters, we have introduced an aggregation index (AI). The computational results show that a sub-keV electron produces DSBs in a dense formation more effectively than higher energy electrons. The root-mean square radius (RMSR) of the cluster size is below 5 nm, which is smaller than the chromatin fiber thickness. It was found that this size of clustering events has a high possibility to cause lesions in DNA within the chromatin fiber site.

  4. Cluster Analysis on Longitudinal Data of Patients with Adult-Onset Asthma.

    Science.gov (United States)

    Ilmarinen, Pinja; Tuomisto, Leena E; Niemelä, Onni; Tommola, Minna; Haanpää, Jussi; Kankaanranta, Hannu

    Previous cluster analyses on asthma are based on cross-sectional data. To identify phenotypes of adult-onset asthma by using data from baseline (diagnostic) and 12-year follow-up visits. The Seinäjoki Adult Asthma Study is a 12-year follow-up study of patients with new-onset adult asthma. K-means cluster analysis was performed by using variables from baseline and follow-up visits on 171 patients to identify phenotypes. Five clusters were identified. Patients in cluster 1 (n = 38) were predominantly nonatopic males with moderate smoking history at baseline. At follow-up, 40% of these patients had developed persistent obstruction but the number of patients with uncontrolled asthma (5%) and rhinitis (10%) was the lowest. Cluster 2 (n = 19) was characterized by older men with heavy smoking history, poor lung function, and persistent obstruction at baseline. At follow-up, these patients were mostly uncontrolled (84%) despite daily use of inhaled corticosteroid (ICS) with add-on therapy. Cluster 3 (n = 50) consisted mostly of nonsmoking females with good lung function at diagnosis/follow-up and well-controlled/partially controlled asthma at follow-up. Cluster 4 (n = 25) had obese and symptomatic patients at baseline/follow-up. At follow-up, these patients had several comorbidities (40% psychiatric disease) and were treated daily with ICS and add-on therapy. Patients in cluster 5 (n = 39) were mostly atopic and had the earliest onset of asthma, the highest blood eosinophils, and FEV 1 reversibility at diagnosis. At follow-up, these patients used the lowest ICS dose but 56% were well controlled. Results can be used to predict outcomes of patients with adult-onset asthma and to aid in development of personalized therapy (NCT02733016 at ClinicalTrials.gov). Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  5. Parkinson's Disease Subtypes Identified from Cluster Analysis of Motor and Non-motor Symptoms.

    Science.gov (United States)

    Mu, Jesse; Chaudhuri, Kallol R; Bielza, Concha; de Pedro-Cuesta, Jesus; Larrañaga, Pedro; Martinez-Martin, Pablo

    2017-01-01

    Parkinson's disease is now considered a complex, multi-peptide, central, and peripheral nervous system disorder with considerable clinical heterogeneity. Non-motor symptoms play a key role in the trajectory of Parkinson's disease, from prodromal premotor to end stages. To understand the clinical heterogeneity of Parkinson's disease, this study used cluster analysis to search for subtypes from a large, multi-center, international, and well-characterized cohort of Parkinson's disease patients across all motor stages, using a combination of cardinal motor features (bradykinesia, rigidity, tremor, axial signs) and, for the first time, specific validated rater-based non-motor symptom scales. Two independent international cohort studies were used: (a) the validation study of the Non-Motor Symptoms Scale ( n = 411) and (b) baseline data from the global Non-Motor International Longitudinal Study ( n = 540). k -means cluster analyses were performed on the non-motor and motor domains (domains clustering) and the 30 individual non-motor symptoms alone (symptoms clustering), and hierarchical agglomerative clustering was performed to group symptoms together. Four clusters are identified from the domains clustering supporting previous studies: mild, non-motor dominant, motor-dominant, and severe. In addition, six new smaller clusters are identified from the symptoms clustering, each characterized by clinically-relevant non-motor symptoms. The clusters identified in this study present statistical confirmation of the increasingly important role of non-motor symptoms (NMS) in Parkinson's disease heterogeneity and take steps toward subtype-specific treatment packages.

  6. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda.

    Science.gov (United States)

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-03-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards' method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. © The Author 2015

  7. Local wavelet correlation: applicationto timing analysis of multi-satellite CLUSTER data

    Directory of Open Access Journals (Sweden)

    J. Soucek

    2004-12-01

    Full Text Available Multi-spacecraft space observations, such as those of CLUSTER, can be used to infer information about local plasma structures by exploiting the timing differences between subsequent encounters of these structures by individual satellites. We introduce a novel wavelet-based technique, the Local Wavelet Correlation (LWC, which allows one to match the corresponding signatures of large-scale structures in the data from multiple spacecraft and determine the relative time shifts between the crossings. The LWC is especially suitable for analysis of strongly non-stationary time series, where it enables one to estimate the time lags in a more robust and systematic way than ordinary cross-correlation techniques. The technique, together with its properties and some examples of its application to timing analysis of bow shock and magnetopause crossing observed by CLUSTER, are presented. We also compare the performance and reliability of the technique with classical discontinuity analysis methods. Key words. Radio science (signal processing – Space plasma physics (discontinuities; instruments and techniques

  8. Cluster analysis as a method for determining size ranges for spinal implants: disc lumbar replacement prosthesis dimensions from magnetic resonance images.

    Science.gov (United States)

    Lei, Dang; Holder, Roger L; Smith, Francis W; Wardlaw, Douglas; Hukins, David W L

    2006-12-01

    Statistical analysis of clinical radiologic data. To develop an objective method for finding the number of sizes for a lumbar disc replacement. Cluster analysis is a well-established technique for sorting observations into clusters so that the "similarity level" is maximal if they belong to the same cluster and minimal otherwise. Magnetic resonance scans from 69 patients, with no abnormal discs, yielded 206 sagittal and transverse images of 206 discs (levels L3-L4-L5-S1). Anteroposterior and lateral dimensions were measured from vertebral margins on transverse images; disc heights were measured from sagittal images. Hierarchical cluster analysis was performed to determine the number of clusters followed by nonhierarchical (K-means) cluster analysis. Discriminant analysis was used to determine how well the clusters could be used to classify an observation. The most successful method of clustering the data involved the following parameters: anteroposterior dimension; lateral dimension (both were the mean of results from the superior and inferior margins of a vertebral body, measured on transverse images); and maximum disc height (from a midsagittal image). These were grouped into 7 clusters so that a discriminant analysis was capable of correctly classifying 97.1% of the observations. The mean and standard deviations for the parameter values in each cluster were determined. Cluster analysis has been successfully used to find the dimensions of the minimum number of prosthesis sizes required to replace L3-L4 to L5-S1 discs; the range of sizes would enable them to be used at higher lumbar levels in some patients.

  9. Cluster analysis for validated climatology stations using precipitation in Mexico

    NARCIS (Netherlands)

    Bravo Cabrera, J. L.; Azpra-Romero, E.; Zarraluqui-Such, V.; Gay-García, C.; Estrada Porrúa, F.

    2012-01-01

    Annual average of daily precipitation was used to group climatological stations into clusters using the k-means procedure and principal component analysis with varimax rotation. After a careful selection of the stations deployed in Mexico since 1950, we selected 349 characterized by having 35 to 40

  10. McMaster Mesonet soil moisture dataset: description and spatio-temporal variability analysis

    Directory of Open Access Journals (Sweden)

    K. C. Kornelsen

    2013-04-01

    Full Text Available This paper introduces and describes the hourly, high-resolution soil moisture dataset continuously recorded by the McMaster Mesonet located in the Hamilton-Halton Watershed in Southern Ontario, Canada. The McMaster Mesonet consists of a network of time domain reflectometer (TDR probes collecting hourly soil moisture data at six depths between 10 cm and 100 cm at nine locations per site, spread across four sites in the 1250 km2 watershed. The sites for the soil moisture arrays are designed to further improve understanding of soil moisture dynamics in a seasonal climate and to capture soil moisture transitions in areas that have different topography, soil and land cover. The McMaster Mesonet soil moisture constitutes a unique database in Canada because of its high spatio-temporal resolution. In order to provide some insight into the dominant processes at the McMaster Mesonet sites, a spatio-temporal and temporal stability analysis were conducted to identify spatio-temporal patterns in the data and to suggest some physical interpretation of soil moisture variability. It was found that the seasonal climate of the Great Lakes Basin causes a transition in soil moisture patterns at seasonal timescales. During winter and early spring months, and at the meadow sites, soil moisture distribution is governed by topographic redistribution, whereas following efflorescence in the spring and summer, soil moisture spatial distribution at the forested site was also controlled by vegetation canopy. Analysis of short-term temporal stability revealed that the relative difference between sites was maintained unless there was significant rainfall (> 20 mm or wet conditions a priori. Following a disturbance in the spatial soil moisture distribution due to wetting, the relative soil moisture pattern re-emerged in 18 to 24 h. Access to the McMaster Mesonet data can be provided by visiting www.hydrology.mcmaster.ca/mesonet.

  11. Gene expression data clustering and it’s application in differential analysis of leukemia

    Directory of Open Access Journals (Sweden)

    M. Vahedi

    2008-02-01

    Full Text Available Introduction: DNA microarray technique is one of the most important categories in bioinformatics,which allows the possibility of monitoring thousands of expressed genes has been resulted in creatinggiant data bases of gene expression data, recently. Statistical analysis of such databases includednormalization, clustering, classification and etc.Materials and Methods: Golub et al (1999 collected data bases of leukemia based on the method ofoligonucleotide. The data is on the internet. In this paper, we analyzed gene expression data. It wasclustered by several methods including multi-dimensional scaling, hierarchical and non-hierarchicalclustering. Data set included 20 Acute Lymphoblastic Leukemia (ALL patients and 14 Acute MyeloidLeukemia (AML patients. The results of tow methods of clustering were compared with regard to realgrouping (ALL & AML. R software was used for data analysis.Results: Specificity and sensitivity of divisive hierarchical clustering in diagnosing of ALL patientswere 75% and 92%, respectively. Specificity and sensitivity of partitioning around medoids indiagnosing of ALL patients were 90% and 93%, respectively. These results showed a wellaccomplishment of both methods of clustering. It is considerable that, due to clustering methodsresults, one of the samples was placed in ALL groups, which was in AML group in clinical test.Conclusion: With regard to concordance of the results with real grouping of data, therefore we canuse these methods in the cases where we don't have accurate information of real grouping of data.Moreover, Results of clustering might distinct subgroups of data in such a way that would be necessaryfor concordance with clinical outcomes, laboratory results and so on.

  12. Application of clustering analysis in the prediction of photovoltaic power generation based on neural network

    Science.gov (United States)

    Cheng, K.; Guo, L. M.; Wang, Y. K.; Zafar, M. T.

    2017-11-01

    In order to select effective samples in the large number of data of PV power generation years and improve the accuracy of PV power generation forecasting model, this paper studies the application of clustering analysis in this field and establishes forecasting model based on neural network. Based on three different types of weather on sunny, cloudy and rainy days, this research screens samples of historical data by the clustering analysis method. After screening, it establishes BP neural network prediction models using screened data as training data. Then, compare the six types of photovoltaic power generation prediction models before and after the data screening. Results show that the prediction model combining with clustering analysis and BP neural networks is an effective method to improve the precision of photovoltaic power generation.

  13. Clustering analysis of water distribution systems: identifying critical components and community impacts.

    Science.gov (United States)

    Diao, K; Farmani, R; Fu, G; Astaraie-Imani, M; Ward, S; Butler, D

    2014-01-01

    Large water distribution systems (WDSs) are networks with both topological and behavioural complexity. Thereby, it is usually difficult to identify the key features of the properties of the system, and subsequently all the critical components within the system for a given purpose of design or control. One way is, however, to more explicitly visualize the network structure and interactions between components by dividing a WDS into a number of clusters (subsystems). Accordingly, this paper introduces a clustering strategy that decomposes WDSs into clusters with stronger internal connections than external connections. The detected cluster layout is very similar to the community structure of the served urban area. As WDSs may expand along with urban development in a community-by-community manner, the correspondingly formed distribution clusters may reveal some crucial configurations of WDSs. For verification, the method is applied to identify all the critical links during firefighting for the vulnerability analysis of a real-world WDS. Moreover, both the most critical pipes and clusters are addressed, given the consequences of pipe failure. Compared with the enumeration method, the method used in this study identifies the same group of the most critical components, and provides similar criticality prioritizations of them in a more computationally efficient time.

  14. Comparative analysis on the selection of number of clusters in community detection

    Science.gov (United States)

    Kawamoto, Tatsuro; Kabashima, Yoshiyuki

    2018-02-01

    We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.

  15. Statistical Significance for Hierarchical Clustering

    Science.gov (United States)

    Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

    2017-01-01

    Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990

  16. Vector Nonlinear Time-Series Analysis of Gamma-Ray Burst Datasets on Heterogeneous Clusters

    Directory of Open Access Journals (Sweden)

    Ioana Banicescu

    2005-01-01

    Full Text Available The simultaneous analysis of a number of related datasets using a single statistical model is an important problem in statistical computing. A parameterized statistical model is to be fitted on multiple datasets and tested for goodness of fit within a fixed analytical framework. Definitive conclusions are hopefully achieved by analyzing the datasets together. This paper proposes a strategy for the efficient execution of this type of analysis on heterogeneous clusters. Based on partitioning processors into groups for efficient communications and a dynamic loop scheduling approach for load balancing, the strategy addresses the variability of the computational loads of the datasets, as well as the unpredictable irregularities of the cluster environment. Results from preliminary tests of using this strategy to fit gamma-ray burst time profiles with vector functional coefficient autoregressive models on 64 processors of a general purpose Linux cluster demonstrate the effectiveness of the strategy.

  17. Parallelization and scheduling of data intensive particle physics analysis jobs on clusters of PCs

    CERN Document Server

    Ponce, S

    2004-01-01

    Summary form only given. Scheduling policies are proposed for parallelizing data intensive particle physics analysis applications on computer clusters. Particle physics analysis jobs require the analysis of tens of thousands of particle collision events, each event requiring typically 200ms processing time and 600KB of data. Many jobs are launched concurrently by a large number of physicists. At a first view, particle physics jobs seem to be easy to parallelize, since particle collision events can be processed independently one from another. However, since large amounts of data need to be accessed, the real challenge resides in making an efficient use of the underlying computing resources. We propose several job parallelization and scheduling policies aiming at reducing job processing times and at increasing the sustainable load of a cluster server. Since particle collision events are usually reused by several jobs, cache based job splitting strategies considerably increase cluster utilization and reduce job ...

  18. SPATIO-TEMPORAL VARIATIONS IN MACROINVERTEBRATE ASSEMBLAGES OF NEW CALEDONIAN STREAMS.

    Directory of Open Access Journals (Sweden)

    MARY N. J.

    2002-01-01

    Full Text Available Forty-one sites located on 14 New Caledonian streams were surveyed four times between October 1996 and October 1997 in order to examine the spatial and temporal changes in the structure of the benthic macroinvertebrate communities. About 250 000 invertebrates representing 167 taxa were collected in the streams. Seventy-five percent of identified taxa and 67% of individuals were insects. Major spatial and temporal changes in the composition of the fauna were detected by multivariate analyses (ordination and classification. Overall, the number of individuals was significantly higher in the dry season (October than in the wetter seasons (January and June. However, a low temporal variability was detected in the structure of benthic communities during the sampling period. A cluster analysis based on taxonomic composition separated five groups of sites in relation with rock type, land use, and geographic characteristics. Several metrics (total invertebrate density, taxon richness, relative abundance of major invertebrate groups, diversity indices were used to characterize each group of sites. Forested streams, where the highest specific diversity occurred, represented the most speciose habitat for benthic fauna. A less rich and abundant fauna occurred in streams draining ultramafic rocks probably because of their low content in food resources and organic matter.

  19. Spatial and temporal epidemiological analysis in the Big Data era.

    Science.gov (United States)

    Pfeiffer, Dirk U; Stevens, Kim B

    2015-11-01

    Concurrent with global economic development in the last 50 years, the opportunities for the spread of existing diseases and emergence of new infectious pathogens, have increased substantially. The activities associated with the enormously intensified global connectivity have resulted in large amounts of data being generated, which in turn provides opportunities for generating knowledge that will allow more effective management of animal and human health risks. This so-called Big Data has, more recently, been accompanied by the Internet of Things which highlights the increasing presence of a wide range of sensors, interconnected via the Internet. Analysis of this data needs to exploit its complexity, accommodate variation in data quality and should take advantage of its spatial and temporal dimensions, where available. Apart from the development of hardware technologies and networking/communication infrastructure, it is necessary to develop appropriate data management tools that make this data accessible for analysis. This includes relational databases, geographical information systems and most recently, cloud-based data storage such as Hadoop distributed file systems. While the development in analytical methodologies has not quite caught up with the data deluge, important advances have been made in a number of areas, including spatial and temporal data analysis where the spectrum of analytical methods ranges from visualisation and exploratory analysis, to modelling. While there used to be a primary focus on statistical science in terms of methodological development for data analysis, the newly emerged discipline of data science is a reflection of the challenges presented by the need to integrate diverse data sources and exploit them using novel data- and knowledge-driven modelling methods while simultaneously recognising the value of quantitative as well as qualitative analytical approaches. Machine learning regression methods, which are more robust and can handle

  20. Quantifying temporal bone morphology of great apes and humans: an approach using geometric morphometrics

    Science.gov (United States)

    Lockwood, Charles A; Lynch, John M; Kimbel, William H

    2002-01-01

    The hominid temporal bone offers a complex array of morphology that is linked to several different functional systems. Its frequent preservation in the fossil record gives the temporal bone added significance in the study of human evolution, but its morphology has proven difficult to quantify. In this study we use techniques of 3D geometric morphometrics to quantify differences among humans and great apes and discuss the results in a phylogenetic context. Twenty-three landmarks on the ectocranial surface of the temporal bone provide a high level of anatomical detail. Generalized Procrustes analysis (GPA) is used to register (adjust for position, orientation and scale) landmark data from 405 adults representing Homo, Pan, Gorilla and Pongo. Principal components analysis of residuals from the GPA shows that the major source of variation is between humans and apes. Human characteristics such as a coronally orientated petrous axis, a deep mandibular fossa, a projecting mastoid process, and reduced lateral extension of the tympanic element strongly impact the analysis. In phenetic cluster analyses, gorillas and orangutans group together with respect to chimpanzees, and all apes group together with respect to humans. Thus, the analysis contradicts depictions of African apes as a single morphotype. Gorillas and orangutans lack the extensive preglenoid surface of chimpanzees, and their mastoid processes are less medially inflected. These and other characters shared by gorillas and orangutans are probably primitive for the African hominid clade. PMID:12489757