WorldWideScience

Sample records for hierarchal cluster analysis

  1. Constructing storyboards based on hierarchical clustering analysis

    Science.gov (United States)

    Hasebe, Satoshi; Sami, Mustafa M.; Muramatsu, Shogo; Kikuchi, Hisakazu

    2005-07-01

    There are growing needs for quick preview of video contents for the purpose of improving accessibility of video archives as well as reducing network traffics. In this paper, a storyboard that contains a user-specified number of keyframes is produced from a given video sequence. It is based on hierarchical cluster analysis of feature vectors that are derived from wavelet coefficients of video frames. Consistent use of extracted feature vectors is the key to avoid a repetition of computationally-intensive parsing of the same video sequence. Experimental results suggest that a significant reduction in computational time is gained by this strategy.

  2. Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

    Science.gov (United States)

    Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

    2013-03-01

    Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.

  3. Performance Analysis of Hierarchical Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    K.Ranjini

    2011-07-01

    Full Text Available Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters, so that the data in each subset (ideally share some common trait - often proximity according to some defined distance measure. Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. This paper explains the implementation of agglomerative and divisive clustering algorithms applied on various types of data. The details of the victims of Tsunami in Thailand during the year 2004, was taken as the test data. Visual programming is used for implementation and running time of the algorithms using different linkages (agglomerative to different types of data are taken for analysis.

  4. Hierarchical Cluster Analysis – Various Approaches to Data Preparation

    Directory of Open Access Journals (Sweden)

    Z. Pacáková

    2013-09-01

    Full Text Available The article deals with two various approaches to data preparation to avoid multicollinearity. The aim of the article is to find similarities among the e-communication level of EU states using hierarchical cluster analysis. The original set of fourteen indicators was first reduced on the basis of correlation analysis while in case of high correlation indicator of higher variability was included in further analysis. Secondly the data were transformed using principal component analysis while the principal components are poorly correlated. For further analysis five principal components explaining about 92% of variance were selected. Hierarchical cluster analysis was performed both based on the reduced data set and the principal component scores. Both times three clusters were assumed following Pseudo t-Squared and Pseudo F Statistic, but the final clusters were not identical. An important characteristic to compare the two results found was to look at the proportion of variance accounted for by the clusters which was about ten percent higher for the principal component scores (57.8% compared to 47%. Therefore it can be stated, that in case of using principal component scores as an input variables for cluster analysis with explained proportion high enough (about 92% for in our analysis, the loss of information is lower compared to data reduction on the basis of correlation analysis.

  5. A Novel Divisive Hierarchical Clustering Algorithm for Geospatial Analysis

    Directory of Open Access Journals (Sweden)

    Shaoning Li

    2017-01-01

    Full Text Available In the fields of geographic information systems (GIS and remote sensing (RS, the clustering algorithm has been widely used for image segmentation, pattern recognition, and cartographic generalization. Although clustering analysis plays a key role in geospatial modelling, traditional clustering methods are limited due to computational complexity, noise resistant ability and robustness. Furthermore, traditional methods are more focused on the adjacent spatial context, which makes it hard for the clustering methods to be applied to multi-density discrete objects. In this paper, a new method, cell-dividing hierarchical clustering (CDHC, is proposed based on convex hull retraction. The main steps are as follows. First, a convex hull structure is constructed to describe the global spatial context of geospatial objects. Then, the retracting structure of each borderline is established in sequence by setting the initial parameter. The objects are split into two clusters (i.e., “sub-clusters” if the retracting structure intersects with the borderlines. Finally, clusters are repeatedly split and the initial parameter is updated until the terminate condition is satisfied. The experimental results show that CDHC separates the multi-density objects from noise sufficiently and also reduces complexity compared to the traditional agglomerative hierarchical clustering algorithm.

  6. Kinematic gait patterns in healthy runners: A hierarchical cluster analysis.

    Science.gov (United States)

    Phinyomark, Angkoon; Osis, Sean; Hettinga, Blayne A; Ferber, Reed

    2015-11-01

    Previous studies have demonstrated distinct clusters of gait patterns in both healthy and pathological groups, suggesting that different movement strategies may be represented. However, these studies have used discrete time point variables and usually focused on only one specific joint and plane of motion. Therefore, the first purpose of this study was to determine if running gait patterns for healthy subjects could be classified into homogeneous subgroups using three-dimensional kinematic data from the ankle, knee, and hip joints. The second purpose was to identify differences in joint kinematics between these groups. The third purpose was to investigate the practical implications of clustering healthy subjects by comparing these kinematics with runners experiencing patellofemoral pain (PFP). A principal component analysis (PCA) was used to reduce the dimensionality of the entire gait waveform data and then a hierarchical cluster analysis (HCA) determined group sets of similar gait patterns and homogeneous clusters. The results show two distinct running gait patterns were found with the main between-group differences occurring in frontal and sagittal plane knee angles (Pgait strategies. These results suggest care must be taken when selecting samples of subjects in order to investigate the pathomechanics of injured runners.

  7. The reflection of hierarchical cluster analysis of co-occurrence matrices in SPSS

    NARCIS (Netherlands)

    Zhou, Q.; Leng, F.; Leydesdorff, L.

    2015-01-01

    Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare the

  8. The reflection of hierarchical cluster analysis of co-occurrence matrices in SPSS

    NARCIS (Netherlands)

    Zhou, Q.; Leng, F.; Leydesdorff, L.

    2015-01-01

    Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare

  9. A COMPARISON BETWEEN SINGLE LINKAGE AND COMPLETE LINKAGE IN AGGLOMERATIVE HIERARCHICAL CLUSTER ANALYSIS FOR IDENTIFYING TOURISTS SEGMENTS

    OpenAIRE

    Noor Rashidah Rashid

    2012-01-01

    Cluster Analysis is a multivariate method in statistics. Agglomerative Hierarchical Cluster Analysis is one of approaches in Cluster Analysis. There are two linkage methods in Agglomerative Hierarchical Cluster Analysis which are Single Linkage and Complete Linkage. The purpose of this study is to compare between Single Linkage and Complete Linkage in Agglomerative Hierarchical Cluster Analysis. The comparison of performances between these linkage methods was shown by using Kruskal-Wallis tes...

  10. A combined multidimensional scaling and hierarchical clustering view for the exploratory analysis of multidimensional data

    Science.gov (United States)

    Craig, Paul; Roa-Seïler, Néna

    2013-01-01

    This paper describes a novel information visualization technique that combines multidimensional scaling and hierarchical clustering to support the exploratory analysis of multidimensional data. The technique displays the results of multidimensional scaling using a scatter plot where the proximity of any two items' representations is approximate to their similarity according to a Euclidean distance metric. The results of hierarchical clustering are overlaid onto this view by drawing smoothed outlines around each nested cluster. The difference in similarity between successive cluster combinations is used to colour code clusters and make stronger natural clusters more prominent in the display. When a cluster or group of items is selected, multidimensional scaling and hierarchical clustering are re-applied to a filtered subset of the data, and animation is used to smooth the transition between successive filtered views. As a case study we demonstrate the technique being used to analyse survey data relating to the appropriateness of different phrases to different emotionally charged situations.

  11. Hierarchical clustering for graph visualization

    CERN Document Server

    Clémençon, Stéphan; Rossi, Fabrice; Tran, Viet Chi

    2012-01-01

    This paper describes a graph visualization methodology based on hierarchical maximal modularity clustering, with interactive and significant coarsening and refining possibilities. An application of this method to HIV epidemic analysis in Cuba is outlined.

  12. Neutrosophic Hierarchical Clustering Algoritms

    Directory of Open Access Journals (Sweden)

    Rıdvan Şahin

    2014-03-01

    Full Text Available Interval neutrosophic set (INS is a generalization of interval valued intuitionistic fuzzy set (IVIFS, whose the membership and non-membership values of elements consist of fuzzy range, while single valued neutrosophic set (SVNS is regarded as extension of intuitionistic fuzzy set (IFS. In this paper, we extend the hierarchical clustering techniques proposed for IFSs and IVIFSs to SVNSs and INSs respectively. Based on the traditional hierarchical clustering procedure, the single valued neutrosophic aggregation operator, and the basic distance measures between SVNSs, we define a single valued neutrosophic hierarchical clustering algorithm for clustering SVNSs. Then we extend the algorithm to classify an interval neutrosophic data. Finally, we present some numerical examples in order to show the effectiveness and availability of the developed clustering algorithms.

  13. Hierarchical Cluster Analysis: Comparison of Three Linkage Measures and Application to Psychological Data

    Directory of Open Access Journals (Sweden)

    Odilia Yim

    2015-02-01

    Full Text Available Cluster analysis refers to a class of data reduction methods used for sorting cases, observations, or variables of a given dataset into homogeneous groups that differ from each other. The present paper focuses on hierarchical agglomerative cluster analysis, a statistical technique where groups are sequentially created by systematically merging similar clusters together, as dictated by the distance and linkage measures chosen by the researcher. Specific distance and linkage measures are reviewed, including a discussion of how these choices can influence the clustering process by comparing three common linkage measures (single linkage, complete linkage, average linkage. The tutorial guides researchers in performing a hierarchical cluster analysis using the SPSS statistical software. Through an example, we demonstrate how cluster analysis can be used to detect meaningful subgroups in a sample of bilinguals by examining various language variables.

  14. Hierarchical cluster analysis of progression patterns in open-angle glaucoma patients with medical treatment.

    Science.gov (United States)

    Bae, Hyoung Won; Rho, Seungsoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2014-04-29

    To classify medically treated open-angle glaucoma (OAG) by the pattern of progression using hierarchical cluster analysis, and to determine OAG progression characteristics by comparing clusters. Ninety-five eyes of 95 OAG patients who received medical treatment, and who had undergone visual field (VF) testing at least once per year for 5 or more years. OAG was classified into subgroups using hierarchical cluster analysis based on the following five variables: baseline mean deviation (MD), baseline visual field index (VFI), MD slope, VFI slope, and Glaucoma Progression Analysis (GPA) printout. After that, other parameters were compared between clusters. Two clusters were made after a hierarchical cluster analysis. Cluster 1 showed -4.06 ± 2.43 dB baseline MD, 92.58% ± 6.27% baseline VFI, -0.28 ± 0.38 dB per year MD slope, -0.52% ± 0.81% per year VFI slope, and all "no progression" cases in GPA printout, whereas cluster 2 showed -8.68 ± 3.81 baseline MD, 77.54 ± 12.98 baseline VFI, -0.72 ± 0.55 MD slope, -2.22 ± 1.89 VFI slope, and seven "possible" and four "likely" progression cases in GPA printout. There were no significant differences in age, sex, mean IOP, central corneal thickness, and axial length between clusters. However, cluster 2 included more high-tension glaucoma patients and used a greater number of antiglaucoma eye drops significantly compared with cluster 1. Hierarchical cluster analysis of progression patterns divided OAG into slow and fast progression groups, evidenced by assessing the parameters of glaucomatous progression in VF testing. In the fast progression group, the prevalence of high-tension glaucoma was greater and the number of antiglaucoma medications administered was increased versus the slow progression group. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.

  15. Periorbital melasma: Hierarchical cluster analysis of clinical features in Asian patients.

    Science.gov (United States)

    Jung, Y S; Bae, J M; Kim, B J; Kang, J-S; Cho, S B

    2017-03-19

    Studies have shown melasma lesions to be distributed across the face in centrofacial, malar, and mandibular patterns. Meanwhile, however, melasma lesions of the periorbital area have yet to be thoroughly described. We analyzed normal and ultraviolet light-exposed photographs of patients with melasma. The periorbital melasma lesions were measured according to anatomical reference points and a hierarchical cluster analysis was performed. The periorbital melasma lesions showed clinical features of fine and homogenous melasma pigmentation, involving both the upper and lower eyelids that extended to other anatomical sites with a darker and coarser appearance. The hierarchical cluster analysis indicated that patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. Significant differences between cluster 1 and cluster 2 were found in lateral distance and inferolateral distance, but not in medial distance and superior distance. Comparing the two clusters, patients in cluster 2 were found to be significantly older and more commonly accompanied by melasma lesions of the temple and medial cheek. Our hierarchical cluster analysis of periorbital melasma lesions demonstrated that Asian patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  16. Intuitionistic fuzzy hierarchical clustering algorithms

    Institute of Scientific and Technical Information of China (English)

    Xu Zeshui

    2009-01-01

    Intuitionistic fuzzy set (IFS) is a set of 2-tuple arguments, each of which is characterized by a mem-bership degree and a nonmembership degree. The generalized form of IFS is interval-valued intuitionistic fuzzy set (IVIFS), whose components are intervals rather than exact numbers. IFSs and IVIFSs have been found to be very useful to describe vagueness and uncertainty. However, it seems that little attention has been focused on the clus-tering analysis of IFSs and IVIFSs. An intuitionistic fuzzy hierarchical algorithm is introduced for clustering IFSs, which is based on the traditional hierarchical clustering procedure, the intuitionistic fuzzy aggregation operator, and the basic distance measures between IFSs: the Hamming distance, normalized Hamming, weighted Hamming, the Euclidean distance, the normalized Euclidean distance, and the weighted Euclidean distance. Subsequently, the algorithm is extended for clustering IVIFSs. Finally the algorithm and its extended form are applied to the classifications of building materials and enterprises respectively.

  17. Mapping informative clusters in a hierarchical [corrected] framework of FMRI multivariate analysis.

    Directory of Open Access Journals (Sweden)

    Rui Xu

    Full Text Available Pattern recognition methods have become increasingly popular in fMRI data analysis, which are powerful in discriminating between multi-voxel patterns of brain activities associated with different mental states. However, when they are used in functional brain mapping, the location of discriminative voxels varies significantly, raising difficulties in interpreting the locus of the effect. Here we proposed a hierarchical framework of multivariate approach that maps informative clusters rather than voxels to achieve reliable functional brain mapping without compromising the discriminative power. In particular, we first searched for local homogeneous clusters that consisted of voxels with similar response profiles. Then, a multi-voxel classifier was built for each cluster to extract discriminative information from the multi-voxel patterns. Finally, through multivariate ranking, outputs from the classifiers were served as a multi-cluster pattern to identify informative clusters by examining interactions among clusters. Results from both simulated and real fMRI data demonstrated that this hierarchical approach showed better performance in the robustness of functional brain mapping than traditional voxel-based multivariate methods. In addition, the mapped clusters were highly overlapped for two perceptually equivalent object categories, further confirming the validity of our approach. In short, the hierarchical framework of multivariate approach is suitable for both pattern classification and brain mapping in fMRI studies.

  18. Analysis of genomic signatures in prokaryotes using multinomial regression and hierarchical clustering

    DEFF Research Database (Denmark)

    Ussery, David; Bohlin, Jon; Skjerve, Eystein

    2009-01-01

    Recently there has been an explosion in the availability of bacterial genomic sequences, making possible now an analysis of genomic signatures across more than 800 hundred different bacterial chromosomes, from a wide variety of environments. Using genomic signatures, we pair-wise compared 867...... different genomic DNA sequences, taken from chromosomes and plasmids more than 100,000 base-pairs in length. Hierarchical clustering was performed on the outcome of the comparisons before a multinomial regression model was fitted. The regression model included the cluster groups as the response variable...... AT content. Small improvements to the regression model, although significant, were also obtained by factors such as sequence size, habitat, growth temperature, selective pressure measured as oligonucleotide usage variance, and oxygen requirement.The statistics obtained using hierarchical clustering...

  19. Applying of hierarchical clustering to analysis of protein patterns in the human cancer-associated liver.

    Directory of Open Access Journals (Sweden)

    Natalia A Petushkova

    Full Text Available There are two ways that statistical methods can learn from biomedical data. One way is to learn classifiers to identify diseases and to predict outcomes using the training dataset with established diagnosis for each sample. When the training dataset is not available the task can be to mine for presence of meaningful groups (clusters of samples and to explore underlying data structure (unsupervised learning.We investigated the proteomic profiles of the cytosolic fraction of human liver samples using two-dimensional electrophoresis (2DE. Samples were resected upon surgical treatment of hepatic metastases in colorectal cancer. Unsupervised hierarchical clustering of 2DE gel images (n = 18 revealed a pair of clusters, containing 11 and 7 samples. Previously we used the same specimens to measure biochemical profiles based on cytochrome P450-dependent enzymatic activities and also found that samples were clearly divided into two well-separated groups by cluster analysis. It turned out that groups by enzyme activity almost perfectly match to the groups identified from proteomic data. Of the 271 reproducible spots on our 2DE gels, we selected 15 to distinguish the human liver cytosolic clusters. Using MALDI-TOF peptide mass fingerprinting, we identified 12 proteins for the selected spots, including known cancer-associated species.Our results highlight the importance of hierarchical cluster analysis of proteomic data, and showed concordance between results of biochemical and proteomic approaches. Grouping of the human liver samples and/or patients into differing clusters may provide insights into possible molecular mechanism of drug metabolism and creates a rationale for personalized treatment.

  20. Diversity of Xiphinema americanum-group Species and Hierarchical Cluster Analysis of Morphometrics.

    Science.gov (United States)

    Lamberti, F; Ciancio, A

    1993-09-01

    Of the 39 species composing the Xiphinema americanum group, 14 were described originally from North America and two others have been reported from this region. Many species are very similar morphologically and can be distinguished only by a difficult comparison of various combinations of some morphometric characters. Study of morphometrics of 49 populations, including the type populations of the 39 species attributed to this group, by principal component analysis and hierarchical cluster analysis placed the populations into five subgroups, proposed here as the X. brevicolle subgroup (seven species), the X. americanum subgroup (17 species), the X. taylori subgroup (two species), the X. pachtaicum subgroup (eight species), and the X. lambertii subgroup (five species).

  1. [Study of the clinical phenotype of symptomatic chronic airways disease by hierarchical cluster analysis and two-step cluster analyses].

    Science.gov (United States)

    Ning, P; Guo, Y F; Sun, T Y; Zhang, H S; Chai, D; Li, X M

    2016-09-01

    To study the distinct clinical phenotype of chronic airway diseases by hierarchical cluster analysis and two-step cluster analysis. A population sample of adult patients in Donghuamen community, Dongcheng district and Qinghe community, Haidian district, Beijing from April 2012 to January 2015, who had wheeze within the last 12 months, underwent detailed investigation, including a clinical questionnaire, pulmonary function tests, total serum IgE levels, blood eosinophil level and a peak flow diary. Nine variables were chosen as evaluating parameters, including pre-salbutamol forced expired volume in one second(FEV1)/forced vital capacity(FVC) ratio, pre-salbutamol FEV1, percentage of post-salbutamol change in FEV1, residual capacity, diffusing capacity of the lung for carbon monoxide/alveolar volume adjusted for haemoglobin level, peak expiratory flow(PEF) variability, serum IgE level, cumulative tobacco cigarette consumption (pack-years) and respiratory symptoms (cough and expectoration). Subjects' different clinical phenotype by hierarchical cluster analysis and two-step cluster analysis was identified. (1) Four clusters were identified by hierarchical cluster analysis. Cluster 1 was chronic bronchitis in smokers with normal pulmonary function. Cluster 2 was chronic bronchitis or mild chronic obstructive pulmonary disease (COPD) patients with mild airflow limitation. Cluster 3 included COPD patients with heavy smoking, poor quality of life and severe airflow limitation. Cluster 4 recognized atopic patients with mild airflow limitation, elevated serum IgE and clinical features of asthma. Significant differences were revealed regarding pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, maximal mid-expiratory flow curve(MMEF)% pred, carbon monoxide diffusing capacity per liter of alveolar(DLCO)/(VA)% pred, residual volume(RV)% pred, total serum IgE level, smoking history (pack-years), St.George's respiratory questionnaire

  2. Water quality assessment with hierarchical cluster analysis based on Mahalanobis distance.

    Science.gov (United States)

    Du, Xiangjun; Shao, Fengjing; Wu, Shunyao; Zhang, Hanlin; Xu, Si

    2017-07-01

    Water quality assessment is crucial for assessment of marine eutrophication, prediction of harmful algal blooms, and environment protection. Previous studies have developed many numeric modeling methods and data driven approaches for water quality assessment. The cluster analysis, an approach widely used for grouping data, has also been employed. However, there are complex correlations between water quality variables, which play important roles in water quality assessment but have always been overlooked. In this paper, we analyze correlations between water quality variables and propose an alternative method for water quality assessment with hierarchical cluster analysis based on Mahalanobis distance. Further, we cluster water quality data collected form coastal water of Bohai Sea and North Yellow Sea of China, and apply clustering results to evaluate its water quality. To evaluate the validity, we also cluster the water quality data with cluster analysis based on Euclidean distance, which are widely adopted by previous studies. The results show that our method is more suitable for water quality assessment with many correlated water quality variables. To our knowledge, it is the first attempt to apply Mahalanobis distance for coastal water quality assessment.

  3. Validation of hierarchical cluster analysis for identification of bacterial species using 42 bacterial isolates

    Science.gov (United States)

    Ghebremedhin, Meron; Yesupriya, Shubha; Luka, Janos; Crane, Nicole J.

    2015-03-01

    Recent studies have demonstrated the potential advantages of the use of Raman spectroscopy in the biomedical field due to its rapidity and noninvasive nature. In this study, Raman spectroscopy is applied as a method for differentiating between bacteria isolates for Gram status and Genus species. We created models for identifying 28 bacterial isolates using spectra collected with a 785 nm laser excitation Raman spectroscopic system. In order to investigate the groupings of these samples, partial least squares discriminant analysis (PLSDA) and hierarchical cluster analysis (HCA) was implemented. In addition, cluster analyses of the isolates were performed using various data types consisting of, biochemical tests, gene sequence alignment, high resolution melt (HRM) analysis and antimicrobial susceptibility tests of minimum inhibitory concentration (MIC) and degree of antimicrobial resistance (SIR). In order to evaluate the ability of these models to correctly classify bacterial isolates using solely Raman spectroscopic data, a set of 14 validation samples were tested using the PLSDA models and consequently the HCA models. External cluster evaluation criteria of purity and Rand index were calculated at different taxonomic levels to compare the performance of clustering using Raman spectra as well as the other datasets. Results showed that Raman spectra performed comparably, and in some cases better than, the other data types with Rand index and purity values up to 0.933 and 0.947, respectively. This study clearly demonstrates that the discrimination of bacterial species using Raman spectroscopic data and hierarchical cluster analysis is possible and has the potential to be a powerful point-of-care tool in clinical settings.

  4. A hierarchical cluster analysis of normal-tension glaucoma using spectral-domain optical coherence tomography parameters.

    Science.gov (United States)

    Bae, Hyoung Won; Ji, Yongwoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2015-01-01

    Normal-tension glaucoma (NTG) is a heterogenous disease, and there is still controversy about subclassifications of this disorder. On the basis of spectral-domain optical coherence tomography (SD-OCT), we subdivided NTG with hierarchical cluster analysis using optic nerve head (ONH) parameters and retinal nerve fiber layer (RNFL) thicknesses. A total of 200 eyes of 200 NTG patients between March 2011 and June 2012 underwent SD-OCT scans to measure ONH parameters and RNFL thicknesses. We classified NTG into homogenous subgroups based on these variables using a hierarchical cluster analysis, and compared clusters to evaluate diverse NTG characteristics. Three clusters were found after hierarchical cluster analysis. Cluster 1 (62 eyes) had the thickest RNFL and widest rim area, and showed early glaucoma features. Cluster 2 (60 eyes) was characterized by the largest cup/disc ratio and cup volume, and showed advanced glaucomatous damage. Cluster 3 (78 eyes) had small disc areas in SD-OCT and were comprised of patients with significantly younger age, longer axial length, and greater myopia than the other 2 groups. A hierarchical cluster analysis of SD-OCT scans divided NTG patients into 3 groups based upon ONH parameters and RNFL thicknesses. It is anticipated that the small disc area group comprised of younger and more myopic patients may show unique features unlike the other 2 groups.

  5. Microglia Morphological Categorization in a Rat Model of Neuroinflammation by Hierarchical Cluster and Principal Components Analysis

    Science.gov (United States)

    Fernández-Arjona, María del Mar; Grondona, Jesús M.; Granados-Durán, Pablo; Fernández-Llebrez, Pedro; López-Ávalos, María D.

    2017-01-01

    It is known that microglia morphology and function are closely related, but only few studies have objectively described different morphological subtypes. To address this issue, morphological parameters of microglial cells were analyzed in a rat model of aseptic neuroinflammation. After the injection of a single dose of the enzyme neuraminidase (NA) within the lateral ventricle (LV) an acute inflammatory process occurs. Sections from NA-injected animals and sham controls were immunolabeled with the microglial marker IBA1, which highlights ramifications and features of the cell shape. Using images obtained by section scanning, individual microglial cells were sampled from various regions (septofimbrial nucleus, hippocampus and hypothalamus) at different times post-injection (2, 4 and 12 h). Each cell yielded a set of 15 morphological parameters by means of image analysis software. Five initial parameters (including fractal measures) were statistically different in cells from NA-injected rats (most of them IL-1β positive, i.e., M1-state) compared to those from control animals (none of them IL-1β positive, i.e., surveillant state). However, additional multimodal parameters were revealed more suitable for hierarchical cluster analysis (HCA). This method pointed out the classification of microglia population in four clusters. Furthermore, a linear discriminant analysis (LDA) suggested three specific parameters to objectively classify any microglia by a decision tree. In addition, a principal components analysis (PCA) revealed two extra valuable variables that allowed to further classifying microglia in a total of eight sub-clusters or types. The spatio-temporal distribution of these different morphotypes in our rat inflammation model allowed to relate specific morphotypes with microglial activation status and brain location. An objective method for microglia classification based on morphological parameters is proposed. Main points Microglia undergo a quantifiable

  6. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    Directory of Open Access Journals (Sweden)

    I. Crawford

    2015-07-01

    Full Text Available In this paper we present improved methods for discriminating and quantifying Primary Biological Aerosol Particles (PBAP by applying hierarchical agglomerative cluster analysis to multi-parameter ultra violet-light induced fluorescence (UV-LIF spectrometer data. The methods employed in this study can be applied to data sets in excess of 1×106 points on a desktop computer, allowing for each fluorescent particle in a dataset to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient dataset. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4 where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best performing methods were applied to the BEACHON-RoMBAS ambient dataset where it was found that the z-score and range normalisation methods yield similar results with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misatrribution

  7. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    Directory of Open Access Journals (Sweden)

    I. Crawford

    2015-11-01

    Full Text Available In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4 where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio–hydro–atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen–Rocky Mountain Biogenic Aerosol Study ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the

  8. Hierarchical Formation of Galactic Clusters

    CERN Document Server

    Elmegreen, B G

    2006-01-01

    Young stellar groupings and clusters have hierarchical patterns ranging from flocculent spiral arms and star complexes on the largest scale to OB associations, OB subgroups, small loose groups, clusters and cluster subclumps on the smallest scales. There is no obvious transition in morphology at the cluster boundary, suggesting that clusters are only the inner parts of the hierarchy where stars have had enough time to mix. The power-law cluster mass function follows from this hierarchical structure: n(M_cl) M_cl^-b for b~2. This value of b is independently required by the observation that the summed IMFs from many clusters in a galaxy equals approximately the IMF of each cluster.

  9. Hierarchical cluster-tendency analysis of the group structure in the foreign exchange market

    Science.gov (United States)

    Wu, Xin-Ye; Zheng, Zhi-Gang

    2013-08-01

    A hierarchical cluster-tendency (HCT) method in analyzing the group structure of networks of the global foreign exchange (FX) market is proposed by combining the advantages of both the minimal spanning tree (MST) and the hierarchical tree (HT). Fifty currencies of the top 50 World GDP in 2010 according to World Bank's database are chosen as the underlying system. By using the HCT method, all nodes in the FX market network can be "colored" and distinguished. We reveal that the FX networks can be divided into two groups, i.e., the Asia-Pacific group and the Pan-European group. The results given by the hierarchical cluster-tendency method agree well with the formerly observed geographical aggregation behavior in the FX market. Moreover, an oil-resource aggregation phenomenon is discovered by using our method. We find that gold could be a better numeraire for the weekly-frequency FX data.

  10. [The hierarchical clustering analysis of hyperspectral image based on probabilistic latent semantic analysis].

    Science.gov (United States)

    Yi, Wen-Bin; Shen, Li; Qi, Yin-Feng; Tang, Hong

    2011-09-01

    The paper introduces the Probabilistic Latent Semantic Analysis (PLSA) to the image clustering and an effective image clustering algorithm using the semantic information from PLSA is proposed which is used for hyperspectral images. Firstly, the ISODATA algorithm is used to obtain the initial clustering result of hyperspectral image and the clusters of the initial clustering result are considered as the visual words of the PLSA. Secondly, the object-oriented image segmentation algorithm is used to partition the hyperspectral image and segments with relatively pure pixels are regarded as documents in PLSA. Thirdly, a variety of identification methods which can estimate the best number of cluster centers is combined to get the number of latent semantic topics. Then the conditional distributions of visual words in topics and the mixtures of topics in different documents are estimated by using PLSA. Finally, the conditional probabilistic of latent semantic topics are distinguished using statistical pattern recognition method, the topic type for each visual in each document will be given and the clustering result of hyperspectral image are then achieved. Experimental results show the clusters of the proposed algorithm are better than K-MEANS and ISODATA in terms of object-oriented property and the clustering result is closer to the distribution of real spatial distribution of surface.

  11. Multichannel biomedical time series clustering via hierarchical probabilistic latent semantic analysis.

    Science.gov (United States)

    Wang, Jin; Sun, Xiangping; Nahavandi, Saeid; Kouzani, Abbas; Wu, Yuchuan; She, Mary

    2014-11-01

    Biomedical time series clustering that automatically groups a collection of time series according to their internal similarity is of importance for medical record management and inspection such as bio-signals archiving and retrieval. In this paper, a novel framework that automatically groups a set of unlabelled multichannel biomedical time series according to their internal structural similarity is proposed. Specifically, we treat a multichannel biomedical time series as a document and extract local segments from the time series as words. We extend a topic model, i.e., the Hierarchical probabilistic Latent Semantic Analysis (H-pLSA), which was originally developed for visual motion analysis to cluster a set of unlabelled multichannel time series. The H-pLSA models each channel of the multichannel time series using a local pLSA in the first layer. The topics learned in the local pLSA are then fed to a global pLSA in the second layer to discover the categories of multichannel time series. Experiments on a dataset extracted from multichannel Electrocardiography (ECG) signals demonstrate that the proposed method performs better than previous state-of-the-art approaches and is relatively robust to the variations of parameters including length of local segments and dictionary size. Although the experimental evaluation used the multichannel ECG signals in a biometric scenario, the proposed algorithm is a universal framework for multichannel biomedical time series clustering according to their structural similarity, which has many applications in biomedical time series management. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  12. Convex Clustering: An Attractive Alternative to Hierarchical Clustering

    Science.gov (United States)

    Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth

    2015-01-01

    The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340

  13. Ingredients and Process Standardization of Thepla: An Indian Unleavened Vegetable Flatbread using Hierarchical Cluster Analysis

    Directory of Open Access Journals (Sweden)

    S.S. Arya

    2012-10-01

    Full Text Available Thepla is an Indian unleavened flatbread made from whole-wheat flour with added spices and vegetables. It is particularly consumed in western zone of the India. The preparation of thepla is tedious, time consuming and requires skill. In the present study standardization of thepla ingredients were carried out by standardizing each ingredient on the basis of Overall Acceptability (OA score. Sensory analysis was carried out using nine-point hedonic rating scale with ten trained panellists. Standardized ingredients of thepla were: salt 3%, red chili powder 2.5%, fenugreek leaves 12%, cumin seed powder 0.6%, coriander seed powder 0.6%, ginger garlic paste (1:1 6%, asafoetida 0.6% and oil 3% w/w of whole wheat flour on the basis of highest sensory OA score. Further thepla process parameters such as time, temperature, diameter of thepla and weight of dough were standardized on the basis of sensory OA score. Obtained sensory score data was processed for Hierarchical Cluster Analysis (HCA.

  14. Hierarchical Clustering and Active Galaxies

    CERN Document Server

    Hatziminaoglou, E; Manrique, A

    2000-01-01

    The growth of Super Massive Black Holes and the parallel development of activity in galactic nuclei are implemented in an analytic code of hierarchical clustering. The evolution of the luminosity function of quasars and AGN will be computed with special attention paid to the connection between quasars and Seyfert galaxies. One of the major interests of the model is the parallel study of quasar formation and evolution and the History of Star Formation.

  15. Hesitant fuzzy agglomerative hierarchical clustering algorithms

    Science.gov (United States)

    Zhang, Xiaolu; Xu, Zeshui

    2015-02-01

    Recently, hesitant fuzzy sets (HFSs) have been studied by many researchers as a powerful tool to describe and deal with uncertain data, but relatively, very few studies focus on the clustering analysis of HFSs. In this paper, we propose a novel hesitant fuzzy agglomerative hierarchical clustering algorithm for HFSs. The algorithm considers each of the given HFSs as a unique cluster in the first stage, and then compares each pair of the HFSs by utilising the weighted Hamming distance or the weighted Euclidean distance. The two clusters with smaller distance are jointed. The procedure is then repeated time and again until the desirable number of clusters is achieved. Moreover, we extend the algorithm to cluster the interval-valued hesitant fuzzy sets, and finally illustrate the effectiveness of our clustering algorithms by experimental results.

  16. Taxonomy of Manufacturing Flexibility at Manufacturing Companies Using Imperialist Competitive Algorithms, Support Vector Machines and Hierarchical Cluster Analysis

    Directory of Open Access Journals (Sweden)

    M. Khoobiyan

    2017-04-01

    Full Text Available Manufacturing flexibility is a multidimensional concept and manufacturing companies act differently in using these dimensions. The purpose of this study is to investigate taxonomy and identify dominant groups of manufacturing flexibility. Dimensions of manufacturing flexibility are extracted by content analysis of literature and expert judgements. Manufacturing flexibility was measured by using a questionnaire developed to survey managers of manufacturing companies. The sample size was set at 379. To identify dominant groups of flexibility based on dimensions of flexibility determined, Hierarchical Cluster Analysis (HCA, Imperialist Competitive Algorithms (ICAs and Support Vector Machines (SVMs were used by cluster validity indices. The best algorithm for clustering was SVMs with three clusters, designated as leading delivery-based flexibility, frugal flexibility and sufficient plan-based flexibility.

  17. Galaxy formation through hierarchical clustering

    Science.gov (United States)

    White, Simon D. M.; Frenk, Carlos S.

    1991-01-01

    Analytic methods for studying the formation of galaxies by gas condensation within massive dark halos are presented. The present scheme applies to cosmogonies where structure grows through hierarchical clustering of a mixture of gas and dissipationless dark matter. The simplest models consistent with the current understanding of N-body work on dissipationless clustering, and that of numerical and analytic work on gas evolution and cooling are adopted. Standard models for the evolution of the stellar population are also employed, and new models for the way star formation heats and enriches the surrounding gas are constructed. Detailed results are presented for a cold dark matter universe with Omega = 1 and H(0) = 50 km/s/Mpc, but the present methods are applicable to other models. The present luminosity functions contain significantly more faint galaxies than are observed.

  18. Hierarchical clustering using correlation metric and spatial continuity constraint

    Science.gov (United States)

    Stork, Christopher L.; Brewer, Luke N.

    2012-10-02

    Large data sets are analyzed by hierarchical clustering using correlation as a similarity measure. This provides results that are superior to those obtained using a Euclidean distance similarity measure. A spatial continuity constraint may be applied in hierarchical clustering analysis of images.

  19. Analysis of genetic association in Listeria and Diabetes using Hierarchical Clustering and Silhouette Index

    Science.gov (United States)

    Pagnuco, Inti A.; Pastore, Juan I.; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L.

    2016-04-01

    It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, where significative groups of genes are defined based on some criteria. This task is usually performed by clustering algorithms, where the whole family of genes, or a subset of them, are clustered into meaningful groups based on their expression values in a set of experiment. In this work we used a methodology based on the Silhouette index as a measure of cluster quality for individual gene groups, and a combination of several variants of hierarchical clustering to generate the candidate groups, to obtain sets of co-expressed genes for two real data examples. We analyzed the quality of the best ranked groups, obtained by the algorithm, using an online bioinformatics tool that provides network information for the selected genes. Moreover, to verify the performance of the algorithm, considering the fact that it doesn’t find all possible subsets, we compared its results against a full search, to determine the amount of good co-regulated sets not detected.

  20. Symptom Clusters in People Living with HIV Attending Five Palliative Care Facilities in Two Sub-Saharan African Countries: A Hierarchical Cluster Analysis.

    Science.gov (United States)

    Moens, Katrien; Siegert, Richard J; Taylor, Steve; Namisango, Eve; Harding, Richard

    2015-01-01

    Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries. Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward's method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores. Among the sample (N=217) the mean age was 36.5 (SD 9.0), 73.2% were female, and 49.1% were on antiretroviral therapy (ART). The cluster analysis produced five symptom clusters identified as: 1) dermatological; 2) generalised anxiety and elimination; 3) social and image; 4) persistently present; and 5) a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, pclusters two and three, p=0.012); global distress (F=26.8, pcluster one, best for cluster four). The greatest burden is associated with cluster one, and should be prioritised in clinical management. Further symptom cluster research in people living with HIV with longitudinally collected symptom data to test cluster stability and identify common symptom trajectories is recommended.

  1. A Resting-State Brain Functional Network Study in MDD Based on Minimum Spanning Tree Analysis and the Hierarchical Clustering

    Directory of Open Access Journals (Sweden)

    Xiaowei Li

    2017-01-01

    Full Text Available A large number of studies demonstrated that major depressive disorder (MDD is characterized by the alterations in brain functional connections which is also identifiable during the brain’s “resting-state.” But, in the present study, the approach of constructing functional connectivity is often biased by the choice of the threshold. Besides, more attention was paid to the number and length of links in brain networks, and the clustering partitioning of nodes was unclear. Therefore, minimum spanning tree (MST analysis and the hierarchical clustering were first used for the depression disease in this study. Resting-state electroencephalogram (EEG sources were assessed from 15 healthy and 23 major depressive subjects. Then the coherence, MST, and the hierarchical clustering were obtained. In the theta band, coherence analysis showed that the EEG coherence of the MDD patients was significantly higher than that of the healthy controls especially in the left temporal region. The MST results indicated the higher leaf fraction in the depressed group. Compared with the normal group, the major depressive patients lost clustering in frontal regions. Our findings suggested that there was a stronger brain interaction in the MDD group and a left-right functional imbalance in the frontal regions for MDD controls.

  2. Principal component analysis vs. self-organizing maps combined with hierarchical clustering for pattern recognition in volcano seismic spectra

    Science.gov (United States)

    Unglert, K.; Radić, V.; Jellinek, A. M.

    2016-06-01

    Variations in the spectral content of volcano seismicity related to changes in volcanic activity are commonly identified manually in spectrograms. However, long time series of monitoring data at volcano observatories require tools to facilitate automated and rapid processing. Techniques such as self-organizing maps (SOM) and principal component analysis (PCA) can help to quickly and automatically identify important patterns related to impending eruptions. For the first time, we evaluate the performance of SOM and PCA on synthetic volcano seismic spectra constructed from observations during two well-studied eruptions at Klauea Volcano, Hawai'i, that include features observed in many volcanic settings. In particular, our objective is to test which of the techniques can best retrieve a set of three spectral patterns that we used to compose a synthetic spectrogram. We find that, without a priori knowledge of the given set of patterns, neither SOM nor PCA can directly recover the spectra. We thus test hierarchical clustering, a commonly used method, to investigate whether clustering in the space of the principal components and on the SOM, respectively, can retrieve the known patterns. Our clustering method applied to the SOM fails to detect the correct number and shape of the known input spectra. In contrast, clustering of the data reconstructed by the first three PCA modes reproduces these patterns and their occurrence in time more consistently. This result suggests that PCA in combination with hierarchical clustering is a powerful practical tool for automated identification of characteristic patterns in volcano seismic spectra. Our results indicate that, in contrast to PCA, common clustering algorithms may not be ideal to group patterns on the SOM and that it is crucial to evaluate the performance of these tools on a control dataset prior to their application to real data.

  3. PERFORMANCE OF SELECTED AGGLOMERATIVE HIERARCHICAL CLUSTERING METHODS

    Directory of Open Access Journals (Sweden)

    Nusa Erman

    2015-01-01

    Full Text Available A broad variety of different methods of agglomerative hierarchical clustering brings along problems how to choose the most appropriate method for the given data. It is well known that some methods outperform others if the analysed data have a specific structure. In the presented study we have observed the behaviour of the centroid, the median (Gower median method, and the average method (unweighted pair-group method with arithmetic mean – UPGMA; average linkage between groups. We have compared them with mostly used methods of hierarchical clustering: the minimum (single linkage clustering, the maximum (complete linkage clustering, the Ward, and the McQuitty (groups method average, weighted pair-group method using arithmetic averages - WPGMA methods. We have applied the comparison of these methods on spherical, ellipsoid, umbrella-like, “core-and-sphere”, ring-like and intertwined three-dimensional data structures. To generate the data and execute the analysis, we have used R statistical software. Results show that all seven methods are successful in finding compact, ball-shaped or ellipsoid structures when they are enough separated. Conversely, all methods except the minimum perform poor on non-homogenous, irregular and elongated ones. Especially challenging is a circular double helix structure; it is being correctly revealed only by the minimum method. We can also confirm formerly published results of other simulation studies, which usually favour average method (besides Ward method in cases when data is assumed to be fairly compact and well separated.

  4. Symptom Clusters in People Living with HIV Attending Five Palliative Care Facilities in Two Sub-Saharan African Countries: A Hierarchical Cluster Analysis.

    Directory of Open Access Journals (Sweden)

    Katrien Moens

    Full Text Available Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries.Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward's method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores.Among the sample (N=217 the mean age was 36.5 (SD 9.0, 73.2% were female, and 49.1% were on antiretroviral therapy (ART. The cluster analysis produced five symptom clusters identified as: 1 dermatological; 2 generalised anxiety and elimination; 3 social and image; 4 persistently present; and 5 a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, p<0.001; being on ART (highest proportions for clusters two and three, p=0.012; global distress (F=26.8, p<0.001, physical distress (F=36.3, p<0.001 and psychological distress subscale (F=21.8, p<0.001 (all subscales worst for cluster one, best for cluster four.The greatest burden is associated with cluster one, and should be prioritised in clinical management. Further symptom cluster research in people living with HIV with longitudinally collected symptom data to

  5. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda.

    Science.gov (United States)

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-03-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards' method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. © The Author 2015

  6. Investigating the effects of climate variations on bacillary dysentery incidence in northeast China using ridge regression and hierarchical cluster analysis

    Directory of Open Access Journals (Sweden)

    Guo Junqiao

    2008-09-01

    Full Text Available Abstract Background The effects of climate variations on bacillary dysentery incidence have gained more recent concern. However, the multi-collinearity among meteorological factors affects the accuracy of correlation with bacillary dysentery incidence. Methods As a remedy, a modified method to combine ridge regression and hierarchical cluster analysis was proposed for investigating the effects of climate variations on bacillary dysentery incidence in northeast China. Results All weather indicators, temperatures, precipitation, evaporation and relative humidity have shown positive correlation with the monthly incidence of bacillary dysentery, while air pressure had a negative correlation with the incidence. Ridge regression and hierarchical cluster analysis showed that during 1987–1996, relative humidity, temperatures and air pressure affected the transmission of the bacillary dysentery. During this period, all meteorological factors were divided into three categories. Relative humidity and precipitation belonged to one class, temperature indexes and evaporation belonged to another class, and air pressure was the third class. Conclusion Meteorological factors have affected the transmission of bacillary dysentery in northeast China. Bacillary dysentery prevention and control would benefit from by giving more consideration to local climate variations.

  7. Cluster analysis for applications

    CERN Document Server

    Anderberg, Michael R

    1973-01-01

    Cluster Analysis for Applications deals with methods and various applications of cluster analysis. Topics covered range from variables and scales to measures of association among variables and among data units. Conceptual problems in cluster analysis are discussed, along with hierarchical and non-hierarchical clustering methods. The necessary elements of data analysis, statistics, cluster analysis, and computer implementation are integrated vertically to cover the complete path from raw data to a finished analysis.Comprised of 10 chapters, this book begins with an introduction to the subject o

  8. Comparison of multianalyte proficiency test results by sum of ranking differences, principal component analysis, and hierarchical cluster analysis.

    Science.gov (United States)

    Škrbić, Biljana; Héberger, Károly; Durišić-Mladenović, Nataša

    2013-10-01

    Sum of ranking differences (SRD) was applied for comparing multianalyte results obtained by several analytical methods used in one or in different laboratories, i.e., for ranking the overall performances of the methods (or laboratories) in simultaneous determination of the same set of analytes. The data sets for testing of the SRD applicability contained the results reported during one of the proficiency tests (PTs) organized by EU Reference Laboratory for Polycyclic Aromatic Hydrocarbons (EU-RL-PAH). In this way, the SRD was also tested as a discriminant method alternative to existing average performance scores used to compare mutlianalyte PT results. SRD should be used along with the z scores--the most commonly used PT performance statistics. SRD was further developed to handle the same rankings (ties) among laboratories. Two benchmark concentration series were selected as reference: (a) the assigned PAH concentrations (determined precisely beforehand by the EU-RL-PAH) and (b) the averages of all individual PAH concentrations determined by each laboratory. Ranking relative to the assigned values and also to the average (or median) values pointed to the laboratories with the most extreme results, as well as revealed groups of laboratories with similar overall performances. SRD reveals differences between methods or laboratories even if classical test(s) cannot. The ranking was validated using comparison of ranks by random numbers (a randomization test) and using seven folds cross-validation, which highlighted the similarities among the (methods used in) laboratories. Principal component analysis and hierarchical cluster analysis justified the findings based on SRD ranking/grouping. If the PAH-concentrations are row-scaled, (i.e., z scores are analyzed as input for ranking) SRD can still be used for checking the normality of errors. Moreover, cross-validation of SRD on z scores groups the laboratories similarly. The SRD technique is general in nature, i.e., it can

  9. Assembling hierarchical cluster solids with atomic precision.

    Science.gov (United States)

    Turkiewicz, Ari; Paley, Daniel W; Besara, Tiglet; Elbaz, Giselle; Pinkard, Andrew; Siegrist, Theo; Roy, Xavier

    2014-11-12

    Hierarchical solids created from the binary assembly of cobalt chalcogenide and iron oxide molecular clusters are reported. Six different molecular clusters based on the octahedral Co6E8 (E = Se or Te) and the expanded cubane Fe8O4 units are used as superatomic building blocks to construct these crystals. The formation of the solid is driven by the transfer of charge between complementary electron-donating and electron-accepting clusters in solution that crystallize as binary ionic compounds. The hierarchical structures are investigated by single-crystal X-ray diffraction, providing atomic and superatomic resolution. We report two different superstructures: a superatomic relative of the CsCl lattice type and an unusual packing arrangement based on the double-hexagonal close-packed lattice. Within these superstructures, we demonstrate various compositions and orientations of the clusters.

  10. A New Metrics for Hierarchical Clustering

    Institute of Scientific and Technical Information of China (English)

    YANGGuangwen; SHIShuming; WANGDingxing

    2003-01-01

    Hierarchical clustering is a popular method of performing unsupervised learning. Some metric must be used to determine the similarity between pairs of clusters in hierarchical clustering. Traditional similarity metrics either can deal with simple shapes (i.e. spherical shapes) only or are very sensitive to outliers (the chaining effect). The main contribution of this paper is to propose some potential-based similarity metrics (APES and AMAPES) between clusters in hierarchical clustering, inspired by the concepts of the electric potential and the gravitational potential in electromagnetics and astronomy. The main features of these metrics are: the first, they have strong antijamming capability; the second, they are capable of finding clusters of different shapes such as spherical, spiral, chain, circle, sigmoid, U shape or other complex irregular shapes; the third, existing algorithms and research fruits for classical metrics can be adopted to deal with these new potential-based metrics with no or little modification. Experiments showed that the new metrics are more superior to traditional ones. Different potential functions are compared, and the sensitivity to parameters is also analyzed in this paper.

  11. Managing Clustered Data Using Hierarchical Linear Modeling

    Science.gov (United States)

    Warne, Russell T.; Li, Yan; McKyer, E. Lisako J.; Condie, Rachel; Diep, Cassandra S.; Murano, Peter S.

    2012-01-01

    Researchers in nutrition research often use cluster or multistage sampling to gather participants for their studies. These sampling methods often produce violations of the assumption of data independence that most traditional statistics share. Hierarchical linear modeling is a statistical method that can overcome violations of the independence…

  12. Managing Clustered Data Using Hierarchical Linear Modeling

    Science.gov (United States)

    Warne, Russell T.; Li, Yan; McKyer, E. Lisako J.; Condie, Rachel; Diep, Cassandra S.; Murano, Peter S.

    2012-01-01

    Researchers in nutrition research often use cluster or multistage sampling to gather participants for their studies. These sampling methods often produce violations of the assumption of data independence that most traditional statistics share. Hierarchical linear modeling is a statistical method that can overcome violations of the independence…

  13. Hierarchical cluster analysis of labour market regulations and population health: a taxonomy of low- and middle-income countries

    Directory of Open Access Journals (Sweden)

    Muntaner Carles

    2012-04-01

    Full Text Available Abstract Background An important contribution of the social determinants of health perspective has been to inquire about non-medical determinants of population health. Among these, labour market regulations are of vital significance. In this study, we investigate the labour market regulations among low- and middle-income countries (LMICs and propose a labour market taxonomy to further understand population health in a global context. Methods Using Gross National Product per capita, we classify 113 countries into either low-income (n = 71 or middle-income (n = 42 strata. Principal component analysis of three standardized indicators of labour market inequality and poverty is used to construct 2 factor scores. Factor score reliability is evaluated with Cronbach's alpha. Using these scores, we conduct a hierarchical cluster analysis to produce a labour market taxonomy, conduct zero-order correlations, and create box plots to test their associations with adult mortality, healthy life expectancy, infant mortality, maternal mortality, neonatal mortality, under-5 mortality, and years of life lost to communicable and non-communicable diseases. Labour market and health data are retrieved from the International Labour Organization's Key Indicators of Labour Markets and World Health Organization's Statistical Information System. Results Six labour market clusters emerged: Residual (n = 16, Emerging (n = 16, Informal (n = 10, Post-Communist (n = 18, Less Successful Informal (n = 22, and Insecure (n = 31. Primary findings indicate: (i labour market poverty and population health is correlated in both LMICs; (ii association between labour market inequality and health indicators is significant only in low-income countries; (iii Emerging (e.g., East Asian and Eastern European countries and Insecure (e.g., sub-Saharan African nations clusters are the most advantaged and disadvantaged, respectively, with the remaining clusters experiencing levels of population

  14. Robust Pseudo-Hierarchical Support Vector Clustering

    DEFF Research Database (Denmark)

    Hansen, Michael Sass; Sjöstrand, Karl; Olafsdóttir, Hildur

    2007-01-01

    Support vector clustering (SVC) has proven an efficient algorithm for clustering of noisy and high-dimensional data sets, with applications within many fields of research. An inherent problem, however, has been setting the parameters of the SVC algorithm. Using the recent emergence of a method...... for calculating the entire regularization path of the support vector domain description, we propose a fast method for robust pseudo-hierarchical support vector clustering (HSVC). The method is demonstrated to work well on generated data, as well as for detecting ischemic segments from multidimensional myocardial...

  15. Typing of unknown microorganisms based on quantitative analysis of fatty acids by mass spectrometry and hierarchical clustering

    Energy Technology Data Exchange (ETDEWEB)

    Li Tingting; Dai Ling; Li Lun; Hu Xuejiao; Dong Linjie; Li Jianjian; Salim, Sule Khalfan; Fu Jieying [Key Laboratory of Pesticides and Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, Hubei 430079 (China); Zhong Hongying, E-mail: hyzhong@mail.ccnu.edu.cn [Key Laboratory of Pesticides and Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, Hubei 430079 (China)

    2011-01-17

    Rapid identification of unknown microorganisms of clinical and agricultural importance is not only critical for accurate diagnosis of infections but also essential for appropriate and prompt treatment. We describe here a rapid method for microorganisms typing based on quantitative analysis of fatty acids by iFAT approach (Isotope-coded Fatty Acid Transmethylation). In this work, lyophilized cell lysates were directly mixed with 0.5 M NaOH solution in d3-methanol and n-hexane. After 1 min of ultrasonication, the top n-hexane layer was combined with a mixture of standard d0-methanol derived fatty acid methylesters with known concentration. Measurement of intensity ratios of d3/d0 labeled fragment ion and molecular ion pairs at the corresponding target fatty acids provides a quantitative basis for hierarchical clustering. In the resultant dendrogram, the Euclidean distance between unknown species and known species quantitatively reveals their differences or shared similarities in fatty acid related pathways. It is of particular interest to apply this method for typing fungal species because fungi has distinguished lipid biosynthetic pathways that have been targeted for lots of drugs or fungicides compared with bacteria and animals. The proposed method has no dependence on the availability of genome or proteome databases. Therefore, it is can be applicable for a broad range of unknown microorganisms or mutant species.

  16. Fingerprint analysis of Hibiscus mutabilis L. leaves based on ultra performance liquid chromatography with photodiode array detector combined with similarity analysis and hierarchical clustering analysis methods

    Directory of Open Access Journals (Sweden)

    Xianrui Liang

    2013-01-01

    Full Text Available Background: A method for chemical fingerprint analysis of Hibiscus mutabilis L. leaves was developed based on ultra performance liquid chromatography with photodiode array detector (UPLC-PAD combined with similarity analysis (SA and hierarchical clustering analysis (HCA. Materials and Methods: 10 batches of Hibiscus mutabilis L. leaves samples were collected from different regions of China. UPLC-PAD was employed to collect chemical fingerprints of Hibiscus mutabilis L. leaves. Results: The relative standard deviations (RSDs of the relative retention times (RRT and relative peak areas (RPA of 10 characteristic peaks (one of them was identified as rutin in precision, repeatability and stability test were less than 3%, and the method of fingerprint analysis was validated to be suitable for the Hibiscus mutabilis L. leaves. Conclusions: The chromatographic fingerprints showed abundant diversity of chemical constituents qualitatively in the 10 batches of Hibiscus mutabilis L. leaves samples from different locations by similarity analysis on basis of calculating the correlation coefficients between each two fingerprints. Moreover, the HCA method clustered the samples into four classes, and the HCA dendrogram showed the close or distant relations among the 10 samples, which was consistent to the SA result to some extent.

  17. Rapid recognition of drug-resistance/sensitivity in leukemic cells by Fourier transform infrared microspectroscopy and unsupervised hierarchical cluster analysis.

    Science.gov (United States)

    Bellisola, Giuseppe; Cinque, Gianfelice; Vezzalini, Marzia; Moratti, Elisabetta; Silvestri, Giovannino; Redaelli, Sara; Gambacorti Passerini, Carlo; Wehbe, Katia; Sorio, Claudio

    2013-07-21

    We tested the ability of Fourier Transform (FT) InfraRed (IR) microspectroscopy (microFTIR) in combination with unsupervised Hierarchical Cluster Analysis (HCA) in identifying drug-resistance/sensitivity in leukemic cells exposed to tyrosine kinase inhibitors (TKIs). Experiments were carried out in a well-established mouse model of human Chronic Myelogenous Leukemia (CML). Mouse-derived pro-B Ba/F3 cells transfected with and stably expressing the human p210(BCR-ABL) drug-sensitive wild-type BCR-ABL or the V299L or T315I p210(BCR-ABL) drug-resistant BCR-ABL mutants were exposed to imatinib-mesylate (IMA) or dasatinib (DAS). MicroFTIR was carried out at the Diamond IR beamline MIRIAM where the mid-IR absorbance spectra of individual Ba/F3 cells were acquired using the high brilliance IR synchrotron radiation (SR) via aperture of 15 × 15 μm(2) in sizes. A conventional IR source (globar) was used to compare average spectra over 15 cells or more. IR signatures of drug actions were identified by supervised analyses in the spectra of TKI-sensitive cells. Unsupervised HCA applied to selected intervals of wavenumber allowed us to classify the IR patterns of viable (drug-resistant) and apoptotic (drug-sensitive) cells with an accuracy of >95%. The results from microFTIR + HCA analysis were cross-validated with those obtained via immunochemical methods, i.e. immunoblotting and flow cytometry (FC) that resulted directly and significantly correlated. We conclude that this combined microFTIR + HCA method potentially represents a rapid, convenient and robust screening approach to study the impact of drugs in leukemic cells as well as in peripheral blasts from patients in clinical trials with new anti-leukemic drugs.

  18. Hierarchical Control for Multiple DC Microgrids Clusters

    DEFF Research Database (Denmark)

    Shafiee, Qobad; Dragicevic, Tomislav; Vasquez, Juan Carlos;

    2014-01-01

    This paper presents a distributed hierarchical control framework to ensure reliable operation of dc Microgrid (MG) clusters. In this hierarchy, primary control is used to regulate the common bus voltage inside each MG locally. An adaptive droop method is proposed for this level which determines....... Another distributed policy is employed then to regulate the power flow among the MGs according to their local SOCs. The proposed distributed controllers on each MG communicate with only the neighbor MGs through a communication infrastructure. Finally, the small signal model is expanded for dc MG clusters...

  19. Hierarchical clusters of phytoplankton variables in dammed water bodies

    Science.gov (United States)

    Silva, Eliana Costa e.; Lopes, Isabel Cristina; Correia, Aldina; Gonçalves, A. Manuela

    2017-06-01

    In this paper a dataset containing biological variables of the water column of several Portuguese reservoirs is analyzed. Hierarchical cluster analysis is used to obtain clusters of phytoplankton variables of the phylum Cyanophyta, with the objective of validating the classification of Portuguese reservoirs previewly presented in [1] which were divided into three clusters: (1) Interior Tagus and Aguieira; (2) Douro; and (3) Other rivers. Now three new clusters of Cyanophyta variables were found. Kruskal-Wallis and Mann-Whitney tests are used to compare the now obtained Cyanophyta clusters and the previous Reservoirs clusters, in order to validate the classification of the water quality of reservoirs. The amount of Cyanophyta algae present in the reservoirs from the three clusters is significantly different, which validates the previous classification.

  20. Technique for fast and efficient hierarchical clustering

    Science.gov (United States)

    Stork, Christopher

    2013-10-08

    A fast and efficient technique for hierarchical clustering of samples in a dataset includes compressing the dataset to reduce a number of variables within each of the samples of the dataset. A nearest neighbor matrix is generated to identify nearest neighbor pairs between the samples based on differences between the variables of the samples. The samples are arranged into a hierarchy that groups the samples based on the nearest neighbor matrix. The hierarchy is rendered to a display to graphically illustrate similarities or differences between the samples.

  1. Magnetic susceptibilities of cluster-hierarchical models

    Science.gov (United States)

    McKay, Susan R.; Berker, A. Nihat

    1984-02-01

    The exact magnetic susceptibilities of hierarchical models are calculated near and away from criticality, in both the ordered and disordered phases. The mechanism and phenomenology are discussed for models with susceptibilities that are physically sensible, e.g., nondivergent away from criticality. Such models are found based upon the Niemeijer-van Leeuwen cluster renormalization. A recursion-matrix method is presented for the renormalization-group evaluation of response functions. Diagonalization of this matrix at fixed points provides simple criteria for well-behaved densities and response functions.

  2. Image Segmentation by Hierarchical Spatial and Color Spaces Clustering

    Institute of Scientific and Technical Information of China (English)

    YU Wei

    2005-01-01

    Image segmentation, as a basic building block for many high-level image analysis problems, has attracted many research attentions over years. Existing approaches, however, are mainly focusing on the clustering analysis in the single channel information, i.e., either in color or spatial space, which may lead to unsatisfactory segmentation performance. Considering the spatial and color spaces jointly, this paper proposes a new hierarchical image segmentation algorithm, which alternately clusters the image regions in color and spatial spaces in a fine to coarse manner. Without losing the perceptual consistence, the proposed algorithm achieves the segmentation result using only very few number of colors according to user specification.

  3. A fast quad-tree based two dimensional hierarchical clustering.

    Science.gov (United States)

    Rajadurai, Priscilla; Sankaranarayanan, Swamynathan

    2012-01-01

    Recently, microarray technologies have become a robust technique in the area of genomics. An important step in the analysis of gene expression data is the identification of groups of genes disclosing analogous expression patterns. Cluster analysis partitions a given dataset into groups based on specified features. Euclidean distance is a widely used similarity measure for gene expression data that considers the amount of changes in gene expression. However, the huge number of genes and the intricacy of biological networks have highly increased the challenges of comprehending and interpreting the resulting group of data, increasing processing time. The proposed technique focuses on a QT based fast 2-dimensional hierarchical clustering algorithm to perform clustering. The construction of the closest pair data structure is an each level is an important time factor, which determines the processing time of clustering. The proposed model reduces the processing time and improves analysis of gene expression data.

  4. Non-hierarchical clustering methods on factorial subspaces

    OpenAIRE

    Tortora, Cristina

    2011-01-01

    Cluster analysis (CA) aims at finding homogeneous group of individuals, where homogeneous is referred to individuals that present similar characteristics. Many CA techniques already exist, among the non-hierarchical ones the most known, thank to its simplicity and computational property, is k-means method. However, the method is unstable when the number of variables is large and when variables are correlated. This problem leads to the development of two-step methods, they perform a linear tra...

  5. 1 Hierarchical Approaches to the Analysis of Genetic Diversity in ...

    African Journals Online (AJOL)

    2015-04-14

    Apr 14, 2015 ... Keywords: Genetic diversity, Hierarchical approach, Plant, Clustering,. Descriptive ... utilization) or by clustering (based on a phonetic analysis of individual ...... Improvement of Food Crop Preservatives for the next Millennium.

  6. A Hierarchical Clustering Methodology for the Estimation of Toxicity

    Science.gov (United States)

    A Quantitative Structure Activity Relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural sim...

  7. Hierarchical Cluster Assembly in Globally Collapsing Clouds

    CERN Document Server

    Vazquez-Semadeni, Enrique; Colin, Pedro

    2016-01-01

    We discuss the mechanism of cluster formation in a numerical simulation of a molecular cloud (MC) undergoing global hierarchical collapse (GHC). The global nature of the collapse implies that the SFR increases over time. The hierarchical nature of the collapse consists of small-scale collapses within larger-scale ones. The large-scale collapses culminate a few Myr later than the small-scale ones and consist of filamentary flows that accrete onto massive central clumps. The small-scale collapses form clumps that are embedded in the filaments and falling onto the large-scale collapse centers. The stars formed in the early, small-scale collapses share the infall motion of their parent clumps. Thus, the filaments feed both gaseous and stellar material to the massive central clump. This leads to the presence of a few older stars in a region where new protostars are forming, and also to a self-similar structure, in which each unit is composed of smaller-scale sub-units that approach each other and may merge. Becaus...

  8. Hierarchically Clustered Star Formation in the Magellanic Clouds

    CERN Document Server

    Gouliermis, Dimitrios A; Ossenkopf, Volker; Klessen, Ralf S; Dolphin, Andrew E

    2012-01-01

    We present a cluster analysis of the bright main-sequence and faint pre--main-sequence stellar populations of a field ~ 90 x 90 pc centered on the HII region NGC 346/N66 in the Small Magellanic Cloud, from imaging with HST/ACS. We extend our earlier analysis on the stellar cluster population in the region to characterize the structuring behavior of young stars in the region as a whole with the use of stellar density maps interpreted through techniques designed for the study of the ISM structuring. In particular, we demonstrate with Cartwrigth & Whitworth's Q parameter, dendrograms, and the Delta-variance wavelet transform technique that the young stellar populations in the region NGC 346/N66 are hierarchically clustered, in agreement with other regions in the Magellanic Clouds observed with HST. The origin of this hierarchy is currently under investigation.

  9. Quantitative and Chemical Fingerprint Analysis for the Quality Evaluation of Receptaculum Nelumbinis by RP-HPLC Coupled with Hierarchical Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Jin-Zhong Wu

    2013-01-01

    Full Text Available A simple and reliable method of high-performance liquid chromatography with photodiode array detection (HPLC-DAD was developed to evaluate the quality of Receptaculum Nelumbinis (dried receptacle of Nelumbo nucifera through establishing chromatographic fingerprint and simultaneous determination of five flavonol glycosides, including hyperoside, isoquercitrin, quercetin-3-O-β-d-glucuronide, isorhamnetin-3-O-β-d-galactoside and syringetin-3-O-β-d-glucoside. In quantitative analysis, the five components showed good regression (R > 0.9998 within linear ranges, and their recoveries were in the range of 98.31%–100.32%. In the chromatographic fingerprint, twelve peaks were selected as the characteristic peaks to assess the similarities of different samples collected from different origins in China according to the State Food and Drug Administration (SFDA requirements. Furthermore, hierarchical cluster analysis (HCA was also applied to evaluate the variation of chemical components among different sources of Receptaculum Nelumbinis in China. This study indicated that the combination of quantitative and chromatographic fingerprint analysis can be readily utilized as a quality control method for Receptaculum Nelumbinis and its related traditional Chinese medicinal preparations.

  10. Properties of hierarchically forming star clusters

    CERN Document Server

    Maschberger, Th; Bonnell, I A; Kroupa, P

    2010-01-01

    We undertake a systematic analysis of the early (< 0.5 Myr) evolution of clustering and the stellar initial mass function in turbulent fragmentation simulations. These large scale simulations for the first time offer the opportunity for a statistical analysis of IMF variations and correlations between stellar properties and cluster richness. The typical evolutionary scenario involves star formation in small-n clusters which then progressively merge; the first stars to form are seeds of massive stars and achieve a headstart in mass acquisition. These massive seeds end up in the cores of clusters and a large fraction of new stars of lower mass is formed in the outer parts of the clusters. The resulting clusters are therefore mass segregated at an age of 0.5 Myr, although the signature of mass segregation is weakened during mergers. We find that the resulting IMF has a smaller exponent (alpha=1.8-2.2) than the Salpeter value (alpha=2.35). The IMFs in subclusters are truncated at masses only somewhat larger th...

  11. Fast, Linear Time Hierarchical Clustering using the Baire Metric

    CERN Document Server

    Contreras, Pedro

    2011-01-01

    The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partititioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwi...

  12. [Cluster analysis in biomedical researches].

    Science.gov (United States)

    Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

    2013-01-01

    Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.

  13. Hierarchical Approach in Clustering to Euclidean Traveling Salesman Problem

    Science.gov (United States)

    Fajar, Abdulah; Herman, Nanna Suryana; Abu, Nur Azman; Shahib, Sahrin

    There has been growing interest in studying combinatorial optimization problems by clustering strategy, with a special emphasis on the traveling salesman problem (TSP). TSP naturally arises as a sub problem in much transportation, manufacturing and logistics application, this problem has caught much attention of mathematicians and computer scientists. A clustering approach will decompose TSP into sub graph and form cluster, so it may reduce problem size into smaller problem. Impact of hierarchical approach will be investigated to produce a better clustering strategy that fit into Euclidean TSP. Clustering strategy to Euclidean TSP consist of two main step, there are; clustering and tour construction. The significant of this research is clustering approach solution result has error less than 10% compare to best known solution (TSPLIB) and there is improvement to a hierarchical clustering algorithm in order to fit in such Euclidean TSP solution method.

  14. Investigating the provenance of iron artifacts of the Royal Iron Factory of Sao Joao de Ipanema by hierarchical cluster analysis of EDS microanalyses of slag inclusions

    Energy Technology Data Exchange (ETDEWEB)

    Mamani-Calcina, Elmer Antonio; Landgraf, Fernando Jose Gomes; Azevedo, Cesar Roberto de Farias, E-mail: c.azevedo@usp.br [Universidade de Sao Paulo (USP), Sao Paulo, SP (Brazil). Escola Politecnica. Departmento de Engenharia Metalurgica e de Materiais

    2017-01-15

    Microstructural characterization techniques, including EDX (Energy Dispersive X-ray Analysis) microanalyses, were used to investigate the slag inclusions in the microstructure of ferrous artifacts of the Royal Iron Factory of Sao Joao de Ipanema (first steel plant of Brazil, XIX century), the D. Pedro II Bridge (located in Bahia, assembled in XIX century and produced in Scotland) and the archaeological sites of Sao Miguel de Missoes (Rio Grande do Sul, Brazil, production site of iron artifacts, the XVIII century) and Afonso Sardinha (Sao Paulo, Brazil production site of iron artifacts, XVI century). The microanalyses results of the main micro constituents of the microstructure of the slag inclusions were investigated by hierarchical cluster analysis and the dendrogram with the microanalyses results of the wüstite phase (using as critical variables the contents of MnO, MgO, Al{sub 2}O{sub 3}, V{sub 2}O{sub 5} and TiO{sub 2}) allowed the identification of four clusters, which successfully represented the samples of the four investigated sites (Ipanema, Sardinha, Missoes and Bahia). Finally, the comparatively low volumetric fraction of slag inclusions in the samples of Ipanema (∼1%) suggested the existence of technological expertise at the iron making processing in the Royal Iron Factory of Sao Joao de Ipanema. (author)

  15. Hierarchical Clustering and the Concept of Space Distortion.

    Science.gov (United States)

    Hubert, Lawrence; Schultz, James

    An empirical assesssment of the space distortion properties of two prototypic hierarchical clustering procedures is given in terms of an occupancy model developed from combinatorics. Using one simple example, the single-link and complete-link clustering strategies now in common use in the behavioral sciences are empirically shown to be space…

  16. Divisive Analysis (DIANA of hierarchical clustering and GPS data for level of service criteria of urban streets

    Directory of Open Access Journals (Sweden)

    Ashish Kumar Patnaik

    2016-03-01

    Full Text Available Level of Service (LOS for heterogeneous traffic flow on urban streets is not well defined in Indian context. Hence in this study an attempt is taken to classify urban road networks into number of street classes and average travel speeds on street segments into LOS categories. Divisive Analysis (DIANA Clustering is used for such classification of large amount of speed data collected using GPS receiver. DIANA algorithm and silhouette validation parameter are used to classify Free Flow Speeds (FFS into optimal number of classes and the same algorithm is applied on speed data to determine ranges of different LOS categories. Speed ranges for LOS categories (A–F expressed in percentage of FFS are found to be 90, 70, 50, 40, 25 and 20–25 respectively in the present study. On the other hand, in HCM (2000 it has been mentioned these values are 85 and above, 67–85, 50–67, 40–50, 30–40 and 30 and less percent respectively.

  17. The Hierarchical Distribution of Young Stellar Clusters in Nearby Galaxies

    Science.gov (United States)

    Grasha, Kathryn; Calzetti, Daniela

    2017-01-01

    We investigate the spatial distributions of young stellar clusters in six nearby galaxies to trace the large scale hierarchical star-forming structures. The six galaxies are drawn from the Legacy ExtraGalactic UV Survey (LEGUS). We quantify the strength of the clustering among stellar clusters as a function of spatial scale and age to establish the survival timescale of the substructures. We separate the clusters into different classes, compact (bound) clusters and associations (unbound), and compare the clustering among them. We find that younger star clusters are more strongly clustered over small spatial scales and that the clustering disappears rapidly for ages as young as a few tens of Myr, consistent with clusters slowly losing the fractal dimension inherited at birth from their natal molecular clouds.

  18. Hierarchical Clustering Given Confidence Intervals of Metric Distances

    CERN Document Server

    Huang, Weiyu

    2016-01-01

    This paper considers metric spaces where distances between a pair of nodes are represented by distance intervals. The goal is to study methods for the determination of hierarchical clusters, i.e., a family of nested partitions indexed by a resolution parameter, induced from the given distance intervals of the metric spaces. Our construction of hierarchical clustering methods is based on defining admissible methods to be those methods that abide to the axioms of value - nodes in a metric space with two nodes are clustered together at the convex combination of the distance bounds between them - and transformation - when both distance bounds are reduced, the output may become more clustered but not less. Two admissible methods are constructed and are shown to provide universal upper and lower bounds in the space of admissible methods. Practical implications are explored by clustering moving points via snapshots and by clustering networks representing brain structural connectivity using the lower and upper bounds...

  19. Cluster analysis

    CERN Document Server

    Everitt, Brian S; Leese, Morven; Stahl, Daniel

    2011-01-01

    Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.Real life examples are used throughout to demons

  20. Hierarchical modeling of cluster size in wildlife surveys

    Science.gov (United States)

    Royle, J. Andrew

    2008-01-01

    Clusters or groups of individuals are the fundamental unit of observation in many wildlife sampling problems, including aerial surveys of waterfowl, marine mammals, and ungulates. Explicit accounting of cluster size in models for estimating abundance is necessary because detection of individuals within clusters is not independent and detectability of clusters is likely to increase with cluster size. This induces a cluster size bias in which the average cluster size in the sample is larger than in the population at large. Thus, failure to account for the relationship between delectability and cluster size will tend to yield a positive bias in estimates of abundance or density. I describe a hierarchical modeling framework for accounting for cluster-size bias in animal sampling. The hierarchical model consists of models for the observation process conditional on the cluster size distribution and the cluster size distribution conditional on the total number of clusters. Optionally, a spatial model can be specified that describes variation in the total number of clusters per sample unit. Parameter estimation, model selection, and criticism may be carried out using conventional likelihood-based methods. An extension of the model is described for the situation where measurable covariates at the level of the sample unit are available. Several candidate models within the proposed class are evaluated for aerial survey data on mallard ducks (Anas platyrhynchos).

  1. Update Legal Documents Using Hierarchical Ranking Models and Word Clustering

    OpenAIRE

    Pham, Minh Quang Nhat; Nguyen, Minh Le; Shimazu, Akira

    2010-01-01

    Our research addresses the task of updating legal documents when newinformation emerges. In this paper, we employ a hierarchical ranking model tothe task of updating legal documents. Word clustering features are incorporatedto the ranking models to exploit semantic relations between words. Experimentalresults on legal data built from the United States Code show that the hierarchicalranking model with word clustering outperforms baseline methods using VectorSpace Model, and word cluster-based ...

  2. Ultra high performance liquid chromatography with electrospray ionization tandem mass spectrometry coupled with hierarchical cluster analysis to evaluate Wikstroemia indica (L.) C. A. Mey. from different geographical regions.

    Science.gov (United States)

    Wei, Lan; Wang, Xiaobo; Mu, Shanxue; Sun, Lixin; Yu, Zhiguo

    2015-06-01

    A sensitive, rapid and simple ultra high performance liquid chromatography with electrospray ionization tandem mass spectrometry method was developed to determine seven constituents (umbelliferone, apigenin, triumbelletin, daphnoretin, arctigenin, genkwanin and emodin) in Wikstroemia indica (L.) C. A. Mey. The chromatographic analysis was performed on an ACQUITY UPLC® BEH C18 column (2.1 × 50 mm, 1.7 μm) by gradient elution with the mobile phase of 0.05% formic acid aqueous solution (A) and acetonitrile (B). Multiple reaction monitoring mode with positive and negative electrospray ionization interface was carried out to detect the components. This method was validated in terms of specificity, linearity, accuracy, precision and stability. Excellent linear behavior was observed over the certain concentration ranges with the correlation coefficient values higher than 0.999. The intraday and innerday precisions were within 2.0%. The recoveries of seven analytes were 99.4-101.1% with relative standard deviation less than 1.2%. The 18 Wikstroemia indica samples from different origins were classified by hierarchical clustering analysis according to the contents of seven components. The results demonstrated that the developed method could successfully be used to quantify simultaneously of seven components in Wikstroemia indica and could be a helpful tool for the detection and confirmation of the quality of traditional Chinese medicines.

  3. MultiDendrograms: Variable-Group Agglomerative Hierarchical Clustering

    CERN Document Server

    Gomez, Sergio; Montiel, Justo; Torres, David

    2012-01-01

    MultiDendrograms is a Java-written application that computes agglomerative hierarchical clusterings of data. Starting from a distances (or weights) matrix, MultiDendrograms is able to calculate its dendrograms using the most common agglomerative hierarchical clustering methods. The application implements a variable-group algorithm that solves the non-uniqueness problem found in the standard pair-group algorithm. This problem arises when two or more minimum distances between different clusters are equal during the agglomerative process, because then different output clusterings are possible depending on the criterion used to break ties between distances. MultiDendrograms solves this problem implementing a variable-group algorithm that groups more than two clusters at the same time when ties occur.

  4. Hierarchical Overlapping Clustering of Network Data Using Cut Metrics

    CERN Document Server

    Gama, Fernando; Ribeiro, Alejandro

    2016-01-01

    A novel method to obtain hierarchical and overlapping clusters from network data -i.e., a set of nodes endowed with pairwise dissimilarities- is presented. The introduced method is hierarchical in the sense that it outputs a nested collection of groupings of the node set depending on the resolution or degree of similarity desired, and it is overlapping since it allows nodes to belong to more than one group. Our construction is rooted on the facts that a hierarchical (non-overlapping) clustering of a network can be equivalently represented by a finite ultrametric space and that a convex combination of ultrametrics results in a cut metric. By applying a hierarchical (non-overlapping) clustering method to multiple dithered versions of a given network and then convexly combining the resulting ultrametrics, we obtain a cut metric associated to the network of interest. We then show how to extract a hierarchical overlapping clustering structure from the aforementioned cut metric. Furthermore, the so-called overlappi...

  5. Identifying Reference Objects by Hierarchical Clustering in Java Environment

    Directory of Open Access Journals (Sweden)

    RAHUL SAHA

    2011-09-01

    Full Text Available Recently Java programming environment has become so popular. Java programming language is a language that is designed to be portable enough to be executed in wide range of computers ranging from cell phones to supercomputers. Computer programs written in Java are compiled into Java Byte code instructions that are suitable for execution by a Java Virtual Machine implementation. Java virtual Machine is commonly implemented in software by means of an interpreter for the Java Virtual Machine instruction set. As an object oriented language, Java utilizes the concept of objects. Our idea is to identify the candidate objects references in a Java environment through hierarchical cluster analysis using reference stack and execution stack.

  6. Hierarchical cluster analysis and chemical characterisation of Myrtus communis L. essential oil from Yemen region and its antimicrobial, antioxidant and anti-colorectal adenocarcinoma properties.

    Science.gov (United States)

    Anwar, Sirajudheen; Crouch, Rebecca A; Awadh Ali, Nasser A; Al-Fatimi, Mohamed A; Setzer, William N; Wessjohann, Ludger

    2017-01-09

    The hydrodistilled essential oil obtained from the dried leaves of Myrtus communis, collected in Yemen, was analysed by GC-MS. Forty-one compounds were identified, representing 96.3% of the total oil. The major constituents of essential oil were oxygenated monoterpenoids (87.1%), linalool (29.1%), 1,8-cineole (18.4%), α-terpineol (10.8%), geraniol (7.3%) and linalyl acetate (7.4%). The essential oil was assessed for its antimicrobial activity using a disc diffusion assay and resulted in moderate to potent antibacterial and antifungal activities targeting mainly Bacillus subtilis, Staphylococcus aureus and Candida albicans. The oil moderately reduced the diphenylpicrylhydrazyl radical (IC50 = 4.2 μL/mL or 4.1 mg/mL). In vitro cytotoxicity evaluation against HT29 (human colonic adenocarcinoma cells) showed that the essential oil exhibited a moderate antitumor effect with IC50 of 110 ± 4 μg/mL. Hierarchical cluster analysis of M. communis has been carried out based on the chemical compositions of 99 samples reported in the literature, including Yemeni sample.

  7. HILIC-UPLC-MS/MS combined with hierarchical clustering analysis to rapidly analyze and evaluate nucleobases and nucleosides in Ginkgo biloba leaves.

    Science.gov (United States)

    Yao, Xin; Zhou, Guisheng; Tang, Yuping; Guo, Sheng; Qian, Dawei; Duan, Jin-Ao

    2015-02-01

    Ginkgo biloba leaf extract has been widely used in dietary supplements and more recently in some foods and beverages. In addition to the well-known flavonol glycosides and terpene lactones, G. biloba leaves are also rich in nucleobases and nucleosides. To determine the content of nucleobases and nucleosides in G. biloba leaves at trace levels, a reliable method has been established by using hydrophilic interaction ultra performance liquid chromatography coupled with triple-quadrupole tandem mass spectrometry (HILIC-UPLC-TQ-MS/MS) working in multiple reaction monitoring mode. Eleven nucleobases and nucleosides were simultaneously determined in seven min. The proposed method was fully validated in terms of linearity, sensitivity, and repeatability, as well as recovery. Furthermore, hierarchical clustering analysis (HCA) was performed to evaluate and classify the samples according to the contents of the eleven chemical constituents. The established approach could be helpful for evaluation of the potential values as dietary supplements and the quality control of G. biloba leaves, which might also be utilized for the investigation of other medicinal herbs containing nucleobases and nucleosides.

  8. Hierarchical clustering techniques for image database organization and summarization

    Science.gov (United States)

    Vellaikal, Asha; Kuo, C.-C. Jay

    1998-10-01

    This paper investigates clustering techniques as a method of organizing image databases to support popular visual management functions such as searching, browsing and navigation. Different types of hierarchical agglomerative clustering techniques are studied as a method of organizing features space as well as summarizing image groups by the selection of a few appropriate representatives. Retrieval performance using both single and multiple level hierarchies are experimented with and the algorithms show an interesting relationship between the top k correct retrievals and the number of comparisons required. Some arguments are given to support the use of such cluster-based techniques for managing distributed image databases.

  9. Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods

    Directory of Open Access Journals (Sweden)

    Zweig Katharina A

    2009-10-01

    Full Text Available Abstract Background Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method. Results In this article we provide a general selection scheme, the level independent clustering selection method, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of cohesive clusters. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection. Conclusion Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and

  10. A Framework for Hierarchical Clustering Based Indexing in Search Engines

    Directory of Open Access Journals (Sweden)

    Parul Gupta

    2011-01-01

    Full Text Available Granting efficient and fast accesses to the index is a key issuefor performances of Web Search Engines. In order to enhancememory utilization and favor fast query resolution, WSEs useInverted File (IF indexes that consist of an array of theposting lists where each posting list is associated with a termand contains the term as well as the identifiers of the documentscontaining the term. Since the document identifiers are stored insorted order, they can be stored as the difference between thesuccessive documents so as to reduce the size of the index. Thispaper describes a clustering algorithm that aims atpartitioning the set of documents into ordered clusters so thatthe documents within the same cluster are similar and are beingassigned the closer document identifiers. Thus the averagevalue of the differences between the successive documents willbe minimized and hence storage space would be saved. Thepaper further presents the extension of this clustering algorithmto be applied for the hierarchical clustering in which similarclusters are clubbed to form a mega cluster and similar megaclusters are then combined to form super cluster. Thus thepaper describes the different levels of clustering whichoptimizes the search process by directing the searchto a specific path from higher levels of clustering to the lowerlevels i.e. from super clusters to mega clusters, then to clustersand finally to the individual documents so that the user gets thebest possible matching results in minimum possible time.

  11. Hierarchical matrices algorithms and analysis

    CERN Document Server

    Hackbusch, Wolfgang

    2015-01-01

    This self-contained monograph presents matrix algorithms and their analysis. The new technique enables not only the solution of linear systems but also the approximation of matrix functions, e.g., the matrix exponential. Other applications include the solution of matrix equations, e.g., the Lyapunov or Riccati equation. The required mathematical background can be found in the appendix. The numerical treatment of fully populated large-scale matrices is usually rather costly. However, the technique of hierarchical matrices makes it possible to store matrices and to perform matrix operations approximately with almost linear cost and a controllable degree of approximation error. For important classes of matrices, the computational cost increases only logarithmically with the approximation error. The operations provided include the matrix inversion and LU decomposition. Since large-scale linear algebra problems are standard in scientific computing, the subject of hierarchical matrices is of interest to scientists ...

  12. Active Clustering: Robust and Efficient Hierarchical Clustering using Adaptively Selected Similarities

    CERN Document Server

    Eriksson, Brian; Singh, Aarti; Nowak, Robert

    2011-01-01

    Hierarchical clustering based on pairwise similarities is a common tool used in a broad range of scientific applications. However, in many problems it may be expensive to obtain or compute similarities between the items to be clustered. This paper investigates the hierarchical clustering of N items based on a small subset of pairwise similarities, significantly less than the complete set of N(N-1)/2 similarities. First, we show that if the intracluster similarities exceed intercluster similarities, then it is possible to correctly determine the hierarchical clustering from as few as 3N log N similarities. We demonstrate this order of magnitude savings in the number of pairwise similarities necessitates sequentially selecting which similarities to obtain in an adaptive fashion, rather than picking them at random. We then propose an active clustering method that is robust to a limited fraction of anomalous similarities, and show how even in the presence of these noisy similarity values we can resolve the hierar...

  13. Non-Hierarchical Clustering as a method to analyse an open-ended ...

    African Journals Online (AJOL)

    Apple

    tests, provide instructors with tools to probe students' conceptual knowledge of various fields of science and ... quantitative non-hierarchical clustering analysis method known as k-means (Everitt, Landau, Leese & Stahl, ...... undergraduate engineering students in creating ... mathematics-formal reasoning and the contextual.

  14. Hierarchical clusters in families with type 2 diabetes

    Science.gov (United States)

    García-Solano, Beatriz; Gallegos-Cabriales, Esther C; Gómez-Meza, Marco V; García-Madrid, Guillermina; Flores-Merlo, Marcela; García-Solano, Mauro

    2015-01-01

    Families represent more than a set of individuals; family is more than a sum of its individual members. With this classification, nurses can identify the family health-illness beliefs obey family as a unit concept, and plan family inclusion into the type 2 diabetes treatment, whom is not considered in public policy, despite families share diet, exercise, and self-monitoring with a member who suffers type 2 diabetes. The aim of this study was to determine whether the characteristics, functionality, routines, and family and individual health in type 2 diabetes describes the differences and similarities between families to consider them as a unit. We performed an exploratory, descriptive hierarchical cluster analysis of 61 families using three instruments and a questionnaire, in addition to weight, height, body fat percentage, hemoglobin A1c, total cholesterol, triglycerides, low-density lipoprotein and high-density lipoprotein. The analysis produced three groups of families. Wilk’s lambda demonstrated statistically significant differences provided by age (Λ = 0.778, F = 2.098, p = 0.010) and family health (Λ = 0.813, F = 2.650, p = 0.023). A post hoc Tukey test coincided with the three subsets. Families with type 2 diabetes have common elements that make them similar, while sharing differences that make them unique. PMID:27347419

  15. Concept Association and Hierarchical Hamming Clustering Model in Text Classification

    Institute of Scientific and Technical Information of China (English)

    Su Gui-yang; Li Jian-hua; Ma Ying-hua; Li Sheng-hong; Yin Zhong-hang

    2004-01-01

    We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to reduce the dimensionality of the category feature vector space which can solve the problem of the extremely high dimensionality of the documents' feature space. The results of experiment indicate that it can obtain the co-occurrence relations among keywords in the documents which promote the recall of classification system effectively. The hierarchical Hamming clustering model can reduce the dimensionality of the category feature vector efficiently, the size of the vector space is only about 10% of the primary dimensionality.

  16. Global Considerations in Hierarchical Clustering Reveal Meaningful Patterns in Data

    Science.gov (United States)

    Varshavsky, Roy; Horn, David; Linial, Michal

    2008-01-01

    Background A hierarchy, characterized by tree-like relationships, is a natural method of organizing data in various domains. When considering an unsupervised machine learning routine, such as clustering, a bottom-up hierarchical (BU, agglomerative) algorithm is used as a default and is often the only method applied. Methodology/Principal Findings We show that hierarchical clustering that involve global considerations, such as top-down (TD, divisive), or glocal (global-local) algorithms are better suited to reveal meaningful patterns in the data. This is demonstrated, by testing the correspondence between the results of several algorithms (TD, glocal and BU) and the correct annotations provided by experts. The correspondence was tested in multiple domains including gene expression experiments, stock trade records and functional protein families. The performance of each of the algorithms is evaluated by statistical criteria that are assigned to clusters (nodes of the hierarchy tree) based on expert-labeled data. Whereas TD algorithms perform better on global patterns, BU algorithms perform well and are advantageous when finer granularity of the data is sought. In addition, a novel TD algorithm that is based on genuine density of the data points is presented and is shown to outperform other divisive and agglomerative methods. Application of the algorithm to more than 500 protein sequences belonging to ion-channels illustrates the potential of the method for inferring overlooked functional annotations. ClustTree, a graphical Matlab toolbox for applying various hierarchical clustering algorithms and testing their quality is made available. Conclusions Although currently rarely used, global approaches, in particular, TD or glocal algorithms, should be considered in the exploratory process of clustering. In general, applying unsupervised clustering methods can leverage the quality of manually-created mapping of proteins families. As demonstrated, it can also provide

  17. Extending stability through hierarchical clusters in Echo State Networks

    Directory of Open Access Journals (Sweden)

    Sarah Jarvis

    2010-07-01

    Full Text Available Echo State Networks (ESN are reservoir networks that satisfy well-established criteria for stability when constructed as feedforward networks. Recent evidence suggests that stability criteria are altered in the presence of reservoir substructures, such as clusters. Understanding how the reservoir architecture affects stability is thus important for the appropriate design of any ESN. To quantitatively determine the influence of the most relevant network parameters, we analysed the impact of reservoir substructures on stability in hierarchically clustered ESNs (HESN, as they allow a smooth transition from highly structured to increasingly homogeneous reservoirs. Previous studies used the largest eigenvalue of the reservoir connectivity matrix (spectral radius as a predictor for stable network dynamics. Here, we evaluate the impact of clusters, hierarchy and intercluster connectivity on the predictive power of the spectral radius for stability. Both hierarchy and low relative cluster sizes extend the range of spectral radius values, leading to stable networks, while increasing intercluster connectivity decreased maximal spectral radius.

  18. Multi-mode clustering model for hierarchical wireless sensor networks

    Science.gov (United States)

    Hu, Xiangdong; Li, Yongfu; Xu, Huifen

    2017-03-01

    The topology management, i.e., clusters maintenance, of wireless sensor networks (WSNs) is still a challenge due to its numerous nodes, diverse application scenarios and limited resources as well as complex dynamics. To address this issue, a multi-mode clustering model (M2 CM) is proposed to maintain the clusters for hierarchical WSNs in this study. In particular, unlike the traditional time-trigger model based on the whole-network and periodic style, the M2 CM is proposed based on the local and event-trigger operations. In addition, an adaptive local maintenance algorithm is designed for the broken clusters in the WSNs using the spatial-temporal demand changes accordingly. Numerical experiments are performed using the NS2 network simulation platform. Results validate the effectiveness of the proposed model with respect to the network maintenance costs, node energy consumption and transmitted data as well as the network lifetime.

  19. Globular cluster formation with multiple stellar populations from hierarchical star cluster complexes

    Science.gov (United States)

    Bekki, Kenji

    2017-01-01

    Most old globular clusters (GCs) in the Galaxy are observed to have internal chemical abundance spreads in light elements. We discuss a new GC formation scenario based on hierarchical star formation within fractal molecular clouds. In the new scenario, a cluster of bound and unbound star clusters (`star cluster complex', SCC) that have a power-law cluster mass function with a slope (β) of 2 is first formed from a massive gas clump developed in a dwarf galaxy. Such cluster complexes and β = 2 are observed and expected from hierarchical star formation. The most massive star cluster (`main cluster'), which is the progenitor of a GC, can accrete gas ejected from asymptotic giant branch (AGB) stars initially in the cluster and other low-mass clusters before the clusters are tidally stripped or destroyed to become field stars in the dwarf. The SCC is initially embedded in a giant gas hole created by numerous supernovae of the SCC so that cold gas outside the hole can be accreted onto the main cluster later. New stars formed from the accreted gas have chemical abundances that are different from those of the original SCC. Using hydrodynamical simulations of GC formation based on this scenario, we show that the main cluster with the initial mass as large as [2 - 5] × 105M⊙ can accrete more than 105M⊙ gas from AGB stars of the SCC. We suggest that merging of hierarchical star cluster complexes can play key roles in stellar halo formation around GCs and self-enrichment processes in the early phase of GC formation.

  20. Hierarchical Parallelization of Gene Differential Association Analysis

    Directory of Open Access Journals (Sweden)

    Dwarkadas Sandhya

    2011-09-01

    Full Text Available Abstract Background Microarray gene differential expression analysis is a widely used technique that deals with high dimensional data and is computationally intensive for permutation-based procedures. Microarray gene differential association analysis is even more computationally demanding and must take advantage of multicore computing technology, which is the driving force behind increasing compute power in recent years. In this paper, we present a two-layer hierarchical parallel implementation of gene differential association analysis. It takes advantage of both fine- and coarse-grain (with granularity defined by the frequency of communication parallelism in order to effectively leverage the non-uniform nature of parallel processing available in the cutting-edge systems of today. Results Our results show that this hierarchical strategy matches data sharing behavior to the properties of the underlying hardware, thereby reducing the memory and bandwidth needs of the application. The resulting improved efficiency reduces computation time and allows the gene differential association analysis code to scale its execution with the number of processors. The code and biological data used in this study are downloadable from http://www.urmc.rochester.edu/biostat/people/faculty/hu.cfm. Conclusions The performance sweet spot occurs when using a number of threads per MPI process that allows the working sets of the corresponding MPI processes running on the multicore to fit within the machine cache. Hence, we suggest that practitioners follow this principle in selecting the appropriate number of MPI processes and threads within each MPI process for their cluster configurations. We believe that the principles of this hierarchical approach to parallelization can be utilized in the parallelization of other computationally demanding kernels.

  1. D Nearest Neighbour Search Using a Clustered Hierarchical Tree Structure

    Science.gov (United States)

    Suhaibah, A.; Uznir, U.; Anton, F.; Mioc, D.; Rahman, A. A.

    2016-06-01

    Locating and analysing the location of new stores or outlets is one of the common issues facing retailers and franchisers. This is due to assure that new opening stores are at their strategic location to attract the highest possible number of customers. Spatial information is used to manage, maintain and analyse these store locations. However, since the business of franchising and chain stores in urban areas runs within high rise multi-level buildings, a three-dimensional (3D) method is prominently required in order to locate and identify the surrounding information such as at which level of the franchise unit will be located or is the franchise unit located is at the best level for visibility purposes. One of the common used analyses used for retrieving the surrounding information is Nearest Neighbour (NN) analysis. It uses a point location and identifies the surrounding neighbours. However, with the immense number of urban datasets, the retrieval and analysis of nearest neighbour information and their efficiency will become more complex and crucial. In this paper, we present a technique to retrieve nearest neighbour information in 3D space using a clustered hierarchical tree structure. Based on our findings, the proposed approach substantially showed an improvement of response time analysis compared to existing approaches of spatial access methods in databases. The query performance was tested using a dataset consisting of 500,000 point locations building and franchising unit. The results are presented in this paper. Another advantage of this structure is that it also offers a minimal overlap and coverage among nodes which can reduce repetitive data entry.

  2. Determination of genetic structure of germplasm collections: are traditional hierarchical clustering methods appropriate for molecular marker data?

    Science.gov (United States)

    Odong, T L; van Heerwaarden, J; Jansen, J; van Hintum, T J L; van Eeuwijk, F A

    2011-07-01

    Despite the availability of newer approaches, traditional hierarchical clustering remains very popular in genetic diversity studies in plants. However, little is known about its suitability for molecular marker data. We studied the performance of traditional hierarchical clustering techniques using real and simulated molecular marker data. Our study also compared the performance of traditional hierarchical clustering with model-based clustering (STRUCTURE). We showed that the cophenetic correlation coefficient is directly related to subgroup differentiation and can thus be used as an indicator of the presence of genetically distinct subgroups in germplasm collections. Whereas UPGMA performed well in preserving distances between accessions, Ward excelled in recovering groups. Our results also showed a close similarity between clusters obtained by Ward and by STRUCTURE. Traditional cluster analysis can provide an easy and effective way of determining structure in germplasm collections using molecular marker data, and, the output can be used for sampling core collections or for association studies.

  3. Delineation of Stenotrophomonas maltophilia isolates from cystic fibrosis patients by fatty acid methyl ester profiles and matrix-assisted laser desorption/ionization time-of-flight mass spectra using hierarchical cluster analysis and principal component analysis.

    Science.gov (United States)

    Vidigal, Pedrina Gonçalves; Mosel, Frank; Koehling, Hedda Luise; Mueller, Karl Dieter; Buer, Jan; Rath, Peter Michael; Steinmann, Joerg

    2014-12-01

    Stenotrophomonas maltophilia is an opportunist multidrug-resistant pathogen that causes a wide range of nosocomial infections. Various cystic fibrosis (CF) centres have reported an increasing prevalence of S. maltophilia colonization/infection among patients with this disease. The purpose of this study was to assess specific fingerprints of S. maltophilia isolates from CF patients (n = 71) by investigating fatty acid methyl esters (FAMEs) through gas chromatography (GC) and highly abundant proteins by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS), and to compare them with isolates obtained from intensive care unit (ICU) patients (n = 20) and the environment (n = 11). Principal component analysis (PCA) of GC-FAME patterns did not reveal a clustering corresponding to distinct CF, ICU or environmental types. Based on the peak area index, it was observed that S. maltophilia isolates from CF patients produced significantly higher amounts of fatty acids in comparison with ICU patients and the environmental isolates. Hierarchical cluster analysis (HCA) based on the MALDI-TOF MS peak profiles of S. maltophilia revealed the presence of five large clusters, suggesting a high phenotypic diversity. Although HCA of MALDI-TOF mass spectra did not result in distinct clusters predominantly composed of CF isolates, PCA revealed the presence of a distinct cluster composed of S. maltophilia isolates from CF patients. Our data suggest that S. maltophilia colonizing CF patients tend to modify not only their fatty acid patterns but also their protein patterns as a response to adaptation in the unfavourable environment of the CF lung. © 2014 The Authors.

  4. THE EVOLUTION OF BRIGHTEST CLUSTER GALAXIES IN A HIERARCHICAL UNIVERSE

    Energy Technology Data Exchange (ETDEWEB)

    Tonini, Chiara; Bernyk, Maksym; Croton, Darren [Centre for Astrophysics and Supercomputing, Swinburne University of Technology, Melbourne, VIC 3122 (Australia); Maraston, Claudia; Thomas, Daniel [Institute of Cosmology and Gravitation, University of Portsmouth, Portsmouth PO1 3FX (United Kingdom)

    2012-11-01

    We investigate the evolution of brightest cluster galaxies (BCGs) from redshift z {approx} 1.6 to z = 0. We upgrade the hierarchical semi-analytic model of Croton et al. with a new spectro-photometric model that produces realistic galaxy spectra, making use of the Maraston stellar populations and a new recipe for the dust extinction. We compare the model predictions of the K-band luminosity evolution and the J - K, V - I, and I - K color evolution with a series of data sets, including those of Collins et al. who argued that semi-analytic models based on the Millennium simulation cannot reproduce the red colors and high luminosity of BCGs at z > 1. We show instead that the model is well in range of the observed luminosity and correctly reproduces the color evolution of BCGs in the whole redshift range up to z {approx} 1.6. We argue that the success of the semi-analytic model is in large part due to the implementation of a more sophisticated spectro-photometric model. An analysis of the model BCGs shows an increase in mass by a factor of 2-3 since z {approx} 1, and star formation activity down to low redshifts. While the consensus regarding BCGs is that they are passively evolving, we argue that this conclusion is affected by the degeneracy between star formation history and stellar population models used in spectral energy distribution fitting, and by the inefficacy of toy models of passive evolution to capture the complexity of real galaxies, especially those with rich merger histories like BCGs. Following this argument, we also show that in the semi-analytic model the BCGs show a realistic mix of stellar populations, and that these stellar populations are mostly old. In addition, the age-redshift relation of the model BCGs follows that of the universe, meaning that given their merger history and star formation history, the ageing of BCGs is always dominated by the ageing of their stellar populations. In a {Lambda}CDM universe, we define such evolution as &apos

  5. Multiscale stochastic hierarchical image segmentation by spectral clustering

    Institute of Scientific and Technical Information of China (English)

    LI XiaoBin; TIAN Zheng

    2007-01-01

    This paper proposes a sampling based hierarchical approach for solving the computational demands of the spectral clustering methods when applied to the problem of image segmentation. The authors first define the distance between a pixel and a cluster, and then derive a new theorem to estimate the number of samples needed for clustering. Finally, by introducing a scale parameter into the similarity function, a novel spectral clustering based image segmentation method has been developed. An important characteristic of the approach is that in the course of image segmentation one needs not only to tune the scale parameter to merge the small size clusters or split the large size clusters but also take samples from the data set at the different scales. The multiscale and stochastic nature makes it feasible to apply the method to very large grouping problem. In addition, it also makes the segmentation compute in time that is linear in the size of the image. The experimental results on various synthetic and real world images show the effectiveness of the approach.

  6. An agglomerative hierarchical approach to visualization in Bayesian clustering problems.

    Science.gov (United States)

    Dawson, K J; Belkhir, K

    2009-07-01

    Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals--the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. As the number of possible partitions grows very rapidly with the sample size, we cannot visualize this probability distribution in its entirety, unless the sample is very small. As a solution to this visualization problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package PartitionView. The exact linkage algorithm takes the posterior co-assignment probabilities as input and yields as output a rooted binary tree, or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities.

  7. Clustering-based classification of road traffic accidents using hierarchical clustering and artificial neural networks.

    Science.gov (United States)

    Taamneh, Madhar; Taamneh, Salah; Alkheder, Sharaf

    2017-09-01

    Artificial neural networks (ANNs) have been widely used in predicting the severity of road traffic crashes. All available information about previously occurred accidents is typically used for building a single prediction model (i.e., classifier). Too little attention has been paid to the differences between these accidents, leading, in most cases, to build less accurate predictors. Hierarchical clustering is a well-known clustering method that seeks to group data by creating a hierarchy of clusters. Using hierarchical clustering and ANNs, a clustering-based classification approach for predicting the injury severity of road traffic accidents was proposed. About 6000 road accidents occurred over a six-year period from 2008 to 2013 in Abu Dhabi were used throughout this study. In order to reduce the amount of variation in data, hierarchical clustering was applied on the data set to organize it into six different forms, each with different number of clusters (i.e., clusters from 1 to 6). Two ANN models were subsequently built for each cluster of accidents in each generated form. The first model was built and validated using all accidents (training set), whereas only 66% of the accidents were used to build the second model, and the remaining 34% were used to test it (percentage split). Finally, the weighted average accuracy was computed for each type of models in each from of data. The results show that when testing the models using the training set, clustering prior to classification achieves (11%-16%) more accuracy than without using clustering, while the percentage split achieves (2%-5%) more accuracy. The results also suggest that partitioning the accidents into six clusters achieves the best accuracy if both types of models are taken into account.

  8. Clustering dynamic textures with the hierarchical em algorithm for modeling video.

    Science.gov (United States)

    Mumtaz, Adeel; Coviello, Emanuele; Lanckriet, Gert R G; Chan, Antoni B

    2013-07-01

    Dynamic texture (DT) is a probabilistic generative model, defined over space and time, that represents a video as the output of a linear dynamical system (LDS). The DT model has been applied to a wide variety of computer vision problems, such as motion segmentation, motion classification, and video registration. In this paper, we derive a new algorithm for clustering DT models that is based on the hierarchical EM algorithm. The proposed clustering algorithm is capable of both clustering DTs and learning novel DT cluster centers that are representative of the cluster members in a manner that is consistent with the underlying generative probabilistic model of the DT. We also derive an efficient recursive algorithm for sensitivity analysis of the discrete-time Kalman smoothing filter, which is used as the basis for computing expectations in the E-step of the HEM algorithm. Finally, we demonstrate the efficacy of the clustering algorithm on several applications in motion analysis, including hierarchical motion clustering, semantic motion annotation, and learning bag-of-systems (BoS) codebooks for dynamic texture recognition.

  9. The Hierarchical Clustering of Tax Burden in the EU27

    Directory of Open Access Journals (Sweden)

    Simkova Nikola

    2015-09-01

    Full Text Available The issue of taxation has become more important due to a significant share of the government revenue. There are several ways of expressing the tax burden of countries. This paper describes the traditional approach as a share of tax revenue to GDP which is applied to the total taxation and the capital taxation as a part of tax systems affecting investment decisions. The implicit tax rate on capital created by Eurostat also offers a possible explanation of the tax burden on capital, so its components are analysed in detail. This study uses one of the econometric methods called the hierarchical clustering. The data on which the clustering is based comprises countries in the EU27 for the period of 1995 – 2012. The aim of this paper is to reveal clusters of countries in the EU27 with similar tax burden or tax changes. The findings suggest that mainly newly acceding countries (2004 and 2007 are in a group of countries with a low tax burden which tried to encourage investors by favourable tax rates. On the other hand, there are mostly countries from the original EU15. Some clusters may be explained by similar historical development, geographic and demographic characteristics.

  10. Hierarchical star cluster assembly in globally collapsing molecular clouds

    Science.gov (United States)

    Vázquez-Semadeni, Enrique; González-Samaniego, Alejandro; Colín, Pedro

    2017-05-01

    We discuss the mechanism of cluster formation in a numerical simulation of a molecular cloud (MC) undergoing global hierarchical collapse, focusing on how the gas motions in the parent cloud control the assembly of the cluster. The global collapse implies that the star formation rate (SFR) increases over time. The collapse is hierarchical because it consists of small-scale collapses within larger scale ones. The latter culminate a few Myr later than the first small-scale ones and consist of filamentary flows that accrete on to massive central clumps. The small-scale collapses consist of clumps that are embedded in the filaments and falling on to the large-scale collapse centres. The stars formed in the early, small-scale collapses share the infall motion of their parent clumps, so that the filaments feed both gas and stars to the massive central clump. This process leads to the presence of a few older stars in a region where new protostars are forming, and also to a self-similar structure, in which each unit is composed of smaller scale subunits that approach each other and may merge. Because the older stars formed in the filaments share the infall motion of the gas on to the central clump, they tend to have larger velocities and to be distributed over larger areas than the younger stars formed in the central clump. Finally, interpreting the initial mass function (IMF) simply as a probability distribution implies that massive stars only form once the local SFR is large enough to sample the IMF up to high masses. In combination with the increase of the SFR, this implies that massive stars tend to appear late in the evolution of the MC, and only in the central massive clumps. We discuss the correspondence of these features with observed properties of young stellar clusters, finding very good qualitative agreement.

  11. Multilevel hierarchical kernel spectral clustering for real-life large scale complex networks.

    Directory of Open Access Journals (Sweden)

    Raghvendra Mall

    Full Text Available Kernel spectral clustering corresponds to a weighted kernel principal component analysis problem in a constrained optimization framework. The primal formulation leads to an eigen-decomposition of a centered Laplacian matrix at the dual level. The dual formulation allows to build a model on a representative subgraph of the large scale network in the training phase and the model parameters are estimated in the validation stage. The KSC model has a powerful out-of-sample extension property which allows cluster affiliation for the unseen nodes of the big data network. In this paper we exploit the structure of the projections in the eigenspace during the validation stage to automatically determine a set of increasing distance thresholds. We use these distance thresholds in the test phase to obtain multiple levels of hierarchy for the large scale network. The hierarchical structure in the network is determined in a bottom-up fashion. We empirically showcase that real-world networks have multilevel hierarchical organization which cannot be detected efficiently by several state-of-the-art large scale hierarchical community detection techniques like the Louvain, OSLOM and Infomap methods. We show that a major advantage of our proposed approach is the ability to locate good quality clusters at both the finer and coarser levels of hierarchy using internal cluster quality metrics on 7 real-life networks.

  12. Hierarchical analysis of acceptable use policies

    Directory of Open Access Journals (Sweden)

    P. A. Laughton

    2008-01-01

    Full Text Available Acceptable use policies (AUPs are vital tools for organizations to protect themselves and their employees from misuse of computer facilities provided. A well structured, thorough AUP is essential for any organization. It is impossible for an effective AUP to deal with every clause and remain readable. For this reason, some sections of an AUP carry more weight than others, denoting importance. The methodology used to develop the hierarchical analysis is a literature review, where various sources were consulted. This hierarchical approach to AUP analysis attempts to highlight important sections and clauses dealt with in an AUP. The emphasis of the hierarchal analysis is to prioritize the objectives of an AUP.

  13. Hierarchical modeling and analysis for spatial data

    CERN Document Server

    Banerjee, Sudipto; Gelfand, Alan E

    2003-01-01

    Among the many uses of hierarchical modeling, their application to the statistical analysis of spatial and spatio-temporal data from areas such as epidemiology And environmental science has proven particularly fruitful. Yet to date, the few books that address the subject have been either too narrowly focused on specific aspects of spatial analysis, or written at a level often inaccessible to those lacking a strong background in mathematical statistics.Hierarchical Modeling and Analysis for Spatial Data is the first accessible, self-contained treatment of hierarchical methods, modeling, and dat

  14. HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree.

    Science.gov (United States)

    Obulkasim, Askar; van de Wiel, Mark A

    2015-01-01

    Hierarchical clustering (HC) is one of the most frequently used methods in computational biology in the analysis of high-dimensional genomics data. Given a data set, HC outputs a binary tree leaves of which are the data points and internal nodes represent clusters of various sizes. Normally, a fixed-height cut on the HC tree is chosen, and each contiguous branch of data points below that height is considered as a separate cluster. However, the fixed-height branch cut may not be ideal in situations where one expects a complicated tree structure with nested clusters. Furthermore, due to lack of utilization of related background information in selecting the cutoff, induced clusters are often difficult to interpret. This paper describes a novel procedure that aims to automatically extract meaningful clusters from the HC tree in a semi-supervised way. The procedure is implemented in the R package HCsnip available from Bioconductor. Rather than cutting the HC tree at a fixed-height, HCsnip probes the various way of snipping, possibly at variable heights, to tease out hidden clusters ensconced deep down in the tree. The cluster extraction process utilizes, along with the data set from which the HC tree is derived, commonly available background information. Consequently, the extracted clusters are highly reproducible and robust against various sources of variations that "haunted" high-dimensional genomics data. Since the clustering process is guided by the background information, clusters are easy to interpret. Unlike existing packages, no constraint is placed on the data type on which clustering is desired. Particularly, the package accepts patient follow-up data for guiding the cluster extraction process. To our knowledge, HCsnip is the first package that is able to decomposes the HC tree into clusters with piecewise snipping under the guidance of patient time-to-event information. Our implementation of the semi-supervised HC tree snipping framework is generic, and can

  15. An Automatic Hierarchical Delay Analysis Tool

    Institute of Scientific and Technical Information of China (English)

    FaridMheir-El-Saadi; BozenaKaminska

    1994-01-01

    The performance analysis of VLSI integrated circuits(ICs) with flat tools is slow and even sometimes impossible to complete.Some hierarchical tools have been developed to speed up the analysis of these large ICs.However,these hierarchical tools suffer from a poor interaction with the CAD database and poorly automatized operations.We introduce a general hierarchical framework for performance analysis to solve these problems.The circuit analysis is automatic under the proposed framework.Information that has been automatically abstracted in the hierarchy is kept in database properties along with the topological information.A limited software implementation of the framework,PREDICT,has also been developed to analyze the delay performance.Experimental results show that hierarchical analysis CPU time and memory requirements are low if heuristics are used during the abstraction process.

  16. Exploiting Homogeneity of Density in Incremental Hierarchical Clustering

    Directory of Open Access Journals (Sweden)

    Dwi H. Widiyantoro

    2006-11-01

    Full Text Available Hierarchical clustering is an important tool in many applications. As it involves a large data set that proliferates over time, reclustering the data set periodically is not an efficient process. Therefore, the ability to incorporate a new data set incrementally into an existing hierarchy becomes increasingly demanding. This article describes Homogen, a system that employs a new algorithm for generating a hierarchy of concepts and clusters incrementally from a stream of observations. The system aims to construct a hierarchy that satisfies the homogeneity and the monotonicity properties. Working in a bottom-up fashion, a new observation is placed in the hierarchy and a sequence of hierarchy restructuring processes is performed only in regions that have been affected by the presence of the new observation. Additionally, it combines multiple restructuring techniques that address different restructuring objectives to get a synergistic effect. The system has been tested on a variety of domains including structured and unstructured data sets. The experimental results reveal that the system is able to construct a concept hierarchy that is consistent regardless of the input data order and whose quality is comparable to the quality of those produced by non incremental clustering algorithms.

  17. Lyman Alpha Emitters in the Hierarchically Clustering Galaxy Formation

    CERN Document Server

    Kobayashi, Masakazu A R; Nagashima, Masahiro

    2007-01-01

    We present a new theoretical model for the luminosity functions (LFs) of Lyman alpha (Lya) emitting galaxies in the framework of hierarchical galaxy formation. We extend a semi-analytic model of galaxy formation that reproduces a number of observations for local galaxies, without changing the original model parameters but introducing a physically-motivated modelling to describe the escape fraction of Lya photons from host galaxies (f_esc). Though a previous study using a hierarchical clustering model simply assumed a constant and universal value of f_esc, we incorporate two new effects on f_esc: extinction by interstellar dust and galaxy-scale outflow induced as a star formation feedback. It is found that the new model nicely reproduces all the observed Lya LFs of the Lya emitters (LAEs) at different redshifts in z ~ 3--6. Our model predicts that galaxies with strong outflows and f_esc ~ 1 are dominant in the observed LFs, which is consistent with available observations while the simple universal f_esc model ...

  18. The structure of dark matter halos in hierarchical clustering theories

    CERN Document Server

    Subramanian, K; Ostriker, J P; Subramanian, Kandaswamy; Cen, Renyue; Ostriker, Jeremiah P.

    1999-01-01

    During hierarchical clustering, smaller masses generally collapse earlier than larger masses and so are denser on the average. The core of a small mass halo could be dense enough to resist disruption and survive undigested, when it is incorporated into a bigger object. We explore the possibility that a nested sequence of undigested cores in the center of the halo, which have survived the hierarchical, inhomogeneous collapse to form larger and larger objects, determines the halo structure in the inner regions. For a flat universe with $P(k) \\propto k^n$, scaling arguments then suggest that the core density profile is, $\\rho \\propto r^{-\\alpha}$ with $\\alpha = (9+3n)/(5+n)$. But whether such behaviour obtains depends on detailed dynamics. We first examine the dynamics using a fluid approach to the self-similar collapse solutions for the dark matter phase space density, including the effect of velocity dispersions. We highlight the importance of tangential velocity dispersions to obtain density profiles shallowe...

  19. Hierarchical Compressed Sensing for Cluster Based Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Vishal Krishna Singh

    2016-02-01

    Full Text Available Data transmission consumes significant amount of energy in large scale wireless sensor networks (WSNs. In such an environment, reducing the in-network communication and distributing the load evenly over the network can reduce the overall energy consumption and maximize the network lifetime significantly. In this work, the aforementioned problem of network lifetime and uneven energy consumption in large scale wireless sensor networks is addressed. This work proposes a hierarchical compressed sensing (HCS scheme to reduce the in-network communication during the data gathering process. Co-related sensor readings are collected via a hierarchical clustering scheme. A compressed sensing (CS based data processing scheme is devised to transmit the data from the source to the sink. The proposed HCS is able to identify the optimal position for the application of CS to achieve reduced and similar number of transmissions on all the nodes in the network. An activity map is generated to validate the reduced and uniformly distributed communication load of the WSN. Based on the number of transmissions per data gathering round, the bit-hop metric model is used to analyse the overall energy consumption. Simulation results validate the efficiency of the proposed method over the existing CS based approaches.

  20. Hand Tracking based on Hierarchical Clustering of Range Data

    CERN Document Server

    Cespi, Roberto; Lindner, Marvin

    2011-01-01

    Fast and robust hand segmentation and tracking is an essential basis for gesture recognition and thus an important component for contact-less human-computer interaction (HCI). Hand gesture recognition based on 2D video data has been intensively investigated. However, in practical scenarios purely intensity based approaches suffer from uncontrollable environmental conditions like cluttered background colors. In this paper we present a real-time hand segmentation and tracking algorithm using Time-of-Flight (ToF) range cameras and intensity data. The intensity and range information is fused into one pixel value, representing its combined intensity-depth homogeneity. The scene is hierarchically clustered using a GPU based parallel merging algorithm, allowing a robust identification of both hands even for inhomogeneous backgrounds. After the detection, both hands are tracked on the CPU. Our tracking algorithm can cope with the situation that one hand is temporarily covered by the other hand.

  1. Bayesian hierarchical clustering for studying cancer gene expression data with unknown statistics.

    Directory of Open Access Journals (Sweden)

    Korsuk Sirinukunwattana

    Full Text Available Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. The implementation of GBHC is available at https://sites.google.com/site/gaussianbhc/

  2. A Framework for Analyzing Software Quality using Hierarchical Clustering

    Directory of Open Access Journals (Sweden)

    Arashdeep Kaur

    2011-02-01

    Full Text Available Fault proneness data available in the early software life cycle from previous releases or similar kind of projects will aid in improving software quality estimations. Various techniques have been proposed in the literature which includes statistical method, machine learning methods, neural network techniques and clustering techniques for the prediction of faulty and non faulty modules in the project. In this study, Hierarchical clustering algorithm is being trained and tested with lifecycle data collected from NASA projects namely, CM1, PC1 and JM1 as predictive models. These predictive models contain requirement metrics and static code metrics. We have combined requirement metric model with static code metric model to get fusion metric model. Further we have investigated that which of the three prediction models is found to be the best prediction model on the basis of fault detection. The basic hypothesis of software quality estimation is that automatic quality prediction models enable verificationexperts to concentrate their attention and resources at problem areas of the system under development. The proposed approach has been implemented in MATLAB 7.4. The results show that when all the prediction techniques are evaluated, the best prediction model is found to be the fusion metric model. This proposed model is also compared with other quality models available in the literature and is found to be efficient for predicting faulty modules.

  3. Using Dynamic Quantum Clustering to Analyze Hierarchically Heterogeneous Samples on the Nanoscale

    Energy Technology Data Exchange (ETDEWEB)

    Hume, Allison; /Princeton U. /SLAC

    2012-09-07

    Dynamic Quantum Clustering (DQC) is an unsupervised, high visual data mining technique. DQC was tested as an analysis method for X-ray Absorption Near Edge Structure (XANES) data from the Transmission X-ray Microscopy (TXM) group. The TXM group images hierarchically heterogeneous materials with nanoscale resolution and large field of view. XANES data consists of energy spectra for each pixel of an image. It was determined that DQC successfully identifies structure in data of this type without prior knowledge of the components in the sample. Clusters and sub-clusters clearly reflected features of the spectra that identified chemical component, chemical environment, and density in the image. DQC can also be used in conjunction with the established data analysis technique, which does require knowledge of components present.

  4. Determination of genetic structure of germplasm collections: are traditional hierarchical clustering methods appropriate for molecular marker data?

    NARCIS (Netherlands)

    Odong, T.L.; Heerwaarden, van J.; Jansen, J.; Hintum, van T.J.L.; Eeuwijk, van F.A.

    2011-01-01

    Despite the availability of newer approaches, traditional hierarchical clustering remains very popular in genetic diversity studies in plants. However, little is known about its suitability for molecular marker data. We studied the performance of traditional hierarchical clustering techniques using

  5. Hierarchical Analysis of the Omega Ontology

    Energy Technology Data Exchange (ETDEWEB)

    Joslyn, Cliff A.; Paulson, Patrick R.

    2009-12-01

    Initial delivery for mathematical analysis of the Omega Ontology. We provide an analysis of the hierarchical structure of a version of the Omega Ontology currently in use within the US Government. After providing an initial statistical analysis of the distribution of all link types in the ontology, we then provide a detailed order theoretical analysis of each of the four main hierarchical links present. This order theoretical analysis includes the distribution of components and their properties, their parent/child and multiple inheritance structure, and the distribution of their vertical ranks.

  6. Evolutionary-Hierarchical Bases of the Formation of Cluster Model of Innovation Economic Development

    Directory of Open Access Journals (Sweden)

    Yuliya Vladimirovna Dubrovskaya

    2016-10-01

    Full Text Available The functioning of a modern economic system is based on the interaction of objects of different hierarchical levels. Thus, the problem of the study of innovation processes taking into account the mutual influence of the activities of these economic actors becomes important. The paper dwells evolutionary basis for the formation of models of innovation development on the basis of micro and macroeconomic analysis. Most of the concepts recognized that despite a big number of diverse models, the coordination of the relations between economic agents is of crucial importance for the successful innovation development. According to the results of the evolutionary-hierarchical analysis, the authors reveal key phases of the development of forms of business cooperation, science and government in the domestic economy. It has become the starting point of the conception of the characteristics of the interaction in the cluster models of innovation development of the economy. Considerable expectancies on improvement of the national innovative system are connected with the development of cluster and network structures. The main objective of government authorities is the formation of mechanisms and institutions that will foster cooperation between members of the clusters. The article explains that the clusters cannot become the factors in the growth of the national economy, not being an effective tool for interaction between the actors of the regional innovative systems.

  7. Hierarchical Cluster Analysis of Three-Dimensional Reconstructions of Unbiased Sampled Microglia Shows not Continuous Morphological Changes from Stage 1 to 2 after Multiple Dengue Infections in Callithrix penicillata

    Science.gov (United States)

    Diniz, Daniel G.; Silva, Geane O.; Naves, Thaís B.; Fernandes, Taiany N.; Araújo, Sanderson C.; Diniz, José A. P.; de Farias, Luis H. S.; Sosthenes, Marcia C. K.; Diniz, Cristovam G.; Anthony, Daniel C.; da Costa Vasconcelos, Pedro F.; Picanço Diniz, Cristovam W.

    2016-01-01

    It is known that microglial morphology and function are related, but few studies have explored the subtleties of microglial morphological changes in response to specific pathogens. In the present report we quantitated microglia morphological changes in a monkey model of dengue disease with virus CNS invasion. To mimic multiple infections that usually occur in endemic areas, where higher dengue infection incidence and abundant mosquito vectors carrying different serotypes coexist, subjects received once a week subcutaneous injections of DENV3 (genotype III)-infected culture supernatant followed 24 h later by an injection of anti-DENV2 antibody. Control animals received either weekly anti-DENV2 antibodies, or no injections. Brain sections were immunolabeled for DENV3 antigens and IBA-1. Random and systematic microglial samples were taken from the polymorphic layer of dentate gyrus for 3-D reconstructions, where we found intense immunostaining for TNFα and DENV3 virus antigens. We submitted all bi- or multimodal morphological parameters of microglia to hierarchical cluster analysis and found two major morphological phenotypes designated types I and II. Compared to type I (stage 1), type II microglia were more complex; displaying higher number of nodes, processes and trees and larger surface area and volumes (stage 2). Type II microglia were found only in infected monkeys, whereas type I microglia was found in both control and infected subjects. Hierarchical cluster analysis of morphological parameters of 3-D reconstructions of random and systematic selected samples in control and ADE dengue infected monkeys suggests that microglia morphological changes from stage 1 to stage 2 may not be continuous. PMID:27047345

  8. 基于改进层次聚类的同家族变压器状态变化规律分析%Condition evolution regularity analysis of power transformer in the same family based on improved hierarchical clustering

    Institute of Scientific and Technical Information of China (English)

    李新叶; 李新芳

    2011-01-01

    Family quality default history affects the healthy condition of power transformer greatly in integrated condition assessment. And now, it is usually subjectively decided by expert's experience. A new quantitatively computing method is proposed, that is, using hierarchical clustering technology to analyze the potential evolution regularity and then computing the influence degree of family quality default history on healthy condition of power transformer. To make the clustering result more accurate, line slope distance of condition evolution is proposed as line shape similarity criterion, both data distance criterion and line slope distance criterion are used to cluster data. The experimental result shows that our method is better than traditional hierarchical clustering method, and it is more reasonable to use clustering analysis to calculate the influence degree of family quality default history on power transformer healthy condition.%在变压器状态综合评估的研究中,家族质量缺陷史对变压器健康状态有重要影响,目前多是凭专家经验主观确定.提出利用层次聚类分析技术对同家族变压器状态变化规律进行分析,根据分析结果定量计算家族质量缺陷史对变压器健康状态的影响程度.为提高聚类的准确性,提出用变压器状态变化曲线的斜率距离作为曲线形状的相似性判据,同时用曲线间点数值距离和斜率距离构成交集约束判据进行聚类.实例分析表明改进的层次聚类算法优于传统的层次聚类算法,由聚类分析结果计算家族质量缺陷史对变压器健康状态的影响得出的结果更合理.

  9. SHIPS: Spectral Hierarchical clustering for the Inference of Population Structure in genetic studies.

    Science.gov (United States)

    Bouaziz, Matthieu; Paccard, Caroline; Guedj, Mickael; Ambroise, Christophe

    2012-01-01

    Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to cluster individuals into genetically homogeneous sub-populations. The parametric algorithms, such as Structure, are very popular but their underlying complexity and their high computational cost led to the development of faster parametric alternatives such as Admixture. Alternatives to these methods are the non-parametric approaches. Among this category, AWclust has proven efficient but fails to properly identify population structure for complex datasets. We present in this article a new clustering algorithm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS), based on a divisive hierarchical clustering strategy, allowing a progressive investigation of population structure. This method takes genetic data as input to cluster individuals into homogeneous sub-populations and with the use of the gap statistic estimates the optimal number of such sub-populations. SHIPS was applied to a set of simulated discrete and admixed datasets and to real SNP datasets, that are data from the HapMap and Pan-Asian SNP consortium. The programs Structure, Admixture, AWclust and PCAclust were also investigated in a comparison study. SHIPS and the parametric approach Structure were the most accurate when applied to simulated datasets both in terms of individual assignments and estimation of the correct number of clusters. The analysis of the results on the real datasets highlighted that the clusterings of SHIPS were the more consistent with the population labels or those produced by the Admixture program. The performances of SHIPS when applied to SNP data, along with its relatively low computational cost and its ease of use make this method a promising

  10. A supplier selection using a hybrid grey based hierarchical clustering and artificial bee colony

    Directory of Open Access Journals (Sweden)

    Farshad Faezy Razi

    2014-06-01

    Full Text Available Selection of one or a combination of the most suitable potential providers and outsourcing problem is the most important strategies in logistics and supply chain management. In this paper, selection of an optimal combination of suppliers in inventory and supply chain management are studied and analyzed via multiple attribute decision making approach, data mining and evolutionary optimization algorithms. For supplier selection in supply chain, hierarchical clustering according to the studied indexes first clusters suppliers. Then, according to its cluster, each supplier is evaluated through Grey Relational Analysis. Then the combination of suppliers’ Pareto optimal rank and costs are obtained using Artificial Bee Colony meta-heuristic algorithm. A case study is conducted for a better description of a new algorithm to select a multiple source of suppliers.

  11. The formation of NGC 3603 young starburst cluster: "prompt" hierarchical assembly or monolithic starburst?

    CERN Document Server

    Banerjee, Sambaran

    2014-01-01

    The formation of very young massive clusters or "starburst" clusters is currently one of the most widely debated topic in astronomy. The classical notion dictates that a star cluster is formed in-situ in a dense molecular gas clump followed by a substantial residual gas expulsion. On the other hand, based on the observed morphologies of many young stellar associations, a hierarchical formation scenario is alternatively suggested. A very young (age $\\approx$ 1 Myr), massive ($>10^4M_\\odot$) star cluster like the Galactic NGC 3603 young cluster (HD 97950) is an appropriate testbed for distinguishing between such "monolithic" and "hierarchical" formation scenarios. A recent study by Banerjee and Kroupa (2014) demonstrates that the monolithic scenario remarkably reproduces the HD 97950 cluster. In the present work, we explore the possibility of the formation of the above cluster via hierarchical assembly of subclusters. These subclusters are initially distributed over a wide range of spatial volumes and have vari...

  12. A dynamic hierarchical clustering method for trajectory-based unusual video event detection.

    Science.gov (United States)

    Jiang, Fan; Wu, Ying; Katsaggelos, Aggelos K

    2009-04-01

    The proposed unusual video event detection method is based on unsupervised clustering of object trajectories, which are modeled by hidden Markov models (HMM). The novelty of the method includes a dynamic hierarchical process incorporated in the trajectory clustering algorithm to prevent model overfitting and a 2-depth greedy search strategy for efficient clustering.

  13. Prioritizing the risk of plant pests by clustering methods; self-organising maps, k-means and hierarchical clustering

    Directory of Open Access Journals (Sweden)

    Susan Worner

    2013-09-01

    -means, hierarchical clustering and the incorporation of the SOM analysis into criteria based approaches to assess pest risk.

  14. CLUSTAG & WCLUSTAG: Hierarchical Clustering Algorithms for Efficient Tag-SNP Selection

    Science.gov (United States)

    Ao, Sio-Iong

    More than 6 million single nucleotide polymorphisms (SNPs) in the human genome have been genotyped by the HapMap project. Although only a pro portion of these SNPs are functional, all can be considered as candidate markers for indirect association studies to detect disease-related genetic variants. The complete screening of a gene or a chromosomal region is nevertheless an expensive undertak ing for association studies. A key strategy for improving the efficiency of association studies is to select a subset of informative SNPs, called tag SNPs, for analysis. In the chapter, hierarchical clustering algorithms have been proposed for efficient tag SNP selection.

  15. Hierarchical and Non-Hierarchical Linear and Non-Linear Clustering Methods to “Shakespeare Authorship Question”

    Directory of Open Access Journals (Sweden)

    Refat Aljumily

    2015-09-01

    Full Text Available A few literary scholars have long claimed that Shakespeare did not write some of his best plays (history plays and tragedies and proposed at one time or another various suspect authorship candidates. Most modern-day scholars of Shakespeare have rejected this claim, arguing that strong evidence that Shakespeare wrote the plays and poems being his name appears on them as the author. This has caused and led to an ongoing scholarly academic debate for quite some long time. Stylometry is a fast-growing field often used to attribute authorship to anonymous or disputed texts. Stylometric attempts to resolve this literary puzzle have raised interesting questions over the past few years. The following paper contributes to “the Shakespeare authorship question” by using a mathematically-based methodology to examine the hypothesis that Shakespeare wrote all the disputed plays traditionally attributed to him. More specifically, the mathematically based methodology used here is based on Mean Proximity, as a linear hierarchical clustering method, and on Principal Components Analysis, as a non-hierarchical linear clustering method. It is also based, for the first time in the domain, on Self-Organizing Map U-Matrix and Voronoi Map, as non-linear clustering methods to cover the possibility that our data contains significant non-linearities. Vector Space Model (VSM is used to convert texts into vectors in a high dimensional space. The aim of which is to compare the degrees of similarity within and between limited samples of text (the disputed plays. The various works and plays assumed to have been written by Shakespeare and possible authors notably, Sir Francis Bacon, Christopher Marlowe, John Fletcher, and Thomas Kyd, where “similarity” is defined in terms of correlation/distance coefficient measure based on the frequency of usage profiles of function words, word bi-grams, and character triple-grams. The claim that Shakespeare authored all the disputed

  16. Analysis hierarchical model for discrete event systems

    Science.gov (United States)

    Ciortea, E. M.

    2015-11-01

    The This paper presents the hierarchical model based on discrete event network for robotic systems. Based on the hierarchical approach, Petri network is analysed as a network of the highest conceptual level and the lowest level of local control. For modelling and control of complex robotic systems using extended Petri nets. Such a system is structured, controlled and analysed in this paper by using Visual Object Net ++ package that is relatively simple and easy to use, and the results are shown as representations easy to interpret. The hierarchical structure of the robotic system is implemented on computers analysed using specialized programs. Implementation of hierarchical model discrete event systems, as a real-time operating system on a computer network connected via a serial bus is possible, where each computer is dedicated to local and Petri model of a subsystem global robotic system. Since Petri models are simplified to apply general computers, analysis, modelling, complex manufacturing systems control can be achieved using Petri nets. Discrete event systems is a pragmatic tool for modelling industrial systems. For system modelling using Petri nets because we have our system where discrete event. To highlight the auxiliary time Petri model using transport stream divided into hierarchical levels and sections are analysed successively. Proposed robotic system simulation using timed Petri, offers the opportunity to view the robotic time. Application of goods or robotic and transmission times obtained by measuring spot is obtained graphics showing the average time for transport activity, using the parameters sets of finished products. individually.

  17. Semi-supervised consensus clustering for gene expression data analysis

    OpenAIRE

    Wang, Yunli; Pan, Youlian

    2014-01-01

    Background Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and do...

  18. Marketing research cluster analysis

    Directory of Open Access Journals (Sweden)

    Marić Nebojša

    2002-01-01

    Full Text Available One area of applications of cluster analysis in marketing is identification of groups of cities and towns with similar demographic profiles. This paper considers main aspects of cluster analysis by an example of clustering 12 cities with the use of Minitab software.

  19. Demographic Data Assessment using Novel 3DCCOM Spatial Hierarchical Clustering: A Case Study of Sonipat Block, Haryana

    Directory of Open Access Journals (Sweden)

    Mamta Malik

    2011-09-01

    Full Text Available Cluster detection is a tool employed by GIS scientists who specialize in the field of spatial analysis. This study employed a combination of GIS, RS and a novel 3DCCOM spatial data clustering algorithm to assess the rural demographic development strategies of Sonepat block, Haryana, India. This Study is undertaken in the rural and rural-based district in India to demonstrate the integration of village-level spatial and non-spatial data in GIS environment using Hierarchical Clustering. Spatial clusters of living standard parameters, including family members, male and female population, sex ratio, total male and female education ratio etc. The paper also envisages future development and usefulness of this community GIS, Spatial data clustering tool for grass-root level planning. Any data that showsgeographic (spatial variability can be subject to cluster analysis.

  20. Hierarchical Clustering Algorithm based on Attribute Dependency for Attention Deficit Hyperactive Disorder

    Directory of Open Access Journals (Sweden)

    J Anuradha

    2014-05-01

    Full Text Available Attention Deficit Hyperactive Disorder (ADHD is a disruptive neurobehavioral disorder characterized by abnormal behavioral patterns in attention, perusing activity, acting impulsively and combined types. It is predominant among school going children and it is tricky to differentiate between an active and an ADHD child. Misdiagnosis and undiagnosed cases are very common. Behavior patterns are identified by the mentors in the academic environment who lack skills in screening those kids. Hence an unsupervised learning algorithm can cluster the behavioral patterns of children at school for diagnosis of ADHD. In this paper, we propose a hierarchical clustering algorithm to partition the dataset based on attribute dependency (HCAD. HCAD forms clusters of data based on the high dependent attributes and their equivalence relation. It is capable of handling large volumes of data with reasonably faster clustering than most of the existing algorithms. It can work on both labeled and unlabelled data sets. Experimental results reveal that this algorithm has higher accuracy in comparison to other algorithms. HCAD achieves 97% of cluster purity in diagnosing ADHD. Empirical analysis of application of HCAD on different data sets from UCI repository is provided.

  1. Cluster Correspondence Analysis

    NARCIS (Netherlands)

    M. van de Velden (Michel); A. Iodice D' Enza; F. Palumbo

    2014-01-01

    markdownabstract__Abstract__ A new method is proposed that combines dimension reduction and cluster analysis for categorical data. A least-squares objective function is formulated that approximates the cluster by variables cross-tabulation. Individual observations are assigned to clusters

  2. The Hierarchical Distribution of the Young Stellar Clusters in Six Local Star-forming Galaxies

    Science.gov (United States)

    Grasha, K.; Calzetti, D.; Adamo, A.; Kim, H.; Elmegreen, B. G.; Gouliermis, D. A.; Dale, D. A.; Fumagalli, M.; Grebel, E. K.; Johnson, K. E.; Kahre, L.; Kennicutt, R. C.; Messa, M.; Pellerin, A.; Ryon, J. E.; Smith, L. J.; Shabani, F.; Thilker, D.; Ubeda, L.

    2017-05-01

    We present a study of the hierarchical clustering of the young stellar clusters in six local (3-15 Mpc) star-forming galaxies using Hubble Space Telescope broadband WFC3/UVIS UV and optical images from the Treasury Program LEGUS (Legacy ExtraGalactic UV Survey). We identified 3685 likely clusters and associations, each visually classified by their morphology, and we use the angular two-point correlation function to study the clustering of these stellar systems. We find that the spatial distribution of the young clusters and associations are clustered with respect to each other, forming large, unbound hierarchical star-forming complexes that are in general very young. The strength of the clustering decreases with increasing age of the star clusters and stellar associations, becoming more homogeneously distributed after ˜40-60 Myr and on scales larger than a few hundred parsecs. In all galaxies, the associations exhibit a global behavior that is distinct and more strongly correlated from compact clusters. Thus, populations of clusters are more evolved than associations in terms of their spatial distribution, traveling significantly from their birth site within a few tens of Myr, whereas associations show evidence of disruption occurring very quickly after their formation. The clustering of the stellar systems resembles that of a turbulent interstellar medium that drives the star formation process, correlating the components in unbound star-forming complexes in a hierarchical manner, dispersing shortly after formation, suggestive of a single, continuous mode of star formation across all galaxies.

  3. Content Based Image Retrieval using Hierarchical and K-Means Clustering Techniques

    Directory of Open Access Journals (Sweden)

    V.S.V.S. Murthy

    2010-03-01

    Full Text Available In this paper we present an image retrieval system that takes an image as the input query and retrieves images based on image content. Content Based Image Retrieval is an approach for retrieving semantically-relevant images from an image database based on automatically-derived image features. The unique aspect of the system is the utilization of hierarchical and k-means clustering techniques. The proposed procedure consists of two stages. First, here we are going to filter most of the images in the hierarchical clustering and then apply the clustered images to KMeans, so that we can get better favored image results.

  4. 3D Nearest Neighbour Search Using a Clustered Hierarchical Tree Structure

    DEFF Research Database (Denmark)

    Suhaibah, A.; Uznir, U.; Antón Castro, Francesc/François

    2016-01-01

    , with the immense number of urban datasets, the retrieval and analysis of nearest neighbour information and their efficiency will become more complex and crucial. In this paper, we present a technique to retrieve nearest neighbour information in 3D space using a clustered hierarchical tree structure. Based on our...... findings, the proposed approach substantially showed an improvement of response time analysis compared to existing approaches of spatial access methods in databases. The query performance was tested using a dataset consisting of 500,000 point locations building and franchising unit. The results...... of the franchise unit will be located or is the franchise unit located is at the best level for visibility purposes. One of the common used analyses used for retrieving the surrounding information is Nearest Neighbour (NN) analysis. It uses a point location and identifies the surrounding neighbours. However...

  5. Hierarchical Control for Multiple DC-Microgrids Clusters

    DEFF Research Database (Denmark)

    Shafiee, Qobad; Dragicevic, Tomislav; Vasquez, Juan Carlos

    2014-01-01

    DC microgrids (MGs) have gained research interest during the recent years because of many potential advantages as compared to the ac system. To ensure reliable operation of a low-voltage dc MG as well as its intelligent operation with the other DC MGs, a hierarchical control is proposed in this p......DC microgrids (MGs) have gained research interest during the recent years because of many potential advantages as compared to the ac system. To ensure reliable operation of a low-voltage dc MG as well as its intelligent operation with the other DC MGs, a hierarchical control is proposed...

  6. CLEAN: CLustering Enrichment ANalysis

    Directory of Open Access Journals (Sweden)

    Medvedovic Mario

    2009-07-01

    Full Text Available Abstract Background Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyses have been used to identify biologically meaningful clusters, compare clustering algorithms and identify biological pathways associated with the biological process under investigation. Results We developed a computational framework for analytically and visually integrating knowledge-based functional categories with the cluster analysis of genomics data. The framework is based on the simple, conceptually appealing, and biologically interpretable gene-specific functional coherence score (CLEAN score. The score is derived by correlating the clustering structure as a whole with functional categories of interest. We directly demonstrate that integrating biological knowledge in this way improves the reproducibility of conclusions derived from cluster analysis. The CLEAN score differentiates between the levels of functional coherence for genes within the same cluster based on their membership in enriched functional categories. We show that this aspect results in higher reproducibility across independent datasets and produces more informative genes for distinguishing different sample types than the scores based on the traditional cluster-wide analysis. We also demonstrate the utility of the CLEAN framework in comparing clusterings produced by different algorithms. CLEAN was implemented as an add-on R package and can be downloaded at http://Clusteranalysis.org. The package integrates routines for calculating gene specific functional coherence scores and the open source interactive Java-based viewer Functional TreeView (FTreeView. Conclusion Our results indicate that using the gene-specific functional coherence score improves the reproducibility of the

  7. DATA CLASSIFICATION WITH NEURAL CLASSIFIER USING RADIAL BASIS FUNCTION WITH DATA REDUCTION USING HIERARCHICAL CLUSTERING

    Directory of Open Access Journals (Sweden)

    M. Safish Mary

    2012-04-01

    Full Text Available Classification of large amount of data is a time consuming process but crucial for analysis and decision making. Radial Basis Function networks are widely used for classification and regression analysis. In this paper, we have studied the performance of RBF neural networks to classify the sales of cars based on the demand, using kernel density estimation algorithm which produces classification accuracy comparable to data classification accuracy provided by support vector machines. In this paper, we have proposed a new instance based data selection method where redundant instances are removed with help of a threshold thus improving the time complexity with improved classification accuracy. The instance based selection of the data set will help reduce the number of clusters formed thereby reduces the number of centers considered for building the RBF network. Further the efficiency of the training is improved by applying a hierarchical clustering technique to reduce the number of clusters formed at every step. The paper explains the algorithm used for classification and for conditioning the data. It also explains the complexities involved in classification of sales data for analysis and decision-making.

  8. Cluster Correspondence Analysis.

    Science.gov (United States)

    van de Velden, M; D'Enza, A Iodice; Palumbo, F

    2017-03-01

    A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.

  9. Using hierarchical clustering methods to classify motor activities of COPD patients from wearable sensor data

    Directory of Open Access Journals (Sweden)

    Reilly John J

    2005-06-01

    Full Text Available Abstract Background Advances in miniature sensor technology have led to the development of wearable systems that allow one to monitor motor activities in the field. A variety of classifiers have been proposed in the past, but little has been done toward developing systematic approaches to assess the feasibility of discriminating the motor tasks of interest and to guide the choice of the classifier architecture. Methods A technique is introduced to address this problem according to a hierarchical framework and its use is demonstrated for the application of detecting motor activities in patients with chronic obstructive pulmonary disease (COPD undergoing pulmonary rehabilitation. Accelerometers were used to collect data for 10 different classes of activity. Features were extracted to capture essential properties of the data set and reduce the dimensionality of the problem at hand. Cluster measures were utilized to find natural groupings in the data set and then construct a hierarchy of the relationships between clusters to guide the process of merging clusters that are too similar to distinguish reliably. It provides a means to assess whether the benefits of merging for performance of a classifier outweigh the loss of resolution incurred through merging. Results Analysis of the COPD data set demonstrated that motor tasks related to ambulation can be reliably discriminated from tasks performed in a seated position with the legs in motion or stationary using two features derived from one accelerometer. Classifying motor tasks within the category of activities related to ambulation requires more advanced techniques. While in certain cases all the tasks could be accurately classified, in others merging clusters associated with different motor tasks was necessary. When merging clusters, it was found that the proposed method could lead to more than 12% improvement in classifier accuracy while retaining resolution of 4 tasks. Conclusion Hierarchical

  10. Using hierarchical clustering methods to classify motor activities of COPD patients from wearable sensor data

    Science.gov (United States)

    Sherrill, Delsey M; Moy, Marilyn L; Reilly, John J; Bonato, Paolo

    2005-01-01

    Background Advances in miniature sensor technology have led to the development of wearable systems that allow one to monitor motor activities in the field. A variety of classifiers have been proposed in the past, but little has been done toward developing systematic approaches to assess the feasibility of discriminating the motor tasks of interest and to guide the choice of the classifier architecture. Methods A technique is introduced to address this problem according to a hierarchical framework and its use is demonstrated for the application of detecting motor activities in patients with chronic obstructive pulmonary disease (COPD) undergoing pulmonary rehabilitation. Accelerometers were used to collect data for 10 different classes of activity. Features were extracted to capture essential properties of the data set and reduce the dimensionality of the problem at hand. Cluster measures were utilized to find natural groupings in the data set and then construct a hierarchy of the relationships between clusters to guide the process of merging clusters that are too similar to distinguish reliably. It provides a means to assess whether the benefits of merging for performance of a classifier outweigh the loss of resolution incurred through merging. Results Analysis of the COPD data set demonstrated that motor tasks related to ambulation can be reliably discriminated from tasks performed in a seated position with the legs in motion or stationary using two features derived from one accelerometer. Classifying motor tasks within the category of activities related to ambulation requires more advanced techniques. While in certain cases all the tasks could be accurately classified, in others merging clusters associated with different motor tasks was necessary. When merging clusters, it was found that the proposed method could lead to more than 12% improvement in classifier accuracy while retaining resolution of 4 tasks. Conclusion Hierarchical clustering methods are relevant

  11. Evaluation by hierarchical clustering of multiple cytokine expression after phytohemagglutinin stimulation

    Directory of Open Access Journals (Sweden)

    Yang Chunhe

    2016-01-01

    Full Text Available The hierarchical clustering method has been used for exploration of gene expression and proteomic profiles; however, little research into its application in the examination of expression of multiplecytokine/chemokine responses to stimuli has been reported. Thus, little progress has been made on how phytohemagglutinin(PHA affects cytokine expression profiling on a large scale in the human hematological system. To investigate the characteristic expression pattern under PHA stimulation, Luminex, a multiplex bead-based suspension array, was performed. The data set collected from human peripheral blood mononuclear cells (PBMC was analyzed using the hierarchical clustering method. It was revealed that two specific chemokines (CCL3 andCCL4 underwent significantly greater quantitative changes during induction of expression than other tested cytokines/chemokines after PHA stimulation. This result indicates that hierarchical clustering is a useful tool for detecting fine patterns during exploration of biological data, and that it can play an important role in comparative studies.

  12. CHIMERA: Top-down model for hierarchical, overlapping and directed cluster structures in directed and weighted complex networks

    Science.gov (United States)

    Franke, R.

    2016-11-01

    In many networks discovered in biology, medicine, neuroscience and other disciplines special properties like a certain degree distribution and hierarchical cluster structure (also called communities) can be observed as general organizing principles. Detecting the cluster structure of an unknown network promises to identify functional subdivisions, hierarchy and interactions on a mesoscale. It is not trivial choosing an appropriate detection algorithm because there are multiple network, cluster and algorithmic properties to be considered. Edges can be weighted and/or directed, clusters overlap or build a hierarchy in several ways. Algorithms differ not only in runtime, memory requirements but also in allowed network and cluster properties. They are based on a specific definition of what a cluster is, too. On the one hand, a comprehensive network creation model is needed to build a large variety of benchmark networks with different reasonable structures to compare algorithms. On the other hand, if a cluster structure is already known, it is desirable to separate effects of this structure from other network properties. This can be done with null model networks that mimic an observed cluster structure to improve statistics on other network features. A third important application is the general study of properties in networks with different cluster structures, possibly evolving over time. Currently there are good benchmark and creation models available. But what is left is a precise sandbox model to build hierarchical, overlapping and directed clusters for undirected or directed, binary or weighted complex random networks on basis of a sophisticated blueprint. This gap shall be closed by the model CHIMERA (Cluster Hierarchy Interconnection Model for Evaluation, Research and Analysis) which will be introduced and described here for the first time.

  13. Hierarchical trie packet classification algorithm based on expectation-maximization clustering

    Science.gov (United States)

    Bi, Xia-an; Zhao, Junxia

    2017-01-01

    With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm. PMID:28704476

  14. The Evolution of Galaxy Clustering in Hierarchical Models

    OpenAIRE

    1999-01-01

    The main ingredients of recent semi-analytic models of galaxy formation are summarised. We present predictions for the galaxy clustering properties of a well specified LCDM model whose parameters are constrained by observed local galaxy properties. We present preliminary predictions for evolution of clustering that can be probed with deep pencil beam surveys.

  15. The evolution of Brightest Cluster Galaxies in a hierarchical universe

    CERN Document Server

    Tonini, Chiara; Croton, Darren; Maraston, Claudia; Thomas, Daniel

    2012-01-01

    We investigate the evolution of Brightest Cluster Galaxies (BCGs) from redshift z~1.6 to z~0. We use the semi-analytic model of Croton et al. (2006) with a new spectro-photometric model based on the Maraston (2005) stellar populations and a new recipe for the dust extinction. We compare the model predictions of the K-band luminosity evolution and the J-K, V-I and I-K colour evolution with a series of datasets, including Collins et al. (Nature, 2009) who argued that semi-analytic models based on the Millennium simulation cannot reproduce the red colours and high luminosity of BCGs at z>1. We show instead that the model is well in range of the observed luminosity and correctly reproduces the colour evolution of BCGs in the whole redshift range up to z~1.6. We argue that the success of the semi-analytic model is in large part due to the implementation of a more sophisticated spectro-photometric model. An analysis of the model BCGs shows an increase in mass by a factor ~2 since z~1, and star formation activity do...

  16. Clinical fracture risk evaluated by hierarchical agglomerative clustering

    DEFF Research Database (Denmark)

    Kruse, Christian; Eiken, P; Vestergaard, P

    2017-01-01

    profiles. INTRODUCTION: The purposes of this study were to establish and quantify patient clusters of high, average and low fracture risk using an unsupervised machine learning algorithm. METHODS: Regional and national Danish patient data on dual-energy X-ray absorptiometry (DXA) scans, medication...... containing less than 250 subjects. Clusters were identified as high, average or low fracture risk based on bone mineral density (BMD) characteristics. Cluster-based descriptive statistics and relative Z-scores for variable means were computed. RESULTS: Ten thousand seven hundred seventy-five women were...... as low fracture risk with high to very high BMD. A mean age of 60 years was the earliest that allowed for separation of high-risk clusters. DXA scan results could identify high-risk subjects with different antiresorptive treatment compliance levels based on similarities and differences in lumbar spine...

  17. APROACHES TOWARDS CLUSTER ANALYSIS

    National Research Council Canada - National Science Library

    Manuela Tvaronaviciene; Kristina Razminiene; Leonardo Piccinetti

    2015-01-01

    .... The findings indicate that case study is used in many articles refering to cluster research. Other methods, such as analysis, interview, survey, research, equation and others are used to support case study...

  18. Modeling Hierarchically Clustered Longitudinal Survival Processes with Applications to Child Mortality and Maternal Health

    Directory of Open Access Journals (Sweden)

    Kuate-Defo, Bathélémy

    2001-01-01

    Full Text Available EnglishThis paper merges two parallel developments since the 1970s of newstatistical tools for data analysis: statistical methods known as hazard models that are used foranalyzing event-duration data and statistical methods for analyzing hierarchically clustered dataknown as multilevel models. These developments have rarely been integrated in research practice andthe formalization and estimation of models for hierarchically clustered survival data remain largelyuncharted. I attempt to fill some of this gap and demonstrate the merits of formulating and estimatingmultilevel hazard models with longitudinal data.FrenchCette étude intègre deux approches statistiques de pointe d'analyse des donnéesquantitatives depuis les années 70: les méthodes statistiques d'analyse desdonnées biographiques ou méthodes de survie et les méthodes statistiquesd'analyse des données hiérarchiques ou méthodes multi-niveaux. Ces deuxapproches ont été très peu mis en symbiose dans la pratique de recherche et parconséquent, la formulation et l'estimation des modèles appropriés aux donnéeslongitudinales et hiérarchiquement nichées demeure essentiellement un champd'investigation vierge. J'essaye de combler ce vide et j'utilise des données réellesen santé publique pour démontrer les mérites et contextes de formulation etd'estimation des modèles multi-niveaux et multi-états des données biographiqueset longitudinales.

  19. Kendall’s tau and agglomerative clustering for structure determination of hierarchical Archimedean copulas

    Directory of Open Access Journals (Sweden)

    Górecki J.

    2017-01-01

    Full Text Available Several successful approaches to structure determination of hierarchical Archimedean copulas (HACs proposed in the literature rely on agglomerative clustering and Kendall’s correlation coefficient. However, there has not been presented any theoretical proof justifying such approaches. This work fills this gap and introduces a theorem showing that, given the matrix of the pairwise Kendall correlation coefficients corresponding to a HAC, its structure can be recovered by an agglomerative clustering technique.

  20. Prediction of in vitro and in vivo oestrogen receptor activity using hierarchical clustering

    Science.gov (United States)

    In this study, hierarchical clustering classification models were developed to predict in vitro and in vivo oestrogen receptor (ER) activity. Classification models were developed for binding, agonist, and antagonist in vitro ER activity and for mouse in vivo uterotrophic ER bindi...

  1. Prediction of in vitro and in vivo oestrogen receptor activity using hierarchical clustering

    Science.gov (United States)

    In this study, hierarchical clustering classification models were developed to predict in vitro and in vivo oestrogen receptor (ER) activity. Classification models were developed for binding, agonist, and antagonist in vitro ER activity and for mouse in vivo uterotrophic ER bindi...

  2. Hierarchical analysis of the quiet Sun magnetism

    CERN Document Server

    Ramos, A Asensio

    2014-01-01

    Standard statistical analysis of the magnetic properties of the quiet Sun rely on simple histograms of quantities inferred from maximum-likelihood estimations. Because of the inherent degeneracies, either intrinsic or induced by the noise, this approach is not optimal and can lead to highly biased results. We carry out a meta-analysis of the magnetism of the quiet Sun from Hinode observations using a hierarchical probabilistic method. This model allows us to infer the statistical properties of the magnetic field vector over the observed field-of-view consistently taking into account the uncertainties in each pixel due to noise and degeneracies. Our results point out that the magnetic fields are very weak, below 275 G with 95% credibility, with a slight preference for horizontal fields, although the distribution is not far from a quasi-isotropic distribution.

  3. 基于主成分与聚类分析的苹果加工品质评价%Evaluation of apple quality based on principal component and hierarchical cluster analysis

    Institute of Scientific and Technical Information of China (English)

    公丽艳; 孟宪军; 刘乃侨; 毕金峰

    2014-01-01

    The purpose of this study was to investigate the variations in physical and chemical characteristics of apple fruit from 30 varieties grown in the same place using pattern recognition tools. Twenty quality parameters of apple samples (e.g. weight,volume, density, color, hardness, sugar-acid ratio, Vitamin C, etc.) were analyzed. Interrelationships between the parameters and the apple variety were investigated by descriptive statistics, principal component analysis (PCA) and hierarchical cluster analysis (HCA). PCA is a mathematical tool which performs a reduction in data dimensionality and allows the visualisation of underlying structure in experimental data and relationships between data and samples.In hierarchical cluster analysis, samples are grouped on the basis of similarities, without taking into account the information about the class membership. The results obtained following HCA are shown as a dendrogram in which five well-defined clusters are visible. Samples will be grouped in clusters in terms of their nearness or similarity. Cluster analysis uses less information (distances only) than PCA. It is interesting to observe what kind of classification can be made on the basis of distances only. The results showed that density, fruit shape index and water content of 30 apple varieties were not significantly different. The remaining seventeen measurements were investigated by principal component analysis. The first six components represented 83.56% of the total variability on the base of the total variance explained and screen plot of principal component analysis. The first principal component was related to titratable acidity, sugar-acid ratio and solid-acid ratio attributes, which were the taste quality factor. The second principal component was related to L,a, andb attributes, which were the color factor. Following that were sweetness factor, texture factor, quality factor and size factor. The sample score plots visually displayed the relationship between

  4. Non-Trivial Feature Derivation for Intensifying Feature Detection Using LIDAR Datasets Through Allometric Aggregation Data Analysis Applying Diffused Hierarchical Clustering for Discriminating Agricultural Land Cover in Portions of Northern Mindanao, Philippines

    Science.gov (United States)

    Villar, Ricardo G.; Pelayo, Jigg L.; Mozo, Ray Mari N.; Salig, James B., Jr.; Bantugan, Jojemar

    2016-06-01

    Leaning on the derived results conducted by Central Mindanao University Phil-LiDAR 2.B.11 Image Processing Component, the paper attempts to provides the application of the Light Detection and Ranging (LiDAR) derived products in arriving quality Landcover classification considering the theoretical approach of data analysis principles to minimize the common problems in image classification. These are misclassification of objects and the non-distinguishable interpretation of pixelated features that results to confusion of class objects due to their closely-related spectral resemblance, unbalance saturation of RGB information is a challenged at the same time. Only low density LiDAR point cloud data is exploited in the research denotes as 2 pts/m2 of accuracy which bring forth essential derived information such as textures and matrices (number of returns, intensity textures, nDSM, etc.) in the intention of pursuing the conditions for selection characteristic. A novel approach that takes gain of the idea of object-based image analysis and the principle of allometric relation of two or more observables which are aggregated for each acquisition of datasets for establishing a proportionality function for data-partioning. In separating two or more data sets in distinct regions in a feature space of distributions, non-trivial computations for fitting distribution were employed to formulate the ideal hyperplane. Achieving the distribution computations, allometric relations were evaluated and match with the necessary rotation, scaling and transformation techniques to find applicable border conditions. Thus, a customized hybrid feature was developed and embedded in every object class feature to be used as classifier with employed hierarchical clustering strategy for cross-examining and filtering features. This features are boost using machine learning algorithms as trainable sets of information for a more competent feature detection. The product classification in this

  5. Signatures of Hierarchical Clustering in Dark Matter Detection Experiments

    CERN Document Server

    Stiff, D; Frieman, Joshua A

    2001-01-01

    In the cold dark matter model of structure formation, galaxies are assembled hierarchically from mergers and the accretion of subclumps. This process is expected to leave residual substructure in the Galactic dark halo, including partially disrupted clumps and their associated tidal debris. We develop a model for such halo substructure and study its implications for dark matter (WIMP and axion) detection experiments. We combine the Press-Schechter model for the distribution of halo subclump masses with N-body simulations of the evolution and disruption of individual clumps as they orbit through the evolving Galaxy to derive the probability that the Earth is passing through a subclump or stream of a given density. Our results suggest that it is likely that the local complement of dark matter particles includes a 1-5% contribution from a single clump. The implications for dark matter detection experiments are significant, since the disrupted clump is composed of a `cold' flow of high-velocity particles. We desc...

  6. The association between content of the elements S, Cl, K, Fe, Cu, Zn and Br in normal and cirrhotic liver tissue from Danes and Greenlandic Inuit examined by dual hierarchical clustering analysis

    DEFF Research Database (Denmark)

    Laursen, Jens; Milman, Nils; Pind, N.;

    2014-01-01

    contents according to calculated similarities, one clustering elements according to correlation coefficients between the element contents, both using Euclidian distance and Ward Procedure. RESULTS: One dendrogram separated subjects in 7 clusters showing no differences in ethnicity, gender or age....... The analysis discriminated between elements in normal and cirrhotic livers. The other dendrogram clustered elements in four clusters: sulphur and chlorine; copper and bromine; potassium and zinc; iron. There were significant correlations between the elements in normal liver samples: S was associated with Cl, K...

  7. Hierarchical clustering of breast cancer methylomes revealed differentially methylated and expressed breast cancer genes.

    Directory of Open Access Journals (Sweden)

    I-Hsuan Lin

    Full Text Available Oncogenic transformation of normal cells often involves epigenetic alterations, including histone modification and DNA methylation. We conducted whole-genome bisulfite sequencing to determine the DNA methylomes of normal breast, fibroadenoma, invasive ductal carcinomas and MCF7. The emergence, disappearance, expansion and contraction of kilobase-sized hypomethylated regions (HMRs and the hypomethylation of the megabase-sized partially methylated domains (PMDs are the major forms of methylation changes observed in breast tumor samples. Hierarchical clustering of HMR revealed tumor-specific hypermethylated clusters and differential methylated enhancers specific to normal or breast cancer cell lines. Joint analysis of gene expression and DNA methylation data of normal breast and breast cancer cells identified differentially methylated and expressed genes associated with breast and/or ovarian cancers in cancer-specific HMR clusters. Furthermore, aberrant patterns of X-chromosome inactivation (XCI was found in breast cancer cell lines as well as breast tumor samples in the TCGA BRCA (breast invasive carcinoma dataset. They were characterized with differentially hypermethylated XIST promoter, reduced expression of XIST, and over-expression of hypomethylated X-linked genes. High expressions of these genes were significantly associated with lower survival rates in breast cancer patients. Comprehensive analysis of the normal and breast tumor methylomes suggests selective targeting of DNA methylation changes during breast cancer progression. The weak causal relationship between DNA methylation and gene expression observed in this study is evident of more complex role of DNA methylation in the regulation of gene expression in human epigenetics that deserves further investigation.

  8. Improving the Decision Value of Hierarchical Text Clustering Using Term Overlap Detection

    Directory of Open Access Journals (Sweden)

    Nilupulee Nathawitharana

    2015-09-01

    Full Text Available Humans are used to expressing themselves with written language and language provides a medium with which we can describe our experiences in detail incorporating individuality. Even though documents provide a rich source of information, it becomes very difficult to identify, extract, summarize and search when vast amounts of documents are collected especially over time. Document clustering is a technique that has been widely used to group documents based on similarity of content represented by the words used. Once key groups are identified further drill down into sub-groupings is facilitated by the use of hierarchical clustering. Clustering and hierarchical clustering are very useful when applied to numerical and categorical data and cluster accuracy and purity measures exist to evaluate the outcomes of a clustering exercise. Although the same measures have been applied to text clustering, text clusters are based on words or terms which can be repeated across documents associated with different topics. Therefore text data cannot be considered as a direct ‘coding’ of a particular experience or situation in contrast to numerical and categorical data and term overlap is a very common characteristic in text clustering. In this paper we propose a new technique and methodology for term overlap capture from text documents, highlighting the different situations such overlap could signify and discuss why such understanding is important for obtaining value from text clustering. Experiments were conducted using a widely used text document collection where the proposed methodology allowed exploring the term diversity for a given document collection and obtain clusters with minimum term overlap.

  9. A Bayesian Alternative to Mutual Information for the Hierarchical Clustering of Dependent Random Variables.

    Directory of Open Access Journals (Sweden)

    Guillaume Marrelec

    Full Text Available The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity, provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering.

  10. A Bayesian Alternative to Mutual Information for the Hierarchical Clustering of Dependent Random Variables.

    Science.gov (United States)

    Marrelec, Guillaume; Messé, Arnaud; Bellec, Pierre

    2015-01-01

    The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC) raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity), provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms) to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI) datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering.

  11. Nursing home care quality: a cluster analysis.

    Science.gov (United States)

    Grøndahl, Vigdis Abrahamsen; Fagerli, Liv Berit

    2017-02-13

    Purpose The purpose of this paper is to explore potential differences in how nursing home residents rate care quality and to explore cluster characteristics. Design/methodology/approach A cross-sectional design was used, with one questionnaire including questions from quality from patients' perspective and Big Five personality traits, together with questions related to socio-demographic aspects and health condition. Residents ( n=103) from four Norwegian nursing homes participated (74.1 per cent response rate). Hierarchical cluster analysis identified clusters with respect to care quality perceptions. χ(2) tests and one-way between-groups ANOVA were performed to characterise the clusters ( pclusters were identified; Cluster 1 residents (28.2 per cent) had the best care quality perceptions and Cluster 2 (67.0 per cent) had the worst perceptions. The clusters were statistically significant and characterised by personal-related conditions: gender, psychological well-being, preferences, admission, satisfaction with staying in the nursing home, emotional stability and agreeableness, and by external objective care conditions: healthcare personnel and registered nurses. Research limitations/implications Residents assessed as having no cognitive impairments were included, thus excluding the largest group. By choosing questionnaire design and structured interviews, the number able to participate may increase. Practical implications Findings may provide healthcare personnel and managers with increased knowledge on which to develop strategies to improve specific care quality perceptions. Originality/value Cluster analysis can be an effective tool for differentiating between nursing homes residents' care quality perceptions.

  12. Comprehensive cluster analysis with Transitivity Clustering.

    Science.gov (United States)

    Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan

    2011-03-01

    Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

  13. Bayesian latent variable models for hierarchical clustered count outcomes with repeated measures in microbiome studies.

    Science.gov (United States)

    Xu, Lizhen; Paterson, Andrew D; Xu, Wei

    2017-04-01

    Motivated by the multivariate nature of microbiome data with hierarchical taxonomic clusters, counts that are often skewed and zero inflated, and repeated measures, we propose a Bayesian latent variable methodology to jointly model multiple operational taxonomic units within a single taxonomic cluster. This novel method can incorporate both negative binomial and zero-inflated negative binomial responses, and can account for serial and familial correlations. We develop a Markov chain Monte Carlo algorithm that is built on a data augmentation scheme using Pólya-Gamma random variables. Hierarchical centering and parameter expansion techniques are also used to improve the convergence of the Markov chain. We evaluate the performance of our proposed method through extensive simulations. We also apply our method to a human microbiome study.

  14. An Exactly Soluble Hierarchical Clustering Model Inverse Cascades, Self-Similarity, and Scaling

    CERN Document Server

    Gabrielov, A; Turcotte, D L

    1999-01-01

    We show how clustering as a general hierarchical dynamical process proceeds via a sequence of inverse cascades to produce self-similar scaling, as an intermediate asymptotic, which then truncates at the largest spatial scales. We show how this model can provide a general explanation for the behavior of several models that has been described as ``self-organized critical,'' including forest-fire, sandpile, and slider-block models.

  15. Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study.

    Science.gov (United States)

    Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein

    2014-11-01

    Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Recursive Hierarchical Image Segmentation by Region Growing and Constrained Spectral Clustering

    Science.gov (United States)

    Tilton, James C.

    2002-01-01

    This paper describes an algorithm for hierarchical image segmentation (referred to as HSEG) and its recursive formulation (referred to as RHSEG). The HSEG algorithm is a hybrid of region growing and constrained spectral clustering that produces a hierarchical set of image segmentations based on detected convergence points. In the main, HSEG employs the hierarchical stepwise optimization (HS WO) approach to region growing, which seeks to produce segmentations that are more optimized than those produced by more classic approaches to region growing. In addition, HSEG optionally interjects between HSWO region growing iterations merges between spatially non-adjacent regions (i.e., spectrally based merging or clustering) constrained by a threshold derived from the previous HSWO region growing iteration. While the addition of constrained spectral clustering improves the segmentation results, especially for larger images, it also significantly increases HSEG's computational requirements. To counteract this, a computationally efficient recursive, divide-and-conquer, implementation of HSEG (RHSEG) has been devised and is described herein. Included in this description is special code that is required to avoid processing artifacts caused by RHSEG s recursive subdivision of the image data. Implementations for single processor and for multiple processor computer systems are described. Results with Landsat TM data are included comparing HSEG with classic region growing. Finally, an application to image information mining and knowledge discovery is discussed.

  17. Hierarchical Adaptive Means (HAM) clustering for hardware-efficient, unsupervised and real-time spike sorting.

    Science.gov (United States)

    Paraskevopoulou, Sivylla E; Wu, Di; Eftekhar, Amir; Constandinou, Timothy G

    2014-09-30

    This work presents a novel unsupervised algorithm for real-time adaptive clustering of neural spike data (spike sorting). The proposed Hierarchical Adaptive Means (HAM) clustering method combines centroid-based clustering with hierarchical cluster connectivity to classify incoming spikes using groups of clusters. It is described how the proposed method can adaptively track the incoming spike data without requiring any past history, iteration or training and autonomously determines the number of spike classes. Its performance (classification accuracy) has been tested using multiple datasets (both simulated and recorded) achieving a near-identical accuracy compared to k-means (using 10-iterations and provided with the number of spike classes). Also, its robustness in applying to different feature extraction methods has been demonstrated by achieving classification accuracies above 80% across multiple datasets. Last but crucially, its low complexity, that has been quantified through both memory and computation requirements makes this method hugely attractive for future hardware implementation. Copyright © 2014 Elsevier B.V. All rights reserved.

  18. Permutation Tests of Hierarchical Cluster Analyses of Carrion Communities and Their Potential Use in Forensic Entomology.

    Science.gov (United States)

    van der Ham, Joris L

    2016-05-19

    Forensic entomologists can use carrion communities' ecological succession data to estimate the postmortem interval (PMI). Permutation tests of hierarchical cluster analyses of these data provide a conceptual method to estimate part of the PMI, the post-colonization interval (post-CI). This multivariate approach produces a baseline of statistically distinct clusters that reflect changes in the carrion community composition during the decomposition process. Carrion community samples of unknown post-CIs are compared with these baseline clusters to estimate the post-CI. In this short communication, I use data from previously published studies to demonstrate the conceptual feasibility of this multivariate approach. Analyses of these data produce series of significantly distinct clusters, which represent carrion communities during 1- to 20-day periods of the decomposition process. For 33 carrion community samples, collected over an 11-day period, this approach correctly estimated the post-CI within an average range of 3.1 days.

  19. An energy efficient cooperative hierarchical MIMO clustering scheme for wireless sensor networks.

    Science.gov (United States)

    Nasim, Mehwish; Qaisar, Saad; Lee, Sungyoung

    2012-01-01

    In this work, we present an energy efficient hierarchical cooperative clustering scheme for wireless sensor networks. Communication cost is a crucial factor in depleting the energy of sensor nodes. In the proposed scheme, nodes cooperate to form clusters at each level of network hierarchy ensuring maximal coverage and minimal energy expenditure with relatively uniform distribution of load within the network. Performance is enhanced by cooperative multiple-input multiple-output (MIMO) communication ensuring energy efficiency for WSN deployments over large geographical areas. We test our scheme using TOSSIM and compare the proposed scheme with cooperative multiple-input multiple-output (CMIMO) clustering scheme and traditional multihop Single-Input-Single-Output (SISO) routing approach. Performance is evaluated on the basis of number of clusters, number of hops, energy consumption and network lifetime. Experimental results show significant energy conservation and increase in network lifetime as compared to existing schemes.

  20. To Aggregate or Not and Potentially Better Questions for Clustered Data: The Need for Hierarchical Linear Modeling in CTE Research

    Science.gov (United States)

    Nimon, Kim

    2012-01-01

    Using state achievement data that are openly accessible, this paper demonstrates the application of hierarchical linear modeling within the context of career technical education research. Three prominent approaches to analyzing clustered data (i.e., modeling aggregated data, modeling disaggregated data, modeling hierarchical data) are discussed…

  1. Novel density-based and hierarchical density-based clustering algorithms for uncertain data.

    Science.gov (United States)

    Zhang, Xianchao; Liu, Han; Zhang, Xiaotong

    2017-09-01

    Uncertain data has posed a great challenge to traditional clustering algorithms. Recently, several algorithms have been proposed for clustering uncertain data, and among them density-based techniques seem promising for handling data uncertainty. However, some issues like losing uncertain information, high time complexity and nonadaptive threshold have not been addressed well in the previous density-based algorithm FDBSCAN and hierarchical density-based algorithm FOPTICS. In this paper, we firstly propose a novel density-based algorithm PDBSCAN, which improves the previous FDBSCAN from the following aspects: (1) it employs a more accurate method to compute the probability that the distance between two uncertain objects is less than or equal to a boundary value, instead of the sampling-based method in FDBSCAN; (2) it introduces new definitions of probability neighborhood, support degree, core object probability, direct reachability probability, thus reducing the complexity and solving the issue of nonadaptive threshold (for core object judgement) in FDBSCAN. Then, we modify the algorithm PDBSCAN to an improved version (PDBSCANi), by using a better cluster assignment strategy to ensure that every object will be assigned to the most appropriate cluster, thus solving the issue of nonadaptive threshold (for direct density reachability judgement) in FDBSCAN. Furthermore, as PDBSCAN and PDBSCANi have difficulties for clustering uncertain data with non-uniform cluster density, we propose a novel hierarchical density-based algorithm POPTICS by extending the definitions of PDBSCAN, adding new definitions of fuzzy core distance and fuzzy reachability distance, and employing a new clustering framework. POPTICS can reveal the cluster structures of the datasets with different local densities in different regions better than PDBSCAN and PDBSCANi, and it addresses the issues in FOPTICS. Experimental results demonstrate the superiority of our proposed algorithms over the existing

  2. Structural system identification using degree of freedom-based reduction and hierarchical clustering algorithm

    Science.gov (United States)

    Chang, Seongmin; Baek, Sungmin; Kim, Ki-Ook; Cho, Maenghyo

    2015-06-01

    A system identification method has been proposed to validate finite element models of complex structures using measured modal data. Finite element method is used for the system identification as well as the structural analysis. In perturbation methods, the perturbed system is expressed as a combination of the baseline structure and the related perturbations. The changes in dynamic responses are applied to determine the structural modifications so that the equilibrium may be satisfied in the perturbed system. In practical applications, the dynamic measurements are carried out on a limited number of accessible nodes and associated degrees of freedom. The equilibrium equation is, in principle, expressed in terms of the measured (master, primary) and unmeasured (slave, secondary) degrees of freedom. Only the specified degrees of freedom are included in the equation formulation for identification and the unspecified degrees of freedom are eliminated through the iterative improved reduction scheme. A large number of system parameters are included as the unknown variables in the system identification of large-scaled structures. The identification problem with large number of system parameters requires a large amount of computation time and resources. In the present study, a hierarchical clustering algorithm is applied to reduce the number of system parameters effectively. Numerical examples demonstrate that the proposed method greatly improves the accuracy and efficiency in the inverse problem of identification.

  3. Using hierarchical clustering of secreted protein families to classify and rank candidate effectors of rust fungi.

    Directory of Open Access Journals (Sweden)

    Diane G O Saunders

    Full Text Available Rust fungi are obligate biotrophic pathogens that cause considerable damage on crop plants. Puccinia graminis f. sp. tritici, the causal agent of wheat stem rust, and Melampsora larici-populina, the poplar leaf rust pathogen, have strong deleterious impacts on wheat and poplar wood production, respectively. Filamentous pathogens such as rust fungi secrete molecules called disease effectors that act as modulators of host cell physiology and can suppress or trigger host immunity. Current knowledge on effectors from other filamentous plant pathogens can be exploited for the characterisation of effectors in the genome of recently sequenced rust fungi. We designed a comprehensive in silico analysis pipeline to identify the putative effector repertoire from the genome of two plant pathogenic rust fungi. The pipeline is based on the observation that known effector proteins from filamentous pathogens have at least one of the following properties: (i contain a secretion signal, (ii are encoded by in planta induced genes, (iii have similarity to haustorial proteins, (iv are small and cysteine rich, (v contain a known effector motif or a nuclear localization signal, (vi are encoded by genes with long intergenic regions, (vii contain internal repeats, and (viii do not contain PFAM domains, except those associated with pathogenicity. We used Markov clustering and hierarchical clustering to classify protein families of rust pathogens and rank them according to their likelihood of being effectors. Using this approach, we identified eight families of candidate effectors that we consider of high value for functional characterization. This study revealed a diverse set of candidate effectors, including families of haustorial expressed secreted proteins and small cysteine-rich proteins. This comprehensive classification of candidate effectors from these devastating rust pathogens is an initial step towards probing plant germplasm for novel resistance components.

  4. Using Hierarchical Clustering of Secreted Protein Families to Classify and Rank Candidate Effectors of Rust Fungi

    Science.gov (United States)

    Saunders, Diane G. O.; Win, Joe; Cano, Liliana M.; Szabo, Les J.; Kamoun, Sophien; Raffaele, Sylvain

    2012-01-01

    Rust fungi are obligate biotrophic pathogens that cause considerable damage on crop plants. Puccinia graminis f. sp. tritici, the causal agent of wheat stem rust, and Melampsora larici-populina, the poplar leaf rust pathogen, have strong deleterious impacts on wheat and poplar wood production, respectively. Filamentous pathogens such as rust fungi secrete molecules called disease effectors that act as modulators of host cell physiology and can suppress or trigger host immunity. Current knowledge on effectors from other filamentous plant pathogens can be exploited for the characterisation of effectors in the genome of recently sequenced rust fungi. We designed a comprehensive in silico analysis pipeline to identify the putative effector repertoire from the genome of two plant pathogenic rust fungi. The pipeline is based on the observation that known effector proteins from filamentous pathogens have at least one of the following properties: (i) contain a secretion signal, (ii) are encoded by in planta induced genes, (iii) have similarity to haustorial proteins, (iv) are small and cysteine rich, (v) contain a known effector motif or a nuclear localization signal, (vi) are encoded by genes with long intergenic regions, (vii) contain internal repeats, and (viii) do not contain PFAM domains, except those associated with pathogenicity. We used Markov clustering and hierarchical clustering to classify protein families of rust pathogens and rank them according to their likelihood of being effectors. Using this approach, we identified eight families of candidate effectors that we consider of high value for functional characterization. This study revealed a diverse set of candidate effectors, including families of haustorial expressed secreted proteins and small cysteine-rich proteins. This comprehensive classification of candidate effectors from these devastating rust pathogens is an initial step towards probing plant germplasm for novel resistance components. PMID:22238666

  5. 3D NEAREST NEIGHBOUR SEARCH USING A CLUSTERED HIERARCHICAL TREE STRUCTURE

    Directory of Open Access Journals (Sweden)

    A. Suhaibah

    2016-06-01

    Full Text Available Locating and analysing the location of new stores or outlets is one of the common issues facing retailers and franchisers. This is due to assure that new opening stores are at their strategic location to attract the highest possible number of customers. Spatial information is used to manage, maintain and analyse these store locations. However, since the business of franchising and chain stores in urban areas runs within high rise multi-level buildings, a three-dimensional (3D method is prominently required in order to locate and identify the surrounding information such as at which level of the franchise unit will be located or is the franchise unit located is at the best level for visibility purposes. One of the common used analyses used for retrieving the surrounding information is Nearest Neighbour (NN analysis. It uses a point location and identifies the surrounding neighbours. However, with the immense number of urban datasets, the retrieval and analysis of nearest neighbour information and their efficiency will become more complex and crucial. In this paper, we present a technique to retrieve nearest neighbour information in 3D space using a clustered hierarchical tree structure. Based on our findings, the proposed approach substantially showed an improvement of response time analysis compared to existing approaches of spatial access methods in databases. The query performance was tested using a dataset consisting of 500,000 point locations building and franchising unit. The results are presented in this paper. Another advantage of this structure is that it also offers a minimal overlap and coverage among nodes which can reduce repetitive data entry.

  6. Hierarchical clustering

    Directory of Open Access Journals (Sweden)

    L. Infante

    2002-01-01

    Full Text Available En esta contribuci on presento resultados recientes sobre las propiedades de acumulaci on de galaxias, grupos, c umulos y superc umulos de bajo redshift (z 1. Presento, a su vez, lo esperado y lo medido con respecto al grado de evoluci on de la acumulaci on de galaxias. Hemos usado el cat alogo fotom etrico de galaxias extra do de las primeras im agenes del \\Sloan Digital Sky Survey", para estudiar las propiedades de acumulaci on de peque~nas estructuras de galaxias, pares, tr os, cuartetos, quintetos, etc. Un an alisis de la funci on de correlaci on de dos puntos, en un area de 250 grados cuadrados del cielo, muestra que estos objetos, al parecer, est an mucho m as acumulados que galaxias individuales.

  7. Hierarchical Clustering of Large Databases and Classification of Antibiotics at High Noise Levels

    Directory of Open Access Journals (Sweden)

    Alexander V. Yarkov

    2008-12-01

    Full Text Available A new algorithm for divisive hierarchical clustering of chemical compounds based on 2D structural fragments is suggested. The algorithm is deterministic, and given a random ordering of the input, will always give the same clustering and can process a database up to 2 million records on a standard PC. The algorithm was used for classification of 1,183 antibiotics mixed with 999,994 random chemical structures. Similarity threshold, at which best separation of active and non active compounds took place, was estimated as 0.6. 85.7% of the antibiotics were successfully classified at this threshold with 0.4% of inaccurate compounds. A .sdf file was created with the probe molecules for clustering of external databases.

  8. A novel approach to the problem of non-uniqueness of the solution in hierarchical clustering.

    Science.gov (United States)

    Cattinelli, Isabella; Valentini, Giorgio; Paulesu, Eraldo; Borghese, Nunzio Alberto

    2013-07-01

    The existence of multiple solutions in clustering, and in hierarchical clustering in particular, is often ignored in practical applications. However, this is a non-trivial problem, as different data orderings can result in different cluster sets that, in turns, may lead to different interpretations of the same data. The method presented here offers a solution to this issue. It is based on the definition of an equivalence relation over dendrograms that allows developing all and only the significantly different dendrograms for the same dataset, thus reducing the computational complexity to polynomial from the exponential obtained when all possible dendrograms are considered. Experimental results in the neuroimaging and bioinformatics domains show the effectiveness of the proposed method.

  9. Fractal Analysis Based on Hierarchical Scaling in Complex Systems

    CERN Document Server

    Chen, Yanguang

    2016-01-01

    A fractal is in essence a hierarchy with cascade structure, which can be described with a set of exponential functions. From these exponential functions, a set of power laws indicative of scaling can be derived. Hierarchy structure and spatial network proved to be associated with one another. This paper is devoted to exploring the theory of fractal analysis of complex systems by means of hierarchical scaling. Two research methods are utilized to make this study, including logic analysis method and empirical analysis method. The main results are as follows. First, a fractal system such as Cantor set is described from the hierarchical angle of view; based on hierarchical structure, three approaches are proposed to estimate fractal dimension. Second, the hierarchical scaling can be generalized to describe multifractals, fractal complementary sets, and self-similar curve such as logarithmic spiral. Third, complex systems such as urban system are demonstrated to be a self-similar hierarchy. The human settlements i...

  10. Intensity-based hierarchical clustering in CT-scans: application to interactive segmentation in cardiology

    Science.gov (United States)

    Hadida, Jonathan; Desrosiers, Christian; Duong, Luc

    2011-03-01

    The segmentation of anatomical structures in Computed Tomography Angiography (CTA) is a pre-operative task useful in image guided surgery. Even though very robust and precise methods have been developed to help achieving a reliable segmentation (level sets, active contours, etc), it remains very time consuming both in terms of manual interactions and in terms of computation time. The goal of this study is to present a fast method to find coarse anatomical structures in CTA with few parameters, based on hierarchical clustering. The algorithm is organized as follows: first, a fast non-parametric histogram clustering method is proposed to compute a piecewise constant mask. A second step then indexes all the space-connected regions in the piecewise constant mask. Finally, a hierarchical clustering is achieved to build a graph representing the connections between the various regions in the piecewise constant mask. This step builds up a structural knowledge about the image. Several interactive features for segmentation are presented, for instance association or disassociation of anatomical structures. A comparison with the Mean-Shift algorithm is presented.

  11. Hierarchical Regional Disparities and Potential Sector Identification Using Modified Agglomerative Clustering

    Science.gov (United States)

    Munandar, T. A.; Azhari; Mushdholifah, A.; Arsyad, L.

    2017-03-01

    Disparities in regional development methods are commonly identified using the Klassen Typology and Location Quotient. Both methods typically use the data on the gross regional domestic product (GRDP) sectors of a particular region. The Klassen approach can identify regional disparities by classifying the GRDP sector data into four classes, namely Quadrants I, II, III, and IV. Each quadrant indicates a certain level of regional disparities based on the GRDP sector value of the said region. Meanwhile, the Location Quotient (LQ) is usually used to identify potential sectors in a particular region so as to determine which sectors are potential and which ones are not potential. LQ classifies each sector into three classes namely, the basic sector, the non-basic sector with a competitive advantage, and the non-basic sector which can only meet its own necessities. Both Klassen Typology and LQ are unable to visualize the relationship of achievements in the development clearly of each region and sector. This research aimed to develop a new approach to the identification of disparities in regional development in the form of hierarchical clustering. The method of Hierarchical Agglomerative Clustering (HAC) was employed as the basis of the hierarchical clustering model for identifying disparities in regional development. Modifications were made to HAC using the Klassen Typology and LQ. Then, HAC which had been modified using the Klassen Typology was called MHACK while HAC which had been modified using LQ was called MACLoQ. Both algorithms can be used to identify regional disparities (MHACK) and potential sectors (MACLoQ), respectively, in the form of hierarchical clusters. Based on the MHACK in 31 regencies in Central Java Province, it is identified that 3 regencies (Demak, Jepara, and Magelang City) fall into the category of developed and rapidly-growing regions, while the other 28 regencies fall into the category of developed but depressed regions. Results of the MACLo

  12. Iterative Maps with Hierarchical Clustering for the Observed Scales of Astrophysical and Cosmological Structures

    CERN Document Server

    Capozziello, S; De Siena, S; Guerra, F; Illuminati, F

    2000-01-01

    We derive, in order of magnitude, the observed astrophysical and cosmologicalscales in the Universe, from neutron stars to superclusters of galaxies, up to,asymptotically, the observed radius of the Universe. This result is obtained byintroducing a recursive scheme of alternating hierachical mechanisms ofthree-dimensional and two-dimensional close packings of gravitationallyinteracting objects. The iterative scheme yields a rapidly converging geometricsequence, which can be described as a hierarchical clustering of aggregates,having the observed radius of the Universe as its fixed point.

  13. A Cluster Analysis of Personality Style in Adults with ADHD

    Science.gov (United States)

    Robin, Arthur L.; Tzelepis, Angela; Bedway, Marquita

    2008-01-01

    Objective: The purpose of this study was to use hierarchical linear cluster analysis to examine the normative personality styles of adults with ADHD. Method: A total of 311 adults with ADHD completed the Millon Index of Personality Styles, which consists of 24 scales assessing motivating aims, cognitive modes, and interpersonal behaviors. Results:…

  14. A Cluster Analysis of Personality Style in Adults with ADHD

    Science.gov (United States)

    Robin, Arthur L.; Tzelepis, Angela; Bedway, Marquita

    2008-01-01

    Objective: The purpose of this study was to use hierarchical linear cluster analysis to examine the normative personality styles of adults with ADHD. Method: A total of 311 adults with ADHD completed the Millon Index of Personality Styles, which consists of 24 scales assessing motivating aims, cognitive modes, and interpersonal behaviors. Results:…

  15. Cluster and constraint analysis in tetrahedron packings.

    Science.gov (United States)

    Jin, Weiwei; Lu, Peng; Liu, Lufeng; Li, Shuixiang

    2015-04-01

    The disordered packings of tetrahedra often show no obvious macroscopic orientational or positional order for a wide range of packing densities, and it has been found that the local order in particle clusters is the main order form of tetrahedron packings. Therefore, a cluster analysis is carried out to investigate the local structures and properties of tetrahedron packings in this work. We obtain a cluster distribution of differently sized clusters, and peaks are observed at two special clusters, i.e., dimer and wagon wheel. We then calculate the amounts of dimers and wagon wheels, which are observed to have linear or approximate linear correlations with packing density. Following our previous work, the amount of particles participating in dimers is used as an order metric to evaluate the order degree of the hierarchical packing structure of tetrahedra, and an order map is consequently depicted. Furthermore, a constraint analysis is performed to determine the isostatic or hyperstatic region in the order map. We employ a Monte Carlo algorithm to test jamming and then suggest a new maximally random jammed packing of hard tetrahedra from the order map with a packing density of 0.6337.

  16. Hierarchical Agglomerative Clustering Schemes for Energy-Efficiency in Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Taleb Tariq

    2017-06-01

    Full Text Available Extending the lifetime of wireless sensor networks (WSNs while delivering the expected level of service remains a hot research topic. Clustering has been identified in the literature as one of the primary means to save communication energy. In this paper, we argue that hierarchical agglomerative clustering (HAC provides a suitable foundation for designing highly energy efficient communication protocols for WSNs. To this end, we study a new mechanism for selecting cluster heads (CHs based both on the physical location of the sensors and their residual energy. Furthermore, we study different patterns of communications between the CHs and the base station depending on the possible transmission ranges and the ability of the sensors to act as traffic relays. Simulation results show that our proposed clustering and communication schemes outperform well-knows existing approaches by comfortable margins. In particular, networks lifetime is increased by more than 60% compared to LEACH and HEED, and by more than 30% compared to K-means clustering.

  17. Asteroid family identification using the Hierarchical Clustering Method and WISE/NEOWISE physical properties

    CERN Document Server

    Masiero, Joseph R; Bauer, J M; Grav, T; Nugent, C R; Stevenson, R

    2013-01-01

    Using albedos from WISE/NEOWISE to separate distinct albedo groups within the Main Belt asteroids, we apply the Hierarchical Clustering Method to these subpopulations and identify dynamically associated clusters of asteroids. While this survey is limited to the ~35% of known Main Belt asteroids that were detected by NEOWISE, we present the families linked from these objects as higher confidence associations than can be obtained from dynamical linking alone. We find that over one-third of the observed population of the Main Belt is represented in the high-confidence cores of dynamical families. The albedo distribution of family members differs significantly from the albedo distribution of background objects in the same region of the Main Belt, however interpretation of this effect is complicated by the incomplete identification of lower-confidence family members. In total we link 38,298 asteroids into 76 distinct families. This work represents a critical step necessary to debias the albedo and size distributio...

  18. On the Formation of Cool, Non-Flowing Cores in Galaxy Clusters via Hierarchical Mergers

    CERN Document Server

    Burns, J O; Norman, M L; Bryan, G L

    2003-01-01

    We present a new model for the creation of cool cores in rich galaxy clusters within a LambdaCDM cosmological framework using the results from high spatial dynamic range, adaptive mesh hydro/N-body simulations. It is proposed that cores of cool gas first form in subclusters and these subclusters merge to create rich clusters with cool, central X-Ray excesses. The rich cool clusters do not possess ``cooling flows'' due to the presence of bulk velocities in the intracluster medium in excess of 1000 km/sec produced by on-going accretion of gas from supercluster filaments. This new model has several attractive features including the presence of substantial core substructure within the cool cores, and it predicts the appearance of cool bullets, cool fronts, and cool filaments all of which have been recently observed with X-Ray satellites. This hierarchical formation model is also consistent with the observation that cool cores in Abell clusters occur preferentially in dense supercluster environments. On the other ...

  19. Hierarchical manifold learning for regional image analysis.

    Science.gov (United States)

    Bhatia, Kanwal K; Rao, Anil; Price, Anthony N; Wolz, Robin; Hajnal, Joseph V; Rueckert, Daniel

    2014-02-01

    We present a novel method of hierarchical manifold learning which aims to automatically discover regional properties of image datasets. While traditional manifold learning methods have become widely used for dimensionality reduction in medical imaging, they suffer from only being able to consider whole images as single data points. We extend conventional techniques by additionally examining local variations, in order to produce spatially-varying manifold embeddings that characterize a given dataset. This involves constructing manifolds in a hierarchy of image patches of increasing granularity, while ensuring consistency between hierarchy levels. We demonstrate the utility of our method in two very different settings: 1) to learn the regional correlations in motion within a sequence of time-resolved MR images of the thoracic cavity; 2) to find discriminative regions of 3-D brain MR images associated with neurodegenerative disease.

  20. Classification of cancer cell lines using an automated two-dimensional liquid mapping method with hierarchical clustering techniques.

    Science.gov (United States)

    Wang, Yanfei; Wu, Rong; Cho, Kathleen R; Shedden, Kerby A; Barder, Timothy J; Lubman, David M

    2006-01-01

    A two-dimensional liquid mapping method was used to map the protein expression of eight ovarian serous carcinoma cell lines and three immortalized ovarian surface epithelial cell lines. Maps were produced using pI as the separation parameter in the first dimension and hydrophobicity based upon reversed-phase HPLC separation in the second dimension. The method can be reproducibly used to produce protein expression maps over a pH range from 4.0 to 8.5. A dynamic programming method was used to correct for minor shifts in peaks during the HPLC gradient between sample runs. The resulting corrected maps can then be compared using hierarchical clustering to produce dendrograms indicating the relationship between different cell lines. It was found that several of the ovarian surface epithelial cell lines clustered together, whereas specific groups of serous carcinoma cell lines clustered with each other. Although there is limited information on the current biology of these cell lines, it was shown that the protein expression of certain cell lines is closely related to each other. Other cell lines, including one ovarian clear cell carcinoma cell line, two endometrioid carcinoma cell lines, and three breast epithelial cell lines, were also mapped for comparison to show that their protein profiles cluster differently than the serous samples and to study how they cluster relative to each other. In addition, comparisons can be made between proteins differentially expressed between cell lines that may serve as markers of ovarian serous carcinomas. The automation of the method allows reproducible comparison of many samples, and the use of differential analysis limits the number of proteins that might require further analysis by mass spectrometry techniques.

  1. Clustering of galaxies in a hierarchical universe - II. Evolution to high redshift

    Science.gov (United States)

    Kauffmann, Guinevere; Colberg, Jörg M.; Diaferio, Antonaldo; White, Simon D. M.

    1999-08-01

    In hierarchical cosmologies the evolution of galaxy clustering depends both on cosmological quantities such as Omega, Lambda and P(k), which determine how collapsed structures - dark matter haloes - form and evolve, and on the physical processes - cooling, star formation, radiative and hydrodynamic feedback - which drive the formation of galaxies within these merging haloes. In this paper we combine dissipationless cosmological N-body simulations and semi-analytic models of galaxy formation in order to study how these two aspects interact. We focus on the differences in clustering predicted for galaxies of differing luminosity, colour, morphology and star formation rate, and on what these differences can teach us about the galaxy formation process. We show that a `dip' in the amplitude of galaxy correlations between z=0 and z=1 can be an important diagnostic. Such a dip occurs in low-density CDM models, because structure forms early, and dark matter haloes of mass ~10^12M_solar, containing galaxies with luminosities ~L_*, are unbiased tracers of the dark matter over this redshift range; their clustering amplitude then evolves similarly to that of the dark matter. At higher redshifts, bright galaxies become strongly biased and the clustering amplitude increases again. In high density models, structure forms late, and bias evolves much more rapidly. As a result, the clustering amplitude of L_* galaxies remains constant from z=0 to z=1. The strength of these effects is sensitive to sample selection. The dip becomes weaker for galaxies with lower star formation rates, redder colours, higher luminosities and earlier morphological types. We explain why this is the case, and how it is related to the variation with redshift of the abundance and environment of the observed galaxies. We also show that the relative peculiar velocities of galaxies are biased low in our models, but that this effect is never very strong. Studies of clustering evolution as a function of galaxy

  2. The Application of Hierarchical Cluster Analysis to the Prediction of Grain Security of Small Research Areas-A Case Study of Kunshan%谱系聚类法在小区域粮食安全预测中的应用——以昆山市为例

    Institute of Scientific and Technical Information of China (English)

    姚鑫; 杨桂山; 万荣荣

    2011-01-01

    粮食安全对国民经济的可持续发展起着不可替代的基础性作用,小区域由于受政策因素的影响较大,粮食安全相关指标的变化呈一定阶段性,长时间序列的数学规律并不突出,不利于规划工作的展开.论文基于昆山市的研究,提出谱系聚类与数学模型相结合的基本思路,在此基础上推出了聚类结果有效性的量化判定标准并对聚类法运用准则做了深入的探讨.实际数据分析结果表明:昆山的粮食安全相关的社会经济指标变化确实呈明显阶段性;与利用全部时间序列数据建立的模型相比,运用谱系聚类的模型拟合和预测效果都有明显优势;至2015年,昆山市粮食自给率将下降至6%,最小人均耕地面积降低至0.022 hm2.通过进一步的分析、对比及讨论,文章认为,谱系聚类法运用于小区域粮食安全预测,方法可操作性强,结论科学性显著.%Grain security is fundamental to the sustainable development of our society and national economy. As research regions with small area are vulnerable to the impacts of policy changes, indexes related to grain security of these areas often change in the form of stages, which means that the mathematical regularity of long-term datasets is not significant. As a result, it is difficult to implement grain security programming for the future.We put forward a new method of combining hierarchical cluster analysis with traditional mathematical models, and established a quantification standard for the validity judgment of the clustering results. Meanwhile, a criterion for the using of hierarchical cluster analysis was also proposed, but we strongly recommended that mass data from other research areas are needed to calibrate and perfect it.Kunshan ( 1985 -2007 ) was chosen as a study region to prove the new method, because it is small in area but with rapid economic development. The results of analysis showed that: the indexes related to grain security did

  3. Integrative cluster analysis in bioinformatics

    CERN Document Server

    Abu-Jamous, Basel; Nandi, Asoke K

    2015-01-01

    Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review o

  4. Analysis of the effects of the global financial crisis on the Turkish economy, using hierarchical methods

    Science.gov (United States)

    Kantar, Ersin; Keskin, Mustafa; Deviren, Bayram

    2012-04-01

    We have analyzed the topology of 50 important Turkish companies for the period 2006-2010 using the concept of hierarchical methods (the minimal spanning tree (MST) and hierarchical tree (HT)). We investigated the statistical reliability of links between companies in the MST by using the bootstrap technique. We also used the average linkage cluster analysis (ALCA) technique to observe the cluster structures much better. The MST and HT are known as useful tools to perceive and detect global structure, taxonomy, and hierarchy in financial data. We obtained four clusters of companies according to their proximity. We also observed that the Banks and Holdings cluster always forms in the centre of the MSTs for the periods 2006-2007, 2008, and 2009-2010. The clusters match nicely with their common production activities or their strong interrelationship. The effects of the Automobile sector increased after the global financial crisis due to the temporary incentives provided by the Turkish government. We find that Turkish companies were not very affected by the global financial crisis.

  5. Finding "Problem Types" in Judgments of Problem-Similarity: Comparison of Cluster Analysis with Subject Protocols.

    Science.gov (United States)

    Herring, Richard D.

    Literature in mathematic problem-solving suggests that learners store information in memory which helps them solve stereotyped algebra word problems. Cluster analysis has been used as an exploratory tool to infer the types of problems which have common representations in memory. This study compares the results of a hierarchical cluster analysis of…

  6. Biomolecule-Assisted Hydrothermal Synthesis and Self-Assembly of Bi2Te3 Nanostring-Cluster Hierarchical Structure

    DEFF Research Database (Denmark)

    Mi, Jianli; Lock, Nina; Sun, Ting;

    2010-01-01

    A simple biomolecule-assisted hydrothermal approach has been developed for the fabrication of Bi2Te3 thermoelectric nanomaterials. The product has a nanostring-cluster hierarchical structure which is composed of ordered and aligned platelet-like crystals. The platelets are100 nm in diameter...

  7. Selections of data preprocessing methods and similarity metrics for gene cluster analysis

    Institute of Scientific and Technical Information of China (English)

    YANG Chunmei; WAN Baikun; GAO Xiaofeng

    2006-01-01

    Clustering is one of the major exploratory techniques for gene expression data analysis. Only with suitable similarity metrics and when datasets are properly preprocessed, can results of high quality be obtained in cluster analysis. In this study, gene expression datasets with external evaluation criteria were preprocessed as normalization by line, normalization by column or logarithm transformation by base-2, and were subsequently clustered by hierarchical clustering, k-means clustering and self-organizing maps (SOMs) with Pearson correlation coefficient or Euclidean distance as similarity metric. Finally, the quality of clusters was evaluated by adjusted Rand index. The results illustrate that k-means clustering and SOMs have distinct advantages over hierarchical clustering in gene clustering, and SOMs are a bit better than k-means when randomly initialized. It also shows that hierarchical clustering prefers Pearson correlation coefficient as similarity metric and dataset normalized by line. Meanwhile, k-means clustering and SOMs can produce better clusters with Euclidean distance and logarithm transformed datasets. These results will afford valuable reference to the implementation of gene expression cluster analysis.

  8. Hierarchical Dependence in Meta-Analysis

    Science.gov (United States)

    Stevens, John R.; Taylor, Alan M.

    2009-01-01

    Meta-analysis is a frequent tool among education and behavioral researchers to combine results from multiple experiments to arrive at a clear understanding of some effect of interest. One of the traditional assumptions in a meta-analysis is the independence of the effect sizes from the studies under consideration. This article presents a…

  9. Comparing chemistry to outcome: the development of a chemical distance metric, coupled with clustering and hierarchal visualization applied to macromolecular crystallography.

    Directory of Open Access Journals (Sweden)

    Andrew E Bruno

    Full Text Available Many bioscience fields employ high-throughput methods to screen multiple biochemical conditions. The analysis of these becomes tedious without a degree of automation. Crystallization, a rate limiting step in biological X-ray crystallography, is one of these fields. Screening of multiple potential crystallization conditions (cocktails is the most effective method of probing a proteins phase diagram and guiding crystallization but the interpretation of results can be time-consuming. To aid this empirical approach a cocktail distance coefficient was developed to quantitatively compare macromolecule crystallization conditions and outcome. These coefficients were evaluated against an existing similarity metric developed for crystallization, the C6 metric, using both virtual crystallization screens and by comparison of two related 1,536-cocktail high-throughput crystallization screens. Hierarchical clustering was employed to visualize one of these screens and the crystallization results from an exopolyphosphatase-related protein from Bacteroides fragilis, (BfR192 overlaid on this clustering. This demonstrated a strong correlation between certain chemically related clusters and crystal lead conditions. While this analysis was not used to guide the initial crystallization optimization, it led to the re-evaluation of unexplained peaks in the electron density map of the protein and to the insertion and correct placement of sodium, potassium and phosphate atoms in the structure. With these in place, the resulting structure of the putative active site demonstrated features consistent with active sites of other phosphatases which are involved in binding the phosphoryl moieties of nucleotide triphosphates. The new distance coefficient, CDcoeff, appears to be robust in this application, and coupled with hierarchical clustering and the overlay of crystallization outcome, reveals information of biological relevance. While tested with a single example the

  10. Clustering of Galaxies in a Hierarchical Universe 2 evolution to High Redshift

    CERN Document Server

    Kauffmann, G; Diaferio, A; White, S D M; Kauffmann, Guinevere; Colberg, Joerg M.; Diaferio, Antonaldo; White, Simon D.M.

    1998-01-01

    In hierarchical cosmologies the evolution of galaxy clustering depends both on cosmological quantities such as Omega and Lambda, which determine how dark matter halos form and evolve, and on the physical processes - cooling, star formation and feedback - which drive the formation of galaxies within these merging halos. In this paper, we combine dissipationless cosmological N-body simulations and semi-analytic models of galaxy formation in order to study how these two aspects interact. We focus on the differences in clustering predicted for galaxies of differing luminosity, colour, morphology and star formation rate and on what these differences can teach us about the galaxy formation process. We show that a "dip" in the amplitude of galaxy correlations between z=0 and z=1 can be an important diagnostic. Such a dip occurs in low-density CDM models because structure forms early and dark matter halos of 10**12 solar masses, containing galaxies with luminosities around L*, are unbiased tracers of the dark matter ...

  11. Quality Assured Optimal Resource Provisioning and Scheduling Technique Based on Improved Hierarchical Agglomerative Clustering Algorithm (IHAC

    Directory of Open Access Journals (Sweden)

    A. Meenakshi

    2016-08-01

    Full Text Available Resource allocation is the task of convenient resources to different uses. In the context of an resources, entire economy, can be assigned by different means, such as markets or central planning. Cloud computing has become a new age technology that has got huge potentials in enterprises and markets. Clouds can make it possible to access applications and associated data from anywhere. The fundamental motive of the resource allocation is to allot the available resource in the most effective manner. In the initial phase, a representative resource usage distribution for a group of nodes with identical resource usage patterns is evaluated as resource bundle which can be easily employed to locate a group of nodes fulfilling a standard criterion. In the document, an innovative clustering-based resource aggregation viz. the Improved Hierarchal Agglomerative Clustering Algorithm (IHAC is elegantly launched to realize the compact illustration of a set of identically behaving nodes for scalability. In the subsequent phase concerned with energetic resource allocation procedure, the hybrid optimization technique is brilliantly brought in. The novel technique is devised for scheduling functions to cloud resources which duly consider both financial and evaluation expenses. The efficiency of the novel Resource allocation system is assessed by means of several parameters such the reliability, reusability and certain other metrics. The optimal path choice is the consequence of the hybrid optimization approach. The new-fangled technique allocates the available resource based on the optimal path.

  12. Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms.

    Science.gov (United States)

    Esplin, M Sean; Manuck, Tracy A; Varner, Michael W; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M; Ilekis, John

    2015-09-01

    We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators' ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB. We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks' gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software. One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ(2) analysis (PD and DH, P cluster 3 of SPTB. We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Cluster based hierarchical resource searching model in P2P network

    Institute of Scientific and Technical Information of China (English)

    Yang Ruijuan; Liu Jian; Tian Jingwen

    2007-01-01

    For the problem of large network load generated by the Gnutella resource-searching model in Peer to Peer (P2P) network, a improved model to decrease the network expense is proposed, which establishes a duster in P2P network,auto-organizes logical layers, and applies a hybrid mechanism of directional searching and flooding. The performance analysis and simulation results show that the proposed hierarchical searching model has availably reduced the generated message load and that its searching-response time performance is as fairly good as that of the Gnutella model.

  14. Fabrication and analysis of gecko-inspired hierarchical polymer nanosetae.

    Science.gov (United States)

    Ho, Audrey Yoke Yee; Yeo, Lip Pin; Lam, Yee Cheong; Rodríguez, Isabel

    2011-03-22

    A gecko's superb ability to adhere to surfaces is widely credited to the large attachment area of the hierarchical and fibrillar structure on its feet. The combination of these two features provides the necessary compliance for the gecko toe-pad to effectively engage a high percentage of the spatulae at each step to any kind of surface topography. With the use of multi-tiered porous anodic alumina template and capillary force assisted nanoimprinting, we have successfully fabricated a gecko-inspired hierarchical topography of branched nanopillars on a stiff polymer. We also demonstrated that the hierarchical topography improved the shear adhesion force over a topography of linear structures by 150%. A systematic analysis to understand the phenomenon was performed. It was determined that the effective stiffness of the hierarchical branched structure was lower than that of the linear structure. The reduction in effective stiffness favored a more efficient bending of the branched topography and a better compliance to a test surface, hence resulting in a higher area of residual deformation. As the area of residual deformation increased, the shear adhesion force emulated. The branched pillar topography also showed a marked increase in hydrophobicity, which is an essential property in the practical applications of these structures for good self-cleaning in dry adhesion conditions.

  15. Cluster analysis of obesity and asthma phenotypes.

    Directory of Open Access Journals (Sweden)

    E Rand Sutherland

    Full Text Available BACKGROUND: Asthma is a heterogeneous disease with variability among patients in characteristics such as lung function, symptoms and control, body weight, markers of inflammation, and responsiveness to glucocorticoids (GC. Cluster analysis of well-characterized cohorts can advance understanding of disease subgroups in asthma and point to unsuspected disease mechanisms. We utilized an hypothesis-free cluster analytical approach to define the contribution of obesity and related variables to asthma phenotype. METHODOLOGY AND PRINCIPAL FINDINGS: In a cohort of clinical trial participants (n = 250, minimum-variance hierarchical clustering was used to identify clinical and inflammatory biomarkers important in determining disease cluster membership in mild and moderate persistent asthmatics. In a subset of participants, GC sensitivity was assessed via expression of GC receptor alpha (GCRα and induction of MAP kinase phosphatase-1 (MKP-1 expression by dexamethasone. Four asthma clusters were identified, with body mass index (BMI, kg/m(2 and severity of asthma symptoms (AEQ score the most significant determinants of cluster membership (F = 57.1, p<0.0001 and F = 44.8, p<0.0001, respectively. Two clusters were composed of predominantly obese individuals; these two obese asthma clusters differed from one another with regard to age of asthma onset, measures of asthma symptoms (AEQ and control (ACQ, exhaled nitric oxide concentration (F(ENO and airway hyperresponsiveness (methacholine PC(20 but were similar with regard to measures of lung function (FEV(1 (% and FEV(1/FVC, airway eosinophilia, IgE, leptin, adiponectin and C-reactive protein (hsCRP. Members of obese clusters demonstrated evidence of reduced expression of GCRα, a finding which was correlated with a reduced induction of MKP-1 expression by dexamethasone CONCLUSIONS AND SIGNIFICANCE: Obesity is an important determinant of asthma phenotype in adults. There is heterogeneity in

  16. Hierarchical multivariate covariance analysis of metabolic connectivity.

    Science.gov (United States)

    Carbonell, Felix; Charil, Arnaud; Zijdenbos, Alex P; Evans, Alan C; Bedell, Barry J

    2014-12-01

    Conventional brain connectivity analysis is typically based on the assessment of interregional correlations. Given that correlation coefficients are derived from both covariance and variance, group differences in covariance may be obscured by differences in the variance terms. To facilitate a comprehensive assessment of connectivity, we propose a unified statistical framework that interrogates the individual terms of the correlation coefficient. We have evaluated the utility of this method for metabolic connectivity analysis using [18F]2-fluoro-2-deoxyglucose (FDG) positron emission tomography (PET) data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. As an illustrative example of the utility of this approach, we examined metabolic connectivity in angular gyrus and precuneus seed regions of mild cognitive impairment (MCI) subjects with low and high β-amyloid burdens. This new multivariate method allowed us to identify alterations in the metabolic connectome, which would not have been detected using classic seed-based correlation analysis. Ultimately, this novel approach should be extensible to brain network analysis and broadly applicable to other imaging modalities, such as functional magnetic resonance imaging (MRI).

  17. Analysis and Optimisation of Hierarchically Scheduled Multiprocessor Embedded Systems

    DEFF Research Database (Denmark)

    Pop, Traian; Pop, Paul; Eles, Petru;

    2008-01-01

    , they are organised in a hierarchy. In this paper, we first develop a holistic scheduling and schedulability analysis that determines the timing properties of a hierarchically scheduled system. Second, we address design problems that are characteristic to such hierarchically scheduled systems: assignment......We present an approach to the analysis and optimisation of heterogeneous multiprocessor embedded systems. The systems are heterogeneous not only in terms of hardware components, but also in terms of communication protocols and scheduling policies. When several scheduling policies share a resource...... of scheduling policies to tasks, mapping of tasks to hardware components, and the scheduling of the activities. We also present several algorithms for solving these problems. Our heuristics are able to find schedulable implementations under limited resources, achieving an efficient utilisation of the system...

  18. New Alzheimer amyloid beta responsive genes identified in human neuroblastoma cells by hierarchical clustering.

    Directory of Open Access Journals (Sweden)

    Markus Uhrig

    Full Text Available Alzheimer's disease (AD is characterized by neuronal degeneration and cell loss. Abeta(42, in contrast to Abeta(40, is thought to be the pathogenic form triggering the pathological cascade in AD. In order to unravel overall gene regulation we monitored the transcriptomic responses to increased or decreased Abeta(40 and Abeta(42 levels, generated and derived from its precursor C99 (C-terminal fragment of APP comprising 99 amino acids in human neuroblastoma cells. We identified fourteen differentially expressed transcripts by hierarchical clustering and discussed their involvement in AD. These fourteen transcripts were grouped into two main clusters each showing distinct differential expression patterns depending on Abeta(40 and Abeta(42 levels. Among these transcripts we discovered an unexpected inverse and strong differential expression of neurogenin 2 (NEUROG2 and KIAA0125 in all examined cell clones. C99-overexpression had a similar effect on NEUROG2 and KIAA0125 expression as a decreased Abeta(42/Abeta(40 ratio. Importantly however, an increased Abeta(42/Abeta(40 ratio, which is typical of AD, had an inverse expression pattern of NEUROG2 and KIAA0125: An increased Abeta(42/Abeta(40 ratio up-regulated NEUROG2, but down-regulated KIAA0125, whereas the opposite regulation pattern was observed for a decreased Abeta(42/Abeta(40 ratio. We discuss the possibilities that the so far uncharacterized KIAA0125 might be a counter player of NEUROG2 and that KIAA0125 could be involved in neurogenesis, due to the involvement of NEUROG2 in developmental neural processes.

  19. Hierarchical models and the analysis of bird survey information

    Science.gov (United States)

    Sauer, J.R.; Link, W.A.

    2003-01-01

    Management of birds often requires analysis of collections of estimates. We describe a hierarchical modeling approach to the analysis of these data, in which parameters associated with the individual species estimates are treated as random variables, and probability statements are made about the species parameters conditioned on the data. A Markov-Chain Monte Carlo (MCMC) procedure is used to fit the hierarchical model. This approach is computer intensive, and is based upon simulation. MCMC allows for estimation both of parameters and of derived statistics. To illustrate the application of this method, we use the case in which we are interested in attributes of a collection of estimates of population change. Using data for 28 species of grassland-breeding birds from the North American Breeding Bird Survey, we estimate the number of species with increasing populations, provide precision-adjusted rankings of species trends, and describe a measure of population stability as the probability that the trend for a species is within a certain interval. Hierarchical models can be applied to a variety of bird survey applications, and we are investigating their use in estimation of population change from survey data.

  20. Hierarchical black hole triples in young star clusters: impact of Kozai-Lidov resonance on mergers

    CERN Document Server

    Kimpson, Thomas O; Mapelli, Michela; Ziosi, Brunetto M

    2016-01-01

    Mergers of compact object binaries are one of the most powerful sources of gravitational waves (GWs) in the frequency range of second-generation ground-based gravitational wave detectors (Advanced LIGO and Virgo). Dynamical simulations of young dense star clusters (SCs) indicate that ~27 per cent of all double compact object binaries are members of hierarchical triple systems (HTs). In this paper, we consider 570 HTs composed of three compact objects (black holes or neutron stars) that formed dynamically in N-body simulations of young dense SCs. We simulate them for a Hubble time with a new code based on the Mikkola's algorithmic regularization scheme, including the 2.5 post-Newtonian term. We find that ~88 per cent of the simulated systems develop Kozai-Lidov (KL) oscillations. KL resonance triggers the merger of the inner binary in three systems (corresponding to 0.5 per cent of the simulated HTs), by increasing the eccentricity of the inner binary. Accounting for KL oscillations leads to an increase of the...

  1. Hierarchical black hole triples in young star clusters: impact of Kozai-Lidov resonance on mergers

    Science.gov (United States)

    Kimpson, Thomas O.; Spera, Mario; Mapelli, Michela; Ziosi, Brunetto M.

    2016-12-01

    Mergers of compact-object binaries are one of the most powerful sources of gravitational waves (GWs) in the frequency range of second-generation ground-based GW detectors (advanced LIGO and Virgo). Dynamical simulations of young dense star clusters (SCs) indicate that ˜27 per cent of all double compact-object binaries are members of hierarchical triple systems (HTs). In this paper, we consider 570 HTs composed of three compact objects (black holes or neutron stars) that formed dynamically in N-body simulations of young dense SCs. We simulate them for a Hubble time with a new code based on the Mikkola's algorithmic regularization scheme, including the 2.5 post-Newtonian term. We find that ˜88 per cent of the simulated systems develop Kozai-Lidov (KL) oscillations. KL resonance triggers the merger of the inner binary in three systems (corresponding to 0.5 per cent of the simulated HTs), by increasing the eccentricity of the inner binary. Accounting for KL oscillations leads to an increase of the total expected merger rate by ≈50 per cent. All binaries that merge because of KL oscillations were formed by dynamical exchanges (i.e. none is a primordial binary) and have chirp mass >20 M⊙. This result might be crucial to interpret the formation channel of the first recently detected GW events.

  2. A new Hierarchical Group Key Management based on Clustering Scheme for Mobile Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    Ayman EL-SAYED

    2014-05-01

    Full Text Available The migration from wired network to wireless network has been a global trend in the past few decades because they provide anytime-anywhere networking services. The wireless networks are rapidly deployed in the future, secure wireless environment will be mandatory. As well, The mobility and scalability brought by wireless network made it possible in many applications. Among all the contemporary wireless networks,Mobile Ad hoc Networks (MANET is one of the most important and unique applications. MANET is a collection of autonomous nodes or terminals which communicate with each other by forming a multihop radio network and maintaining connectivity in a decentralized manner. Due to the nature of unreliable wireless medium data transfer is a major problem in MANET and it lacks security and reliability of data. The most suitable solution to provide the expected level of security to these services is the provision of a key management protocol. A Key management is vital part of security. This issue is even bigger in wireless network compared to wired network. The distribution of keys in an authenticated manner is a difficult task in MANET. When a member leaves or joins the group, it needs to generate a new key to maintain forward and backward secrecy. In this paper, we propose a new group key management schemes namely a Hierarchical, Simple, Efficient and Scalable Group Key (HSESGK based on clustering management scheme for MANETs and different other schemes are classified. Group members deduce the group key in a distributed manner.

  3. MAP-Based Underdetermined Blind Source Separation of Convolutive Mixtures by Hierarchical Clustering and -Norm Minimization

    Directory of Open Access Journals (Sweden)

    Kellermann Walter

    2007-01-01

    Full Text Available We address the problem of underdetermined BSS. While most previous approaches are designed for instantaneous mixtures, we propose a time-frequency-domain algorithm for convolutive mixtures. We adopt a two-step method based on a general maximum a posteriori (MAP approach. In the first step, we estimate the mixing matrix based on hierarchical clustering, assuming that the source signals are sufficiently sparse. The algorithm works directly on the complex-valued data in the time-frequency domain and shows better convergence than algorithms based on self-organizing maps. The assumption of Laplacian priors for the source signals in the second step leads to an algorithm for estimating the source signals. It involves the -norm minimization of complex numbers because of the use of the time-frequency-domain approach. We compare a combinatorial approach initially designed for real numbers with a second-order cone programming (SOCP approach designed for complex numbers. We found that although the former approach is not theoretically justified for complex numbers, its results are comparable to, or even better than, the SOCP solution. The advantage is a lower computational cost for problems with low input/output dimensions.

  4. Sun Protection Belief Clusters: Analysis of Amazon Mechanical Turk Data.

    Science.gov (United States)

    Santiago-Rivas, Marimer; Schnur, Julie B; Jandorf, Lina

    2016-12-01

    This study aimed (i) to determine whether people could be differentiated on the basis of their sun protection belief profiles and individual characteristics and (ii) explore the use of a crowdsourcing web service for the assessment of sun protection beliefs. A sample of 500 adults completed an online survey of sun protection belief items using Amazon Mechanical Turk. A two-phased cluster analysis (i.e., hierarchical and non-hierarchical K-means) was utilized to determine clusters of sun protection barriers and facilitators. Results yielded three distinct clusters of sun protection barriers and three distinct clusters of sun protection facilitators. Significant associations between gender, age, sun sensitivity, and cluster membership were identified. Results also showed an association between barrier and facilitator cluster membership. The results of this study provided a potential alternative approach to developing future sun protection promotion initiatives in the population. Findings add to our knowledge regarding individuals who support, oppose, or are ambivalent toward sun protection and inform intervention research by identifying distinct subtypes that may best benefit from (or have a higher need for) skin cancer prevention efforts.

  5. Study on Cluster Analysis Used with Laser-Induced Breakdown Spectroscopy

    Science.gov (United States)

    He, Li'ao; Wang, Qianqian; Zhao, Yu; Liu, Li; Peng, Zhong

    2016-06-01

    Supervised learning methods (eg. PLS-DA, SVM, etc.) have been widely used with laser-induced breakdown spectroscopy (LIBS) to classify materials; however, it may induce a low correct classification rate if a test sample type is not included in the training dataset. Unsupervised cluster analysis methods (hierarchical clustering analysis, K-means clustering analysis, and iterative self-organizing data analysis technique) are investigated in plastics classification based on the line intensities of LIBS emission in this paper. The results of hierarchical clustering analysis using four different similarity measuring methods (single linkage, complete linkage, unweighted pair-group average, and weighted pair-group average) are compared. In K-means clustering analysis, four kinds of choosing initial centers methods are applied in our case and their results are compared. The classification results of hierarchical clustering analysis, K-means clustering analysis, and ISODATA are analyzed. The experiment results demonstrated cluster analysis methods can be applied to plastics discrimination with LIBS. supported by Beijing Natural Science Foundation of China (No. 4132063)

  6. Obstructive Sleep Apnea: A Cluster Analysis at Time of Diagnosis

    Science.gov (United States)

    Grillet, Yves; Richard, Philippe; Stach, Bruno; Vivodtzev, Isabelle; Timsit, Jean-Francois; Lévy, Patrick; Tamisier, Renaud; Pépin, Jean-Louis

    2016-01-01

    Background The classification of obstructive sleep apnea is on the basis of sleep study criteria that may not adequately capture disease heterogeneity. Improved phenotyping may improve prognosis prediction and help select therapeutic strategies. Objectives: This study used cluster analysis to investigate the clinical clusters of obstructive sleep apnea. Methods An ascending hierarchical cluster analysis was performed on baseline symptoms, physical examination, risk factor exposure and co-morbidities from 18,263 participants in the OSFP (French national registry of sleep apnea). The probability for criteria to be associated with a given cluster was assessed using odds ratios, determined by univariate logistic regression. Results: Six clusters were identified, in which patients varied considerably in age, sex, symptoms, obesity, co-morbidities and environmental risk factors. The main significant differences between clusters were minimally symptomatic versus sleepy obstructive sleep apnea patients, lean versus obese, and among obese patients different combinations of co-morbidities and environmental risk factors. Conclusions Our cluster analysis identified six distinct clusters of obstructive sleep apnea. Our findings underscore the high degree of heterogeneity that exists within obstructive sleep apnea patients regarding clinical presentation, risk factors and consequences. This may help in both research and clinical practice for validating new prevention programs, in diagnosis and in decisions regarding therapeutic strategies. PMID:27314230

  7. Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition.

    Science.gov (United States)

    Liu, An-An; Su, Yu-Ting; Nie, Wei-Zhi; Kankanhalli, Mohan

    2017-01-01

    This paper proposes a hierarchical clustering multi-task learning (HC-MTL) method for joint human action grouping and recognition. Specifically, we formulate the objective function into the group-wise least square loss regularized by low rank and sparsity with respect to two latent variables, model parameters and grouping information, for joint optimization. To handle this non-convex optimization, we decompose it into two sub-tasks, multi-task learning and task relatedness discovery. First, we convert this non-convex objective function into the convex formulation by fixing the latent grouping information. This new objective function focuses on multi-task learning by strengthening the shared-action relationship and action-specific feature learning. Second, we leverage the learned model parameters for the task relatedness measure and clustering. In this way, HC-MTL can attain both optimal action models and group discovery by alternating iteratively. The proposed method is validated on three kinds of challenging datasets, including six realistic action datasets (Hollywood2, YouTube, UCF Sports, UCF50, HMDB51 & UCF101), two constrained datasets (KTH & TJU), and two multi-view datasets (MV-TJU & IXMAS). The extensive experimental results show that: 1) HC-MTL can produce competing performances to the state of the arts for action recognition and grouping; 2) HC-MTL can overcome the difficulty in heuristic action grouping simply based on human knowledge; 3) HC-MTL can avoid the possible inconsistency between the subjective action grouping depending on human knowledge and objective action grouping based on the feature subspace distributions of multiple actions. Comparison with the popular clustered multi-task learning further reveals that the discovered latent relatedness by HC-MTL aids inducing the group-wise multi-task learning and boosts the performance. To the best of our knowledge, ours is the first work that breaks the assumption that all actions are either

  8. Hierarchical clustering of ryanodine receptors enables emergence of a calcium clock in sinoatrial node cells.

    Science.gov (United States)

    Stern, Michael D; Maltseva, Larissa A; Juhaszova, Magdalena; Sollott, Steven J; Lakatta, Edward G; Maltsev, Victor A

    2014-05-01

    rate in response to β-adrenergic stimulation. The model indicates that the hierarchical clustering of surface RyRs in SANCs may be a crucial adaptive mechanism. Pathological desynchronization of the clocks may explain sinus node dysfunction in heart failure and RyR mutations.

  9. Weighted Clustering

    DEFF Research Database (Denmark)

    Ackerman, Margareta; Ben-David, Shai; Branzei, Simina

    2012-01-01

    We investigate a natural generalization of the classical clustering problem, considering clustering tasks in which different instances may have different weights.We conduct the first extensive theoretical analysis on the influence of weighted data on standard clustering algorithms in both...... the partitional and hierarchical settings, characterizing the conditions under which algorithms react to weights. Extending a recent framework for clustering algorithm selection, we propose intuitive properties that would allow users to choose between clustering algorithms in the weighted setting and classify...

  10. Modeling place field activity with hierarchical slow feature analysis

    Directory of Open Access Journals (Sweden)

    Fabian eSchoenfeld

    2015-05-01

    Full Text Available In this paper we present six experimental studies from the literature on hippocampal place cells and replicate their main results in a computational framework based on the principle of slowness. Each of the chosen studies first allows rodents to develop stable place field activity and then examines a distinct property of the established spatial encoding, namely adaptation to cue relocation and removal; directional firing activity in the linear track and open field; and results of morphing and stretching the overall environment. To replicate these studies we employ a hierarchical Slow Feature Analysis (SFA network. SFA is an unsupervised learning algorithm extracting slowly varying information from a given stream of data, and hierarchical application of SFA allows for high dimensional input such as visual images to be processed efficiently and in a biologically plausible fashion. Training data for the network is produced in ratlab, a free basic graphics engine designed to quickly set up a wide range of 3D environments mimicking real life experimental studies, simulate a foraging rodent while recording its visual input, and training & sampling a hierarchical SFA network.

  11. Scoring methods used in cluster analysis

    OpenAIRE

    Sirota, Sergej

    2014-01-01

    The aim of the thesis is to compare methods of cluster analysis correctly classify objects in the dataset into groups, which are known. In the theoretical section first describes the steps needed to prepare a data file for cluster analysis. The next theoretical section is dedicated to the cluster analysis, which describes ways of measuring similarity of objects and clusters, and dedicated to description the methods of cluster analysis used in practical part of this thesis. In practical part a...

  12. Gene-Set Local Hierarchical Clustering (GSLHC--A Gene Set-Based Approach for Characterizing Bioactive Compounds in Terms of Biological Functional Groups.

    Directory of Open Access Journals (Sweden)

    Feng-Hsiang Chung

    Full Text Available Gene-set-based analysis (GSA, which uses the relative importance of functional gene-sets, or molecular signatures, as units for analysis of genome-wide gene expression data, has exhibited major advantages with respect to greater accuracy, robustness, and biological relevance, over individual gene analysis (IGA, which uses log-ratios of individual genes for analysis. Yet IGA remains the dominant mode of analysis of gene expression data. The Connectivity Map (CMap, an extensive database on genomic profiles of effects of drugs and small molecules and widely used for studies related to repurposed drug discovery, has been mostly employed in IGA mode. Here, we constructed a GSA-based version of CMap, Gene-Set Connectivity Map (GSCMap, in which all the genomic profiles in CMap are converted, using gene-sets from the Molecular Signatures Database, to functional profiles. We showed that GSCMap essentially eliminated cell-type dependence, a weakness of CMap in IGA mode, and yielded significantly better performance on sample clustering and drug-target association. As a first application of GSCMap we constructed the platform Gene-Set Local Hierarchical Clustering (GSLHC for discovering insights on coordinated actions of biological functions and facilitating classification of heterogeneous subtypes on drug-driven responses. GSLHC was shown to tightly clustered drugs of known similar properties. We used GSLHC to identify the therapeutic properties and putative targets of 18 compounds of previously unknown characteristics listed in CMap, eight of which suggest anti-cancer activities. The GSLHC website http://cloudr.ncu.edu.tw/gslhc/ contains 1,857 local hierarchical clusters accessible by querying 555 of the 1,309 drugs and small molecules listed in CMap. We expect GSCMap and GSLHC to be widely useful in providing new insights in the biological effect of bioactive compounds, in drug repurposing, and in function-based classification of complex diseases.

  13. Nonlinear analysis of EAS clusters

    CERN Document Server

    Zotov, M Yu; Fomin, Y A; Fomin, Yu. A.

    2002-01-01

    We apply certain methods of nonlinear time series analysis to the extensive air shower clusters found earlier in the data set obtained with the EAS-1000 Prototype array. In particular, we use the Grassberger-Procaccia algorithm to compute the correlation dimension of samples in the vicinity of the clusters. The validity of the results is checked by surrogate data tests and some additional quantities. We compare our conclusions with the results of similar investigations performed by the EAS-TOP and LAAS groups.

  14. Hierarchical Visual Analysis and Steering Framework for Astrophysical Simulations

    Institute of Scientific and Technical Information of China (English)

    肖健; 张加万; 原野; 周鑫; 纪丽; 孙济洲

    2015-01-01

    A framework for accelerating modern long-running astrophysical simulations is presented, which is based on a hierarchical architecture where computational steering in the high-resolution run is performed under the guide of knowledge obtained in the gradually refined ensemble analyses. Several visualization schemes for facilitating ensem-ble management, error analysis, parameter grouping and tuning are also integrated owing to the pluggable modular design. The proposed approach is prototyped based on the Flash code, and it can be extended by introducing user-defined visualization for specific requirements. Two real-world simulations, i.e., stellar wind and supernova remnant, are carried out to verify the proposed approach.

  15. Supermodel Analysis of Galaxy Clusters

    CERN Document Server

    Fusco-Femiano, R; Lapi, A

    2009-01-01

    [abridged] We present the analysis of the X-ray brightness and temperature profiles for six clusters belonging to both the Cool Core and Non Cool Core classes, in terms of the Supermodel (SM) developed by Cavaliere, Lapi & Fusco-Femiano (2009). Based on the gravitational wells set by the dark matter halos, the SM straightforwardly expresses the equilibrium of the IntraCluster Plasma (ICP) modulated by the entropy deposited at the boundary by standing shocks from gravitational accretion, and injected at the center by outgoing blastwaves from mergers or from outbursts of Active Galactic Nuclei. The cluster set analyzed here highlights not only how simply the SM represents the main dichotomy Cool vs. Non Cool Core clusters in terms of a few ICP parameters governing the radial entropy run, but also how accurately it fits even complex brightness and temperature profiles. For Cool Core clusters like A2199 and A2597, the SM with a low level of central entropy straightforwardly yields the characteristic peaked pr...

  16. Category theoretic analysis of hierarchical protein materials and social networks.

    Directory of Open Access Journals (Sweden)

    David I Spivak

    Full Text Available Materials in biology span all the scales from Angstroms to meters and typically consist of complex hierarchical assemblies of simple building blocks. Here we describe an application of category theory to describe structural and resulting functional properties of biological protein materials by developing so-called ologs. An olog is like a "concept web" or "semantic network" except that it follows a rigorous mathematical formulation based on category theory. This key difference ensures that an olog is unambiguous, highly adaptable to evolution and change, and suitable for sharing concepts with other olog. We consider simple cases of beta-helical and amyloid-like protein filaments subjected to axial extension and develop an olog representation of their structural and resulting mechanical properties. We also construct a representation of a social network in which people send text-messages to their nearest neighbors and act as a team to perform a task. We show that the olog for the protein and the olog for the social network feature identical category-theoretic representations, and we proceed to precisely explicate the analogy or isomorphism between them. The examples presented here demonstrate that the intrinsic nature of a complex system, which in particular includes a precise relationship between structure and function at different hierarchical levels, can be effectively represented by an olog. This, in turn, allows for comparative studies between disparate materials or fields of application, and results in novel approaches to derive functionality in the design of de novo hierarchical systems. We discuss opportunities and challenges associated with the description of complex biological materials by using ologs as a powerful tool for analysis and design in the context of materiomics, and we present the potential impact of this approach for engineering, life sciences, and medicine.

  17. Cluster analysis of WIBS single particle bioaerosol data

    Directory of Open Access Journals (Sweden)

    N. H. Robinson

    2012-09-01

    Full Text Available Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial datasets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Waveband Integrated Bioaerosol Sensor (WIBS. The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL before being applied to two separate contemporaneous ambient WIBS datasets recorded in a forest site in Colorado, USA as part of the BEACHON-RoMBAS project. Cluster analysis results between both datasets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity to represent: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long term online PBAP measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics is improved.

  18. Cluster analysis of WIBS single particle bioaerosol data

    Science.gov (United States)

    Robinson, N. H.; Allan, J. D.; Huffman, J. A.; Kaye, P. H.; Foot, V. E.; Gallagher, M.

    2012-09-01

    Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial datasets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Waveband Integrated Bioaerosol Sensor (WIBS). The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL) before being applied to two separate contemporaneous ambient WIBS datasets recorded in a forest site in Colorado, USA as part of the BEACHON-RoMBAS project. Cluster analysis results between both datasets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity) to represent: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long term online PBAP measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics is improved.

  19. SPATIO-TEMPORAL CLUSTER ANALYSIS OF DISEASE

    Directory of Open Access Journals (Sweden)

    M. S. Abramovich

    2014-01-01

    Full Text Available The robust version of the spatial scanning statistics for clustering is proposed. Spatio-temporal cluster analysis algorithms were used for the cluster detection of incidence of thyroid carcinoma. Me-thods and algorithms of detection and building clusters for disease on studying territories are consi-dered.

  20. Classifying airborne radiometry data with Agglomerative Hierarchical Clustering: A tool for geological mapping in context of rainforest (French Guiana)

    Science.gov (United States)

    Martelet, G.; Truffert, C.; Tourlière, B.; Ledru, P.; Perrin, J.

    2006-09-01

    In highly weathered environments, it is crucial that geological maps provide information concerning both the regolith and the bedrock, for societal needs, such as land-use, mineral or water resources management. Often, geologists are facing the challenge of upgrading existing maps, as relevant information concerning weathering processes and pedogenesis is currently missing. In rugged areas in particular, where access to the field is difficult, ground observations are sparsely available, and need therefore to be complemented using methods based on remotely sensed data. For this purpose, we discuss the use of Agglomerative Hierarchical Clustering (AHC) on eU, K and eTh airborne gamma-ray spectrometry grids. The AHC process allows primarily to segment the geophysical maps into zones having coherent U, K and Th contents. The analysis of these contents are discussed in terms of geochemical signature for lithological attribution of classes, as well as the use of a dendrogram, which gives indications on the hierarchical relations between classes. Unsupervised classification maps resulting from AHC can be considered as spatial models of the distribution of the radioelement content in surface and sub-surface formations. The source of gamma rays emanating from the ground is primarily related to the geochemistry of the bedrock and secondarily to modifications of the radioelement distribution by weathering and other secondary mechanisms, such as mobilisation by wind or water. The interpretation of the obtained predictive classified maps, their U, K, Th contents, and the dendrogram, in light of available geological knowledge, allows to separate signatures related to regolith and solid geology. Consequently, classification maps can be integrated within a GIS environment and used by the geologist as a support for mapping bedrock lithologies and their alteration. We illustrate the AHC classification method in the region of Cayenne using high-resolution airborne radiometric data

  1. Identification of Counterfeit Alcoholic Beverages Using Cluster Analysis in Principal-Component Space

    Science.gov (United States)

    Khodasevich, M. A.; Sinitsyn, G. V.; Gres'ko, M. A.; Dolya, V. M.; Rogovaya, M. V.; Kazberuk, A. V.

    2017-07-01

    A study of 153 brands of commercial vodka products showed that counterfeit samples could be identified by introducing a unified additive at the minimum concentration acceptable for instrumental detection and multivariate analysis of UV-Vis transmission spectra. Counterfeit products were detected with 100% probability by using hierarchical cluster analysis or the C-means method in two-dimensional principal-component space.

  2. Cluster Analysis of the Newcastle Electronic Corpus of Tyneside English: A Comparison of Methods

    NARCIS (Netherlands)

    Moisl, Hermann; Jones, Val

    2005-01-01

    This article examines the feasibility of an empirical approach to sociolinguistic analysis of the Newcastle Electronic Corpus of Tyneside English using exploratory multivariate methods. It addresses a known problem with one class of such methods, hierarchical cluster analysis¿that different clusteri

  3. Hierarchical resource analysis for land use planning through remote sensing

    Science.gov (United States)

    Byrnes, B. H.; Frazee, C. J.; Cox, T. L.

    1976-01-01

    A hierarchical resource analysis was applied to remote sensing data to provide maps at Planning Levels I and III (Anderson et al., U.S. Geological Survey Circular 671) for Meade County, S. Dak. Level I land use and general soil maps were prepared by visual interpretation of imagery from a false color composite of Landsat MSS bands 4, 5, and 7 and single bands (5 and 7). A modified Level III land use map was prepared for the Black Hills area from RB-57 photography enlarged to a scale of 1:24,000. Level III land use data were used together with computer-generated interpretive soil maps to analyze relationships between developed and developing areas and soil criteria.

  4. Analysis of stability of community structure across multiple hierarchical levels

    CERN Document Server

    Li, Hui-Jia

    2015-01-01

    The analysis of stability of community structure is an important problem for scientists from many fields. Here, we propose a new framework to reveal hidden properties of community structure by quantitatively analyzing the dynamics of Potts model. Specifically we model the Potts procedure of community structure detection by a Markov process, which has a clear mathematical explanation. Critical topological information regarding to multivariate spin configuration could also be inferred from the spectral significance of the Markov process. We test our framework on some example networks and find it doesn't have resolute limitation problem at all. Results have shown the model we proposed is able to uncover hierarchical structure in different scales effectively and efficiently.

  5. The SMART CLUSTER METHOD - adaptive earthquake cluster analysis and declustering

    Science.gov (United States)

    Schaefer, Andreas; Daniell, James; Wenzel, Friedemann

    2016-04-01

    Earthquake declustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity with usual applications comprising of probabilistic seismic hazard assessments (PSHAs) and earthquake prediction methods. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation. Various methods have been developed to address this issue from other researchers. These have differing ranges of complexity ranging from rather simple statistical window methods to complex epidemic models. This study introduces the smart cluster method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal identification. Hereby, an adaptive search algorithm for data point clusters is adopted. It uses the earthquake density in the spatio-temporal neighbourhood of each event to adjust the search properties. The identified clusters are subsequently analysed to determine directional anisotropy, focussing on a strong correlation along the rupture plane and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010/2011 Darfield-Christchurch events, an adaptive classification procedure is applied to disassemble subsequent ruptures which may have been grouped into an individual cluster using near-field searches, support vector machines and temporal splitting. The steering parameters of the search behaviour are linked to local earthquake properties like magnitude of completeness, earthquake density and Gutenberg-Richter parameters. The method is capable of identifying and classifying earthquake clusters in space and time. It is tested and validated using earthquake data from California and New Zealand. As a result of the cluster identification process, each event in

  6. Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms

    Science.gov (United States)

    Esplin, M Sean; Manuck, Tracy A.; Varner, Michael W.; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M.; Ilekis, John

    2015-01-01

    Objective We sought to employ an innovative tool based on common biological pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB), in order to enhance investigators' ability to identify to highlight common mechanisms and underlying genetic factors responsible for SPTB. Study Design A secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks gestation. Each woman was assessed for the presence of underlying SPTB etiologies. A hierarchical cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis using VEGAS software. Results 1028 women with SPTB were assigned phenotypes. Hierarchical clustering of the phenotypes revealed five major clusters. Cluster 1 (N=445) was characterized by maternal stress, cluster 2 (N=294) by premature membrane rupture, cluster 3 (N=120) by familial factors, and cluster 4 (N=63) by maternal comorbidities. Cluster 5 (N=106) was multifactorial, characterized by infection (INF), decidual hemorrhage (DH) and placental dysfunction (PD). These three phenotypes were highly correlated by Chi-square analysis [PD and DH (p<2.2e-6); PD and INF (p=6.2e-10); INF and DH (p=0.0036)]. Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. Conclusion We identified 5 major clusters of SPTB based on a phenotype tool and hierarchal clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors underlying SPTB. PMID:26070700

  7. Cluster Analysis of Ranunculus Species

    Directory of Open Access Journals (Sweden)

    SURANTO

    2002-01-01

    Full Text Available The aim of the experiment was to examine whether the morphological characters of eleven species of Ranunculus collected from a number of populations were in agreement with the genetic data (isozyme. The method used in this study was polyacrilamide gel electrophoresis using peroxides, estarase, malate dehydrogenase, and acid phosphatase enzymes. The results showed that cluster analysis based on isozyme data have given a good support to classification of eleven species based on morphological groups. This study concluded that in certain species each morphological variation was profit to be genetically based.

  8. Teaching a machine to see: unsupervised image segmentation and categorisation using growing neural gas and hierarchical clustering

    CERN Document Server

    Hocking, Alex; Davey, Neil; Sun, Yi

    2015-01-01

    We present a novel unsupervised learning approach to automatically segment and label images in astronomical surveys. Automation of this procedure will be essential as next-generation surveys enter the petabyte scale: data volumes will exceed the capability of even large crowd-sourced analyses. We demonstrate how a growing neural gas (GNG) can be used to encode the feature space of imaging data. When coupled with a technique called hierarchical clustering, imaging data can be automatically segmented and labelled by organising nodes in the GNG. The key distinction of unsupervised learning is that these labels need not be known prior to training, rather they are determined by the algorithm itself. Importantly, after training a network can be be presented with images it has never 'seen' before and provide consistent categorisation of features. As a proof-of-concept we demonstrate application on data from the Hubble Space Telescope Frontier Fields: images of clusters of galaxies containing a mixture of galaxy type...

  9. Predicting the decision to pursue mediation in civil disputes: a hierarchical classes analysis.

    Science.gov (United States)

    Reich, Warren A; Kressel, Kenneth; Scanlon, Kathleen M; Weiner, Gary A

    2007-11-01

    Clients (N = 185) involved in civil court cases completed the CPR Institute's Mediation Screen, which is designed to assist in making a decision about pursuing mediation. The authors modeled data using hierarchical classes analysis (HICLAS), a clustering algorithm that places clients into 1 set of classes and CPRMS items into another set of classes. HICLAS then links the sets of classes so that any class of clients can be identified in terms of the classes of items they endorsed. HICLAS-derived item classes reflected 2 underlying themes: (a) suitability of the dispute for a problem-solving process and (b) potential benefits of mediation. All clients who perceived that mediation would be beneficial also believed that the context of their conflict was favorable to mediation; however, not all clients who saw a favorable context believed they would benefit from mediation. The majority of clients who agreed to pursue mediation endorsed items reflecting both contextual suitability and perceived benefits of mediation.

  10. 基于层次聚类分析与数据图形化技术探讨 少腹逐瘀汤与温经汤的组方配伍特点%Explore the combination relationship of Shaofu Zhuyu Decoction and Wenjing Decoction based on hierarchical clustering analysis and data visualization technology

    Institute of Scientific and Technical Information of China (English)

    宿树兰; 叶亮; 尚尔鑫; 范欣生; 段金廒; 华永庆; 唐于平

    2011-01-01

    Objective: To explore the combination relationship of Shaofu Zhuyu Decoction (SFZYD) and Wenjing Decoction (WJD) based on the methods of hierarchical clustering analysis and data visualization in order to provide guidance for modern research of the formula. Methods: The data mining technology based on the hierarchical clustering analysis and data visualization was used to analysis the complicated correlations of Shaofu Zhuyu decoction and Wenjing Decoction after qualitative information of drugs property. Results: The analytic results stated that the drugs in SFZYD were classified four clusters of Danggui-Rougui-Xiaohuixiang-Ganjiang, Moyao-Yuanhu, Puhuang - Wulingzhi, and Chuanxiong-Chishao. WJD were classified four clusters of Shengjiang-Banxia-Wuzhuyu- Danggui. Renshen-Mandong-Gancao, Shaoyao, and Ajiao-Danpi-Guizhi-Chuanxiong. The graphic models of Xing, and Wei, and Guijing stated that the properties of drugs in SFZYD are mainly distribute in warm and hot scope, but the properties of drugs in WJD are distributed in two scopes; About the Guijing, the drugs of SFZYD fasten on Pi (Wei), Gan (Dan), Xin (Xiaochang), while the drugs of WJD fasten on Pi (Wei), Gan (Dan), Fei (Da chang), Shen (Pangguang). The Pi (Wei) and Gan (Dan) channels are the important distribution areas. The pungent, sweet and hard are the major taste. Conclusion: The results are agreeing with the theory of TCM and provide guidance for modern research of TCM formulae. The multi-mathematical analysis methods may be feasible for research the complex correlations.%目的:研究少腹逐瘀汤与温经汤的配伍特点,为治疗妇科瘀血腹痛提供治疗思路及理论依据.方法:对少腹逐瘀汤与温经汤组方药物的药性信息进行量化处理,采用层次聚类分析法、数据图形化技术分别对两方组方特点进行分析.结果:聚类分析结果表明:少腹逐瘀汤10味药物根据其性、味、归经属性聚类为当归、肉桂、小茴香与干姜;没药与

  11. Data Preprocessing in Cluster Analysis of Gene Expression

    Institute of Scientific and Technical Information of China (English)

    杨春梅; 万柏坤; 高晓峰

    2003-01-01

    Considering that the DNA microarray technology has generated explosive gene expression data and that it is urgent to analyse and to visualize such massive datasets with efficient methods, we investigate the data preprocessing methods used in cluster analysis, normalization or logarithm of the matrix, by using hierarchical clustering, principal component analysis (PCA) and self-organizing maps (SOMs). The results illustrate that when using the Euclidean distance as measuring metrics, logarithm of relative expression level is the best preprocessing method, while data preprocessed by normalization cannot attain the expected results because the data structure is ruined. If there are only a few principal components, the PCA is an effective method to extract the frame structure, while SOMs are more suitable for a specific structure.

  12. 2 x 2 Achievement Goals and Achievement Emotions: A Cluster Analysis of Students' Motivation

    Science.gov (United States)

    Jang, Leong Yeok; Liu, Woon Chia

    2012-01-01

    This study sought to better understand the adoption of multiple achievement goals at an intra-individual level, and its links to emotional well-being, learning, and academic achievement. Participants were 480 Secondary Two students (aged between 13 and 14 years) from two coeducational government schools. Hierarchical cluster analysis revealed the…

  13. 2 x 2 Achievement Goals and Achievement Emotions: A Cluster Analysis of Students' Motivation

    Science.gov (United States)

    Jang, Leong Yeok; Liu, Woon Chia

    2012-01-01

    This study sought to better understand the adoption of multiple achievement goals at an intra-individual level, and its links to emotional well-being, learning, and academic achievement. Participants were 480 Secondary Two students (aged between 13 and 14 years) from two coeducational government schools. Hierarchical cluster analysis revealed the…

  14. Energy Efficient Backoff Hierarchical Clustering Algorithms for Multi-Hop Wireless Sensor Networks

    Institute of Scientific and Technical Information of China (English)

    Jun Wang; Yong-Tao Cao; Jun-Yuan Xie; Shi-Fu Chen

    2011-01-01

    Compared with flat routing protocols, clustering is a fundamental performance improvement technique in wireless sensor networks, which can increase network scalability and lifetime. In this paper, we integrate the multi-hop technique with a backoff-based clustering algorithm to organize sensors. By using an adaptive backoff strategy, the algorithm not only realizes load balance among sensor node, but also achieves fairly uniform cluster head distribution across the network. Simulation results also demonstrate our algorithm is more energy-efficient than classical ones. Our algorithm is also easily extended to generate a hierarchy of cluster heads to obtain better network management and energy-efficiency.

  15. Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.

    Science.gov (United States)

    Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D

    2017-06-01

    Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.

  16. Manual hierarchical clustering of regional geochemical data using a Bayesian finite mixture model

    Science.gov (United States)

    Ellefsen, Karl J.; Smith, David

    2016-01-01

    Interpretation of regional scale, multivariate geochemical data is aided by a statistical technique called “clustering.” We investigate a particular clustering procedure by applying it to geochemical data collected in the State of Colorado, United States of America. The clustering procedure partitions the field samples for the entire survey area into two clusters. The field samples in each cluster are partitioned again to create two subclusters, and so on. This manual procedure generates a hierarchy of clusters, and the different levels of the hierarchy show geochemical and geological processes occurring at different spatial scales. Although there are many different clustering methods, we use Bayesian finite mixture modeling with two probability distributions, which yields two clusters. The model parameters are estimated with Hamiltonian Monte Carlo sampling of the posterior probability density function, which usually has multiple modes. Each mode has its own set of model parameters; each set is checked to ensure that it is consistent both with the data and with independent geologic knowledge. The set of model parameters that is most consistent with the independent geologic knowledge is selected for detailed interpretation and partitioning of the field samples.

  17. Analysis of Energy Optimized Hierarchical Routing Protocols in WSN

    Directory of Open Access Journals (Sweden)

    Er. Shelly Jain

    2013-05-01

    Full Text Available Modern wireless sensor network can be expanded into large geographical areas via cheap sensor devices which can sustain themselves with limited energy and developing an energy efficient protocol is a major challenge. Currently, routing in the wireless sensor network faces multiple challenges, such as new scalability, coverage, packet loss, interference, real-time audio and real time video streaming, weather reports, energy constraints and so forth. Clustering sensor nodes is an effective topology control approach. LEACH is an energy efficient clustering protocol because of its nodes distribution capabilities but still it has limitations because it leads to uneven energy distribution. PEGASIS is an enhancement of LEACH using chain-based technique to optimize the energy consumption. This protocol also has certain disadvantages like delays in larger networks etc. HEED is an advanced version of protocol which removes the disadvantages of LEACH and PEGASIS by using distributed algorithm for selecting the cluster heads (CH. It does not make any assumptions about the infrastructure or capabilities of nodes. LEACH, PEGASIS and HEED routing algorithms are compared using Matlab simulation on a Wi-Max network and the results & analysis are based upon the simulation experiments. Simulation results demonstrate that HEED is effective in prolonging the network lifetime and also overcomes the disadvantages of both LEACH & PEGASIS

  18. Cluster analysis in phenotyping a Portuguese population.

    Science.gov (United States)

    Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J

    2015-09-03

    Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.

  19. Energy Efficient Zone Division Multihop Hierarchical Clustering Algorithm for Load Balancing in Wireless Sensor Network

    Directory of Open Access Journals (Sweden)

    Ashim Kumar Ghosh

    2011-12-01

    Full Text Available Wireless sensor nodes are use most embedded computing application. Multihop cluster hierarchy has been presented for large wireless sensor networks (WSNs that can provide scalable routing, data aggregation, and querying. The energy consumption rate for sensors in a WSN varies greatly based on the protocols the sensors use for communications. In this paper we present a cluster based routing algorithm. One of our main goals is to design the energy efficient routing protocol. Here we try to solve the usual problems of WSNs. We know the efficiency of WSNs depend upon the distance between node to base station and the amount of data to be transferred and the performance of clustering is greatly influenced by the selection of cluster-heads, which are in charge of creating clusters and controlling member nodes. This algorithm makes the best use of node with low number of cluster head know as super node. Here we divided the full region in four equal zones and the centre area of the region is used to select for super node. Each zone is considered separately and the zone may be or not divided further that’s depending upon the density of nodes in that zone and capability of the super node. This algorithm forms multilayer communication. The no of layer depends on the network current load and statistics. Our algorithm is easily extended to generate a hierarchy of cluster heads to obtain better network management and energy efficiency.

  20. Spatial Hierarchical Bayesian Analysis of the Historical Extreme Streamflow

    Science.gov (United States)

    Najafi, M. R.; Moradkhani, H.

    2012-04-01

    Analysis of the climate change impact on extreme hydro-climatic events is crucial for future hydrologic/hydraulic designs and water resources decision making. The purpose of this study is to investigate the changes of the extreme value distribution parameters with respect to time to reflect upon the impact of climate change. We develop a statistical model using the observed streamflow data of the Columbia River Basin in USA to estimate the changes of high flows as a function of time as well as other variables. Generalized Pareto Distribution (GPD) is used to model the upper 95% flows during December through March for 31 gauge stations. In the process layer of the model the covariates including time, latitude, longitude, elevation and basin area are considered to assess the sensitivity of the model to each variable. Markov Chain Monte Carlo (MCMC) method is used to estimate the parameters. The Spatial Hierarchical Bayesian technique models the GPD parameters spatially and borrows strength from other locations by pooling data together, while providing an explicit estimation of the uncertainties in all stages of modeling.

  1. The applicability and effectiveness of cluster analysis

    Science.gov (United States)

    Ingram, D. S.; Actkinson, A. L.

    1973-01-01

    An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.

  2. Inter-Cluster Routing Authentication for Ad Hoc Networks by a Hierarchical Key Scheme

    Institute of Scientific and Technical Information of China (English)

    Yueh-Min Huang; Hua-Yi Lin; Tzone-I Wang

    2006-01-01

    Dissimilar to traditional networks, the features of mobile wireless devices that can actively form a network without any infrastructure mean that mobile ad hoc networks frequently display partition due to node mobility or link failures. These indicate that an ad hoc network is difficult to provide on-line access to a trusted authority server. Therefore,applying traditional Public Key Infrastructure (PKI) security framework to mobile ad hoc networks will cause insecurities.This study proposes a scalable and elastic key management scheme integrated into Cluster Based Secure Routing Protocol (CBSRP) to enhance security and non-repudiation of routing authentication, and introduces an ID-Based internal routing authentication scheme to enhance the routing performance in an internal cluster. Additionally, a method of performing routing authentication between internal and external clusters, as well as inter-cluster routing authentication, is developed.The proposed cluster-based key management scheme distributes trust to an aggregation of cluster heads using a threshold scheme faculty, provides Certificate Authority (CA) with a fault tolerance mechanism to prevent a single point of compromise or failure, and saves CA large repositories from maintaining member certificates, making ad hoc networks robust to malicious behaviors and suitable for numerous mobile devices.

  3. Cluster analysis of WIBS single-particle bioaerosol data

    Science.gov (United States)

    Robinson, N. H.; Allan, J. D.; Huffman, J. A.; Kaye, P. H.; Foot, V. E.; Gallagher, M.

    2013-02-01

    Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial data sets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Wideband Integrated Bioaerosol Sensors (WIBSs). The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL) before being applied to two separate contemporaneous ambient WIBS data sets recorded in a forest site in Colorado, USA, as part of the BEACHON-RoMBAS project. Cluster analysis results between both data sets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity) to represent the following: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long-term online primary biological aerosol particle (PBAP) measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics are improved.

  4. Cluster analysis of WIBS single-particle bioaerosol data

    Directory of Open Access Journals (Sweden)

    N. H. Robinson

    2013-02-01

    Full Text Available Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial data sets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Wideband Integrated Bioaerosol Sensors (WIBSs. The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL before being applied to two separate contemporaneous ambient WIBS data sets recorded in a forest site in Colorado, USA, as part of the BEACHON-RoMBAS project. Cluster analysis results between both data sets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity to represent the following: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long-term online primary biological aerosol particle (PBAP measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics are improved.

  5. From Snakes to Stars, the Statistics of Collapsed Objects - II. Testing a Generic Scaling Ansatz for Hierarchical Clustering

    CERN Document Server

    Munshi, D; Melott, A L; Munshi, Dipak; Coles, Peter; Melott, Adrian L.

    1999-01-01

    We develop a diagrammatic technique to represent the multi-point cumulative probability density function (CPDF) of mass fluctuations in terms of the statistical properties of individual collapsed objects and relate this to other statistical descriptors such as cumulants, cumulant correlators and factorial moments. We use this approach to establish key scaling relations describing various measurable statistical quantities if clustering follows a simple general scaling ansatz, as expected in hierarchical models. We test these detailed predictions against high-resolution numerical simulations. We show that, when appropriate variables are used, the count probability distribution function (CPDF) and void probability distribution function (VPF) shows clear scaling properties in the non-linear regime. Generalising the results to the two-point count probability distribution function (2CPDF), and the bivariate void probability function (2VPF) we find good match with numerical simulations. We explore the behaviour of t...

  6. Registration Cost Performance Analysis of a Hierarchical Mobile Internet Protocol Network

    Institute of Scientific and Technical Information of China (English)

    XU Kai; JI Hong; YUE Guang-xin

    2004-01-01

    On the basis of introducing principles for hierarchical mobile Internet protocol networks, the registration cost performance in this network model is analyzed in detail. Furthermore, the functional relationship is also established in the paper among registration cost, hierarchical level number and the maximum handover time for gateway foreign agent regional registration. At last, the registration cost of the hierarchical mobile Internet protocol network is compared with that of the traditional mobile Internet protocol. Theoretic analysis and computer simulation results show that the hierarchical level number and the maximum handover times can both affect the registration cost importantly, when suitable values of which are chosen, the hierarchical network can significantly improve the registration performance compared with the traditional mobile IP.

  7. Application of Multi-SOM clustering approach to macrophage gene expression analysis.

    Science.gov (United States)

    Ghouila, Amel; Yahia, Sadok Ben; Malouche, Dhafer; Jmel, Haifa; Laouini, Dhafer; Guerfali, Fatma Z; Abdelhak, Sonia

    2009-05-01

    The production of increasingly reliable and accessible gene expression data has stimulated the development of computational tools to interpret such data and to organize them efficiently. The clustering techniques are largely recognized as useful exploratory tools for gene expression data analysis. Genes that show similar expression patterns over a wide range of experimental conditions can be clustered together. This relies on the hypothesis that genes that belong to the same cluster are coregulated and involved in related functions. Nevertheless, clustering algorithms still show limits, particularly for the estimation of the number of clusters and the interpretation of hierarchical dendrogram, which may significantly influence the outputs of the analysis process. We propose here a multi level SOM based clustering algorithm named Multi-SOM. Through the use of clustering validity indices, Multi-SOM overcomes the problem of the estimation of clusters number. To test the validity of the proposed clustering algorithm, we first tested it on supervised training data sets. Results were evaluated by computing the number of misclassified samples. We have then used Multi-SOM for the analysis of macrophage gene expression data generated in vitro from the same individual blood infected with 5 different pathogens. This analysis led to the identification of sets of tightly coregulated genes across different pathogens. Gene Ontology tools were then used to estimate the biological significance of the clustering, which showed that the obtained clusters are coherent and biologically significant.

  8. 基于类轮廓层次聚类方法的研究%RESEARCH ON CLASS-PROFILE-BASED HIERARCHICAL CLUSTERING METHOD

    Institute of Scientific and Technical Information of China (English)

    孟海东; 唐旋

    2011-01-01

    传统的聚类算法在考虑类与类之间的连通性特征和近似性特征上往往顾此失彼.首先给出类边界点和类轮廓的基本定义以及寻求方法,然后基于类间连通性特征和近似性特征的综合考虑,拟定一些类间相似性度量标准和方法,最后提出一种基于类轮廓的层次聚类算法.该算法能够有效处理任意形状的簇,且能够区分孤立点和噪声数据.通过对图像数据集和Iris标准数据集的聚类分析,验证了该算法的可行性和有效性.%Traditional clustering algorithms are often incapable of roundly considering the connectivity and similarity characteristics among classes. The thesis firstly presents the fundamental definition of class boundary point and class profile; secondly, with comprehensive consideration based on connectivity characteristics and similarity characteristics among classes, defines some standards and methods for inter class similarity measurement; thirdly, proposes a class-profile-based hierarchical clustering algorithm, which is able to effectively process arbitrary shaped clusters and distinguish isolated points from noise data. The feasibility and effectiveness of the algorithm is validated through clustering analysis on image data sets and Iris standard data sets.

  9. Hierarchical Scheduling Framework Based on Compositional Analysis Using Uppaal

    DEFF Research Database (Denmark)

    Boudjadar, Jalil; David, Alexandre; Kim, Jin Hyun

    2014-01-01

    This paper introduces a reconfigurable compositional scheduling framework, in which the hierarchical structure, the scheduling policies, the concrete task behavior and the shared resources can all be reconfigured. The behavior of each periodic preemptive task is given as a list of timed actions, ...

  10. Implementation of hybrid clustering based on partitioning around medoids algorithm and divisive analysis on human Papillomavirus DNA

    Science.gov (United States)

    Arimbi, Mentari Dian; Bustamam, Alhadi; Lestari, Dian

    2017-03-01

    Data clustering can be executed through partition or hierarchical method for many types of data including DNA sequences. Both clustering methods can be combined by processing partition algorithm in the first level and hierarchical in the second level, called hybrid clustering. In the partition phase some popular methods such as PAM, K-means, or Fuzzy c-means methods could be applied. In this study we selected partitioning around medoids (PAM) in our partition stage. Furthermore, following the partition algorithm, in hierarchical stage we applied divisive analysis algorithm (DIANA) in order to have more specific clusters and sub clusters structures. The number of main clusters is determined using Davies Bouldin Index (DBI) value. We choose the optimal number of clusters if the results minimize the DBI value. In this work, we conduct the clustering on 1252 HPV DNA sequences data from GenBank. The characteristic extraction is initially performed, followed by normalizing and genetic distance calculation using Euclidean distance. In our implementation, we used the hybrid PAM and DIANA using the R open source programming tool. In our results, we obtained 3 main clusters with average DBI value is 0.979, using PAM in the first stage. After executing DIANA in the second stage, we obtained 4 sub clusters for Cluster-1, 9 sub clusters for Cluster-2 and 2 sub clusters in Cluster-3, with the BDI value 0.972, 0.771, and 0.768 for each main cluster respectively. Since the second stage produce lower DBI value compare to the DBI value in the first stage, we conclude that this hybrid approach can improve the accuracy of our clustering results.

  11. caBIG™ VISDA: Modeling, visualization, and discovery for cluster analysis of genomic data

    Directory of Open Access Journals (Sweden)

    Xuan Jianhua

    2008-09-01

    Full Text Available Abstract Background The main limitations of most existing clustering methods used in genomic data analysis include heuristic or random algorithm initialization, the potential of finding poor local optima, the lack of cluster number detection, an inability to incorporate prior/expert knowledge, black-box and non-adaptive designs, in addition to the curse of dimensionality and the discernment of uninformative, uninteresting cluster structure associated with confounding variables. Results In an effort to partially address these limitations, we develop the VIsual Statistical Data Analyzer (VISDA for cluster modeling, visualization, and discovery in genomic data. VISDA performs progressive, coarse-to-fine (divisive hierarchical clustering and visualization, supported by hierarchical mixture modeling, supervised/unsupervised informative gene selection, supervised/unsupervised data visualization, and user/prior knowledge guidance, to discover hidden clusters within complex, high-dimensional genomic data. The hierarchical visualization and clustering scheme of VISDA uses multiple local visualization subspaces (one at each node of the hierarchy and consequent subspace data modeling to reveal both global and local cluster structures in a "divide and conquer" scenario. Multiple projection methods, each sensitive to a distinct type of clustering tendency, are used for data visualization, which increases the likelihood that cluster structures of interest are revealed. Initialization of the full dimensional model is based on first learning models with user/prior knowledge guidance on data projected into the low-dimensional visualization spaces. Model order selection for the high dimensional data is accomplished by Bayesian theoretic criteria and user justification applied via the hierarchy of low-dimensional visualization subspaces. Based on its complementary building blocks and flexible functionality, VISDA is generally applicable for gene clustering, sample

  12. caBIG VISDA: modeling, visualization, and discovery for cluster analysis of genomic data.

    Science.gov (United States)

    Zhu, Yitan; Li, Huai; Miller, David J; Wang, Zuyi; Xuan, Jianhua; Clarke, Robert; Hoffman, Eric P; Wang, Yue

    2008-09-18

    The main limitations of most existing clustering methods used in genomic data analysis include heuristic or random algorithm initialization, the potential of finding poor local optima, the lack of cluster number detection, an inability to incorporate prior/expert knowledge, black-box and non-adaptive designs, in addition to the curse of dimensionality and the discernment of uninformative, uninteresting cluster structure associated with confounding variables. In an effort to partially address these limitations, we develop the VIsual Statistical Data Analyzer (VISDA) for cluster modeling, visualization, and discovery in genomic data. VISDA performs progressive, coarse-to-fine (divisive) hierarchical clustering and visualization, supported by hierarchical mixture modeling, supervised/unsupervised informative gene selection, supervised/unsupervised data visualization, and user/prior knowledge guidance, to discover hidden clusters within complex, high-dimensional genomic data. The hierarchical visualization and clustering scheme of VISDA uses multiple local visualization subspaces (one at each node of the hierarchy) and consequent subspace data modeling to reveal both global and local cluster structures in a "divide and conquer" scenario. Multiple projection methods, each sensitive to a distinct type of clustering tendency, are used for data visualization, which increases the likelihood that cluster structures of interest are revealed. Initialization of the full dimensional model is based on first learning models with user/prior knowledge guidance on data projected into the low-dimensional visualization spaces. Model order selection for the high dimensional data is accomplished by Bayesian theoretic criteria and user justification applied via the hierarchy of low-dimensional visualization subspaces. Based on its complementary building blocks and flexible functionality, VISDA is generally applicable for gene clustering, sample clustering, and phenotype clustering

  13. Hierarchical linear modeling of longitudinal pedigree data for genetic association analysis.

    Science.gov (United States)

    Tan, Qihua; B Hjelmborg, Jacob V; Thomassen, Mads; Jensen, Andreas Kryger; Christiansen, Lene; Christensen, Kaare; Zhao, Jing Hua; Kruse, Torben A

    2014-01-01

    Genetic association analysis on complex phenotypes under a longitudinal design involving pedigrees encounters the problem of correlation within pedigrees, which could affect statistical assessment of the genetic effects. Approaches have been proposed to integrate kinship correlation into the mixed-effect models to explicitly model the genetic relationship. These have proved to be an efficient way of dealing with sample clustering in pedigree data. Although current algorithms implemented in popular statistical packages are useful for adjusting relatedness in the mixed modeling of genetic effects on the mean level of a phenotype, they are not sufficiently straightforward to handle the kinship correlation on the time-dependent trajectories of a phenotype. We introduce a 2-level hierarchical linear model to separately assess the genetic associations with the mean level and the rate of change of a phenotype, integrating kinship correlation in the analysis. We apply our method to the Genetic Analysis Workshop 18 genome-wide association studies data on chromosome 3 to estimate the genetic effects on systolic blood pressure measured over time in large pedigrees. Our method identifies genetic variants associated with blood pressure with estimated inflation factors of 0.99, suggesting that our modeling of random effects efficiently handles the genetic relatedness in pedigrees. Application to simulated data captures important variants specified in the simulation. Our results show that the method is useful for genetic association studies in related samples using longitudinal design.

  14. Robust cluster analysis and variable selection

    CERN Document Server

    Ritter, Gunter

    2014-01-01

    Clustering remains a vibrant area of research in statistics. Although there are many books on this topic, there are relatively few that are well founded in the theoretical aspects. In Robust Cluster Analysis and Variable Selection, Gunter Ritter presents an overview of the theory and applications of probabilistic clustering and variable selection, synthesizing the key research results of the last 50 years. The author focuses on the robust clustering methods he found to be the most useful on simulated data and real-time applications. The book provides clear guidance for the varying needs of bot

  15. ASteCA - Automated Stellar Cluster Analysis

    CERN Document Server

    Perren, Gabriel I; Piatti, Andrés E

    2014-01-01

    We present ASteCA (Automated Stellar Cluster Analysis), a suit of tools designed to fully automatize the standard tests applied on stellar clusters to determine their basic parameters. The set of functions included in the code make use of positional and photometric data to obtain precise and objective values for a given cluster's center coordinates, radius, luminosity function and integrated color magnitude, as well as characterizing through a statistical estimator its probability of being a true physical cluster rather than a random overdensity of field stars. ASteCA incorporates a Bayesian field star decontamination algorithm capable of assigning membership probabilities using photometric data alone. An isochrone fitting process based on the generation of synthetic clusters from theoretical isochrones and selection of the best fit through a genetic algorithm is also present, which allows ASteCA to provide accurate estimates for a cluster's metallicity, age, extinction and distance values along with its unce...

  16. Cluster analysis for computer workload evaluation

    CERN Document Server

    Landau, K

    1976-01-01

    An introduction to computer workload analysis is given, showing its range of application in computer centre management, system and application programming. Cluster methods are discussed which can be used in conjunction with workload data and cluster algorithms are adapted to the specific set problem. Several samples of CDC 7600- accounting-data-collected at CERN, the European Organization for Nuclear Research-underwent a cluster analysis to determine job groups. The conclusions from resource usage of typical job groups in relation to computer workload analysis are discussed. (17 refs).

  17. Hierarchical Modelling of Flood Risk for Engineering Decision Analysis

    DEFF Research Database (Denmark)

    Custer, Rocco

    Societies around the world are faced with flood risk, prompting authorities and decision makers to manage risk to protect population and assets. With climate change, urbanisation and population growth, flood risk changes constantly, requiring flood risk management strategies that are flexible...... and robust. Traditional risk management solutions, e.g. dike construction, are not particularly flexible, as they are difficult to adapt to changing risk. Conversely, the recent concept of integrated flood risk management, entailing a combination of several structural and non-structural risk management...... measures, allows identifying flexible and robust flood risk management strategies. Based on it, this thesis investigates hierarchical flood protection systems, which encompass two, or more, hierarchically integrated flood protection structures on different spatial scales (e.g. dikes, local flood barriers...

  18. A spatial analysis of hierarchical waste transport structures under growing demand.

    Science.gov (United States)

    Tanguy, Audrey; Glaus, Mathias; Laforest, Valérie; Villot, Jonathan; Hausler, Robert

    2016-10-01

    The design of waste management systems rarely accounts for the spatio-temporal evolution of the demand. However, recent studies suggest that this evolution affects the planning of waste management activities like the choice and location of treatment facilities. As a result, the transport structure could also be affected by these changes. The objective of this paper is to study the influence of the spatio-temporal evolution of the demand on the strategic planning of a waste transport structure. More particularly this study aims at evaluating the effect of varying spatial parameters on the economic performance of hierarchical structures (with one transfer station). To this end, three consecutive generations of three different spatial distributions were tested for hierarchical and non-hierarchical transport structures based on costs minimization. Results showed that a hierarchical structure is economically viable for large and clustered spatial distributions. The distance parameter was decisive but the loading ratio of trucks and the formation of clusters of sources also impacted the attractiveness of the transfer station. Thus the territories' morphology should influence strategies as regards to the installation of transfer stations. The use of spatial-explicit tools such as the transport model presented in this work that take into account the territory's evolution are needed to help waste managers in the strategic planning of waste transport structures.

  19. Cluster analysis of multiple planetary flow regimes

    Science.gov (United States)

    Mo, Kingtse; Ghil, Michael

    1988-01-01

    A modified cluster analysis method developed for the classification of quasi-stationary events into a few planetary flow regimes and for the examination of transitions between these regimes is described. The method was applied first to a simple deterministic model and then to a 500-mbar data set for Northern Hemisphere (NH), for which cluster analysis was carried out in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters were found in the low-frequency band of more than 10 days, while transient clusters were found in the band-pass frequency window between 2.5 and 6 days. In the low-frequency band, three pairs of clusters determined EOFs 1, 2, and 3, respectively; they exhibited well-known regional features, such as blocking, the Pacific/North American pattern, and wave trains. Both model and low-pass data exhibited strong bimodality.

  20. [Cluster analysis and its application].

    Science.gov (United States)

    Půlpán, Zdenĕk

    2002-01-01

    The study exploits knowledge-oriented and context-based modification of well-known algorithms of (fuzzy) clustering. The role of fuzzy sets is inherently inclined towards coping with linguistic domain knowledge also. We try hard to obtain from rich diverse data and knowledge new information about enviroment that is being explored.

  1. Cluster Analysis of Adolescent Blogs

    Science.gov (United States)

    Liu, Eric Zhi-Feng; Lin, Chun-Hung; Chen, Feng-Yi; Peng, Ping-Chuan

    2012-01-01

    Emerging web applications and networking systems such as blogs have become popular, and they offer unique opportunities and environments for learners, especially for adolescent learners. This study attempts to explore the writing styles and genres used by adolescents in their blogs by employing content, factor, and cluster analyses. Factor…

  2. 大连市寿险原保险保费收入预测实证分析——基于最小二乘法的谱系聚类马氏链模型%Empirical analysis of forecast premium income of Dalian primary life insurance——based on least-squares method 's hierarchical clustering and Markov chain model

    Institute of Scientific and Technical Information of China (English)

    王鑫

    2012-01-01

    针对保费收入预测问题,以最小二乘法拟合为依托,基于谱系聚类分析的方法,运用马氏链模型对2008-2011年大连市人寿保险月度原保险保费收入的数据进行实证模拟仿真,采用定量分析的方法对大连市人寿保险月度原保险保费收入进行定性预测,结果表明该方法在进行定性预测时预测结果比较准确。%In view of problems in insurance premium income prediction,based on Least-squares,hierarchical clustering analysis and Markov chain,an empirical simulation research was made of monthly premium income of Dalian's primary life insurance during 2008-2011.Quantitive analysis was made of the same data for qualitative prediction.The results show that this method is fairly accurate in qualitative prediction.

  3. 建筑物层次空间聚类方法研究%Hierarchical spatial clustering of buildings

    Institute of Scientific and Technical Information of China (English)

    邓敏; 孙前虎; 文小岳; 徐枫

    2011-01-01

    建筑物空间聚类是实现居民地地图自动综合的有效方法.基于图论和Gestalt原理,发展了一种层次的建筑物聚类方法.该方法可以深层次地挖掘建筑物图形的视觉特性,将面状地物信息充分合理地表达在聚类结果中.依据视觉感知原理,借助Dealaunay三角网构建方法,分析了地图上建筑物的自身形状特性和相互间的邻接关系,并依据建筑物间的可视区域均值距离建立了加权邻近结构图,确定了建筑物的邻近关系(定性约束).根据Gestalt准则将邻近性、方向性和几何特征等量化为旋转卡壳距离约束和几何相似度约束.通过实例验证了层次聚类方法得到更加符合人类认知的建筑物聚类结果.%Spatial clustering provides an effective approach for generalization of residential area in automated cartographic generalization.Based on graph theory and Gestalt principle, a hierarchical approach is proposed in this paper.This approach can be utilized to discover the graphical structure formed by buildings, which is obtained with the consideration of shape, size and neighboring relations.The neighboring relations are determined by Dclaunay triangulation, which is a qualitative constraint among buildings.A weighted neighboring structural graph is obtained by setting visual distance as the weight of the linking edge between adjacent buildings.Two levels of quantitative constraints are developed by considering the Gestalt factors, I.e.proximity, orientation and geometry of buildings.One is the rotating calipers minimum distance;the other is the geometric similarity measure.Through experiments it is illustrated that the results by the hierarchical spatial clustering proposed in this paper are consistent with human perception.

  4. Cluster Analysis of the Malaysian Hipposideros

    Science.gov (United States)

    Sazali, Siti Nurlydia; Laman, Charlie J.; Abdullah, M. T.

    2008-01-01

    A preliminary study on the morphometric variations among species in the genus Hipposideros was conducted using voucher specimens from the Universiti Malaysia Sarawak (UNIMAS) Zoological Museum and the Department of Wildlife and National Park (DWNP) Kuala Lumpur. A total of 24 individuals from six species of this genus were morphologically studied where all related measurements of body, skull and dental were measured and recorded. The statistical data subjected to the cluster analysis shows that the genus Hipposideros is divided into two major clusters where each species was clearly separated. The cluster analysis among Hipposideros species is useful for aiding in species identification.

  5. Using cluster analysis to explore survey data.

    Science.gov (United States)

    Spencer, Llinos; Roberts, Gwerfyl; Irvine, Fiona; Jones, Peter; Baker, Colin

    2007-01-01

    Llinos Haf Spencer reports on the use of the cluster analysis statistical technique in nursing research and uses data from the Welsh Language Awareness in Healthcare Provision in Wales survey as an exemplar She concludes that cluster analysis is a valuable tool to tease out patterns in data that are not initially evident in bivariate analyses and thus should be considered as a viable option for nursing research.

  6. Analytical relations concerning the collapse time in hierarchically clustered cosmological models

    CERN Document Server

    Gambera, M

    1997-01-01

    By means of numerical methods, we solve the equations of motion for the collapse of a shell of baryonic matter, made of galaxies and substructure falling into the central regions of a cluster of galaxies, taking into account the effect of the dynamical friction. The parameters on which the dynamical friction mainly depends are: the peaks' height, the number of peaks inside a protocluster multiplied by the correlation function evaluated at the origin, the filtering radius and the nucleus radius of the protocluster of galaxies. We show how the collapse time (Tau) of the shell depends on these parameters. We give a formula that links the dynamical friction coefficient (Eta) o the parameters mentioned above and an analytic relation between the collapse time and (Eta). Finally, we obtain an analytical relation between (Tau) and the mean overdensity (mean Delta) within the shell. All the analytical relations that we find are in excellent agreement with the numerical integration.

  7. Cluster Analysis and Clinical Asthma Phenotypes

    Science.gov (United States)

    Shaw, Dominic E.; Berry, Michael A.; Thomas, Michael; Brightling, Christopher E.; Wardlaw, Andrew J.

    2014-01-01

    Rationale Heterogeneity in asthma expression is multidimensional, including variability in clinical, physiologic, and pathologic parameters. Classification requires consideration of these disparate domains in a unified model. Objectives To explore the application of a multivariate mathematical technique, k-means cluster analysis, for identifying distinct phenotypic groups. Methods We performed k-means cluster analysis in three independent asthma populations. Clusters of a population managed in primary care (n = 184) with predominantly mild to moderate disease, were compared with a refractory asthma population managed in secondary care (n = 187). We then compared differences in asthma outcomes (exacerbation frequency and change in corticosteroid dose at 12 mo) between clusters in a third population of 68 subjects with predominantly refractory asthma, clustered at entry into a randomized trial comparing a strategy of minimizing eosinophilic inflammation (inflammation-guided strategy) with standard care. Measurements and Main Results Two clusters (early-onset atopic and obese, noneosinophilic) were common to both asthma populations. Two clusters characterized by marked discordance between symptom expression and eosinophilic airway inflammation (early-onset symptom predominant and late-onset inflammation predominant) were specific to refractory asthma. Inflammation-guided management was superior for both discordant subgroups leading to a reduction in exacerbation frequency in the inflammation-predominant cluster (3.53 [SD, 1.18] vs. 0.38 [SD, 0.13] exacerbation/patient/yr, P = 0.002) and a dose reduction of inhaled corticosteroid in the symptom-predominant cluster (mean difference, 1,829 μg beclomethasone equivalent/d [95% confidence interval, 307–3,349 μg]; P = 0.02). Conclusions Cluster analysis offers a novel multidimensional approach for identifying asthma phenotypes that exhibit differences in clinical response to treatment algorithms. PMID:18480428

  8. Choosing appropriate analysis methods for cluster randomised cross-over trials with a binary outcome.

    Science.gov (United States)

    Morgan, Katy E; Forbes, Andrew B; Keogh, Ruth H; Jairath, Vipul; Kahan, Brennan C

    2017-01-30

    In cluster randomised cross-over (CRXO) trials, clusters receive multiple treatments in a randomised sequence over time. In such trials, there is usual correlation between patients in the same cluster. In addition, within a cluster, patients in the same period may be more similar to each other than to patients in other periods. We demonstrate that it is necessary to account for these correlations in the analysis to obtain correct Type I error rates. We then use simulation to compare different methods of analysing a binary outcome from a two-period CRXO design. Our simulations demonstrated that hierarchical models without random effects for period-within-cluster, which do not account for any extra within-period correlation, performed poorly with greatly inflated Type I errors in many scenarios. In scenarios where extra within-period correlation was present, a hierarchical model with random effects for cluster and period-within-cluster only had correct Type I errors when there were large numbers of clusters; with small numbers of clusters, the error rate was inflated. We also found that generalised estimating equations did not give correct error rates in any scenarios considered. An unweighted cluster-level summary regression performed best overall, maintaining an error rate close to 5% for all scenarios, although it lost power when extra within-period correlation was present, especially for small numbers of clusters. Results from our simulation study show that it is important to model both levels of clustering in CRXO trials, and that any extra within-period correlation should be accounted for. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  9. A novel symptom cluster analysis among ambulatory HIV/AIDS patients in Uganda.

    Science.gov (United States)

    Namisango, Eve; Harding, Richard; Katabira, Elly T; Siegert, Richard J; Powell, Richard A; Atuhaire, Leonard; Moens, Katrien; Taylor, Steve

    2015-01-01

    Symptom clusters are gaining importance given HIV/AIDS patients experience multiple, concurrent symptoms. This study aimed to: determine clusters of patients with similar symptom combinations; describe symptom combinations distinguishing the clusters; and evaluate the clusters regarding patient socio-demographic, disease and treatment characteristics, quality of life (QOL) and functional performance. This was a cross-sectional study of 302 adult HIV/AIDS outpatients consecutively recruited at two teaching and referral hospitals in Uganda. Socio-demographic and seven-day period symptom prevalence and distress data were self-reported using the Memorial Symptom Assessment Schedule. QOL was assessed using the Medical Outcome Scale and functional performance using the Karnofsky Performance Scale. Symptom clusters were established using hierarchical cluster analysis with squared Euclidean distances using Ward's clustering methods based on symptom occurrence. Analysis of variance compared clusters on mean QOL and functional performance scores. Patient subgroups were categorised based on symptom occurrence rates. Five symptom occurrence clusters were identified: Cluster 1 (n=107), high-low for sensory discomfort and eating difficulties symptoms; Cluster 2 (n=47), high-low for psycho-gastrointestinal symptoms; Cluster 3 (n=71), high for pain and sensory disturbance symptoms; Cluster 4 (n=35), all high for general HIV/AIDS symptoms; and Cluster 5 (n=48), all low for mood-cognitive symptoms. The all high occurrence cluster was associated with worst functional status, poorest QOL scores and highest symptom-associated distress. Use of antiretroviral therapy was associated with all high symptom occurrence rate (Fisher's exact=4, Pcluster (Fisher's exact=41, Pclusters have a differential, affect HIV/AIDS patients' self-reported outcomes, with the subgroup experiencing high-symptom occurrence rates having a higher risk of poorer outcomes. Identification of symptom clusters could

  10. 一种分层分簇的组密钥管理方案%A HIERARCHICAL CLUSTERING-BASED GROUP KEY MANAGEMENT SCHEME

    Institute of Scientific and Technical Information of China (English)

    李珍格; 游林

    2014-01-01

    为了满足无线传感器网络组通信的安全,提出一种分层分簇的组密钥管理方案。该方案采用分层的体系结构,将组中节点分为管理层和普通层。BS通过构造特殊的组密钥多项式更新普通层组密钥,而管理层则采用二元单向函数进行组密钥的协商。分析表明,该方案很好满足了无线传感器网络中组密钥管理的前向安全性,后向安全性,并且减小了存储开销、计算开销和通信开销。%In this paper,a hierarchical clustering-based group key management scheme is proposed in order to satisfy the secure group communication in wireless sensor network.The proposed scheme adopts the hierarchical architecture and divides the nodes in the group into master-node layer and terminal layer.The group key of terminal layer is updated by constructing a special group key polynomial in BS,and the binary one-way function is used by the master-node layer for group key negotiation.Analysis demonstrates that the scheme well satisfies the forward security and backward security of the group key management in WSN,and reduces the storage overhead,computation overhead and communication overhead as well.

  11. Identification and validation of asthma phenotypes in Chinese population using cluster analysis.

    Science.gov (United States)

    Wang, Lei; Liang, Rui; Zhou, Ting; Zheng, Jing; Liang, Bing Miao; Zhang, Hong Ping; Luo, Feng Ming; Gibson, Peter G; Wang, Gang

    2017-08-30

    Asthma is a heterogeneous airway disease, so it is crucial to clearly identify clinical phenotypes to achieve better asthma management. To identify and prospectively validate asthma clusters in a Chinese population. Two hundred eighty-four patients were consecutively recruited and 18 sociodemographic and clinical variables were collected. Hierarchical cluster analysis was performed by the Ward method followed by k-means cluster analysis. Then, a prospective 12-month cohort study was used to validate the identified clusters. Five clusters were successfully identified. Clusters 1 (n = 71) and 3 (n = 81) were mild asthma phenotypes with slight airway obstruction and low exacerbation risk, but with a sex differential. Cluster 2 (n = 65) described an "allergic" phenotype, cluster 4 (n = 33) featured a "fixed airflow limitation" phenotype with smoking, and cluster 5 (n = 34) was a "low socioeconomic status" phenotype. Patients in clusters 2, 4, and 5 had distinctly lower socioeconomic status and more psychological symptoms. Cluster 2 had a significantly increased risk of exacerbations (risk ratio [RR] 1.13, 95% confidence interval [CI] 1.03-1.25), unplanned visits for asthma (RR 1.98, 95% CI 1.07-3.66), and emergency visits for asthma (RR 7.17, 95% CI 1.26-40.80). Cluster 4 had an increased risk of unplanned visits (RR 2.22, 95% CI 1.02-4.81), and cluster 5 had increased emergency visits (RR 12.72, 95% CI 1.95-69.78). Kaplan-Meier analysis confirmed that cluster grouping was predictive of time to the first asthma exacerbation, unplanned visit, emergency visit, and hospital admission (P clusters as "allergic asthma," "fixed airflow limitation," and "low socioeconomic status" phenotypes that are at high risk of severe asthma exacerbations and that have management implications for clinical practice in developing countries. Copyright © 2017 American College of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  12. Formation of an O-Star Cluster by Hierarchical Accretion in G20.08-0.14 N

    CERN Document Server

    Galván-Madrid, Roberto; Zhang, Qizhou; Kurtz, Stan; Rodríguez, Luis F; Ho, Paul T P

    2009-01-01

    Spectral line and continuum observations of the ionized and molecular gas in G20.08-0.14 N explore the dynamics of accretion over a range of spatial scales in this massive star forming region. Very Large Array observations of NH_3 at 4'' angular resolution show a large scale (0.5 pc) molecular accretion flow around and into a star cluster with three small, bright HII regions. Higher resolution (0.4'') observations with the Submillimeter Array in hot core molecules (CH_3CN, OCS, and SO_2) and the VLA in NH_3, show that the two brightest and smallest HII regions are themselves surrounded by smaller scale (0.05 pc) accretion flows. The axes of rotation of the large and small scale flows are aligned, and the time scale for the contraction of the cloud is short enough, 0.1 Myr, for the large scale accretion flow to deliver significant mass to the smaller scales within the star formation time scale. The flow structure appears to be continuous and hierarchical from larger to smaller scales. Millimeter radio recombin...

  13. Cluster Analysis in Patients with GOLD 1 Chronic Obstructive Pulmonary Disease.

    Directory of Open Access Journals (Sweden)

    Philippe Gagnon

    Full Text Available We hypothesized that heterogeneity exists within the Global Initiative for Chronic Obstructive Lung Disease (GOLD 1 spirometric category and that different subgroups could be identified within this GOLD category.Pre-randomization study participants from two clinical trials were symptomatic/asymptomatic GOLD 1 chronic obstructive pulmonary disease (COPD patients and healthy controls. A hierarchical cluster analysis used pre-randomization demographics, symptom scores, lung function, peak exercise response and daily physical activity levels to derive population subgroups.Considerable heterogeneity existed for clinical variables among patients with GOLD 1 COPD. All parameters, except forced expiratory volume in 1 second (FEV1/forced vital capacity (FVC, had considerable overlap between GOLD 1 COPD and controls. Three-clusters were identified: cluster I (18 [15%] COPD patients; 105 [85%] controls; cluster II (45 [80%] COPD patients; 11 [20%] controls; and cluster III (22 [92%] COPD patients; 2 [8%] controls. Apart from reduced diffusion capacity and lower baseline dyspnea index versus controls, cluster I COPD patients had otherwise preserved lung volumes, exercise capacity and physical activity levels. Cluster II COPD patients had a higher smoking history and greater hyperinflation versus cluster I COPD patients. Cluster III COPD patients had reduced physical activity versus controls and clusters I and II COPD patients, and lower FEV1/FVC versus clusters I and II COPD patients.The results emphasize heterogeneity within GOLD 1 COPD, supporting an individualized therapeutic approach to patients.www.clinicaltrials.gov. NCT01360788 and NCT01072396.

  14. Clustering analysis of telecommunication customers

    Institute of Scientific and Technical Information of China (English)

    REN Hong; ZHENG Yan; WU Ye-rong

    2009-01-01

    In this article, a clustering method based on genetic algorithm (GA) for telecommunication customer subdivision is presented. First, the features of telecommunication customers (such as the calling behavior and consuming behavior) are extracted. Second, the similarities between the multidimensional feature vectors of telecommunication customers are computed and mapped as the distance between samples on a two-dimensional plane. Finally, the distances are adjusted to approximate the similarities gradually by GA. One advantage of this method is the independent distribution of the sample space. The experiments demonstrate the feasibility of the proposed method.

  15. Hierarchical Network Design

    DEFF Research Database (Denmark)

    Thomadsen, Tommy

    2005-01-01

    Communication networks are immensely important today, since both companies and individuals use numerous services that rely on them. This thesis considers the design of hierarchical (communication) networks. Hierarchical networks consist of layers of networks and are well-suited for coping...... the clusters. The design of hierarchical networks involves clustering of nodes, hub selection, and network design, i.e. selection of links and routing of ows. Hierarchical networks have been in use for decades, but integrated design of these networks has only been considered for very special types of networks....... The thesis investigates models for hierarchical network design and methods used to design such networks. In addition, ring network design is considered, since ring networks commonly appear in the design of hierarchical networks. The thesis introduces hierarchical networks, including a classification scheme...

  16. Principal factor and hierarchical cluster analyses for the performance assessment of an urban wastewater treatment plant in the Southeast of Spain.

    Science.gov (United States)

    Bayo, Javier; López-Castellanos, Joaquín

    2016-07-01

    Process performance and operation of wastewater treatment plants (WWTP) are carried out to ensure their compliance with legislative requirements imposed by European Union. Because a high amount of variables are daily measured, a coherent and structured approach of such a system is required to understand its inherent behavior and performance efficiency. In this sense, both principal factor analysis (PFA) and hierarchical cluster analysis (HCA) are multivariate techniques that have been widely applied to extract and structure information for different purposes. In this paper, both statistical tools are applied in an urban WWTP situated in the Southeast of Spain, a zone with special characteristics related to the geochemical background composition of water and an important use of fertilizers. Four main factors were extracted in association with nutrients, the ionic component, the organic load to the WWTP, and the efficiency of the whole process. HCA allowed distinguish between influent and effluent parameters, although a deeper examination resulted in a dendrogram with groupings similar to those previously reported for PFA.

  17. Partial least square and hierarchical clustering in ADMET modeling: prediction of blood-brain barrier permeation of α-adrenergic and imidazoline receptor ligands.

    Science.gov (United States)

    Nikolic, Katarina; Filipic, Slavica; Smoliński, Adam; Kaliszan, Roman; Agbaba, Danica

    2013-01-01

    PURPOSE. Rate of brain penetration (logPS), brain/plasma equilibration rate (logPS-brain), and extent of blood-brain barrier permeation (logBB) of 29 α-adrenergic and imidazoline-receptors ligands were examined in Quantitative-Structure-Property Relationship (QSPR) study. METHODS. Experimentally determined chromatographic retention data (logKw at pH 4.4, slope (S) at pH 4.4, logKw at pH 7.4, slope (S) at pH 7.4, logKw at pH 9.1, and slope (S) at pH 9.1) and capillary electrophoresis migration parameters (μeff at pH 4.4, μeff at pH 7.4, and μeff at pH 9.1), together with calculated molecular descriptors, were used as independent variables in the QSPR study by use of partial least square (PLS) methodology. RESULTS. Predictive potential of the formed QSPR models, QSPR(logPS), QSPR(logPS-brain), QSPR(logBB), was confirmed by cross- and external validation. Hydrophilicity (Hy) and H-indices (H7m) were selected as significant parameters negatively correlated with both logPS and logPS-brain, while topological polar surface area (TPSA(NO)) was chosen as molecular descriptor negatively correlated with both logPS and logBB. The principal component analysis (PCA) and hierarchical clustering analysis (HCA) were applied to cluster examined drugs based on their chromatographic, electrophoretic and molecular properties. Significant positive correlations were obtained between the slope (S) at pH 7.4 and logBB in A/B cluster and between the logKw at pH 9.1 and logPS in C/D cluster. CONCLUSIONS. Results of the QSPR, clustering and correlation studies could be used as novel tool for evaluation of blood-brain barrier permeation of related α-adrenergic/imidazoline receptor ligands.This article is open to POST-PUBLICATION REVIEW. Registered readers (see "For Readers") may comment by clicking on ABSTRACT on the issue's contents page.PURPOSE. Rate of brain penetration (logPS), brain/plasma equilibration rate (logPS-brain), and extent of blood-brain barrier permeation (logBB) of 29

  18. Hierarchical linear model: thinking outside the traditional repeated-measures analysis-of-variance box.

    Science.gov (United States)

    Lininger, Monica; Spybrook, Jessaca; Cheatham, Christopher C

    2015-04-01

    Longitudinal designs are common in the field of athletic training. For example, in the Journal of Athletic Training from 2005 through 2010, authors of 52 of the 218 original research articles used longitudinal designs. In 50 of the 52 studies, a repeated-measures analysis of variance was used to analyze the data. A possible alternative to this approach is the hierarchical linear model, which has been readily accepted in other medical fields. In this short report, we demonstrate the use of the hierarchical linear model for analyzing data from a longitudinal study in athletic training. We discuss the relevant hypotheses, model assumptions, analysis procedures, and output from the HLM 7.0 software. We also examine the advantages and disadvantages of using the hierarchical linear model with repeated measures and repeated-measures analysis of variance for longitudinal data.

  19. Hierarchical Linear Model: Thinking Outside the Traditional Repeated-Measures Analysis-of-Variance Box

    Science.gov (United States)

    Lininger, Monica; Spybrook, Jessaca; Cheatham, Christopher C.

    2015-01-01

    Longitudinal designs are common in the field of athletic training. For example, in the Journal of Athletic Training from 2005 through 2010, authors of 52 of the 218 original research articles used longitudinal designs. In 50 of the 52 studies, a repeated-measures analysis of variance was used to analyze the data. A possible alternative to this approach is the hierarchical linear model, which has been readily accepted in other medical fields. In this short report, we demonstrate the use of the hierarchical linear model for analyzing data from a longitudinal study in athletic training. We discuss the relevant hypotheses, model assumptions, analysis procedures, and output from the HLM 7.0 software. We also examine the advantages and disadvantages of using the hierarchical linear model with repeated measures and repeated-measures analysis of variance for longitudinal data. PMID:25875072

  20. Filtering Genes for Cluster and Network Analysis

    Directory of Open Access Journals (Sweden)

    Parkhomenko Elena

    2009-06-01

    Full Text Available Abstract Background Prior to cluster analysis or genetic network analysis it is customary to filter, or remove genes considered to be irrelevant from the set of genes to be analyzed. Often genes whose variation across samples is less than an arbitrary threshold value are deleted. This can improve interpretability and reduce bias. Results This paper introduces modular models for representing network structure in order to study the relative effects of different filtering methods. We show that cluster analysis and principal components are strongly affected by filtering. Filtering methods intended specifically for cluster and network analysis are introduced and compared by simulating modular networks with known statistical properties. To study more realistic situations, we analyze simulated "real" data based on well-characterized E. coli and S. cerevisiae regulatory networks. Conclusion The methods introduced apply very generally, to any similarity matrix describing gene expression. One of the proposed methods, SUMCOV, performed well for all models simulated.

  1. Unsupervised Transient Light Curve Analysis Via Hierarchical Bayesian Inference

    CERN Document Server

    Sanders, Nathan; Soderberg, Alicia

    2014-01-01

    Historically, light curve studies of supernovae (SNe) and other transient classes have focused on individual objects with copious and high signal-to-noise observations. In the nascent era of wide field transient searches, objects with detailed observations are decreasing as a fraction of the overall known SN population, and this strategy sacrifices the majority of the information contained in the data about the underlying population of transients. A population level modeling approach, simultaneously fitting all available observations of objects in a transient sub-class of interest, fully mines the data to infer the properties of the population and avoids certain systematic biases. We present a novel hierarchical Bayesian statistical model for population level modeling of transient light curves, and discuss its implementation using an efficient Hamiltonian Monte Carlo technique. As a test case, we apply this model to the Type IIP SN sample from the Pan-STARRS1 Medium Deep Survey, consisting of 18,837 photometr...

  2. UNSUPERVISED TRANSIENT LIGHT CURVE ANALYSIS VIA HIERARCHICAL BAYESIAN INFERENCE

    Energy Technology Data Exchange (ETDEWEB)

    Sanders, N. E.; Soderberg, A. M. [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138 (United States); Betancourt, M., E-mail: nsanders@cfa.harvard.edu [Department of Statistics, University of Warwick, Coventry CV4 7AL (United Kingdom)

    2015-02-10

    Historically, light curve studies of supernovae (SNe) and other transient classes have focused on individual objects with copious and high signal-to-noise observations. In the nascent era of wide field transient searches, objects with detailed observations are decreasing as a fraction of the overall known SN population, and this strategy sacrifices the majority of the information contained in the data about the underlying population of transients. A population level modeling approach, simultaneously fitting all available observations of objects in a transient sub-class of interest, fully mines the data to infer the properties of the population and avoids certain systematic biases. We present a novel hierarchical Bayesian statistical model for population level modeling of transient light curves, and discuss its implementation using an efficient Hamiltonian Monte Carlo technique. As a test case, we apply this model to the Type IIP SN sample from the Pan-STARRS1 Medium Deep Survey, consisting of 18,837 photometric observations of 76 SNe, corresponding to a joint posterior distribution with 9176 parameters under our model. Our hierarchical model fits provide improved constraints on light curve parameters relevant to the physical properties of their progenitor stars relative to modeling individual light curves alone. Moreover, we directly evaluate the probability for occurrence rates of unseen light curve characteristics from the model hyperparameters, addressing observational biases in survey methodology. We view this modeling framework as an unsupervised machine learning technique with the ability to maximize scientific returns from data to be collected by future wide field transient searches like LSST.

  3. Genomic analysis of the hierarchical structure of regulatory networks

    Science.gov (United States)

    Yu, Haiyuan; Gerstein, Mark

    2006-01-01

    A fundamental question in biology is how the cell uses transcription factors (TFs) to coordinate the expression of thousands of genes in response to various stimuli. The relationships between TFs and their target genes can be modeled in terms of directed regulatory networks. These relationships, in turn, can be readily compared with commonplace “chain-of-command” structures in social networks, which have characteristic hierarchical layouts. Here, we develop algorithms for identifying generalized hierarchies (allowing for various loop structures) and use these approaches to illuminate extensive pyramid-shaped hierarchical structures existing in the regulatory networks of representative prokaryotes (Escherichia coli) and eukaryotes (Saccharomyces cerevisiae), with most TFs at the bottom levels and only a few master TFs on top. These masters are situated near the center of the protein–protein interaction network, a different type of network from the regulatory one, and they receive most of the input for the whole regulatory hierarchy through protein interactions. Moreover, they have maximal influence over other genes, in terms of affecting expression-level changes. Surprisingly, however, TFs at the bottom of the regulatory hierarchy are more essential to the viability of the cell. Finally, one might think master TFs achieve their wide influence through directly regulating many targets, but TFs with most direct targets are in the middle of the hierarchy. We find, in fact, that these midlevel TFs are “control bottlenecks” in the hierarchy, and this great degree of control for “middle managers” has parallels in efficient social structures in various corporate and governmental settings. PMID:17003135

  4. Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient.

    Science.gov (United States)

    Yao, Jianchao; Chang, Chunqi; Salmi, Mari L; Hung, Yeung Sam; Loraine, Ann; Roux, Stanley J

    2008-06-18

    Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. This study shows that SCC is an alternative to the Pearson

  5. Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient

    Directory of Open Access Journals (Sweden)

    Loraine Ann

    2008-06-01

    Full Text Available Abstract Background Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. Results In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC, that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. Conclusion

  6. Meta-Analysis in Higher Education: An Illustrative Example Using Hierarchical Linear Modeling

    Science.gov (United States)

    Denson, Nida; Seltzer, Michael H.

    2011-01-01

    The purpose of this article is to provide higher education researchers with an illustrative example of meta-analysis utilizing hierarchical linear modeling (HLM). This article demonstrates the step-by-step process of meta-analysis using a recently-published study examining the effects of curricular and co-curricular diversity activities on racial…

  7. Meta-Analysis in Higher Education: An Illustrative Example Using Hierarchical Linear Modeling

    Science.gov (United States)

    Denson, Nida; Seltzer, Michael H.

    2011-01-01

    The purpose of this article is to provide higher education researchers with an illustrative example of meta-analysis utilizing hierarchical linear modeling (HLM). This article demonstrates the step-by-step process of meta-analysis using a recently-published study examining the effects of curricular and co-curricular diversity activities on racial…

  8. Augmenting Visual Analysis in Single-Case Research with Hierarchical Linear Modeling

    Science.gov (United States)

    Davis, Dawn H.; Gagne, Phill; Fredrick, Laura D.; Alberto, Paul A.; Waugh, Rebecca E.; Haardorfer, Regine

    2013-01-01

    The purpose of this article is to demonstrate how hierarchical linear modeling (HLM) can be used to enhance visual analysis of single-case research (SCR) designs. First, the authors demonstrated the use of growth modeling via HLM to augment visual analysis of a sophisticated single-case study. Data were used from a delayed multiple baseline…

  9. Using Cluster Analysis to Examine Husband-Wife Decision Making

    Science.gov (United States)

    Bonds-Raacke, Jennifer M.

    2006-01-01

    Cluster analysis has a rich history in many disciplines and although cluster analysis has been used in clinical psychology to identify types of disorders, its use in other areas of psychology has been less popular. The purpose of the current experiments was to use cluster analysis to investigate husband-wife decision making. Cluster analysis was…

  10. Cluster analysis in kinetic modelling of the brain: A noninvasive alternative to arterial sampling

    DEFF Research Database (Denmark)

    Liptrot, Matthew George; Adams, K.H.; Martiny, L.

    2004-01-01

    In emission tomography, quantification of brain tracer uptake, metabolism or binding requires knowledge of the cerebral input function. Traditionally, this is achieved with arterial blood sampling. We propose a noninvasive alternative via the use of a blood vessel time-activity curve (TAC......) extracted directly from dynamic positron emission tomography (PET) scans by cluster analysis. Five healthy subjects were injected with the 5HT2A- receptor ligand [18F]-altanserin and blood samples were subsequently taken from the radial artery and cubital vein. Eight regions-of-interest (ROI) TACs were...... extracted from the PET data set. Hierarchical K-means cluster analysis was performed on the PET time series to extract a cerebral vasculature ROI. The number of clusters was varied from K = 1 to 10 for the second of the two-stage method. Determination of the correct number of clusters was performed...

  11. Clustering analysis of seismicity and aftershock identification.

    Science.gov (United States)

    Zaliapin, Ilya; Gabrielov, Andrei; Keilis-Borok, Vladimir; Wong, Henry

    2008-07-01

    We introduce a statistical methodology for clustering analysis of seismicity in the time-space-energy domain and use it to establish the existence of two statistically distinct populations of earthquakes: clustered and nonclustered. This result can be used, in particular, for nonparametric aftershock identification. The proposed approach expands the analysis of Baiesi and Paczuski [Phys. Rev. E 69, 066106 (2004)10.1103/PhysRevE.69.066106] based on the space-time-magnitude nearest-neighbor distance eta between earthquakes. We show that for a homogeneous Poisson marked point field with exponential marks, the distance eta has the Weibull distribution, which bridges our results with classical correlation analysis for point fields. The joint 2D distribution of spatial and temporal components of eta is used to identify the clustered part of a point field. The proposed technique is applied to several seismicity models and to the observed seismicity of southern California.

  12. Cosmic queuing: galaxy satellites, building blocks and the hierarchical clustering paradigm

    CERN Document Server

    Lagos, Claudia del P; Cora, Sofia A

    2009-01-01

    We study the properties of building blocks (BBs, i.e. accreted satellites) and surviving satellites of present-day galaxies using the SAG semi-analytic model of galaxy formation in the context of a concordance Lambda Cold Dark Matter (LCDM) cosmology. We find higher metallicities for BBs, an effect produced by the same processes behind the build-up of the mass-metallicity relation, namely, the higher peak height in the density fluctuation field occupied by BBs and central galaxies which have collapsed into a single object earlier than surviving satellites. A detailed analysis shows that BBs start to form stars earlier, and build-up half of their final stellar mass (measured at the moment of disruption) up to four times faster than surviving satellites. We show that this effect is a consequence of the epoch in which this occurs; BBs assemble their stellar mass mostly during the peak of the merger activity in the LCDM cosmology, whereas surviving satellites keep increasing their stellar masses down to z=1. The ...

  13. Mining Sequential Update Summarization with Hierarchical Text Analysis

    Directory of Open Access Journals (Sweden)

    Chunyun Zhang

    2016-01-01

    Full Text Available The outbreak of unexpected news events such as large human accident or natural disaster brings about a new information access problem where traditional approaches fail. Mostly, news of these events shows characteristics that are early sparse and later redundant. Hence, it is very important to get updates and provide individuals with timely and important information of these incidents during their development, especially when being applied in wireless and mobile Internet of Things (IoT. In this paper, we define the problem of sequential update summarization extraction and present a new hierarchical update mining system which can broadcast with useful, new, and timely sentence-length updates about a developing event. The new system proposes a novel method, which incorporates techniques from topic-level and sentence-level summarization. To evaluate the performance of the proposed system, we apply it to the task of sequential update summarization of temporal summarization (TS track at Text Retrieval Conference (TREC 2013 to compute four measurements of the update mining system: the expected gain, expected latency gain, comprehensiveness, and latency comprehensiveness. Experimental results show that our proposed method has good performance.

  14. Category theoretic analysis of hierarchical protein materials and social networks

    CERN Document Server

    Spivak, David I; Buehler, Markus J

    2011-01-01

    Materials in biology span all the scales from Angstroms to meters and typically consist of complex hierarchical assemblies of simple building blocks. Here we review an application of category theory to describe structural and resulting functional properties of biological protein materials by developing so-called ologs. An olog is like a "concept web" or "semantic network" except that it follows a rigorous mathematical formulation based on category theory. This key difference ensures that an olog is unambiguous, highly adaptable to evolution and change, and suitable for sharing concepts with other ologs. We consider a simple example of an alpha-helical and an amyloid-like protein filament subjected to axial extension and develop an olog representation of their structural and resulting mechanical properties. We also construct a representation of a social network in which people send text-messages to their nearest neighbors and act as a team to perform a task. We show that the olog for the protein and the olog f...

  15. Weighted Clustering

    CERN Document Server

    Ackerman, Margareta; Branzei, Simina; Loker, David

    2011-01-01

    In this paper we investigate clustering in the weighted setting, in which every data point is assigned a real valued weight. We conduct a theoretical analysis on the influence of weighted data on standard clustering algorithms in each of the partitional and hierarchical settings, characterising the precise conditions under which such algorithms react to weights, and classifying clustering methods into three broad categories: weight-responsive, weight-considering, and weight-robust. Our analysis raises several interesting questions and can be directly mapped to the classical unweighted setting.

  16. Exploring the individual patterns of spiritual well-being in people newly diagnosed with advanced cancer: a cluster analysis.

    Science.gov (United States)

    Bai, Mei; Dixon, Jane; Williams, Anna-Leila; Jeon, Sangchoon; Lazenby, Mark; McCorkle, Ruth

    2016-11-01

    Research shows that spiritual well-being correlates positively with quality of life (QOL) for people with cancer, whereas contradictory findings are frequently reported with respect to the differentiated associations between dimensions of spiritual well-being, namely peace, meaning and faith, and QOL. This study aimed to examine individual patterns of spiritual well-being among patients newly diagnosed with advanced cancer. Cluster analysis was based on the twelve items of the 12-item Functional Assessment of Chronic Illness Therapy-Spiritual Well-Being Scale at Time 1. A combination of hierarchical and k-means (non-hierarchical) clustering methods was employed to jointly determine the number of clusters. Self-rated health, depressive symptoms, peace, meaning and faith, and overall QOL were compared at Time 1 and Time 2. Hierarchical and k-means clustering methods both suggested four clusters. Comparison of the four clusters supported statistically significant and clinically meaningful differences in QOL outcomes among clusters while revealing contrasting relations of faith with QOL. Cluster 1, Cluster 3, and Cluster 4 represented high, medium, and low levels of overall QOL, respectively, with correspondingly high, medium, and low levels of peace, meaning, and faith. Cluster 2 was distinguished from other clusters by its medium levels of overall QOL, peace, and meaning and low level of faith. This study provides empirical support for individual difference in response to a newly diagnosed cancer and brings into focus conceptual and methodological challenges associated with the measure of spiritual well-being, which may partly contribute to the attenuated relation between faith and QOL.

  17. Hierarchical multiple bit clusters and patterned media enabled by novel nanofabrication techniques -- High resolution electron beam lithography and block polymer self assembly

    Science.gov (United States)

    Xiao, Qijun

    This thesis discusses the full scope of a project exploring the physics of hierarchical clusters of interacting nanomagnets. These clusters may be relevant for novel applications such as multilevel data storage devices. The work can be grouped into three main activities: micromagnetic simulation, fabrication and characterization of proof-of-concept prototype devices, and efforts to scale down the structures by creating the hierarchical structures with the aid of diblock copolymer self assembly. Theoretical micromagnetic studies and simulations based on Landau-Lifshitz-Gilbert (LLG) equation were conducted on nanoscale single domain magnetic entities. For the simulated nanomagnet clusters with perpendicular uniaxial anisotropy, the simulation showed the switching field distributions, the stability of the magnetostatic states with distinctive total cluster perpendicular moments, and the stepwise magnetic switching curves. For simulated nanomagnet clusters with in-plane shape anisotropy, the simulation showed the stepwise switching behaviors governed by thermal agitation and cluster configurations. Proof-of-concept cluster devices with three interacting Co nanomagnets were fabricated by e-beam lithography (EBL) and pulse-reverse electrochemical deposition (PRECD). EBL patterning on a suspended 100 nm SiN membrane showed improved lateral lithography resolution to 30 nm. The Co nanomagnets deposited using the PRECD method showed perpendicular anisotropy. The switching experiments with external applied fields were able to switch the Co nanomagnets through the four magnetostatic states with distinctive total perpendicular cluster magnetization, and proved the feasibility of multilevel data storage devices based on the cluster concept. Shrinking the structures size was experimented by the aid of diblock copolymer. Thick poly(styrene)-b-poly(methyl methacrylate) (PS-b-PMMA) diblock copolymer templates aligned with external electrical field were used to fabricate long Ni

  18. Hierarchical Linear Modeling for Analysis of Ecological Momentary Assessment Data in Physical Medicine and Rehabilitation Research.

    Science.gov (United States)

    Terhorst, Lauren; Beck, Kelly Battle; McKeon, Ashlee B; Graham, Kristin M; Ye, Feifei; Shiffman, Saul

    2017-08-01

    Ecological momentary assessment (EMA) methods collect real-time data in real-world environments, which allow physical medicine and rehabilitation researchers to examine objective outcome data and reduces bias from retrospective recall. The statistical analysis of EMA data is directly related to the research question and the temporal design of the study. Hierarchical linear modeling, which accounts for multiple observations from the same participant, is a particularly useful approach to analyzing EMA data. The objective of this paper was to introduce the process of conducting hierarchical linear modeling analyses with EMA data. This is accomplished using exemplars from recent physical medicine and rehabilitation literature.

  19. Mean-field analysis of phase transitions in the emergence of hierarchical society

    Science.gov (United States)

    Okubo, Tsuyoshi; Odagaki, Takashi

    2007-09-01

    Emergence of hierarchical society is analyzed by use of a simple agent-based model. We extend the mean-field model of Bonabeau [Physica A 217, 373 (1995)] to societies obeying complex diffusion rules where each individual selects a moving direction following their power rankings. We apply this mean-field analysis to the pacifist society model recently investigated by use of Monte Carlo simulation [Physica A 367, 435 (2006)]. We show analytically that the self-organization of hierarchies occurs in two steps as the individual density is increased and there are three phases: one egalitarian and two hierarchical states. We also highlight that the transition from the egalitarian phase to the first hierarchical phase is a continuous change in the order parameter and the second transition causes a discontinuous jump in the order parameter.

  20. "Sentido de Pertenencia": A Hierarchical Analysis Predicting Sense of Belonging among Latino College Students

    Science.gov (United States)

    Strayhorn, Terrell Lamont

    2008-01-01

    The present study estimated the influence of academic and social collegiate experiences on Latino students' sense of belonging, controlling for background differences, using hierarchical analysis techniques with a nested design. In addition, results were compared between Latino students and their White counterparts. Findings reveal that grades,…

  1. A Hierarchical Linear Model with Factor Analysis Structure at Level 2

    Science.gov (United States)

    Miyazaki, Yasuo; Frank, Kenneth A.

    2006-01-01

    In this article the authors develop a model that employs a factor analysis structure at Level 2 of a two-level hierarchical linear model (HLM). The model (HLM2F) imposes a structure on a deficient rank Level 2 covariance matrix [tau], and facilitates estimation of a relatively large [tau] matrix. Maximum likelihood estimators are derived via the…

  2. Identifying Peer Institutions Using Cluster Analysis

    Science.gov (United States)

    Boronico, Jess; Choksi, Shail S.

    2012-01-01

    The New York Institute of Technology's (NYIT) School of Management (SOM) wishes to develop a list of peer institutions for the purpose of benchmarking and monitoring/improving performance against other business schools. The procedure utilizes relevant criteria for the purpose of establishing this peer group by way of a cluster analysis. The…

  3. Exploring cognitive heterogeneity in first-episode psychosis: What cluster analysis can reveal.

    Science.gov (United States)

    Reser, Maree P; Allott, Kelly A; Killackey, Eóin; Farhall, John; Cotton, Susan M

    2015-10-30

    Variable outcomes in first-episode psychosis (FEP) are partly attributable to heterogeneity in cognitive functioning. To aid identification of those likely to have poorer or better outcomes, we examined whether purported cognitive profiles identified through use of cluster analysis in chronic schizophrenia were evident in FEP. We also aimed to assess whether there was a relationship between cognitive profile and factors independent of the solution, providing external validation that the cognitive profiles represented distinct subgroups. Ward's method hierarchical cluster analysis, verified by a k-means cluster solution, was performed using data obtained from a cognitive test battery administered to 128 participants aged 15-25 years. Four cognitive profiles were identified. A continuity element was evident; participants in cluster four were more cognitively impaired compared to participants in cluster three, who appeared more cognitively intact. Clusters one and two were distinguishable across measures of attention and working memory and visual recognition memory, most likely reflecting sample specific patterns of deficit. Participants in cluster four had significantly lower premorbid and current IQ and higher negative symptoms compared to participants in cluster three. The distinct levels and patterns of cognition found in chronic schizophrenia cohorts are also evident across diagnostic categories in FEP. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  4. [Visual field progression in glaucoma: cluster analysis].

    Science.gov (United States)

    Bresson-Dumont, H; Hatton, J; Foucher, J; Fonteneau, M

    2012-11-01

    Visual field progression analysis is one of the key points in glaucoma monitoring, but distinction between true progression and random fluctuation is sometimes difficult. There are several different algorithms but no real consensus for detecting visual field progression. The trend analysis of global indices (MD, sLV) may miss localized deficits or be affected by media opacities. Conversely, point-by-point analysis makes progression difficult to differentiate from physiological variability, particularly when the sensitivity of a point is already low. The goal of our study was to analyse visual field progression with the EyeSuite™ Octopus Perimetry Clusters algorithm in patients with no significant changes in global indices or worsening of the analysis of pointwise linear regression. We analyzed the visual fields of 162 eyes (100 patients - 58 women, 42 men, average age 66.8 ± 10.91) with ocular hypertension or glaucoma. For inclusion, at least six reliable visual fields per eye were required, and the trend analysis (EyeSuite™ Perimetry) of visual field global indices (MD and SLV), could show no significant progression. The analysis of changes in cluster mode was then performed. In a second step, eyes with statistically significant worsening of at least one of their clusters were analyzed point-by-point with the Octopus Field Analysis (OFA). Fifty four eyes (33.33%) had a significant worsening in some clusters, while their global indices remained stable over time. In this group of patients, more advanced glaucoma was present than in stable group (MD 6.41 dB vs. 2.87); 64.82% (35/54) of those eyes in which the clusters progressed, however, had no statistically significant change in the trend analysis by pointwise linear regression. Most software algorithms for analyzing visual field progression are essentially trend analyses of global indices, or point-by-point linear regression. This study shows the potential role of analysis by clusters trend. However, for best

  5. A Performance-Prediction Model for PIC Applications on Clusters of Symmetric MultiProcessors: Validation with Hierarchical HPF+OpenMP Implementation

    Directory of Open Access Journals (Sweden)

    Sergio Briguglio

    2003-01-01

    Full Text Available A performance-prediction model is presented, which describes different hierarchical workload decomposition strategies for particle in cell (PIC codes on Clusters of Symmetric MultiProcessors. The devised workload decomposition is hierarchically structured: a higher-level decomposition among the computational nodes, and a lower-level one among the processors of each computational node. Several decomposition strategies are evaluated by means of the prediction model, with respect to the memory occupancy, the parallelization efficiency and the required programming effort. Such strategies have been implemented by integrating the high-level languages High Performance Fortran (at the inter-node stage and OpenMP (at the intra-node one. The details of these implementations are presented, and the experimental values of parallelization efficiency are compared with the predicted results.

  6. HIERARCHICAL ADAPTIVE ROOD PATTERN SEARCH FOR MOTION ESTIMATION AT VIDEO SEQUENCE ANALYSIS

    Directory of Open Access Journals (Sweden)

    V. T. Nguyen

    2016-05-01

    Full Text Available Subject of Research.The paper deals with the motion estimation algorithms for the analysis of video sequences in compression standards MPEG-4 Visual and H.264. Anew algorithm has been offered based on the analysis of the advantages and disadvantages of existing algorithms. Method. Thealgorithm is called hierarchical adaptive rood pattern search (Hierarchical ARPS, HARPS. This new algorithm includes the classic adaptive rood pattern search ARPS and hierarchical search MP (Hierarchical search or Mean pyramid. All motion estimation algorithms have been implemented using MATLAB package and tested with several video sequences. Main Results. The criteria for evaluating the algorithms were: speed, peak signal to noise ratio, mean square error and mean absolute deviation. The proposed method showed a much better performance at a comparable error and deviation. The peak signal to noise ratio in different video sequences shows better and worse results than characteristics of known algorithms so it requires further investigation. Practical Relevance. Application of this algorithm in MPEG-4 and H.264 codecs instead of the standard can significantly reduce compression time. This feature enables to recommend it in telecommunication systems for multimedia data storing, transmission and processing.

  7. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations.

    Science.gov (United States)

    Corrigan, Neil; Bankart, Michael J G; Gray, Laura J; Smith, Karen L

    2014-05-24

    There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations include avoidance of cluster merges where

  8. Geographic atrophy phenotype identification by cluster analysis.

    Science.gov (United States)

    Monés, Jordi; Biarnés, Marc

    2017-07-20

    To identify ocular phenotypes in patients with geographic atrophy secondary to age-related macular degeneration (GA) using a data-driven cluster analysis. This was a retrospective analysis of data from a prospective, natural history study of patients with GA who were followed for ≥6 months. Cluster analysis was used to identify subgroups within the population based on the presence of several phenotypic features: soft drusen, reticular pseudodrusen (RPD), primary foveal atrophy, increased fundus autofluorescence (FAF), greyish FAF appearance and subfoveal choroidal thickness (SFCT). A comparison of features between the subgroups was conducted, and a qualitative description of the new phenotypes was proposed. The atrophy growth rate between phenotypes was then compared. Data were analysed from 77 eyes of 77 patients with GA. Cluster analysis identified three groups: phenotype 1 was characterised by high soft drusen load, foveal atrophy and slow growth; phenotype 3 showed high RPD load, extrafoveal and greyish FAF appearance and thin SFCT; the characteristics of phenotype 2 were midway between phenotypes 1 and 3. Phenotypes differed in all measured features (p≤0.013), with decreases in the presence of soft drusen, foveal atrophy and SFCT seen from phenotypes 1 to 3 and corresponding increases in high RPD load, high FAF and greyish FAF appearance. Atrophy growth rate differed between phenotypes 1, 2 and 3 (0.63, 1.91 and 1.73 mm(2)/year, respectively, p=0.0005). Cluster analysis identified three distinct phenotypes in GA. One of them showed a particularly slow growth pattern. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  9. MMPI-2: Cluster Analysis of Personality Profiles in Perinatal Depression—Preliminary Evidence

    Directory of Open Access Journals (Sweden)

    Valentina Meuti

    2014-01-01

    Full Text Available Background. To assess personality characteristics of women who develop perinatal depression. Methods. The study started with a screening of a sample of 453 women in their third trimester of pregnancy, to which was administered a survey data form, the Edinburgh Postnatal Depression Scale (EPDS and the Minnesota Multiphasic Personality Inventory 2 (MMPI-2. A clinical group of subjects with perinatal depression (PND, 55 subjects was selected; clinical and validity scales of MMPI-2 were used as predictors in hierarchical cluster analysis carried out. Results. The analysis identified three clusters of personality profile: two “clinical” clusters (1 and 3 and an “apparently common” one (cluster 2. The first cluster (39.5% collects structures of personality with prevalent obsessive or dependent functioning tending to develop a “psychasthenic” depression; the third cluster (13.95% includes women with prevalent borderline functioning tending to develop “dysphoric” depression; the second cluster (46.5% shows a normal profile with a “defensive” attitude, probably due to the presence of defense mechanisms or to the fear of stigma. Conclusion. Characteristics of personality have a key role in clinical manifestations of perinatal depression; it is important to detect them to identify mothers at risk and to plan targeted therapeutic interventions.

  10. MMPI-2: cluster analysis of personality profiles in perinatal depression—preliminary evidence.

    Science.gov (United States)

    Meuti, Valentina; Marini, Isabella; Grillo, Alessandra; Lauriola, Marco; Leone, Carlo; Giacchetti, Nicoletta; Aceti, Franca

    2014-01-01

    To assess personality characteristics of women who develop perinatal depression. The study started with a screening of a sample of 453 women in their third trimester of pregnancy, to which was administered a survey data form, the Edinburgh Postnatal Depression Scale (EPDS) and the Minnesota Multiphasic Personality Inventory 2 (MMPI-2). A clinical group of subjects with perinatal depression (PND, 55 subjects) was selected; clinical and validity scales of MMPI-2 were used as predictors in hierarchical cluster analysis carried out. The analysis identified three clusters of personality profile: two "clinical" clusters (1 and 3) and an "apparently common" one (cluster 2). The first cluster (39.5%) collects structures of personality with prevalent obsessive or dependent functioning tending to develop a "psychasthenic" depression; the third cluster (13.95%) includes women with prevalent borderline functioning tending to develop "dysphoric" depression; the second cluster (46.5%) shows a normal profile with a "defensive" attitude, probably due to the presence of defense mechanisms or to the fear of stigma. Characteristics of personality have a key role in clinical manifestations of perinatal depression; it is important to detect them to identify mothers at risk and to plan targeted therapeutic interventions.

  11. MANNER OF STOCKS SORTING USING CLUSTER ANALYSIS METHODS

    Directory of Open Access Journals (Sweden)

    Jana Halčinová

    2014-06-01

    Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.

  12. AMOEBA clustering revisited. [cluster analysis, classification, and image display program

    Science.gov (United States)

    Bryant, Jack

    1990-01-01

    A description of the clustering, classification, and image display program AMOEBA is presented. Using a difficult high resolution aircraft-acquired MSS image, the steps the program takes in forming clusters are traced. A number of new features are described here for the first time. Usage of the program is discussed. The theoretical foundation (the underlying mathematical model) is briefly presented. The program can handle images of any size and dimensionality.

  13. Mapping Cigarettes Similarities using Cluster Analysis Methods

    Directory of Open Access Journals (Sweden)

    Lorentz Jäntschi

    2007-09-01

    Full Text Available The aim of the research was to investigate the relationship and/or occurrences in and between chemical composition information (tar, nicotine, carbon monoxide, market information (brand, manufacturer, price, and public health information (class, health warning as well as clustering of a sample of cigarette data. A number of thirty cigarette brands have been analyzed. Six categorical (cigarette brand, manufacturer, health warnings, class and four continuous (tar, nicotine, carbon monoxide concentrations and package price variables were collected for investigation of chemical composition, market information and public health information. Multiple linear regression and two clusterization techniques have been applied. The study revealed interesting remarks. The carbon monoxide concentration proved to be linked with tar and nicotine concentration. The applied clusterization methods identified groups of cigarette brands that shown similar characteristics. The tar and carbon monoxide concentrations were the main criteria used in clusterization. An analysis of a largest sample could reveal more relevant and useful information regarding the similarities between cigarette brands.

  14. Functional annotation of hierarchical modularity.

    Directory of Open Access Journals (Sweden)

    Kanchana Padmanabhan

    Full Text Available In biological networks of molecular interactions in a cell, network motifs that are biologically relevant are also functionally coherent, or form functional modules. These functionally coherent modules combine in a hierarchical manner into larger, less cohesive subsystems, thus revealing one of the essential design principles of system-level cellular organization and function-hierarchical modularity. Arguably, hierarchical modularity has not been explicitly taken into consideration by most, if not all, functional annotation systems. As a result, the existing methods would often fail to assign a statistically significant functional coherence score to biologically relevant molecular machines. We developed a methodology for hierarchical functional annotation. Given the hierarchical taxonomy of functional concepts (e.g., Gene Ontology and the association of individual genes or proteins with these concepts (e.g., GO terms, our method will assign a Hierarchical Modularity Score (HMS to each node in the hierarchy of functional modules; the HMS score and its p-value measure functional coherence of each module in the hierarchy. While existing methods annotate each module with a set of "enriched" functional terms in a bag of genes, our complementary method provides the hierarchical functional annotation of the modules and their hierarchically organized components. A hierarchical organization of functional modules often comes as a bi-product of cluster analysis of gene expression data or protein interaction data. Otherwise, our method will automatically build such a hierarchy by directly incorporating the functional taxonomy information into the hierarchy search process and by allowing multi-functional genes to be part of more than one component in the hierarchy. In addition, its underlying HMS scoring metric ensures that functional specificity of the terms across different levels of the hierarchical taxonomy is properly treated. We have evaluated our

  15. Fractal analysis of the hierarchic structure of fossil coal surface

    Energy Technology Data Exchange (ETDEWEB)

    Alekseev, A.D.; Vasilenko, T.A.; Kirillov, A.K. [National Academy of Sciences, Donetsk (Ukraine)

    2008-05-15

    The fractal analysis is described as method of studying images of surface of fossil coal, one of the natural sorbent, with the aim of determining its structural surface heterogeneity. The deformation effect as a reduction in the dimensions of heterogeneity boundaries is considered. It is shown that the theory of nonequilibrium dynamic systems permits to assess a formation level of heterogeneities involved into a sorbent composition by means of the Hurst factor.

  16. Study of cluster analysis used in explosives classification with laser-induced breakdown spectroscopy

    Science.gov (United States)

    Wang, Q. Q.; He, L. A.; Zhao, Y.; Peng, Z.; Liu, L.

    2016-06-01

    Supervised learning methods (such as partial least squares regression-discriminant analysis, SIMCA, etc) are widely used in explosives recognition. The correct classification rate may be lowered if a sample or substrate is not included in the training dataset. Unsupervised learning methods (such as hierarchical clustering analysis, K-means, etc) have the potential to solve this problem. In this paper we analyzed results of using as input variables the intensities of seven lines and then five intensity ratios of the seven lines. It was demonstrated that unsupervised learning methods had the ability to achieve a better classification result.

  17. Suicide in the oldest old: an observational study and cluster analysis.

    Science.gov (United States)

    Sinyor, Mark; Tan, Lynnette Pei Lin; Schaffer, Ayal; Gallagher, Damien; Shulman, Kenneth

    2016-01-01

    The older population are at a high risk for suicide. This study sought to learn more about the characteristics of suicide in the oldest-old and to use a cluster analysis to determine if oldest-old suicide victims assort into clinically meaningful subgroups. Data were collected from a coroner's chart review of suicide victims in Toronto from 1998 to 2011. We compared two age groups (65-79 year olds, n = 335, and 80+ year olds, n = 191) and then conducted a hierarchical agglomerative cluster analysis using Ward's method to identify distinct clusters in the 80+ group. The younger and older age groups differed according to marital status, living circumstances and pattern of stressors. The cluster analysis identified three distinct clusters in the 80+ group. Cluster 1 was the largest (n = 124) and included people who were either married or widowed who had significantly more depression and somewhat more medical health stressors. In contrast, cluster 2 (n = 50) comprised people who were almost all single and living alone with significantly less identified depression and slightly fewer medical health stressors. All members of cluster 3 (n = 17) lived in a retirement residence or nursing home, and this group had the highest rates of depression, dementia, other mental illness and past suicide attempts. This is the first study to use the cluster analysis technique to identify meaningful subgroups among suicide victims in the oldest-old. The results reveal different patterns of suicide in the older population that may be relevant for clinical care. Copyright © 2015 John Wiley & Sons, Ltd.

  18. Implementation of Hierarchical Task Analysis for User Interface Design in Drawing Application for Early Childhood Education

    National Research Council Canada - National Science Library

    Mira Kania Sabariah; Veronikha Effendy; Muhamad Fachmi Ichsan

    2016-01-01

    ... of learning and characteristics of early childhood (4-6 years). Based on the results, Hierarchical Task Analysis method generated a list of tasks that must be done in designing an user interface that represents the user experience in draw learning. Then by using the Heuristic Evaluation method the usability of the model has fulfilled a very good level of understanding and also it can be enhanced and produce a better model.

  19. Design And Analysis Of Low Power Hierarchical Decoder

    Directory of Open Access Journals (Sweden)

    Abhinav Singh

    2012-11-01

    Full Text Available Due to the high degree of miniaturization possible today in semiconductor technology, the size and complexity of designs that may be implemented in hardware has increased dramatically. Process scaling has been used in the miniaturization process to reduce the area needed for logic functions in an effort to lower the product costs. Precharged Complementary Metal Oxide Semiconductor (CMOS domino logic techniques may be applied to functional blocks to reduce power. Domino logic forms an attractive design style for high performance designs since its low switching threshold and reduced transistor count leads to fast and area efficient circuit implementations. In this paper all the necessary components required to form a 5-to-32 bit decoder using domino logic are designed to perform different analysis at 180nm & 350 nm technologies. Decoderimplemented through domino logic is compared to static decoder.

  20. Equivalent damage validation by variable cluster analysis

    Science.gov (United States)

    Drago, Carlo; Ferlito, Rachele; Zucconi, Maria

    2016-06-01

    The main aim of this work is to perform a clustering analysis on the damage relieved in the old center of L'Aquila after the earthquake occurred on April 6, 2009 and to validate an Indicator of Equivalent Damage ED that summarizes the information reported on the AeDES card regarding the level of damage and their extension on the surface of the buildings. In particular we used a sample of 13442 masonry buildings located in an area characterized by a Macroseismic Intensity equal to 8 [1]. The aim is to ensure the coherence between the clusters and its hierarchy identified in the data of damage detected and in the data of the ED elaborated.

  1. Data Clustering Analysis Based on Wavelet Feature Extraction

    Institute of Scientific and Technical Information of China (English)

    QIANYuntao; TANGYuanyan

    2003-01-01

    A novel wavelet-based data clustering method is presented in this paper, which includes wavelet feature extraction and cluster growing algorithm. Wavelet transform can provide rich and diversified information for representing the global and local inherent structures of dataset. therefore, it is a very powerful tool for clustering feature extraction. As an unsupervised classification, the target of clustering analysis is dependent on the specific clustering criteria. Several criteria that should be con-sidered for general-purpose clustering algorithm are pro-posed. And the cluster growing algorithm is also con-structed to connect clustering criteria with wavelet fea-tures. Compared with other popular clustering methods,our clustering approach provides multi-resolution cluster-ing results,needs few prior parameters, correctly deals with irregularly shaped clusters, and is insensitive to noises and outliers. As this wavelet-based clustering method isaimed at solving two-dimensional data clustering prob-lem, for high-dimensional datasets, self-organizing mapand U-matrlx method are applied to transform them intotwo-dimensional Euclidean space, so that high-dimensional data clustering analysis,Results on some sim-ulated data and standard test data are reported to illus-trate the power of our method.

  2. Statistical mechanical analysis of a hierarchical random code ensemble in signal processing

    Energy Technology Data Exchange (ETDEWEB)

    Obuchi, Tomoyuki [Department of Earth and Space Science, Faculty of Science, Osaka University, Toyonaka 560-0043 (Japan); Takahashi, Kazutaka [Department of Physics, Tokyo Institute of Technology, Tokyo 152-8551 (Japan); Takeda, Koujin, E-mail: takeda@sp.dis.titech.ac.jp [Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, Yokohama 226-8502 (Japan)

    2011-02-25

    We study a random code ensemble with a hierarchical structure, which is closely related to the generalized random energy model with discrete energy values. Based on this correspondence, we analyze the hierarchical random code ensemble by using the replica method in two situations: lossy data compression and channel coding. For both the situations, the exponents of large deviation analysis characterizing the performance of the ensemble, the distortion rate of lossy data compression and the error exponent of channel coding in Gallager's formalism, are accessible by a generating function of the generalized random energy model. We discuss that the transitions of those exponents observed in the preceding work can be interpreted as phase transitions with respect to the replica number. We also show that the replica symmetry breaking plays an essential role in these transitions.

  3. Monitoring Post Disturbance Forest Regeneration with Hierarchical Object-Based Image Analysis

    Directory of Open Access Journals (Sweden)

    L. Monika Moskal

    2013-10-01

    Full Text Available The main goal of this exploratory project was to quantify seedling density in post fire regeneration sites, with the following objectives: to evaluate the application of second order image texture (SOIT in image segmentation, and to apply the object-based image analysis (OBIA approach to develop a hierarchical classification. With the utilization of image texture we successfully developed a methodology to classify hyperspatial (high-spatial imagery to fine detail level of tree crowns, shadows and understory, while still allowing discrimination between density classes and mature forest versus burn classes. At the most detailed hierarchical Level I classification accuracies reached 78.8%, a Level II stand density classification produced accuracies of 89.1% and the same accuracy was achieved by the coarse general classification at Level III. Our interpretation of these results suggests hyperspatial imagery can be applied to post-fire forest density and regeneration mapping.

  4. Accounting for uncertainty in ecological analysis: the strengths and limitations of hierarchical statistical modeling.

    Science.gov (United States)

    Cressie, Noel; Calder, Catherine A; Clark, James S; Ver Hoef, Jay M; Wikle, Christopher K

    2009-04-01

    Analyses of ecological data should account for the uncertainty in the process(es) that generated the data. However, accounting for these uncertainties is a difficult task, since ecology is known for its complexity. Measurement and/or process errors are often the only sources of uncertainty modeled when addressing complex ecological problems, yet analyses should also account for uncertainty in sampling design, in model specification, in parameters governing the specified model, and in initial and boundary conditions. Only then can we be confident in the scientific inferences and forecasts made from an analysis. Probability and statistics provide a framework that accounts for multiple sources of uncertainty. Given the complexities of ecological studies, the hierarchical statistical model is an invaluable tool. This approach is not new in ecology, and there are many examples (both Bayesian and non-Bayesian) in the literature illustrating the benefits of this approach. In this article, we provide a baseline for concepts, notation, and methods, from which discussion on hierarchical statistical modeling in ecology can proceed. We have also planted some seeds for discussion and tried to show where the practical difficulties lie. Our thesis is that hierarchical statistical modeling is a powerful way of approaching ecological analysis in the presence of inevitable but quantifiable uncertainties, even if practical issues sometimes require pragmatic compromises.

  5. A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering

    CERN Document Server

    Seldin, Yevgeny

    2010-01-01

    We formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. This formulation enables practical and theoretical comparison of different approaches to graph clustering as well as comparison of graph clustering with other possible ways to model the graph. We adapt the PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008; Seldin, 2009) to derive a PAC-Bayesian generalization bound for graph clustering. The bound shows that graph clustering should optimize a trade-off between empirical data fit and the mutual information that clusters preserve on the graph nodes. A similar trade-off derived from information-theoretic considerations was already shown to produce state-of-the-art results in practice (Slonim et al., 2005; Yom-Tov and Slonim, 2009). This paper supports the empirical evidence by providing a better theoretical foundation, suggesting formal generalization guarantees, and offering...

  6. Improving Hierarchical Models Using Historical Data with Applications in High-Throughput Genomics Data Analysis.

    Science.gov (United States)

    Li, Ben; Li, Yunxiao; Qin, Zhaohui S

    2017-06-01

    Modern high-throughput biotechnologies such as microarray and next generation sequencing produce a massive amount of information for each sample assayed. However, in a typical high-throughput experiment, only limited amount of data are observed for each individual feature, thus the classical 'large p, small n' problem. Bayesian hierarchical model, capable of borrowing strength across features within the same dataset, has been recognized as an effective tool in analyzing such data. However, the shrinkage effect, the most prominent feature of hierarchical features, can lead to undesirable over-correction for some features. In this work, we discuss possible causes of the over-correction problem and propose several alternative solutions. Our strategy is rooted in the fact that in the Big Data era, large amount of historical data are available which should be taken advantage of. Our strategy presents a new framework to enhance the Bayesian hierarchical model. Through simulation and real data analysis, we demonstrated superior performance of the proposed strategy. Our new strategy also enables borrowing information across different platforms which could be extremely useful with emergence of new technologies and accumulation of data from different platforms in the Big Data era. Our method has been implemented in R package "adaptiveHM", which is freely available from https://github.com/benliemory/adaptiveHM.

  7. Countries population determination to test rice crisis indicator at national level using k-means cluster analysis

    Science.gov (United States)

    Hidayat, Y.; Purwandari, T.; Sukono; Ariska, Y. D.

    2017-01-01

    This study aimed to obtain information on the population of the countries which is have similarities with Indonesia based on three characteristics, that is the democratic atmosphere, rice consumption and purchasing power of rice. It is useful as a reference material for research which tested the strength and predictability of the rice crisis indicators Unprecedented Restlessness (UR). The similarities countries with Indonesia were conducted using multivariate analysis that is non-hierarchical cluster analysis k-Means with 38 countries as the data population. This analysis is done repeatedly until the obtainment number of clusters which is capable to show the differentiator power of the three characteristics and describe the high similarity within clusters. Based on the results, it turns out with 6 clusters can describe the differentiator power of characteristics of formed clusters. However, to answer the purpose of the study, only one cluster which will be taken accordance with the criteria of success for the population of countries that have similarities with Indonesia that cluster contain Indonesia therein, there are countries which is sustain crisis and non-crisis of rice in 2008, and cluster which is have the largest member among them. This criterion is met by cluster 2, which consists of 22 countries, namely Indonesia, Brazil, Costa Rica, Djibouti, Dominican Republic, Ecuador, Fiji, Guinea-Bissau, Haiti, India, Jamaica, Japan, Korea South, Madagascar, Malaysia, Mali, Nicaragua, Panama, Peru, Senegal, Sierra Leone and Suriname.

  8. Classification of frailty using the Kihon checklist: A cluster analysis of older adults in urban areas.

    Science.gov (United States)

    Kera, Takeshi; Kawai, Hisashi; Yoshida, Hideyo; Hirano, Hirohiko; Kojima, Motonaga; Fujiwara, Yoshinori; Ihara, Kazushige; Obuchi, Shuichi

    2017-01-01

    Frailty is an important predictor of the need for long-term care and hospitalization. Our aim was to categorize frailty in community-dwelling older adults. The present study was carried out in 2011-2013, and consisted of 1380 individuals over 65 years of age. Participants completed the Kihon checklist, which is widely used to assess frailty in Japan, and their physical, cognitive and social function was evaluated. Non-hierarchical cluster analysis was used to statistically categorize frailty. The optimum number of clusters was determined as the point at which the external reference values (instrumental activity of daily living score, grip power, 10-m walk time, body mass index, portable fall risk index, occlusal force and Mini-Mental State Examination score) differed. According to the Kihon checklist, 369 (26.7%) of the 1380 study participants were considered frail. When the cluster number was increased from two to six, the scores in each subdomain of the Kihon checklist significantly differed. The estimated minimum number of clusters was five, and each of the five cluster groups had distinct characteristics. The numbers of participants in cluster groups 1-5 were 105, 78, 62, 71 and 53, respectively. We identified five types of frailty in community-dwelling older adults in Japan: "experience of falling," "pre-frailty," "oral frailty," "housebound" and "severe frailty." Geriatr Gerontol Int 2017; 17: 69-77. © 2016 Japan Geriatrics Society.

  9. Cluster analysis of infrared spectra of rabbit cortical bone samples during maturation and growth.

    Science.gov (United States)

    Kobrina, Yevgeniya; Turunen, Mikael J; Saarakkala, Simo; Jurvelin, Jukka S; Hauta-Kasari, Markku; Isaksson, Hanna

    2010-12-01

    Bone consists of an organic and an inorganic matrix. During development, bone undergoes changes in its composition and structure. In this study we apply three different cluster analysis algorithms [K-means (KM), fuzzy C-means (FCM) and hierarchical clustering (HCA)], and discriminant analysis (DA) on infrared spectroscopic data from developing cortical bone with the aim of comparing their ability to correctly classify the samples into different age groups. Cortical bone samples from the mid-diaphysis of the humerus of New Zealand white rabbits from three different maturation stages (newborn (NB), immature (11 days-1 month old), mature (3-6 months old)) were used. Three clusters were obtained by KM, FCM and HCA methods on different spectral regions (amide I, phosphate and carbonate). The newborn samples were well separated (71-100% correct classifications) from the other age groups by all bone components. The mature samples (3-6 months old) were well separated (100%) from those of other age groups by the carbonate spectral region, while by the phosphate and amide I regions some samples were assigned to another group (43-71% correct classifications). The greatest variance in the results for all algorithms was observed in the amide I region. In general, FCM clustering performed better than the other methods, and the overall error was lower. The discriminate analysis results showed that by combining the clustering results from all three spectral regions, the ability to predict the correct age group for all samples increased (from 29-86% to 77-91%). This study is the first to compare several clustering methods on infrared spectra of bone. Fuzzy C-means clustering performed best, and its ability to study the degree of memberships of samples to each cluster might be beneficial in future studies of medical diagnostics.

  10. Narcolepsy with and without cataplexy, idiopathic hypersomnia with and without long sleep time: a cluster analysis.

    Science.gov (United States)

    Šonka, Karel; Šusta, Marek; Billiard, Michel

    2015-02-01

    The successive editions of the International Classification of Sleep Disorders (ICSD) reflect the evolution of the concepts of various sleep disorders. This is particularly the case for central disorders of hypersomnolence, with continuous changes in terminology and divisions of narcolepsy, idiopathic hypersomnia, and recurrent hypersomnia. According to the ICSD 2nd Edition (ICSD-2), narcolepsy with cataplexy (NwithC), narcolepsy without cataplexy (Nw/oC), idiopathic hypersomnia with long sleep time (IHwithLST), and idiopathic hypersomnia without long sleep time (IHw/oLST) are four, well-defined hypersomnias of central origin. However, in the absence of biological markers, doubts have been raised as to the relevance of a division of idiopathic hypersomnia into two forms, and it is not yet clear whether Nw/oC and IHw/oLST are two distinct entities. With this in mind, it was decided to empirically review the ICSD-2 classification by using a hierarchical cluster analysis to see whether this division has some relevance, even though the terms "with long sleep time" and "without long sleep time" are inappropriate. The cluster analysis differentiated three main clusters: Cluster 1, "combined monosymptomatic hypersomnia/narcolepsy type 2" (people initially diagnosed with IHw/oLST and Nw/oC); Cluster 2 "polysymptomatic hypersomnia" (people initially diagnosed with IHwithLST); and Cluster 3, narcolepsy type 1 (people initially diagnosed with NwithC). Cluster analysis confirmed that narcolepsy type 1 and polysymptomatic hypersomnia are independent sleep disorders. People who were initially diagnosed with Nw/oC and IHw/oLST formed a single cluster, referred to as "combined monosymptomatic hypersomnia/narcolepsy type 2." Copyright © 2014 Elsevier B.V. All rights reserved.

  11. Environmental quenching and hierarchical cluster assembly: Evidence from spectroscopic ages of red-sequence galaxies in Coma

    CERN Document Server

    Smith, Russell J; Price, James; Hudson, Michael J; Phillipps, Steven

    2011-01-01

    We explore the variation in stellar population ages for Coma cluster galaxies as a function of projected cluster-centric distance, using a sample of 362 red-sequence galaxies with high signal-to-noise spectroscopy. The sample spans a wide range in luminosity (0.02-4 L*) and extends from the cluster core to near the virial radius. We find a clear distinction in the observed trends of the giant and dwarf galaxies. The ages of red-sequence giants are primarily determined by galaxy mass, with only weak modulation by environment, in the sense that galaxies at larger cluster-centric distance are slightly younger. For red-sequence dwarfs (with mass <10^10 Msun), the roles of mass and environment as predictors of age are reversed: there is little dependence on mass, but strong trends with projected cluster-centric radius are observed. The average age of dwarfs at the 2.5 Mpc limit of our sample is approximately half that of dwarfs near the cluster centre. The gradient in dwarf galaxy ages is a global cluster-centr...

  12. Cluster-based exposure variation analysis.

    Science.gov (United States)

    Samani, Afshin; Mathiassen, Svend Erik; Madeleine, Pascal

    2013-04-04

    Static posture, repetitive movements and lack of physical variation are known risk factors for work-related musculoskeletal disorders, and thus needs to be properly assessed in occupational studies. The aims of this study were (i) to investigate the effectiveness of a conventional exposure variation analysis (EVA) in discriminating exposure time lines and (ii) to compare it with a new cluster-based method for analysis of exposure variation. For this purpose, we simulated a repeated cyclic exposure varying within each cycle between "low" and "high" exposure levels in a "near" or "far" range, and with "low" or "high" velocities (exposure change rates). The duration of each cycle was also manipulated by selecting a "small" or "large" standard deviation of the cycle time. Theses parameters reflected three dimensions of exposure variation, i.e. range, frequency and temporal similarity.Each simulation trace included two realizations of 100 concatenated cycles with either low (ρ = 0.1), medium (ρ = 0.5) or high (ρ = 0.9) correlation between the realizations. These traces were analyzed by conventional EVA, and a novel cluster-based EVA (C-EVA). Principal component analysis (PCA) was applied on the marginal distributions of 1) the EVA of each of the realizations (univariate approach), 2) a combination of the EVA of both realizations (multivariate approach) and 3) C-EVA. The least number of principal components describing more than 90% of variability in each case was selected and the projection of marginal distributions along the selected principal component was calculated. A linear classifier was then applied to these projections to discriminate between the simulated exposure patterns, and the accuracy of classified realizations was determined. C-EVA classified exposures more correctly than univariate and multivariate EVA approaches; classification accuracy was 49%, 47% and 52% for EVA (univariate and multivariate), and C-EVA, respectively (p analysis are the advantages

  13. A Hierarchical Clustering Method Based on the Threshold of Semantic Feature in Big Data%大数据中一种基于语义特征阈值的层次聚类方法

    Institute of Scientific and Technical Information of China (English)

    罗恩韬; 王国军

    2015-01-01

    云计算、健康医疗、街景地图服务、推荐系统等新兴服务促使数据的种类和规模以前所未有的速度增长,数据量的激增会导致很多共性问题.例如数据的可表示,可处理和可靠性问题.如何有效处理和分析数据之间的关系,提高数据的划分效率,建立数据的聚类分析模型,已经成为学术界和企业界共同亟待解决的问题.该文提出一种基于语义特征的层次聚类方法,首先根据数据的语义特征进行训练,然后在每个子集上利用训练结果进行层次聚类,最终产生整体数据的密度中心点,提高了数据聚类效率和准确性.此方法采样复杂度低,数据分析准确,易于实现,具有良好的判定性.%The type and scale of data has been promoted with a hitherto unknown speed by the emerging services including cloud computing, health care, street view services recommendation system and so on. However, the surge in the volume of data may lead to many common problems, such as the representability, reliability and handlability of data. Therefore, how to effectively handle the relationship between the data and the analysis to improve the efficiency of classification of the data and establish the data clustering analysis model has become an academic and business problem, which needs to be solved urgently. A hierarchical clustering method based on semantic feature is proposed. Firstly, the data should be trained according to the semantic features of data, and then is used the training result to process hierarchical clustering in each subset; finally, the density center point is produced. This method can improve the efficiency and accuracy of data clustering. This algorithm is of low complexity about sampling, high accuracy of data analysis and good judgment. Furthermore, the algorithm is easy to realize.

  14. Hierarchical rutile TiO2 flower cluster-based high efficiency dye-sensitized solar cells via direct hydrothermal growth on conducting substrates.

    Science.gov (United States)

    Ye, Meidan; Liu, Hsiang-Yu; Lin, Changjian; Lin, Zhiqun

    2013-01-28

    Dye-sensitized solar cells (DSSCs) based on hierarchical rutile TiO(2) flower clusters prepared by a facile, one-pot hydrothermal process exhibit a high efficiency. Complex yet appealing rutile TiO(2) flower films are, for the first time, directly hydrothermally grown on a transparent conducting fluorine-doped tin oxide (FTO) substrate. The thickness and density of as-grown flower clusters can be readily tuned by tailoring growth parameters, such as growth time, the addition of cations of different valence and size, initial concentrations of precursor and cation, growth temperature, and acidity. Notably, the small lattice mismatch between the FTO substrate and rutile TiO(2) renders the epitaxial growth of a compact rutile TiO(2) layer on the FTO glass. Intriguingly, these TiO(2) flower clusters can then be exploited as photoanodes to produce DSSCs, yielding a power conversion efficiency of 2.94% despite their rutile nature, which is further increased to 4.07% upon the TiCl(4) treatment.

  15. Mismatch negativity/P3a complex in young people with psychiatric disorders: a cluster analysis.

    Directory of Open Access Journals (Sweden)

    Manreena Kaur

    Full Text Available BACKGROUND: We have recently shown that the event-related potential biomarkers, mismatch negativity (MMN and P3a, are similarly impaired in young patients with schizophrenia- and affective-spectrum psychoses as well as those with bipolar disorder. A data driven approach may help to further elucidate novel patterns of MMN/P3a amplitudes that characterise distinct subgroups in patients with emerging psychiatric disorders. METHODS: Eighty seven outpatients (16 to 30 years were assessed: 19 diagnosed with a depressive disorder; 26 with a bipolar disorder; and 42 with a psychotic disorder. The MMN/P3a complex was elicited using a two-tone passive auditory oddball paradigm with duration deviant tones. Hierarchical cluster analysis utilising frontal, central and temporal neurophysiological variables was conducted. RESULTS: Three clusters were determined: the 'globally impaired' cluster (n = 53 displayed reduced frontal and temporal MMN as well as reduced central P3a amplitudes; the 'largest frontal MMN' cluster (n = 17 were distinguished by increased frontal MMN amplitudes and the 'largest temporal MMN' cluster (n = 17 was characterised by increases in temporal MMN only. Notably, 55% of those in the globally impaired cluster were diagnosed with schizophrenia-spectrum disorder, whereas the three patient subgroups were equally represented in the remaining two clusters. The three cluster-groups did not differ in their current symptomatology; however, the globally impaired cluster was the most neuropsychologically impaired, compared with controls. CONCLUSIONS: These findings suggest that in emerging psychiatric disorders there are distinct MMN/P3a profiles of patient subgroups independent of current symptomatology. Schizophrenia-spectrum patients tended to show the most global impairments in this neurophysiological complex. Two other subgroups of patients were found to have neurophysiological profiles suggestive of quite different neurobiological (and

  16. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

    CERN Document Server

    Emmons, Scott; Gallant, Mike; Börner, Katy

    2016-01-01

    Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Blondel, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 o...

  17. Analysis of acoustic cardiac signals for heart rate variability and murmur detection using nonnegative matrix factorization-based hierarchical decomposition

    DEFF Research Database (Denmark)

    Shah, Ghafoor; Koch, Peter; Papadias, Constantinos B.

    2014-01-01

    . A novel method based on hierarchical decomposition of the single channel mixture using various nonnegative matrix factorization techniques is proposed, which provides unsupervised clustering of the underlying component signals. HRV is determined over the recovered normal cardiac acoustic signals....... This novel decomposition technique is compared against the state-of-the-art techniques; experiments are performed using real-world clinical data, which show the potential significance of the proposed technique....

  18. Posterior AD-Type Pathology: Cognitive Subtypes Emerging from a Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Antonella Cappa

    2014-01-01

    Full Text Available Background. “Posterior shift” of the neuropathological changes of Alzheimer's disease (AD produces a syndrome (posterior cortical atrophy (PCA dominated by high-level visual deficits. Objective. To explore in patients with AD-type pathology whether a data-driven analysis (cluster analysis based on neuropsychological findings resulted in the emergence of different subgroups of patients; in particular to find out whether it was possible to identify patients with visuospatial deficits consistent with the hypothesis that PCA is a “dorsal stream” syndrome or, rather, whether there were subgroups of patients with different types of impairment within the high-level visual domain. Methods. 23 PCA and 16 DAT patients were studied. By a principal component analysis performed on a wide range of neuropsychological tasks, 15 variables were obtained that loaded onto five main factors (memory, language, perceptual, visuospatial, and calculation which entered a hierarchical cluster analysis. Results. Four clusters of cognitive impairment emerged: visuospatial/perceptual, memory, perceptual/calculation, and language. Only in the first cluster a visuospatial deficit clearly emerged. Conclusions. AD pathology produces not only variants dominated by memory (DAT and, to a lesser extent, visuospatial deficit (PCA, but also other distinct syndromic subtypes with disorders in visual perception and language which reflect a different vulnerability of specific functional networks.

  19. Hierarchical Direct Time Integration Method and Adaptive Procedure for Dynamic Analysis

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    New hierarchical direct time integration method for structural dynamic analysis is developed by using Taylor series expansions in each time step. Very accurate results can be obtained by increasing the order of the Taylor series. Furthermore, the local error can be estimated by simply comparing the solutions obtained by the proposed method with the higher order solutions. This local estimate is then used to develop an adaptive order-control technique. Numerical examples are given to illustrate the performance of the present method and its adaptive procedure.

  20. Automation of control and analysis of execution of official duties and instructions in the hierarchical organization

    Directory of Open Access Journals (Sweden)

    Demchenko A.I.

    2017-01-01

    Full Text Available The article considers the problem of monitoring over execution of official duties of employees. This problem is characteristic of the enterprises having a hierarchical management structure. The functions and the modes of monitoring are defined, the types of analysis of the staff activities are provided. The description of the program complex allowing distributing functions and instructions for between the employees is given. The developed computer program allows tracking the performance, creating reports. The computer program has a demarcation of access rights and provides the can be operated in both local, and a large-scale network.

  1. [Prognostic differences of phenotypes in pT1-2N0 invasive breast cancer: a large cohort study with cluster analysis].

    Science.gov (United States)

    Wang, Z; Wang, W H; Wang, S L; Jin, J; Song, Y W; Liu, Y P; Ren, H; Fang, H; Tang, Y; Chen, B; Qi, S N; Lu, N N; Li, N; Tang, Y; Liu, X F; Yu, Z H; Li, Y X

    2016-06-23

    To find phenotypic subgroups of patients with pT1-2N0 invasive breast cancer by means of cluster analysis and estimate the prognosis and clinicopathological features of these subgroups. From 1999 to 2013, 4979 patients with pT1-2N0 invasive breast cancer were recruited for hierarchical clustering analysis. Age (≤40, 41-70, 70+ years), size of primary tumor, pathological type, grade of differentiation, microvascular invasion, estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER-2) were chosen as distance metric between patients. Hierarchical cluster analysis was performed using Ward's method. Cophenetic correlation coefficient (CPCC) and Spearman correlation coefficient were used to validate clustering structures. The CPCC was 0.603. The Spearman correlation coefficient was 0.617 (Pcluster model seemed to best illustrate our patient cohort. Patients in cluster 5, 9 and 12 had best prognosis and were characterized by age >40 years, smaller primary tumor, lower histologic grade, positive ER and PR status, and mainly negative HER-2. Patients in the cluster 1 and 11 had the worst prognosis, The cluster 1 was characterized by a larger tumor, higher grade and negative ER and PR status, while the cluster 11 was characterized by positive microvascular invasion. Patients in other 7 clusters had a moderate prognosis, and patients in each cluster had distinctive clinicopathological features and recurrent patterns. This study identified distinctive clinicopathologic phenotypes in a large cohort of patients with pT1-2N0 breast cancer through hierarchical clustering and revealed different prognosis. This integrative model may help physicians to make more personalized decisions regarding adjuvant therapy.

  2. 一种层次聚类的RDF图语义检索方法研究%Hierarchical clustering-based semantic retrieval of RDF graph

    Institute of Scientific and Technical Information of China (English)

    刘宁; 左凤华; 张俊

    2012-01-01

    The cun-ent research related RDF graph retrieve exists some problems, such as low efficiency of memory usage, low search efficiency and so on. This paper proposed a hierarchical clustering semantic retrieval model on RDF graph and the method based on the model to solve aforesaid problems. That extracting entities from RDF graph and hierarchical clustering by the guidance of the ontology library made the complex graph structure into a tree structure for efficient retrieval. Orientating target object which was one of nodes in the model in RDF conducted the semantic expansion queries. Retrieval efficiency increased because retrieval scope narrow down as construction of retrieval model and recall ratio increased by the semantic expansion queries.%针对当前信息资源描述框架(RDF)检索过程中存在的内存使用过大及检索效率低等问题,提出一个RDF图的层次聚类语义检索模型,设计并实现了相应的检索方法.首先从RDF图中抽取实体数据,在本体库的指导下,通过层次聚类,将复杂的图形结构转换为适合检索的树型结构;根据在树中查找到的目标对象,确定其在RDF图中的位置,进行语义扩充查询.检索模型的构建缩小了检索范围,从而提高了检索效率,其语义扩充查询还可以得到较好的查全率.

  3. Cluster analysis for the systematic grouping of genuine cocoa butter and cocoa butter equivalent samples based on triglyceride patterns.

    Science.gov (United States)

    Buchgraber, Manuela; Ulberth, Franz; Anklam, Elke

    2004-06-16

    The triglyceride profile of cocoa butters (CBs) from different geographical origins, varieties, growing seasons, and a number of cocoa butter equivalents (CBEs) was determined by capillary gas liquid chromatography. Hierarchical cluster analysis was applied to the five main triglycerides of the samples for the ability to find natural groupings among (a) CBs of various provenance and (b) CBE samples of different types. The samples were clustered using Ward's method, and the similarity values of the linkages were represented by dendrograms. The five triglycerides contained adequate information to obtain a meaningful sample differentiation. This information can be used to assess the purity and the origin of the CB sample examined.

  4. Cluster analysis of word frequency dynamics

    Science.gov (United States)

    Maslennikova, Yu S.; Bochkarev, V. V.; Belashova, I. A.

    2015-01-01

    This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations.

  5. Clinical relevance of cluster analysis in phenotyping allergic rhinitis in a real-life study.

    Science.gov (United States)

    Bousquet, Philippe Jean; Devillier, Philippe; Tadmouri, Abir; Mesbah, Kamal; Demoly, Pascal; Bousquet, Jean

    2015-01-01

    Disease stratification, using phenotypic characterization performed either by hypothesis- or data-driven methods, was developed to improve clinical decisions. However, cluster analysis has not been used for allergic rhinitis. To define clusters in allergic rhinitis and to compare them with ARIA (Allergic Rhinitis and Its Impact on Asthma), a hypothesis-driven approach. A French observational prospective multicenter study (EVEIL: Echelle visuelle analogique dans la rhinite allergique) was carried out on 990 patients consulting general practitioners for allergic rhinitis and treated as per clinical practice. In this study, changes in symptom scores, visual analogue scales and quality of life were measured at baseline and after 14 days of treatment. A post hoc analysis was performed to identify clusters of patients with allergic rhinitis – using Ward's hierarchical method – and to define their clinical relevance at baseline and after 14 days of treatment. The cluster approach was compared to the ARIA approach. Patients were clustered into 4 phenotypes which partly followed the ARIA classes. These phenotypes differed in their disease severity including symptoms and quality of life. Physicians in real-life practice prescribed medication regardless of the phenotype and severity, with the exception of patients with ocular symptoms. Prescribed treatments were comparable in hypothesis- and data-driven analyses. The prevalence of uncontrolled patients during treatment was similar in the 4 clusters, but was significantly different according to the ARIA classes. Cluster analysis using demographic and clinical parameters only does not appear to add relevant information for disease stratification in allergic rhinitis. © 2015 S. Karger AG, Basel.

  6. A general strategy to determine the congruence between a hierarchical and a non-hierarchical classification

    Directory of Open Access Journals (Sweden)

    Marín Ignacio

    2007-11-01

    Full Text Available Abstract Background Classification procedures are widely used in phylogenetic inference, the analysis of expression profiles, the study of biological networks, etc. Many algorithms have been proposed to establish the similarity between two different classifications of the same elements. However, methods to determine significant coincidences between hierarchical and non-hierarchical partitions are still poorly developed, in spite of the fact that the search for such coincidences is implicit in many analyses of massive data. Results We describe a novel strategy to compare a hierarchical and a dichotomic non-hierarchical classification of elements, in order to find clusters in a hierarchical tree in which elements of a given "flat" partition are overrepresented. The key improvement of our strategy respect to previous methods is using permutation analyses of ranked clusters to determine whether regions of the dendrograms present a significant enrichment. We show that this method is more sensitive than previously developed strategies and how it can be applied to several real cases, including microarray and interactome data. Particularly, we use it to compare a hierarchical representation of the yeast mitochondrial interactome and a catalogue of known mitochondrial protein complexes, demonstrating a high level of congruence between those two classifications. We also discuss extensions of this method to other cases which are conceptually related. Conclusion Our method is highly sensitive and outperforms previously described strategies. A PERL script that implements it is available at http://www.uv.es/~genomica/treetracker.

  7. Clustering of frequency spectrums from different bearing fault using principle component analysis

    Directory of Open Access Journals (Sweden)

    Yusof M.F.M.

    2017-01-01

    Full Text Available In studies associated with the defect in rolling element bearing, signal clustering are one of the popular approach taken in attempt to identify the type of defect. However, the noise interruption are one of the major issues which affect the degree of effectiveness of the applied clustering method. In this paper, the application of principle component analysis (PCA as a pre-processing method for hierarchical clustering analysis on the frequency spectrum of the vibration signal was proposed. To achieve the aim, the vibration signal was acquired from the operating bearings with different condition and speed. In the next stage, the principle component analysis was applied to the frequency spectrums of the acquired signals for pattern recognition purpose. Meanwhile the mahalanobis distance model was used to cluster the result from PCA. According to the results, it was found that the change in amplitude at the respective fundamental frequencies can be detected as a result from the application of PCA. Meanwhile, the application of mahalanobis distance was found to be suitable for clustering the results from principle component analysis. Uniquely, it was discovered that the spectrums from healthy and inner race defect bearing can be clearly distinguished from each other even though the change in amplitude pattern for inner race defect frequency spectrum was too small compared to the healthy one. In this work, it was demonstrated that the use of principle component analysis could sensitively detect the change in the pattern of the frequency spectrums. Likewise, the implementation of mahalanobis distance model for clustering purpose was found to be significant for bearing defect identification.

  8. Galaxy Number Counts in the Subaru Deep Field Multi-band Analysis in a Hierarchical Galaxy Formation Model

    CERN Document Server

    Nagashima, M; Totani, T; Gouda, N

    2002-01-01

    Number counts of galaxies are re-analyzed using a semi-analytic model (SAM) of galaxy formation based on the hierarchical clustering scenario. Faint galaxies in the Subaru Deep Field (SDF) and the Hubble Deep Field (HDF) are compared with our model galaxies. We have determined the astrophysical parameters in the SAM that reproduce observations of nearby galaxies, and used them to predict the number counts and redshifts of faint galaxies for three cosmological models, the standard cold dark matter (CDM) universe, a flat lambda-CDM, and an open CDM. The novelty of our SAM analysis is the inclusion of selection effects arising from the cosmological dimming of surface brightness of high-z galaxies, and from the absorption of visible light by internal dust and intergalactic HI clouds. As was found in our previous work, in which the UV/optical HDF galaxies were compared with our model galaxies, we find that our SAM reproduces counts of near-IR SDF galaxies in low-density models, and that the standard CDM universe i...

  9. Hierarchical Classifiers for Multi-Way Sentiment Analysis of Arabic Reviews

    Directory of Open Access Journals (Sweden)

    Mahmoud Al-Ayyoub

    2016-02-01

    Full Text Available Sentiment Analysis (SA is one of hottest fields in data mining (DM and natural language processing (NLP. The goal of SA is to extract the sentiment conveyed in a certain text based on its content. While most current works focus on the simple problem of determining whether the sentiment is positive or negative, Multi-Way Sentiment Analysis (MWSA focuses on sentiments conveyed through a rating or scoring system (e.g., a 5-star scoring system. In such scoring systems, the sentiments conveyed in two reviews of close scores (such as 4 stars and 5 stars can be very similar creating an added challenge compared to traditional SA. One intuitive way of handling this challenge is via a divide-and-conquer approach where the MWSA problem is divided into a set of sub-problems allowing the use of customized classifiers to differentiate between reviews of close scores. A hierarchical classification structure can be used with this approach where each node represents a different classification sub-problem and the decision from it may lead to the invocation of another classifier. In this work, we show how the use of this divide-and-conquer hierarchical structure of classifiers can generate better results than the use of existing flat classifiers for the MWSA problem. We focus on the Arabic language for many reasons such as the importance of this language and the scarcity of prior works and available tools for it. To the best of our knowledge, very few papers have been published on MWSA of Arabic reviews. One notable work is that of Ali and Atiya, in which the authors collected a large scale Arabic Book Reviews (LABR dataset and made it publicly available. Unfortunately, the baseline experiments on this dataset had very low accuracy. We present two different hierarchical structures and compare their accuracies with the flat structure using different core classifiers. The comparison is based on standard accuracy measures such as precision and recall in addition to

  10. The heterogeneity of headache patients who self-medicate: a cluster analysis approach.

    Science.gov (United States)

    Mehuys, Els; Paemeleire, Koen; Crombez, Geert; Adriaens, Els; Van Hees, Thierry; Demarche, Sophie; Christiaens, Thierry; Van Bortel, Luc; Van Tongelen, Inge; Remon, Jean-Paul; Boussery, Koen

    2016-07-01

    Patients with headache often self-treat their condition with over-the-counter analgesics. However, overuse of analgesics can cause medication-overuse headache. The present study aimed to identify subgroups of individuals with headache who self-medicate, as this could be helpful to tailor intervention strategies for prevention of medication-overuse headache. Patients (n = 1021) were recruited from 202 community pharmacies and completed a self-administered questionnaire. A hierarchical cluster analysis was used to group patients as a function of sociodemographics, pain, disability, and medication use for pain. Three patient clusters were identified. Cluster 1 (n = 498, 48.8%) consisted of relatively young individuals, and most of them suffered from migraine. They reported the least number of other pain complaints and the lowest prevalence of medication overuse (MO; 16%). Cluster 2 (n = 301, 29.5%) included older persons with mainly non-migraine headache, a low disability, and on average pain in 2 other locations. Prevalence of MO was 40%. Cluster 3 (n = 222, 21.7%) mostly consisted of patients with migraine who also report pain in many other locations. These patients reported a high disability and a severe limitation of activities. They also showed the highest rates of MO (73%).

  11. Fuzzy C-means clustering for chromatographic fingerprints analysis: A gas chromatography-mass spectrometry case study.

    Science.gov (United States)

    Parastar, Hadi; Bazrafshan, Alisina

    2016-03-18

    Fuzzy C-means clustering (FCM) is proposed as a promising method for the clustering of chromatographic fingerprints of complex samples, such as essential oils. As an example, secondary metabolites of 14 citrus leaves samples are extracted and analyzed by gas chromatography-mass spectrometry (GC-MS). The obtained chromatographic fingerprints are divided to desired number of chromatographic regions. Owing to the fact that chromatographic problems, such as elution time shift and peak overlap can significantly affect the clustering results, therefore, each chromatographic region is analyzed using multivariate curve resolution-alternating least squares (MCR-ALS) to address these problems. Then, the resolved elution profiles are used to make a new data matrix based on peak areas of pure components to cluster by FCM. The FCM clustering parameters (i.e., fuzziness coefficient and number of cluster) are optimized by two different methods of partial least squares (PLS) as a conventional method and minimization of FCM objective function as our new idea. The results showed that minimization of FCM objective function is an easier and better way to optimize FCM clustering parameters. Then, the optimized FCM clustering algorithm is used to cluster samples and variables to figure out the similarities and dissimilarities among samples and to find discriminant secondary metabolites in each cluster (chemotype). Finally, the FCM clustering results are compared with those of principal component analysis (PCA), hierarchical cluster analysis (HCA) and Kohonon maps. The results confirmed the outperformance of FCM over the frequently used clustering algorithms. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. Hierarchical modeling for reliability analysis using Markov models. B.S./M.S. Thesis - MIT

    Science.gov (United States)

    Fagundo, Arturo

    1994-01-01

    Markov models represent an extremely attractive tool for the reliability analysis of many systems. However, Markov model state space grows exponentially with the number of components in a given system. Thus, for very large systems Markov modeling techniques alone become intractable in both memory and CPU time. Often a particular subsystem can be found within some larger system where the dependence of the larger system on the subsystem is of a particularly simple form. This simple dependence can be used to decompose such a system into one or more subsystems. A hierarchical technique is presented which can be used to evaluate these subsystems in such a way that their reliabilities can be combined to obtain the reliability for the full system. This hierarchical approach is unique in that it allows the subsystem model to pass multiple aggregate state information to the higher level model, allowing more general systems to be evaluated. Guidelines are developed to assist in the system decomposition. An appropriate method for determining subsystem reliability is also developed. This method gives rise to some interesting numerical issues. Numerical error due to roundoff and integration are discussed at length. Once a decomposition is chosen, the remaining analysis is straightforward but tedious. However, an approach is developed for simplifying the recombination of subsystem reliabilities. Finally, a real world system is used to illustrate the use of this technique in a more practical context.

  13. Extending hierarchical task analysis to identify cognitive demands and information design requirements.

    Science.gov (United States)

    Phipps, Denham L; Meakin, George H; Beatty, Paul C W

    2011-07-01

    While hierarchical task analysis (HTA) is well established as a general task analysis method, there appears a need to make more explicit both the cognitive elements of a task and design requirements that arise from an analysis. One way of achieving this is to make use of extensions to the standard HTA. The aim of the current study is to evaluate the use of two such extensions--the sub-goal template (SGT) and the skills-rules-knowledge (SRK) framework--to analyse the cognitive activity that takes place during the planning and delivery of anaesthesia. In quantitative terms, the two methods were found to have relatively poor inter-rater reliability; however, qualitative evidence suggests that the two methods were nevertheless of value in generating insights about anaesthetists' information handling and cognitive performance. Implications for the use of an extended HTA to analyse work systems are discussed.

  14. An exploratory analysis of treatment completion and client and organizational factors using hierarchical linear modeling.

    Science.gov (United States)

    Woodward, Albert; Das, Abhik; Raskin, Ira E; Morgan-Lopez, Antonio A

    2006-11-01

    Data from the Alcohol and Drug Services Study (ADSS) are used to analyze the structure and operation of the substance abuse treatment industry in the United States. Published literature contains little systematic empirical analysis of the interaction between organizational characteristics and treatment outcomes. This paper addresses that deficit. It develops and tests a hierarchical linear model (HLM) to address questions about the empirical relationship between treatment inputs (industry costs, types and use of counseling and medical personnel, diagnosis mix, patient demographics, and the nature and level of services used in substance abuse treatment), and patient outcomes (retention and treatment completion rates). The paper adds to the literature by demonstrating a direct and statistically significant link between treatment completion and the organizational and staffing structure of the treatment setting. Related reimbursement issues, questions for future analysis, and limitations of the ADSS for this analysis are discussed.

  15. Two-dimensional finite element neutron diffusion analysis using hierarchic shape functions

    Energy Technology Data Exchange (ETDEWEB)

    Carpenter, D.C.

    1997-04-01

    Recent advances have been made in the use of p-type finite element method (FEM) for structural and fluid dynamics problems that hold promise for reactor physics problems. These advances include using hierarchic shape functions, element-by-element iterative solvers and more powerful mapping techniques. Use of the hierarchic shape functions allows greater flexibility and efficiency in implementing energy-dependent flux expansions and incorporating localized refinement of the solution space. The irregular matrices generated by the p-type FEM can be solved efficiently using element-by-element conjugate gradient iterative solvers. These solvers do not require storage of either the global or local stiffness matrices and can be highly vectorized. Mapping techniques based on blending function interpolation allow exact representation of curved boundaries using coarse element grids. These features were implemented in a developmental two-dimensional neutron diffusion program based on the use of hierarchic shape functions (FEM2DH). Several aspects in the effective use of p-type analysis were explored. Two choices of elemental preconditioning were examined--the proper selection of the polynomial shape functions and the proper number of functions to use. Of the five shape function polynomials tested, the integral Legendre functions were the most effective. The serendipity set of functions is preferable over the full tensor product set. Two global preconditioners were also examined--simple diagonal and incomplete Cholesky. The full effectiveness of the finite element methodology was demonstrated on a two-region, two-group cylindrical problem but solved in the x-y coordinate space, using a non-structured element grid. The exact, analytic eigenvalue solution was achieved with FEM2DH using various combinations of element grids and flux expansions.

  16. A hierarchical model for probabilistic independent component analysis of multi-subject fMRI studies.

    Science.gov (United States)

    Guo, Ying; Tang, Li

    2013-12-01

    An important goal in fMRI studies is to decompose the observed series of brain images to identify and characterize underlying brain functional networks. Independent component analysis (ICA) has been shown to be a powerful computational tool for this purpose. Classic ICA has been successfully applied to single-subject fMRI data. The extension of ICA to group inferences in neuroimaging studies, however, is challenging due to the unavailability of a pre-specified group design matrix. Existing group ICA methods generally concatenate observed fMRI data across subjects on the temporal domain and then decompose multi-subject data in a similar manner to single-subject ICA. The major limitation of existing methods is that they ignore between-subject variability in spatial distributions of brain functional networks in group ICA. In this article, we propose a new hierarchical probabilistic group ICA method to formally model subject-specific effects in both temporal and spatial domains when decomposing multi-subject fMRI data. The proposed method provides model-based estimation of brain functional networks at both the population and subject level. An important advantage of the hierarchical model is that it provides a formal statistical framework to investigate similarities and differences in brain functional networks across subjects, for example, subjects with mental disorders or neurodegenerative diseases such as Parkinson's as compared to normal subjects. We develop an EM algorithm for model estimation where both the E-step and M-step have explicit forms. We compare the performance of the proposed hierarchical model with that of two popular group ICA methods via simulation studies. We illustrate our method with application to an fMRI study of Zen meditation.

  17. Somatotyping using 3D anthropometry: a cluster analysis.

    Science.gov (United States)

    Olds, Tim; Daniell, Nathan; Petkov, John; David Stewart, Arthur

    2013-01-01

    Somatotyping is the quantification of human body shape, independent of body size. Hitherto, somatotyping (including the most popular method, the Heath-Carter system) has been based on subjective visual ratings, sometimes supported by surface anthropometry. This study used data derived from three-dimensional (3D) whole-body scans as inputs for cluster analysis to objectively derive clusters of similar body shapes. Twenty-nine dimensions normalised for body size were measured on a purposive sample of 301 adults aged 17-56 years who had been scanned using a Vitus Smart laser scanner. K-means Cluster Analysis with v-fold cross-validation was used to determine shape clusters. Three male and three female clusters emerged, and were visualised using those scans closest to the cluster centroid and a caricature defined by doubling the difference between the average scan and the cluster centroid. The male clusters were decidedly endomorphic (high fatness), ectomorphic (high linearity), and endo-mesomorphic (a mixture of fatness and muscularity). The female clusters were clearly endomorphic, ectomorphic, and the ecto-mesomorphic (a mixture of linearity and muscularity). An objective shape quantification procedure combining 3D scanning and cluster analysis yielded shape clusters strikingly similar to traditional somatotyping.

  18. Identifying patterns in treatment response profiles in acute bipolar mania: a cluster analysis approach

    Directory of Open Access Journals (Sweden)

    Houston John P

    2008-07-01

    Full Text Available Abstract Background Patients with acute mania respond differentially to treatment and, in many cases, fail to obtain or sustain symptom remission. The objective of this exploratory analysis was to characterize response in bipolar disorder by identifying groups of patients with similar manic symptom response profiles. Methods Patients (n = 222 were selected from a randomized, double-blind study of treatment with olanzapine or divalproex in bipolar I disorder, manic or mixed episode, with or without psychotic features. Hierarchical clustering based on Ward's distance was used to identify groups of patients based on Young-Mania Rating Scale (YMRS total scores at each of 5 assessments over 7 weeks. Logistic regression was used to identify baseline predictors for clusters of interest. Results Four distinct clusters of patients were identified: Cluster 1 (n = 64: patients did not maintain a response (YMRS total scores ≤ 12; Cluster 2 (n = 92: patients responded rapidly (within less than a week and response was maintained; Cluster 3 (n = 36: patients responded rapidly but relapsed soon afterwards (YMRS ≥ 15; Cluster 4 (n = 30: patients responded slowly (≥ 2 weeks and response was maintained. Predictive models using baseline variables found YMRS Item 10 (Appearance, and psychosis to be significant predictors for Clusters 1 and 4 vs. Clusters 2 and 3, but none of the baseline characteristics allowed discriminating between Clusters 1 vs. 4. Experiencing a mixed episode at baseline predicted membership in Clusters 2 and 3 vs. Clusters 1 and 4. Treatment with divalproex, larger number of previous manic episodes, lack of disruptive-aggressive behavior, and more prominent depressive symptoms at baseline were predictors for Cluster 3 vs. 2. Conclusion Distinct treatment response profiles can be predicted by clinical features at baseline. The presence of these features as potential risk factors for relapse in patients who have responded to treatment

  19. An Isogeometric Design-through-analysis Methodology based on Adaptive Hierarchical Refinement of NURBS, Immersed Boundary Methods, and T-spline CAD Surfaces

    Science.gov (United States)

    2012-01-22

    ICES REPORT 12-05 January 2012 An Isogeometric Design-through-analysis Methodology based on Adaptive Hierarchical Refinement of NURBS , Immersed...M.J. Borden, E. Rank, T.J.R. Hughes, An Isogeometric Design-through-analysis Methodology based on Adaptive Hierarchical Refinement of NURBS , Immersed...analysis Methodology based on Adaptive Hierarchical Refinement of NURBS , Immersed Boundary Methods, and T-spline CAD Surfaces 5a. CONTRACT NUMBER 5b

  20. A hybrid monkey search algorithm for clustering analysis.

    Science.gov (United States)

    Chen, Xin; Zhou, Yongquan; Luo, Qifang

    2014-01-01

    Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.

  1. A Hybrid Monkey Search Algorithm for Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Xin Chen

    2014-01-01

    Full Text Available Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.

  2. Smartness and Italian Cities. A Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Flavio Boscacci

    2014-05-01

    Full Text Available Smart cities have been recently recognized as the most pleasing and attractive places to live in; due to this, both scholars and policy-makers pay close attention to this topic. Specifically, urban “smartness” has been identified by plenty of characteristics that can be grouped into six dimensions (Giffinger et al. 2007: smart Economy (competitiveness, smart People (social and human capital, smart Governance (participation, smart Mobility (both ICTs and transport, smart Environment (natural resources, and smart Living (quality of life. According to this analytical framework, in the present paper the relation between urban attractiveness and the “smart” characteristics has been investigated in the 103 Italian NUTS3 province capitals in the year 2011. To this aim, a descriptive statistics has been followed by a regression analysis (OLS, where the dependent variable measuring the urban attractiveness has been proxied by housing market prices. Besides, a Cluster Analysis (CA has been developed in order to find differences and commonalities among the province capitals.The OLS results indicate that living, people and economy are the key drivers for achieving a better urban attractiveness. Environment, instead, keeps on playing a minor role. Besides, the CA groups the province capitals a

  3. Ultrathin mesoporous Co3O4 nanosheets-constructed hierarchical clusters as high rate capability and long life anode materials for lithium-ion batteries

    Science.gov (United States)

    Wu, Shengming; Xia, Tian; Wang, Jingping; Lu, Feifei; Xu, Chunbo; Zhang, Xianfa; Huo, Lihua; Zhao, Hui

    2017-06-01

    Herein, Ultrathin mesoporous Co3O4 nanosheets-constructed hierarchical clusters (UMCN-HCs) have been successfully synthesized via a facile hydrothermal method followed by a subsequent thermolysis treatment at 600 °C in air. The products consist of cluster-like Co3O4 microarchitectures, which are assembled by numerous ultrathin mesoporous Co3O4 nanosheets. When tested as anode materials for lithium-ion batteries, UMCN-HCs deliver a high reversible capacity of 1067 mAh g-1 at a current density of 100 mA g-1 after 100 cycles. Even at 2 A g-1, a stable capacity as high as 507 mAh g-1 can be achieved after 500 cycles. The high reversible capacity, excellent cycling stability, and good rate capability of UMCN-HCs may be attributed to their mesoporous sheet-like nanostructure. The sheet-layered structure of UMCN-HCs may buffer the volume change during the lithiation-delithiation process, and the mesoporous characteristic make lithium-ion transfer more easily at the interface between the active electrode and the electrolyte.

  4. Instantaneous normal mode analysis of melting of finite dust clusters.

    Science.gov (United States)

    Melzer, André; Schella, André; Schablinski, Jan; Block, Dietmar; Piel, Alexander

    2012-06-01

    The experimental melting transition of finite two-dimensional dust clusters in a dusty plasma is analyzed using the method of instantaneous normal modes. In the experiment, dust clusters are heated in a thermodynamic equilibrium from a solid to a liquid state using a four-axis laser manipulation system. The fluid properties of the dust cluster, such as the diffusion constant, are measured from the instantaneous normal mode analysis. Thereby, the phase transition of these finite clusters is approached from the liquid phase. From the diffusion constants, unique melting temperatures have been assigned to dust clusters of various sizes that very well reflect their dynamical stability properties.

  5. Cluster Analysis to Identify Possible Subgroups in Tinnitus Patients.

    Science.gov (United States)

    van den Berge, Minke J C; Free, Rolien H; Arnold, Rosemarie; de Kleine, Emile; Hofman, Rutger; van Dijk, J Marc C; van Dijk, Pim

    2017-01-01

    In tinnitus treatment, there is a tendency to shift from a "one size fits all" to a more individual, patient-tailored approach. Insight in the heterogeneity of the tinnitus spectrum might improve the management of tinnitus patients in terms of choice of treatment and identification of patients with severe mental distress. The goal of this study was to identify subgroups in a large group of tinnitus patients. Data were collected from patients with severe tinnitus complaints visiting our tertiary referral tinnitus care group at the University Medical Center Groningen. Patient-reported and physician-reported variables were collected during their visit to our clinic. Cluster analyses were used to characterize subgroups. For the selection of the right variables to enter in the cluster analysis, two approaches were used: (1) variable reduction with principle component analysis and (2) variable selection based on expert opinion. Various variables of 1,783 tinnitus patients were included in the analyses. Cluster analysis (1) included 976 patients and resulted in a four-cluster solution. The effect of external influences was the most discriminative between the groups, or clusters, of patients. The "silhouette measure" of the cluster outcome was low (0.2), indicating a "no substantial" cluster structure. Cluster analysis (2) included 761 patients and resulted in a three-cluster solution, comparable to the first analysis. Again, a "no substantial" cluster structure was found (0.2). Two cluster analyses on a large database of tinnitus patients revealed that clusters of patients are mostly formed by a different response of external influences on their disease. However, both cluster outcomes based on this dataset showed a poor stability, suggesting that our tinnitus population comprises a continuum rather than a number of clearly defined subgroups.

  6. PERFORMANCE ANALYSIS OF CLUSTERED RADIO INTERFEROMETRIC CALIBRATION

    NARCIS (Netherlands)

    Kazemi, S.; Yatawatta, S.; Zaroubi, S.

    2012-01-01

    Subtraction of compact, bright sources is essential to produce high quality images in radio astronomy. It is recently proposed that 'clustered' calibration can perform better in subtracting fainter background sources. This is due to the fact that the effective power of a source cluster is greater th

  7. The Psychology of Yoga Practitioners: A Cluster Analysis.

    Science.gov (United States)

    Genovese, Jeremy E C; Fondran, Kristine M

    2017-03-30

    Yoga practitioners (N = 261) completed the revised Expression of Spirituality Inventory (ESI) and the Multidimensional Body-Self Relations Questionnaire. Cluster analysis revealed three clusters: Cluster A scored high on all four spiritual constructs. They had high positive evaluations of their appearance, but a lower orientation towards their appearance. They tended to have a high evaluation of their fitness and health, and higher body satisfaction. Cluster B showed lower scores on the spiritual constructs. Like Cluster A, members of Cluster B tended to show high positive evaluations of appearance and fitness. They also had higher body satisfaction. Members of Cluster B had a higher fitness orientation and a higher appearance orientation than members of Cluster A. Members of Cluster C had low scores for all spiritual constructs. They had a low evaluation of, and unhappiness with, their appearance. They were unhappy with the size and appearance of their bodies. They tended to see themselves as overweight. There was a significant difference in years of practice between the three groups (Kruskall-Wallis, p = .0041). Members of Cluster A have the most years of yoga experience and members of Cluster B have more yoga experience than members of Cluster C. These results suggest the possible existence of a developmental trajectory for yoga practitioners. Such a developmental sequence may have important implications for yoga practice and instruction.

  8. Critérios de formação de carteiras de ativos por meio de Hierarchical Clusters

    Directory of Open Access Journals (Sweden)

    Pierre Lucena

    2010-04-01

    Full Text Available Este artigo tem como objetivo principal apresentar e testar uma ferramenta de estatística multivariada em modelos financeiros. Essa metodologia, conhecida como análise de clusters, separa as observações em grupos com suas determinadas características, em contraste com a metodologia tradicional, que é somente a ordem com os quantis. Foi aplicada essa ferramenta em 213 ações negociadas na Bolsa de São Paulo (Bovespa, separando os grupos por tamanho e book-tomarket. Depois, as novas carteiras foram aplicadas no modelo de Fama e French (1996, comparando os resultados numa formação de carteira para quantil e análise de cluster. Foram encontrados melhores resultados na segunda metodologia. Os autores concluem que a análise de cluster pode ser mais adequada porque tende a formar grupos mais homogeneizados, sendo sua aplicação útil para a formação de carteiras e para a teoria financeira.

  9. Using Cluster Analysis for Data Mining in Educational Technology Research

    Science.gov (United States)

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

    2012-01-01

    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  10. A Survey of Popular R Packages for Cluster Analysis

    Science.gov (United States)

    Flynt, Abby; Dean, Nema

    2016-01-01

    Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans, and hclust functions; the mclust library; the poLCA…

  11. Using Cluster Analysis for Data Mining in Educational Technology Research

    Science.gov (United States)

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

    2012-01-01

    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  12. A Survey of Popular R Packages for Cluster Analysis

    Science.gov (United States)

    Flynt, Abby; Dean, Nema

    2016-01-01

    Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans, and hclust functions; the mclust library; the poLCA…

  13. Critical clusters in interdependent economic sectors. A data-driven spectral clustering analysis

    Science.gov (United States)

    Oliva, Gabriele; Setola, Roberto; Panzieri, Stefano

    2016-10-01

    In this paper we develop a data-driven hierarchical clustering methodology to group the economic sectors of a country in order to highlight strongly coupled groups that are weakly coupled with other groups. Specifically, we consider an input-output representation of the coupling among the sectors and we interpret the relation among sectors as a directed graph; then we recursively apply the spectral clustering methodology over the graph, without a priori information on the number of groups that have to be obtained. In order to do this, we resort to the eigengap criterion, where a suitable number of groups is selected automatically based on the intensity and structure of the coupling among the sectors. We validate the proposed methodology considering a case study for Italy, inspecting how the coupling among clusters and sectors changes from the year 1995 to 2011, showing that in the years the Italian structure underwent deep changes, becoming more and more interdependent, i.e., a large part of the economy has become tightly coupled.

  14. A Hierarchical Allometric Scaling Analysis of Chinese Cities: 1991-2014

    CERN Document Server

    Chen, Yanguang

    2016-01-01

    The law of allometric scaling based on Zipf distributions can be employed to research hierarchies of cities in a geographical region. However, the allometric patterns are easily influenced by random disturbance from the noises in observational data. In theory, both the allometric growth law and Zipf's law are equivalent to hierarchical scaling laws associated with fractal structure. In this paper, the scaling laws of hierarchies with cascade structure are used to study Chinese cities, and the method of R/S analysis is applied to analyzing the change of the allometric scaling exponents. The results show that the hierarchical scaling relations of Chinese cities became clearer and clearer from 1991 to 2014 year, the global allometric scaling exponent values fluctuated around 0.85, and the local scaling exponent approached to 0.85. The Hurst exponent of the allometric parameter change is greater than 1/2, indicating persistence and a long-term memory of urban evolution. The main conclusions can be reached as foll...

  15. Hierarchical Fragmentation and Jet-like Outflows in IRDC G28.34+0.06, a Growing Massive Protostar Cluster

    CERN Document Server

    Wang, Ke; Wu, Yuefang; Zhang, Huawei

    2011-01-01

    We present Submillimeter Array (SMA) \\lambda = 0.88mm observations of an infrared dark cloud (IRDC) G28.34+0.06. Located in the quiescent southern part of the G28.34 cloud, the region of interest is a massive ($>10^3$\\,\\msun) molecular clump P1 with a luminosity of $\\sim 10^3$ \\lsun, where our previous SMA observations at 1.3mm have revealed a string of five dust cores of 22-64 \\msun\\ along the 1 pc IR-dark filament. The cores are well aligned at a position angle of 48 degrees and regularly spaced at an average projected separation of 0.16 pc. The new high-resolution, high-sensitivity 0.88\\,mm image further resolves the five cores into ten compact condensations of 1.4-10.6 \\msun, with sizes a few thousands AU. The spatial structure at clump ($\\sim 1$ pc) and core ($\\sim 0.1$ pc) scales indicates a hierarchical fragmentation. While the clump fragmentation is consistent with a cylindrical collapse, the observed fragment masses are much larger than the expected thermal Jeans masses. All the cores are driving CO(...

  16. tclust: An R Package for a Trimming Approach to Cluster Analysis

    Directory of Open Access Journals (Sweden)

    2012-04-01

    Full Text Available Outlying data can heavily influence standard clustering methods. At the same time, clustering principles can be useful when robustifying statistical procedures. These two reasons motivate the development of feasible robust model-based clustering approaches. With this in mind, an R package for performing non-hierarchical robust clustering, called tclust, is presented here. Instead of trying to “fit” noisy data, a proportion α of the most outlying observations is trimmed. The tclust package efficiently handles different cluster scatter constraints. Graphical exploratory tools are also provided to help the user make sensible choices for the trimming proportion as well as the number of clusters to search for.

  17. Assessment of Differential Item Functioning in Health-Related Outcomes: A Simulation and Empirical Analysis with Hierarchical Polytomous Data

    Directory of Open Access Journals (Sweden)

    Zahra Sharafi

    2017-01-01

    Full Text Available Background. The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. Methods. The ordinal logistic regression (OLR and hierarchical ordinal logistic regression (HOLR were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™ 4.0 collected from 576 healthy school children were analyzed. Results. Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. Conclusions. The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed.

  18. Cluster analysis of the hot subdwarfs in the PG survey

    Science.gov (United States)

    Thejll, Peter; Charache, Darryl; Shipman, Harry L.

    1989-01-01

    Application of cluster analysis to the hot subdwarfs in the Palomar Green (PG) survey of faint blue high-Galactic-latitude objects is assessed, with emphasis on data noise and the number of clusters to subdivide the data into. The data used in the study are presented, and cluster analysis, using the CLUSTAN program, is applied to it. Distances are calculated using the Euclidean formula, and clustering is done by Ward's method. The results are discussed, and five groups representing natural divisions of the subdwarfs in the PG survey are presented.

  19. Numerical analysis on mechanical behaviors of hierarchical cellular structures with negative Poisson’s ratio

    Science.gov (United States)

    Li, Dong; Yin, Jianhua; Dong, Liang; Lakes, Roderic S.

    2017-02-01

    Two-dimensional hierarchical re-entrant honeycomb structures were designed and the mechanical behaviors of the structures were studied using a finite element method. Hierarchical re-entrant structure of order n (n ≥ 1) was constructed by replacing each vertex of a lower order (n - 1) hierarchical re-entrant structure with a smaller re-entrant hexagon with identical strut aspect ratio. The Poisson’s ratio and energy absorption capacity of re-entrant structures of different hierarchical orders were studied under different compression velocities. The results showed that the Poisson’s ratio of the first and second order hierarchical structures can reach -1.36 and -1.33 with appropriate aspect ratio, 13.8% and 12.1% lower than that of the zeroth order hierarchical structure. The energy absorption capacity of the three models increased with an increasing compression velocity; the second order hierarchical structure exhibited the highest rate of increase in energy absorption capacity with an increasing compression velocity. The plateau stresses of the first and second order hierarchical structures were slightly lower than that of the zeroth order hierarchical structure; however the second order hierarchical structure exhibited the highest energy absorption capacity at high compression velocity (60 m s-1).

  20. Micromechanics of hierarchical materials

    DEFF Research Database (Denmark)

    Mishnaevsky, Leon, Jr.

    2012-01-01

    A short overview of micromechanical models of hierarchical materials (hybrid composites, biomaterials, fractal materials, etc.) is given. Several examples of the modeling of strength and damage in hierarchical materials are summarized, among them, 3D FE model of hybrid composites...... with nanoengineered matrix, fiber bundle model of UD composites with hierarchically clustered fibers and 3D multilevel model of wood considered as a gradient, cellular material with layered composite cell walls. The main areas of research in micromechanics of hierarchical materials are identified, among them......, the investigations of the effects of load redistribution between reinforcing elements at different scale levels, of the possibilities to control different material properties and to ensure synergy of strengthening effects at different scale levels and using the nanoreinforcement effects. The main future directions...

  1. Investigating Subtypes of Child Development: A Comparison of Cluster Analysis and Latent Class Cluster Analysis in Typology Creation

    Science.gov (United States)

    DiStefano, Christine; Kamphaus, R. W.

    2006-01-01

    Two classification methods, latent class cluster analysis and cluster analysis, are used to identify groups of child behavioral adjustment underlying a sample of elementary school children aged 6 to 11 years. Behavioral rating information across 14 subscales was obtained from classroom teachers and used as input for analyses. Both the procedures…

  2. Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.

    Science.gov (United States)

    Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun

    2017-01-01

    Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.

  3. Applied Bayesian Hierarchical Methods

    CERN Document Server

    Congdon, Peter D

    2010-01-01

    Bayesian methods facilitate the analysis of complex models and data structures. Emphasizing data applications, alternative modeling specifications, and computer implementation, this book provides a practical overview of methods for Bayesian analysis of hierarchical models.

  4. Analysis of Stemming Algorithm for Text Clustering

    Directory of Open Access Journals (Sweden)

    N.Sandhya

    2011-09-01

    Full Text Available Text document clustering plays an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In Bag of words representation of documents the words that appear in documents often have many morphological variants and in most cases, morphological variants of words have similar semantic interpretations and can be considered as equivalent for the purpose of clustering applications. For this reason, a number of stemming Algorithms, or stemmers, have been developed, which attempt to reduce a word to its stem or root form. Thus, the key terms of a document are represented by stems rather than by the original words. In this work we have studied the impact of stemming algorithm along with four popular similarity measures (Euclidean, cosine, Pearson correlation and extended Jaccard in conjunction with different types of vector representation (boolean, term frequency and term frequency and inverse document frequency on cluster quality. For Clustering documents we have used partitional based clustering technique K Means. Performance is measured against a human-imposed classification of Classic data set. We conducted a number of experiments and used entropy measure to assure statistical significance of results. Cosine, Pearson correlation and extended Jaccard similarities emerge as the best measures to capture human categorization behavior, while Euclidean measures perform poor. After applying the Stemming algorithm Euclidean measure shows little improvement.

  5. Toward optimal cluster power spectrum analysis

    CERN Document Server

    Smith, Robert E

    2014-01-01

    The power spectrum of galaxy clusters is an important probe of the cosmological model. In this paper we determine the optimal weighting scheme for maximizing the signal-to-noise ratio for such measurements. We find a closed form analytic expression for the optimal weights. Our expression takes into account: cluster mass, finite survey volume effects, survey masking, and a flux limit. The implementation of this weighting scheme requires knowledge of the measured cluster masses, and analytic models for the bias and space-density of clusters as a function of mass and redshift. Recent studies have suggested that the optimal method for reconstruction of the matter density field from a set of clusters is mass-weighting (Seljak et al 2009, Hamaus et al 2010, Cai et al 2011). We compare our optimal weighting scheme with this approach and also with the original power spectrum scheme of Feldman et al (1994). We show that our optimal weighting scheme outperforms these approaches for both volume- and flux-limited cluster...

  6. Hierarchical Cluster-based Partial Least Squares Regression (HC-PLSR is an efficient tool for metamodelling of nonlinear dynamic models

    Directory of Open Access Journals (Sweden)

    Omholt Stig W

    2011-06-01

    Full Text Available Abstract Background Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs to variation in features of the trajectories of the state variables (outputs throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR, where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR and ordinary least squares (OLS regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Results Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback

  7. Segmentation Algorithm for Oil Spill SAR Images Based on Hierarchical Agglomerative Clustering%基于HAC的溢油SAR图像分割算法

    Institute of Scientific and Technical Information of China (English)

    苏腾飞; 孟俊敏; 张晰

    2013-01-01

    图像分割是SAR溢油检测中的关键步骤,但由于SAR影像中存在斑点噪声,使得一般的图像分割算法难以收到理想的效果,严重影响溢油检测的精度.发展一种基于凝聚层次聚类(Hierarchical Agglomerative Clustering,HAC)的溢油SAR图像分割算法.该算法利用多尺度分割的思想,能够有效保持SAR影像中溢油斑块的形状特征,并能减少细碎斑块的产生.利用2010年墨西哥湾的Envisat ASAR影像开展了溢油SAR图像分割实验,并将该算法和Canny边缘检测、OTSU阈值分割、FCM分割、水平集分割等方法进行了对比.结果显示,HAC方法可以有效减少细碎斑块的产生,有助于提高SAR溢油检测的精度.%Image segmentation is a crucial stage in the SAR oil spill detection.However,the common image segmentation algorithms can hardly achieve satisfactory results due to speckle noise in the SAR images,thus affecting seriously the accuracy of oil spill detection.For this reason,an image segmentation algorithm which is based on HAC (Hierarchical Agglomerative Clustering) is developed for the oil spill SAR images.This method takes advantage of multi-resolution segmentation to maintain effectively the shape property of oil spill patches,and can reduce the formation of small patches.By using Envisat ASAR images of the Gulf of Mexico obtained in 2010,an experiment of SAR oil spill image segmentation has been conducted.Comparing with other approaches such as Canny,OTSU,FCM and Levelset,the results show that HAC can effectively reduce the producing of small patches,which is helpful to improve the accuracy of SAR oil spill detection.

  8. Implementation of Hierarchical Task Analysis for User Interface Design in Drawing Application for Early Childhood Education

    Directory of Open Access Journals (Sweden)

    Mira Kania Sabariah

    2016-05-01

    Full Text Available Draw learning in early childhood is an important lesson and full of stimulation of the process of growth and development of children which could help to train the fine motor skills. We have had a lot of applications that can be used to perform learning, including interactive learning applications. Referring to the observations that have been conducted showed that the experiences given by the applications that exist today are very diverse and have not been able to represent the model of learning and characteristics of early childhood (4-6 years. Based on the results, Hierarchical Task Analysis method generated a list of tasks that must be done in designing an user interface that represents the user experience in draw learning. Then by using the Heuristic Evaluation method the usability of the model has fulfilled a very good level of understanding and also it can be enhanced and produce a better model.

  9. Associations among attachment, sexuality, and marital satisfaction in adult Chilean couples: a linear hierarchical models analysis.

    Science.gov (United States)

    Heresi Milad, Eliana; Rivera Ottenberger, Diana; Huepe Artigas, David

    2014-01-01

    This study aimed to explore the associations among attachment system type, sexual satisfaction, and marital satisfaction in adult couples in stable relationships. Participants were 294 couples between the ages of 20 and 70 years who answered self-administered questionnaires. Hierarchical linear modeling revealed that the anxiety and avoidance, sexual satisfaction, and marital satisfaction dimensions were closely related. Specifically, the avoidance dimension, but not the anxiety dimension, corresponded to lower levels of sexual and marital satisfaction. Moreover, for the sexual satisfaction variable, an interaction effect was observed between the gender of the actor and avoidance of the partner, which was observed only in men. In the marital satisfaction dimension, effects were apparent only at the individual level; a positive relation was found between the number of years spent living together and greater contentment with the relationship. These results confirm the hypothetical association between attachment and sexual and marital satisfaction and demonstrate the relevance of methodologies when the unit of analysis is the couple.

  10. Automatic Contrast Enhancement of Brain MR Images Using Hierarchical Correlation Histogram Analysis.

    Science.gov (United States)

    Chen, Chiao-Min; Chen, Chih-Cheng; Wu, Ming-Chi; Horng, Gwoboa; Wu, Hsien-Chu; Hsueh, Shih-Hua; Ho, His-Yun

    Parkinson's disease is a progressive neurodegenerative disorder that has a higher probability of occurrence in middle-aged and older adults than in the young. With the use of a computer-aided diagnosis (CAD) system, abnormal cell regions can be identified, and this identification can help medical personnel to evaluate the chance of disease. This study proposes a hierarchical correlation histogram analysis based on the grayscale distribution degree of pixel intensity by constructing a correlation histogram, that can improves the adaptive contrast enhancement for specific objects. The proposed method produces significant results during contrast enhancement preprocessing and facilitates subsequent CAD processes, thereby reducing recognition time and improving accuracy. The experimental results show that the proposed method is superior to existing methods by using two estimation image quantitative methods of PSNR and average gradient values. Furthermore, the edge information pertaining to specific cells can effectively increase the accuracy of the results.

  11. The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

    Science.gov (United States)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-07-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  12. The smart cluster method - Adaptive earthquake cluster identification and analysis in strong seismic regions

    Science.gov (United States)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-03-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  13. Data Clustering

    Science.gov (United States)

    Wagstaff, Kiri L.

    2012-03-01

    particular application involves considerations of the kind of data being analyzed, algorithm runtime efficiency, and how much prior knowledge is available about the problem domain, which can dictate the nature of clusters sought. Fundamentally, the clustering method and its representations of clusters carries with it a definition of what a cluster is, and it is important that this be aligned with the analysis goals for the problem at hand. In this chapter, I emphasize this point by identifying for each algorithm the cluster representation as a model, m_j , even for algorithms that are not typically thought of as creating a “model.” This chapter surveys a basic collection of clustering methods useful to any practitioner who is interested in applying clustering to a new data set. The algorithms include k-means (Section 25.2), EM (Section 25.3), agglomerative (Section 25.4), and spectral (Section 25.5) clustering, with side mentions of variants such as kernel k-means and divisive clustering. The chapter also discusses each algorithm’s strengths and limitations and provides pointers to additional in-depth reading for each subject. Section 25.6 discusses methods for incorporating domain knowledge into the clustering process. This chapter concludes with a brief survey of interesting applications of clustering methods to astronomy data (Section 25.7). The chapter begins with k-means because it is both generally accessible and so widely used that understanding it can be considered a necessary prerequisite for further work in the field. EM can be viewed as a more sophisticated version of k-means that uses a generative model for each cluster and probabilistic item assignments. Agglomerative clustering is the most basic form of hierarchical clustering and provides a basis for further exploration of algorithms in that vein. Spectral clustering permits a departure from feature-vector-based clustering and can operate on data sets instead represented as affinity, or similarity

  14. Intelligent Hybrid Cluster Based Classification Algorithm for Social Network Analysis

    Directory of Open Access Journals (Sweden)

    S. Muthurajkumar

    2014-05-01

    Full Text Available In this paper, we propose an hybrid clustering based classification algorithm based on mean approach to effectively classify to mine the ordered sequences (paths from weblog data in order to perform social network analysis. In the system proposed in this work for social pattern analysis, the sequences of human activities are typically analyzed by switching behaviors, which are likely to produce overlapping clusters. In this proposed system, a robust Modified Boosting algorithm is proposed to hybrid clustering based classification for clustering the data. This work is useful to provide connection between the aggregated features from the network data and traditional indices used in social network analysis. Experimental results show that the proposed algorithm improves the decision results from data clustering when combined with the proposed classification algorithm and hence it is proved that of provides better classification accuracy when tested with Weblog dataset. In addition, this algorithm improves the predictive performance especially for multiclass datasets which can increases the accuracy.

  15. Hierarchical statistical analysis of complex analog and mixed-signal systems

    Science.gov (United States)

    Webb, Matthew; Tang, Hua

    2014-12-01

    With increasing process parameter variations in nanometre regime, circuits and systems encounter significant performance variations and therefore statistical analysis has become increasingly important. For complex analog and mixed-signal circuits and systems, efficient yet accurate statistical analysis has been a challenge mainly due to significant simulation and modelling time. In the past years, there have been various approaches proposed for statistical analysis of analog and mixed-signal circuits. A recent work is reported to address statistical analysis for continuous-time Delta-Sigma modulators. In this article, we generalise that method and present a hierarchical method for efficient statistical analysis of complex analog and mixed-signal circuits while maintaining reasonable accuracy. At circuit level, we use the response surface modelling method to extract quadratic models of circuit-level performance parameters in terms of process parameters. Then at system level, we use behavioural models and apply the Monte-Carlo method for statistical evaluation of system performance parameters. We illustrate and validate the method on a continuous-time Delta-Sigma modulator and an analog filter.

  16. Bayesian model-based cluster analysis for predicting macrofaunal communities

    NARCIS (Netherlands)

    Braak, ter C.J.F.; Hoijtink, H.; Akkermans, W.; Verdonschot, P.F.M.

    2003-01-01

    To predict macrofaunal community composition from environmental data a two-step approach is often followed: (1) the water samples are clustered into groups on the basis of the macrofauna data and (2) the groups are related to the environmental data, e.g. by discriminant analysis. For the cluster ana

  17. Multimorbidity Patterns in Elderly Primary Health Care Patients in a South Mediterranean European Region: A Cluster Analysis.

    Science.gov (United States)

    Foguet-Boreu, Quintí; Violán, Concepción; Rodriguez-Blanco, Teresa; Roso-Llorach, Albert; Pons-Vigués, Mariona; Pujol-Ribera, Enriqueta; Cossio Gil, Yolima; Valderas, Jose M

    2015-01-01

    The purpose of this study was to identify clusters of diagnoses in elderly patients with multimorbidity, attended in primary care. Cross-sectional study. 251 primary care centres in Catalonia, Spain. Individuals older than 64 years registered with participating practices. Multimorbidity, defined as the coexistence of 2 or more ICD-10 disease categories in the electronic health record. Using hierarchical cluster analysis, multimorbidity clusters were identified by sex and age group (65-79 and ≥80 years). 322,328 patients with multimorbidity were included in the analysis (mean age, 75.4 years [Standard deviation, SD: 7.4], 57.4% women; mean of 7.9 diagnoses [SD: 3.9]). For both men and women, the first cluster in both age groups included the same two diagnoses: Hypertensive diseases and Metabolic disorders. The second cluster contained three diagnoses of the musculoskeletal system in the 65- to 79-year-old group, and five diseases coincided in the ≥80 age group: varicose veins of the lower limbs, senile cataract, dorsalgia, functional intestinal disorders and shoulder lesions. The greatest overlap (54.5%) between the three most common diagnoses was observed in women aged 65-79 years. This cluster analysis of elderly primary care patients with multimorbidity, revealed a single cluster of circulatory-metabolic diseases that were the most prevalent in both age groups and sex, and a cluster of second-most prevalent diagnoses that included musculoskeletal diseases. Clusters unknown to date have been identified. The clusters identified should be considered when developing clinical guidance for this population.

  18. Cluster analysis of the factors influencing innovative development of economy in regions of Russian Federation

    Directory of Open Access Journals (Sweden)

    V. N. Yur’ev

    2017-01-01

    Full Text Available This article provides a statistical description aimed at identifying the factors, which influence on the innovative development in regions of the Russian Federation. Presented article refers to the results of previous research [1, p. 212–218]. On the first stage, there was given a terminology on the concepts of innovations and innovative development, as well as their role in the modern economy was stated. On the next stage, the factors, which may have an influence on the volume of innovative products, activities and services, were chosen. The results received from this article show the cluster analysis of the regions conducted according to three chosen methods. In the course of the research, data was collected from an official web page of Federal State Statistics Service in accordance to previously chosen factors, its’ analysis and conclusions were made, on the current step the cluster analysis was additionally conducted. To analyze the sample rates and to divide regions to the clusters we’ve used a fully integrated line of analytic solutions Statistica [2], for analyzing, visualizing and forecasting. As a result of a statistical analysis and Statistica use regions were divided into clusters according to the three methods: hierarchical classification, Kaverage method and two-input distribution. To make more detailed analysis, linear, power and exponential equations were built for each region. As a result there were drawn two tables: 1 with the Euclidian distances; 2 with the regression models and the meaningful factors. Thereby, regions were grouped. For each group conclusions and recommendations were given. The results of current research will be applicable for analysis and planning of different commercial and governmental market participants.

  19. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.

    Science.gov (United States)

    Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy

    2016-01-01

    Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.

  20. Accurate and Efficient Analysis of Printed Reflectarrays With Arbitrary Elements Using Higher-Order Hierarchical Legendre Basis Functions

    DEFF Research Database (Denmark)

    Zhou, Min; Jørgensen, Erik; Kim, Oleksiy S.;

    2012-01-01

    , thus providing the flexibility required in the analysis of printed reflectarrays. A comparison to DTU-ESA Facility measurements of a reference offset reflectarray shows that higher-order hierarchical Legendre basis functions produce results of the same accuracy as those obtained using singular basis...

  1. A method of spherical harmonic analysis in the geosciences via hierarchical Bayesian inference

    Science.gov (United States)

    Muir, J. B.; Tkalčić, H.

    2015-11-01

    The problem of decomposing irregular data on the sphere into a set of spherical harmonics is common in many fields of geosciences where it is necessary to build a quantitative understanding of a globally varying field. For example, in global seismology, a compressional or shear wave speed that emerges from tomographic images is used to interpret current state and composition of the mantle, and in geomagnetism, secular variation of magnetic field intensity measured at the surface is studied to better understand the changes in the Earth's core. Optimization methods are widely used for spherical harmonic analysis of irregular data, but they typically do not treat the dependence of the uncertainty estimates on the imposed regularization. This can cause significant difficulties in interpretation, especially when the best-fit model requires more variables as a result of underestimating data noise. Here, with the above limitations in mind, the problem of spherical harmonic expansion of irregular data is treated within the hierarchical Bayesian framework. The hierarchical approach significantly simplifies the problem by removing the need for regularization terms and user-supplied noise estimates. The use of the corrected Akaike Information Criterion for picking the optimal maximum degree of spherical harmonic expansion and the resulting spherical harmonic analyses are first illustrated on a noisy synthetic data set. Subsequently, the method is applied to two global data sets sensitive to the Earth's inner core and lowermost mantle, consisting of PKPab-df and PcP-P differential traveltime residuals relative to a spherically symmetric Earth model. The posterior probability distributions for each spherical harmonic coefficient are calculated via Markov Chain Monte Carlo sampling; the uncertainty obtained for the coefficients thus reflects the noise present in the real data and the imperfections in the spherical harmonic expansion.

  2. Entropic Approach to Multiscale Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Antonio Insolia

    2012-05-01

    Full Text Available Recently, a novel method has been introduced to estimate the statistical significance of clustering in the direction distribution of objects. The method involves a multiscale procedure, based on the Kullback–Leibler divergence and the Gumbel statistics of extreme values, providing high discrimination power, even in presence of strong background isotropic contamination. It is shown that the method is: (i semi-analytical, drastically reducing computation time; (ii very sensitive to small, medium and large scale clustering; (iii not biased against the null hypothesis. Applications to the physics of ultra-high energy cosmic rays, as a cosmological probe, are presented and discussed.

  3. Comprehensive Evaluation of Entropy-hierarchical Grey Correlation Analysis for Highway Safety Life Protection Engineering

    Directory of Open Access Journals (Sweden)

    Jin Shuxins

    2016-01-01

    Full Text Available Different highway safety life protection engineering decision-making have important meaning. The achieving goals and optimal highway safety life protection engineering scheme can not only improve the function of the highway facilities and service level, still can reduce the traffic accident, which caused by the imperfect highway facilities. Different highway safety life protection engineering decision-making is a multiple targets, multi-layers and multi-schemes system evaluation problem. With regard to lack of concrete data on multiple targets, multi-layers and multi-schemes system evaluation problem, make analytical hierarchy process combined with the entropy value analysis into the grey relational comprehensive evaluation method, and then get entropy-hierarchical grey correlation analysis method. This method is a qualitative and quantitative decision method, which combine comparison principle of analytic hierarchy process (AHP and the entropy principle of entropy value analysis method to determine the relative weight of various indexes between factors layer-by-layer. Then using grey relational analysis by low-layer to high-layer step by step in the possible scheme and referenced scheme. Finally, calculating the comprehensive correlation degree between the possible scheme and referenced scheme, the best plan which has maximum grey correlation degree can be selected.

  4. Avoiding progenitor bias: The structural and mass evolution of Brightest Group and Cluster Galaxies in Hierarchical models since z~1

    CERN Document Server

    Shankar, Francesco; Rettura, Alessandro; Bouillot, Vincent; Moreno, Jorge; Licitra, Rossella; Bernardi, Mariangela; Huertas-Company, Marc; Mei, Simona; Ascaso, Begoña; Sheth, Ravi; Delaye, Lauriane; Raichoor, Anand

    2015-01-01

    The mass and structural evolution of massive galaxies is one of the hottest topics in galaxy formation. This is because it may reveal invaluable insights into the still debated evolutionary processes governing the growth and assembly of spheroids. However, direct comparison between models and observations is usually prevented by the so-called "progenitor bias", i.e., new galaxies entering the observational selection at later epochs, thus eluding a precise study of how pre-existing galaxies actually evolve in size. To limit this effect, we here gather data on high-redshift brightest group and cluster galaxies, evolve their (mean) host halo masses down to z=0 along their main progenitors, and assign as their "descendants" local SDSS central galaxies matched in host halo mass. At face value, the comparison between high redshift and local data suggests a noticeable increase in stellar mass of a factor of >2 since z~1, and of >2.5 in mean effective radius. We then compare the inferred stellar mass and size growth ...

  5. Detection of Functional Change Using Cluster Trend Analysis in Glaucoma.

    Science.gov (United States)

    Gardiner, Stuart K; Mansberger, Steven L; Demirel, Shaban

    2017-05-01

    Global analyses using mean deviation (MD) assess visual field progression, but can miss localized changes. Pointwise analyses are more sensitive to localized progression, but more variable so require confirmation. This study assessed whether cluster trend analysis, averaging information across subsets of locations, could improve progression detection. A total of 133 test-retest eyes were tested 7 to 10 times. Rates of change and P values were calculated for possible re-orderings of these series to generate global analysis ("MD worsening faster than x dB/y with P cluster analyses ("n locations [or clusters] worsening faster than x dB/y with P cluster analysis criterion, and 4.1 years (95% CI, 4.0-4.5) for the best pointwise criterion. However, for pointwise analysis, only 38% of these changes were confirmed, compared with 61% for clusters and 76% for MD. The time until 25% of eyes showed subsequently confirmed deterioration was 6.3 years (95% CI, 6.0-7.2) for global, 6.3 years (95% CI, 6.0-7.0) for pointwise, and 6.0 years (95% CI, 5.3-6.6) for cluster analyses. Although the specificity is still suboptimal, cluster trend analysis detects subsequently confirmed deterioration sooner than either global or pointwise analyses.

  6. [On National Demonstration Areas: a cluster analysis].

    Science.gov (United States)

    Mao, F; Jiang, Y Y; Dong, W L; Ji, N; Dong, J Q

    2017-04-10

    Objective: To understand the 'backward' provinces and the relatively poor work among the construction of National Demonstration Area, so as to promote communication and future visions among different regions. Methods: Methods on Cluster analysis were used to compare the development of National Demonstration Area in different provinces, including the coverage of National Demonstration Area and the scores of non-communicable disease (NCDs) prevention and control work based on a standardized indicating system. Results: According to the results from the construction of National Demonstration Area, all the 29 provinces and the Xinjiang Production and Construction Corps (except Tibet and Qinghai) were classified into 6 categories: Shanghai; Beijing, Zhejiang, Chongqing; Tianjin, Shandong, Guangdong and Xinjiang Production and Construction Corps; Hebei, Fujian, Hubei, Jiangsu, Liaoning, Xinjiang, Hunan and Guangxi; Shanxi, Jilin, Henan, Hainan,Sichuan, Anhui and Jiangxi; Inner Mongolia, Shaanxi, Ningxia, Guizhou, Yunnan, Gansu and Heilongjiang. Based on the scores gathered from this study, 24 items that representing the achievements from the NCDs prevention and control endeavor were classified into 4 categories: Manpower, special day on NCD, information materials development, policy/strategy support, financial support, mass media, enabled environment, community fitness campaign, health promotion for children and teenage, institutional structure and patient self-management; healthy diet, risk factors on NCDs surveillance, tobacco control and community diagnosis; intervention of high-risk groups, identification of high-risk groups, reporting system on cardiovascular and cerebrovascular events, popularization of basic public health service, workplace intervention programs, construction of demonstration units and mortality surveillance; oral hygiene and tumor registration. Contents including oral hygiene, tumor registration, intervention on high-risk groups, identification of

  7. Design and Analysis Considerations for Cluster Randomized Controlled Trials That Have a Small Number of Clusters.

    Science.gov (United States)

    Deke, John

    2016-10-25

    Cluster randomized controlled trials (CRCTs) often require a large number of clusters in order to detect small effects with high probability. However, there are contexts where it may be possible to design a CRCT with a much smaller number of clusters (10 or fewer) and still detect meaningful effects. The objective is to offer recommendations for best practices in design and analysis for small CRCTs. I use simulations to examine alternative design and analysis approaches. Specifically, I examine (1) which analytic approaches control Type I errors at the desired rate, (2) which design and analytic approaches yield the most power, (3) what is the design effect of spurious correlations, and (4) examples of specific scenarios under which impacts of different sizes can be detected with high probability. I find that (1) mixed effects modeling and using Ordinary Least Squares (OLS) on data aggregated to the cluster level both control the Type I error rate, (2) randomization within blocks is always recommended, but how best to account for blocking through covariate adjustment depends on whether the precision gains offset the degrees of freedom loss, (3) power calculations can be accurate when design effects from small sample, spurious correlations are taken into account, and (4) it is very difficult to detect small effects with just four clusters, but with six or more clusters, there are realistic circumstances under which small effects can be detected with high probability. © The Author(s) 2016.

  8. Visual verification and analysis of cluster detection for molecular dynamics.

    Science.gov (United States)

    Grottel, Sebastian; Reina, Guido; Vrabec, Jadran; Ertl, Thomas

    2007-01-01

    A current research topic in molecular thermodynamics is the condensation of vapor to liquid and the investigation of this process at the molecular level. Condensation is found in many physical phenomena, e.g. the formation of atmospheric clouds or the processes inside steam turbines, where a detailed knowledge of the dynamics of condensation processes will help to optimize energy efficiency and avoid problems with droplets of macroscopic size. The key properties of these processes are the nucleation rate and the critical cluster size. For the calculation of these properties it is essential to make use of a meaningful definition of molecular clusters, which currently is a not completely resolved issue. In this paper a framework capable of interactively visualizing molecular datasets of such nucleation simulations is presented, with an emphasis on the detected molecular clusters. To check the quality of the results of the cluster detection, our framework introduces the concept of flow groups to highlight potential cluster evolution over time which is not detected by the employed algorithm. To confirm the findings of the visual analysis, we coupled the rendering view with a schematic view of the clusters' evolution. This allows to rapidly assess the quality of the molecular cluster detection algorithm and to identify locations in the simulation data in space as well as in time where the cluster detection fails. Thus, thermodynamics researchers can eliminate weaknesses in their cluster detection algorithms. Several examples for the effective and efficient usage of our tool are presented.

  9. A Flocking Based algorithm for Document Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.

  10. Differences in Pedaling Technique in Cycling: A Cluster Analysis.

    Science.gov (United States)

    Lanferdini, Fábio J; Bini, Rodrigo R; Figueiredo, Pedro; Diefenthaeler, Fernando; Mota, Carlos B; Arndt, Anton; Vaz, Marco A

    2016-10-01

    To employ cluster analysis to assess if cyclists would opt for different strategies in terms of neuromuscular patterns when pedaling at the power output of their second ventilatory threshold (POVT2) compared with cycling at their maximal power output (POMAX). Twenty athletes performed an incremental cycling test to determine their power output (POMAX and POVT2; first session), and pedal forces, muscle activation, muscle-tendon unit length, and vastus lateralis architecture (fascicle length, pennation angle, and muscle thickness) were recorded (second session) in POMAX and POVT2. Athletes were assigned to 2 clusters based on the behavior of outcome variables at POVT2 and POMAX using cluster analysis. Clusters 1 (n = 14) and 2 (n = 6) showed similar power output and oxygen uptake. Cluster 1 presented larger increases in pedal force and knee power than cluster 2, without differences for the index of effectiveness. Cluster 1 presented less variation in knee angle, muscle-tendon unit length, pennation angle, and tendon length than cluster 2. However, clusters 1 and 2 showed similar muscle thickness, fascicle length, and muscle activation. When cycling at POVT2 vs POMAX, cyclists could opt for keeping a constant knee power and pedal-force production, associated with an increase in tendon excursion and a constant fascicle length. Increases in power output lead to greater variations in knee angle, muscle-tendon unit length, tendon length, and pennation angle of vastus lateralis for a similar knee-extensor activation and smaller pedal-force changes in cyclists from cluster 2 than in cluster 1.

  11. Logistics Enterprise Evaluation Model Based On Fuzzy Clustering Analysis

    Science.gov (United States)

    Fu, Pei-hua; Yin, Hong-bo

    In this thesis, we introduced an evaluation model based on fuzzy cluster algorithm of logistics enterprises. First of all,we present the evaluation index system which contains basic information, management level, technical strength, transport capacity,informatization level, market competition and customer service. We decided the index weight according to the grades, and evaluated integrate ability of the logistics enterprises using fuzzy cluster analysis method. In this thesis, we introduced the system evaluation module and cluster analysis module in detail and described how we achieved these two modules. At last, we gave the result of the system.

  12. Cancer incidence in men: a cluster analysis of spatial patterns

    Directory of Open Access Journals (Sweden)

    D'Alò Daniela

    2008-11-01

    Full Text Available Abstract Background Spatial clustering of different diseases has received much less attention than single disease mapping. Besides chance or artifact, clustering of different cancers in a given area may depend on exposure to a shared risk factor or to multiple correlated factors (e.g. cigarette smoking and obesity in a deprived area. Models developed so far to investigate co-occurrence of diseases are not well-suited for analyzing many cancers simultaneously. In this paper we propose a simple two-step exploratory method for screening clusters of different cancers in a population. Methods Cancer incidence data were derived from the regional cancer registry of Umbria, Italy. A cluster analysis was performed on smoothed and non-smoothed standardized incidence ratios (SIRs of the 13 most frequent cancers in males. The Besag, York and Mollie model (BYM and Poisson kriging were used to produce smoothed SIRs. Results Cluster analysis on non-smoothed SIRs was poorly informative in terms of clustering of different cancers, as only larynx and oral cavity were grouped, and of characteristic patterns of cancer incidence in specific geographical areas. On the other hand BYM and Poisson kriging gave similar results, showing cancers of the oral cavity, larynx, esophagus, stomach and liver formed a main cluster. Lung and urinary bladder cancers clustered together but not with the cancers mentioned above. Both methods, particularly the BYM model, identified distinct geographic clusters of adjacent areas. Conclusion As in single disease mapping, non-smoothed SIRs do not provide reliable estimates of cancer risks because of small area variability. The BYM model produces smooth risk surfaces which, when entered into a cluster analysis, identify well-defined geographical clusters of adjacent areas. It probably enhances or amplifies the signal arising from exposure of more areas (statistical units to shared risk factors that are associated with different cancers. In

  13. Cluster Analysis and Significance of Novel Genes Related to Molecular Classification of Glioma

    Institute of Scientific and Technical Information of China (English)

    Juxiang Chen; Yicheng Lu; Guohan Hu; Kehua Sun; Chun Luo; Meiqing Lou; Kang Ying; Yao Li

    2005-01-01

    OBJECTIVE To screen differentially expressed genes in the development of human glioma and establish a primary molecular classification of glioma based on gene expression using cDNA microarrays.METHODS Brain specimens were obtained from 18 patients with glioma, 10males and 8 females, ages 14~62 with an average age of 44.4. The total RNAs of these glioma specimens and two specimens of donated brain of normal adults were extracted. BioStarH140S microarrays (including 8,347old genes and 5,592 novel genes) were adopted and hybridized with probes which were prepared from the total RNAs. Differentially expressed genes between normal tissues and glioma tissues were assayed after scanning cDNA microarrays with ScanArray4000. Northern hybridization and in situ hybridization (ISH) were used to identify functions of novel genes. Those differentially expressed genes were studied with a Hierarchical method and molecular classification of glioma was preliminary carried out.RESULTS Among the 13,939 target genes, there were 1,200 (8.61%)differentially expressed genes, of which 395 (2.83%) were novel genes. A total of 348 genes were up-regulated and 852 genes were down-regulated in the gliomas. The results of bioinformatical analysis, Northern hybridization and ISH revealed that those novel genes were highly associated with gliomas. There were multiple genes, such as the MAP gene、cytoskeleton & matrix motility genes, etc, which were of relevance to classification by the Hierarchical method. Molecular classification of glioma using a Hierarchical cluster was in accordance with pathology and suggested a molecular process of tumorigenesis and development.CONCLUSION Multiple genes play important roles in development of glioma. cDNA microarray technology is a powerful technique in screening for differentially expressed genes between two different kinds of tissues. Further analysis of gene expression and novel genes would be helpful to understand the molecular mechanism of glioma

  14. Assessment of cluster yield components by image analysis.

    Science.gov (United States)

    Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose

    2015-04-01

    Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.

  15. Community detection algorithm based on hierarchical clustering under signal missing in propagating process%传播过程中信号缺失的层次聚类社区发现算法

    Institute of Scientific and Technical Information of China (English)

    康茜; 李德玉; 王素格; 冀庆斌

    2015-01-01

    社区发现是社会网络分析的一个基本任务,而社区结构探测是社区发现的一个关键问题。将社区结构中的结点看作信号源,针对信号传递过程中存在信号缺失情况,提出了一种层次聚类社区发现算法。该算法通过度中心性来度量节点接收信号的概率,用于量化节点接受信号过程中的缺失值。经过信号传递,使网络的拓扑结构转化为向量间的几何关系,在此基础上,使用层次聚类算法用于发现社区。为了验证SMHC算法的有效性,通过在三个数据集上与SHC算法、CNM算法、GN算法、Similar算法进行比较,实验结果表明,SMHC算法在一定程度上提高了社区发现的正确率。%Community identification is a basic task of social network analysis, meanwhile the community structure detec-tion is a key problem of community identification. Each node in the community structure is regarded as the signal source. A hierarchical clustering community algorithm is proposed in order to settle the problem of signal missing in the process of signal transmission. The algorithm measures the probability of receiving signals of nodes by degree centrality to quantify the signal missing values. After the signal transmission, the topology of the network is transformed into geometric relation-ships among the vectors. On the basis, the hierarchical clustering algorithm is used to find the community structure. In order to validate the proposed method, this paper compares it with SHC algorithm, CNM algorithm, GN algorithm and Similar algorithm. Under three real networks, the Zachary Club, American Football and Netscience, the experimental results indi-cate that SMHC algorithm can effectively improve precision.

  16. CLUSTERING ANALYSIS OF DEBRIS-FLOW STREAMS

    Institute of Scientific and Technical Information of China (English)

    Yuan-Fan TSAI; Huai-Kuang TSAI; Cheng-Yan KAO

    2004-01-01

    The Chi-Chi earthquake in 1999 caused disastrous landslides, which triggered numerous debris flows and killed hundreds of people. A critical rainfall intensity line for each debris-flow stream is studied to prevent such a disaster. However, setting rainfall lines from incomplete data is difficult, so this study considered eight critical factors to group streams, such that streams within a cluster have similar rainfall lines. A genetic algorithm is applied to group 377 debris-flow streams selected from the center of an area affected by the Chi-Chi earthquake. These streams are grouped into seven clusters with different characteristics. The results reveal that the proposed method effectively groups debris-flow streams.

  17. Combined cluster and discriminant analysis: An efficient chemometric approach in diesel fuel characterization.

    Science.gov (United States)

    Novák, Márton; Palya, Dóra; Bodai, Zsolt; Nyiri, Zoltán; Magyar, Norbert; Kovács, József; Eke, Zsuzsanna

    2017-01-01

    Combined cluster and discriminant analysis (CCDA) as a chemometric tool in compound specific isotope analysis of diesel fuels was studied. The stable carbon isotope ratios (δ(13)C) of n-alkanes in diesel fuel can be used to characterize or differentiate diesels originating from different sources. We investigated 25 diesel fuel samples representing 20 different brands. The samples were collected from 25 different service stations in 11 European countries over a 2 year period. The n-alkane fraction of diesel fuels was separated using solid-state urea clathrate formation combined with silica gel fractionation. The stable carbon isotope ratios of C10-C24 n-alkanes were measured with gas chromatography-isotope ratio mass spectrometry (GC-IRMS) using perdeuterated n-alkanes as internal standards. Beside the 25 samples one additional diesel fuel was prepared and measured three times to get totally homogenous samples in order to test the performance of our analytical and statistical routine. Stable isotope ratio data were evaluated with hierarchical cluster analysis (HCA), principal component analysis (PCA) and CCDA. CCDA combines two multivariate data analysis methods hierarchical cluster analysis with linear discriminant analysis (LDA). The main idea behind CCDA is to compare the goodness of preconceived (based on the sample origins) and random groupings. In CCDA all the samples were compared pairwise. The results for the parallel sample preparations showed that the analytical procedure does not have any significant effect on the δ(13)C values of n-alkanes. The three parallels proved to be totally homogenous with CCDA. HCA and PCA can be useful tools when the examining of the relationship among several samples is in question. However, these two techniques cannot be always decisive on the origin of similar samples. The initial hypothesis that all diesel fuel samples are considered chemically unique was verified by CCDA. The main advantage of CCDA is that it gives an

  18. Detection of early glaucomatous progression with octopus cluster trend analysis.

    Science.gov (United States)

    Naghizadeh, Farzaneh; Holló, Gábor

    2014-01-01

    To compare the ability of Corrected Cluster Trend Analysis (CCTA) and Cluster Trend Analysis (CTA) with event analysis of Octopus visual field series to detect early glaucomatous progression. One eye of 15 healthy, 19 ocular hypertensive, 20 preperimetric, and 51 perimetric glaucoma (PG) patients were investigated with Octopus normal G2 test at 6-month intervals for 1.5 to 3 years. Progression was defined with significant worsening in any of the 10 Octopus clusters with CCTA, and event analysis criteria, respectively. With event analysis, 9 PG eyes showed localized progression and 1 diffuse mean defect (MD) worsening. With CCTA, progression was indicated in 1 normal, 1 ocular hypertensive, and 1 preperimetric glaucoma eyes due to vitreous floaters, and 28 PG eyes including all 9 eyes with localized progression with event analysis. The locations of CCTA progression matched those found with event analysis in all 9 cases. In 17 of the remaining 19 eyes, progressing clusters matched the locations that were suspicious but not definitive for progression with event analysis. In the eye with diffuse MD worsening, CTA found significant progression for 7 clusters. For global MD progression rate, eyes worsened with CCTA only did not differ from the stable eyes but had significantly smaller progression rates than the eyes progressed with event analysis (P=0.0002). In PG, Octopus CCTA and CTA are clinically useful to identify early progression and areas suspicious for early progression. However, in some eyes with no glaucomatous visual field damage, vitreous floaters may cause progression artifacts.

  19. Cluster Analysis of Gene Expression Data

    CERN Document Server

    Domany, E

    2002-01-01

    The expression levels of many thousands of genes can be measured simultaneously by DNA microarrays (chips). This novel experimental tool has revolutionized research in molecular biology and generated considerable excitement. A typical experiment uses a few tens of such chips, each dedicated to a single sample - such as tissue extracted from a particular tumor. The results of such an experiment contain several hundred thousand numbers, that come in the form of a table, of several thousand rows (one for each gene) and 50 - 100 columns (one for each sample). We developed a clustering methodology to mine such data. In this review I provide a very basic introduction to the subject, aimed at a physics audience with no prior knowledge of either gene expression or clustering methods. I explain what genes are, what is gene expression and how it is measured by DNA chips. Next I explain what is meant by "clustering" and how we analyze the massive amounts of data from such experiments, and present results obtained from a...

  20. Generic Approach for Hierarchical Modulation Performance Analysis: Application to DVB-SH

    CERN Document Server

    Méric, Hugo; Amiot-Bazile, Caroline; Arnal, Fabrice; Boucheret, Marie-Laure

    2011-01-01

    Broadcasting systems have to deal with channel diversity in order to offer the best rate to the users. Hierarchical modulation is a practical solution to provide several rates in function of the channel quality. Unfortunately the performance evaluation of such modulations requires time consuming simulations. We propose in this paper a novel approach based on the channel capacity to avoid these simulations. The method allows to study the performance in terms of spectrum efficiency of hierarchical and also classical modulations combined with error correcting codes. Our method will be applied to the DVB-SH standard which considers hierarchical modulation as an optional feature.

  1. Time-domain analysis of neural tracking of hierarchical linguistic structures.

    Science.gov (United States)

    Zhang, Wen; Ding, Nai

    2017-02-01

    When listening to continuous speech, cortical activity measured by MEG concurrently follows the rhythms of multiple linguistic structures, e.g., syllables, phrases, and sentences. This phenomenon was previously characterized in the frequency domain. Here, we investigate the waveform of neural activity tracking linguistic structures in the time domain and quantify the coherence of neural response phases over subjects listening to the same stimulus. These analyses are achieved by decomposing the multi-channel MEG recordings into components that maximize the correlation between neural response waveforms across listeners. Each MEG component can be viewed as the recording from a virtual sensor that is spatially tuned to a cortical network showing coherent neural activity over subjects. This analysis reveals information not available from previous frequency-domain analysis of MEG global field power: First, concurrent neural tracking of hierarchical linguistic structures emerges at the beginning of the stimulus, rather than slowly building up after repetitions of the same sentential structure. Second, neural tracking of the sentential structure is reflected by slow neural fluctuations, rather than, e.g., a series of short-lasting transient responses at sentential boundaries. Lastly and most importantly, it shows that the MEG responses tracking the syllabic rhythm are spatially separable from the MEG responses tracking the sentential and phrasal rhythms.

  2. Improving water quality assessments through a hierarchical Bayesian analysis of variability.

    Science.gov (United States)

    Gronewold, Andrew D; Borsuk, Mark E

    2010-10-15

    Water quality measurement error and variability, while well-documented in laboratory-scale studies, is rarely acknowledged or explicitly resolved in most model-based water body assessments, including those conducted in compliance with the United States Environmental Protection Agency (USEPA) Total Maximum Daily Load (TMDL) program. Consequently, proposed pollutant loading reductions in TMDLs and similar water quality management programs may be biased, resulting in either slower-than-expected rates of water quality restoration and designated use reinstatement or, in some cases, overly conservative management decisions. To address this problem, we present a hierarchical Bayesian approach for relating actual in situ or model-predicted pollutant concentrations to multiple sampling and analysis procedures, each with distinct sources of variability. We apply this method to recently approved TMDLs to investigate whether appropriate accounting for measurement error and variability will lead to different management decisions. We find that required pollutant loading reductions may in fact vary depending not only on how measurement variability is addressed but also on which water quality analysis procedure is used to assess standard compliance. As a general strategy, our Bayesian approach to quantifying variability may represent an alternative to the common practice of addressing all forms of uncertainty through an arbitrary margin of safety (MOS).

  3. Comparative analysis of genomic signal processing for microarray data clustering.

    Science.gov (United States)

    Istepanian, Robert S H; Sungoor, Ala; Nebel, Jean-Christophe

    2011-12-01

    Genomic signal processing is a new area of research that combines advanced digital signal processing methodologies for enhanced genetic data analysis. It has many promising applications in bioinformatics and next generation of healthcare systems, in particular, in the field of microarray data clustering. In this paper we present a comparative performance analysis of enhanced digital spectral analysis methods for robust clustering of gene expression across multiple microarray data samples. Three digital signal processing methods: linear predictive coding, wavelet decomposition, and fractal dimension are studied to provide a comparative evaluation of the clustering performance of these methods on several microarray datasets. The results of this study show that the fractal approach provides the best clustering accuracy compared to other digital signal processing and well known statistical methods.

  4. Using cluster analysis to organize and explore regional GPS velocities

    Science.gov (United States)

    Simpson, Robert W.; Thatcher, Wayne; Savage, James C.

    2012-01-01

    Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.

  5. Discovering hierarchical structure in normal relational data

    DEFF Research Database (Denmark)

    Schmidt, Mikkel Nørgaard; Herlau, Tue; Mørup, Morten

    2014-01-01

    Hierarchical clustering is a widely used tool for structuring and visualizing complex data using similarity. Traditionally, hierarchical clustering is based on local heuristics that do not explicitly provide assessment of the statistical saliency of the extracted hierarchy. We propose a non-param...

  6. Cluster Analysis of vents in monogenetic volcanic fields, Lunar Crater Volcanic Field (Nevada)

    Science.gov (United States)

    Tadini, A.; Cortes, J. A.; Valentine, G. A.; Johnson, P. J.; Tibaldi, A.; Bonali, F. L.

    2012-12-01

    Monogenetic volcanic fields pose a serious risk to human activities and settlements due to their high occurrence around the world and because of the type of eruptive activity that they exhibit. The need of adequate tools to better undertake volcanic hazard assessment for volcanic fields, especially from a spatial point of view, is of key importance at the time of mitigate such hazard. Among these tools, a better understanding of the spatial distribution of cones and vents and any structural/tectonical relationship are essential to understand the plumbing system of the field and thus help to predict the likelihood location of future eruptions. In this study we have developed a spatial methodology, which is the combination of various methodologies developed for volcanic textures and other clustering goals [1,2], to study the clustering of volcanic vents and their relation with structural features from satellite images. The methodology first involves the statistical identification and removal of spatial outliers using a predictive elliptical area [2] and the generation of randomly distributed points in the same predictive area. A comparison of the Near Neighbor Distance (NND) between the generated data and the data measured in a volcanic field is used to determine whether the vents are clustered or not. If the vents are clustered, a combination of hierarchical clustering and K-means [3] is then used to identify the clusters and their related vents. Results are then further constrained with the study of lineaments and other structural features that can be affected and related with the clusters. The methodology was tested in the Lunar Crater Volcanic Field, Nevada (USA) and successfully has helped to identify tectonically controlled lineaments from those that are resultant of geomorphological processes such the drainage control imposed by the cone clusters. Theoretical approaches has been developed before to constrain the plumbing of a volcanic field [4], however these

  7. Assessment of anaesthetic depth by clustering analysis and autoregressive modelling of electroencephalograms

    DEFF Research Database (Denmark)

    Thomsen, C E; Rosenfalck, A; Nørregaard Christensen, K

    1991-01-01

    . The method applied autoregressive modelling of the signal, segmented in 2 s fixed intervals. The features from the EEG segments were used for learning and for classification. The learning process was unsupervised and hierarchical clustering analysis was used to construct a learning set of EEG amplitude......-frequency patterns for each of the three anaesthetic drugs. These EEG patterns were assigned to a colour code corresponding to similar clinical states. A common learning set could be used for all patients anaesthetized with the same drug. The classification process could be performed on-line and the results were......The brain activity electroencephalogram (EEG) was recorded from 30 healthy women scheduled for hysterectomy. The patients were anaesthetized with isoflurane, halothane or etomidate/fentanyl. A multiparametric method was used for extraction of amplitude and frequency information from the EEG...

  8. Discovery of Overlapping and Hierarchical Communities Based on Extended Link Cluster Sequence%基于增广边簇序列的重叠层次社区发现

    Institute of Scientific and Technical Information of China (English)

    郭红; 黄佳鑫; 郭昆

    2015-01-01

    The mining and discovery of overlapping and hierarchical communities is a hot topic in the area of social network research. Firstly, an algorithm, discovery of link conmunities based on extended link cluster sequence ( DLC ECS) , is proposed to detect overlapping and hierarchical communities in social networks efficiently. Based on the extended link cluster sequence corresponding to community structures with various densities, the optimal link community is detected after searching for the global optimal density. The link communities are transformed into the node communities, and thus the overlapping communities can be found out. Then, hierarchical link communities extraction based on extended link cluster sequence ( HLCE ECS ) is designed. Hierarchical link communities from the extended link cluster sequence is found by the proposed algorithm. The link communities are transformed into the node communities to find out the overlapping and hierarchical communities. Experimental results on are artificial and real-world datasets demonstrate that DLC ECS algorithm significantly improves the community quality and HLCE ECS algorithm effectively discovers meaningful hierarchical communities.%高质量重叠层次社区的挖掘和发现已成为社会网络研究热点,为更有效地发现社会网络中具有重叠层次性的社区结构,提出基于增广边簇序列的边社区发现算法( DLC ECS)。在产生包含所有可能密度参数对应的社区结构的增广边簇序列的基础上,找出全局最优的密度参数,发现全局最优的边社区结构,将识别的边社区结构转化为节点社区结构,发现具有重叠结构的社区。在该序列的基础上,提出层次边社区提取算法( HLCE ECS),快速发现序列中的层次边社区结构,将识别的边社区结构转化为节点社区结构,发现同时具有重叠和层次结构的社区。在真实数据集和人工数据集上的实验表明,DLC ECS具有

  9. Fabrication of micro/nano hierarchical structures with analysis on the surface mechanics

    Science.gov (United States)

    Jheng, Yu-Sheng; Lee, Yeeu-Chang

    2016-10-01

    Biomimicry refers to the imitation of mechanisms and features found in living creatures using artificial methods. This study used optical lithography, colloidal lithography, and dry etching to mimic the micro/nano hierarchical structures covering the soles of gecko feet. We measured the static contact angle and contact angle hysteresis to reveal the behavior of liquid drops on the hierarchical structures. Pulling tests were also performed to measure the resistance of movement between the hierarchical structures and a testing plate. Our results reveal that hierarchical structures at the micro-/nano-scale are considerably hydrophobic, they provide good flow characteristics, and they generate more contact force than do surfaces with micro-scale cylindrical structures.

  10. Hierarchical modeling and inference in ecology: the analysis of data from populations, metapopulations and communities

    National Research Council Canada - National Science Library

    Royle, J. Andrew; Dorazio, Robert M

    2008-01-01

    "This book describes a general and flexible framework for modeling and inference in ecological systems based on hierarchical modeling in which a strict focus on probability models and parametric inference is adopted...

  11. Nonresident Undergraduates' Performance in English Writing Classes-Hierarchical Linear Modeling Analysis

    National Research Council Canada - National Science Library

    Allison A Vaughn; Matthew Bergman; Barry Fass-Holmes

    2015-01-01

    ...) in the fall term of the five most recent academic years. Hierarchical linear modeling analyses showed that the predictors with the largest effect sizes were English writing programs and class level...

  12. A Distributed Flocking Approach for Information Stream Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Intelligence analysts are currently overwhelmed with the amount of information streams generated everyday. There is a lack of comprehensive tool that can real-time analyze the information streams. Document clustering analysis plays an important role in improving the accuracy of information retrieval. However, most clustering technologies can only be applied for analyzing the static document collection because they normally require a large amount of computation resource and long time to get accurate result. It is very difficult to cluster a dynamic changed text information streams on an individual computer. Our early research has resulted in a dynamic reactive flock clustering algorithm which can continually refine the clustering result and quickly react to the change of document contents. This character makes the algorithm suitable for cluster analyzing dynamic changed document information, such as text information stream. Because of the decentralized character of this algorithm, a distributed approach is a very natural way to increase the clustering speed of the algorithm. In this paper, we present a distributed multi-agent flocking approach for the text information stream clustering and discuss the decentralized architectures and communication schemes for load balance and status information synchronization in this approach.

  13. Modeling the deformation behavior of nanocrystalline alloy with hierarchical microstructures

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Hongxi; Zhou, Jianqiu, E-mail: zhouj@njtech.edu.cn [Nanjing Tech University, Department of Mechanical Engineering (China); Zhao, Yonghao, E-mail: yhzhao@njust.edu.cn [Nanjing University of Science and Technology, Nanostructural Materials Research Center, School of Materials Science and Engineering (China)

    2016-02-15

    A mechanism-based plasticity model based on dislocation theory is developed to describe the mechanical behavior of the hierarchical nanocrystalline alloys. The stress–strain relationship is derived by invoking the impeding effect of the intra-granular solute clusters and the inter-granular nanostructures on the dislocation movements along the sliding path. We found that the interaction between dislocations and the hierarchical microstructures contributes to the strain hardening property and greatly influence the ductility of nanocrystalline metals. The analysis indicates that the proposed model can successfully describe the enhanced strength of the nanocrystalline hierarchical alloy. Moreover, the strain hardening rate is sensitive to the volume fraction of the hierarchical microstructures. The present model provides a new perspective to design the microstructures for optimizing the mechanical properties in nanostructural metals.

  14. Extreme Rainfall Analysis using Bayesian Hierarchical Modeling in the Willamette River Basin, Oregon

    Science.gov (United States)

    Love, C. A.; Skahill, B. E.; AghaKouchak, A.; Karlovits, G. S.; England, J. F.; Duren, A. M.

    2016-12-01

    We present preliminary results of ongoing research directed at evaluating the worth of including various covariate data to support extreme rainfall analysis in the Willamette River basin using Bayesian hierarchical modeling (BHM). We also compare the BHM derived extreme rainfall estimates with their respective counterparts obtained from a traditional regional frequency analysis (RFA) using the same set of rain gage extreme rainfall data. The U.S. Army Corps of Engineers (USACE) Portland District operates thirteen dams in the 11,478 square mile Willamette River basin (WRB) located in northwestern Oregon, a major tributary of the Columbia River whose 187 miles long main stem, the Willamette River, flows northward between the Coastal and Cascade Ranges. The WRB contains approximately two-thirds of Oregon's population and 20 of the 25 most populous cities in the state. Extreme rainfall estimates are required to support risk-informed hydrologic analyses for these projects as part of the USACE Dam Safety Program. We analyze daily annual rainfall maxima data for the WRB utilizing the spatial BHM R package "spatial.gev.bma", which has been shown to be efficient in developing coherent maps of extreme rainfall by return level. Our intent is to profile for the USACE an alternate methodology to a RFA which was developed in 2008 due to the lack of an official NOAA Atlas 14 update for the state of Oregon. Unlike RFA, the advantage of a BHM-based analysis of hydrometeorological extremes is its ability to account for non-stationarity while providing robust estimates of uncertainty. BHM also allows for the inclusion of geographical and climatological factors which we show for the WRB influence regional rainfall extremes. Moreover, the Bayesian framework permits one to combine additional data types into the analysis; for example, information derived via elicitation and causal information expansion data, both being additional opportunities for future related research.

  15. Hierarchical sliding mode control for under-actuated cranes design, analysis and simulation

    CERN Document Server

    Qian, Dianwei

    2015-01-01

    This book reports on the latest developments in sliding mode overhead crane control, presenting novel research ideas and findings on sliding mode control (SMC), hierarchical SMC and compensator design-based hierarchical sliding mode. The results, which were previously scattered across various journals and conference proceedings, are now presented in a systematic and unified form. The book will be of interest to researchers, engineers and graduate students in control engineering and mechanical engineering who want to learn the methods and applications of SMC.

  16. NOVEL CONTEXT-AWARE CLUSTERING WITH HIERARCHICAL ADDRESSING (CCHA) FOR THE INTERNET OF THINGS (IoT)

    DEFF Research Database (Denmark)

    Mahalle, Parikshit N.; Prasad, Neeli R.; Prasad, Ramjee

    2013-01-01

    -hoc network, interaction between these nomadic devices to provide seamless service extend the need of new identi-ties to the things, addressing and IdM in the IoT. New identities and identifier format to alleviate the perfor-mance issue is introduced in this paper. This paper pre-sents novel Context....... Furthermore, this paper also presents the framework for IdM in the IoT and mathematical model for queuing analysis.......As computing technology becomes more tightly coupled into dynamic and mobile world of the Internet of Things (IoT), security mechanism becomes more stringent, less flexible and intrusive. Scalability issue in the IoT makes Identity Management (IdM) of ubiquitous things more challenging. Forming ad...

  17. A comparison of visual search strategies of elite and non-elite tennis players through cluster analysis.

    Science.gov (United States)

    Murray, Nicholas P; Hunfalvay, Melissa

    2017-02-01

    Considerable research has documented that successful performance in interceptive tasks (such as return of serve in tennis) is based on the performers' capability to capture appropriate anticipatory information prior to the flight path of the approaching object. Athletes of higher skill tend to fixate on different locations in the playing environment prior to initiation of a skill than their lesser skilled counterparts. The purpose of this study was to examine visual search behaviour strategies of elite (world ranked) tennis players and non-ranked competitive tennis players (n = 43) utilising cluster analysis. The results of hierarchical (Ward's method) and nonhierarchical (k means) cluster analyses revealed three different clusters. The clustering method distinguished visual behaviour of high, middle-and low-ranked players. Specifically, high-ranked players demonstrated longer mean fixation duration and lower variation of visual search than middle-and low-ranked players. In conclusion, the results demonstrated that cluster analysis is a useful tool for detecting and analysing the areas of interest for use in experimental analysis of expertise and to distinguish visual search variables among participants'.

  18. Cluster analysis of clinical data identifies fibromyalgia subgroups.

    Directory of Open Access Journals (Sweden)

    Elisa Docampo

    Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.

  19. Hierarchical clustering of genetic diversity associated to different levels of mutation and recombination in Escherichia coli: a study based on Mexican isolates.

    Science.gov (United States)

    González-González, Andrea; Sánchez-Reyes, Luna L; Delgado Sapien, Gabriela; Eguiarte, Luis E; Souza, Valeria

    2013-01-01

    Escherichia coli occur as either free-living microorganisms, or within the colons of mammals and birds as pathogenic or commensal bacteria. Although the Mexican population of intestinal E. coli maintains high levels of genetic diversity, the exact mechanisms by which this occurs remain unknown. We therefore investigated the role of homologous recombination and point mutation in the genetic diversification and population structure of Mexican strains of E. coli. This was explored using a multi locus sequence typing (MLST) approach in a non-outbreak related, host-wide sample of 128 isolates. Overall, genetic diversification in this sample appears to be driven primarily by homologous recombination, and to a lesser extent, by point mutation. Since genetic diversity is hierarchically organized according to the MLST genealogy, we observed that there is not a homogeneous recombination rate, but that different rates emerge at different clustering levels such as phylogenetic group, lineage and clonal complex (CC). Moreover, we detected clear signature of substructure among the A+B1 phylogenetic group, where the majority of isolates were differentiated into four discrete lineages. Substructure pattern is revealed by the presence of several CCs associated to a particular life style and host as well as to different genetic diversification mechanisms. We propose these findings as an alternative explanation for the maintenance of the clear phylogenetic signal of this species despite the prevalence of homologous recombination. Finally, we corroborate using both phylogenetic and genetic population approaches as an effective mean to establish epidemiological surveillance tailored to the ecological specificities of each geographic region.

  20. Interpolation centers' selection using hierarchical curvature-based clustering Selección de centros de interpolacion mediante agrupamiento jerárquico basado en curvatura

    Directory of Open Access Journals (Sweden)

    Juan C. Rodríguez

    2010-07-01

    Full Text Available Es ampliamente conocido que algunos campos relacionados con aplicaciones de gráficos realistas requieren modelos tridimensionales altamente detallados. Las tecnologías para esto están bien desarrolladas, sin embargo, en algunos casos los escáneres láser obtienen modelos complejos formados por millones de puntos, por lo que son computacionalmente intratables. En estos casos es conveniente obtener un conjunto reducido de estas muestras con las que reconstruir la superficie de la función. Obtener un enfoque de reducción adecuado que posea un equilibrio entre la pérdida de precisión de la función reconstruida, y el costo computacional es un problema no trivial. En este artículo presentamos un método jerárquico de aglomeración a través de la selección de centros mediante la geométrica, la distribución y la estimación de curvatura de las muestras en el espacio 3D.It is widely known that some fields related to graphic applications require realistic and full detailed three-dimensional models. Technologies for this kind of applications exist. However, in some cases, laser scanner get complex models composed of million of points, making its computationally difficult. In these cases, it is desirable to obtain a reduced set of these samples to reconstruct the function's surface. An appropriate reduction approach with a non-significant loss of accuracy in the reconstructed function with a good balance of computational load is usually a non-trivial problem. In this article, a hierarchical clustering based method by the selection of center using the geometric distribution and curvature estimation of the samples in the 3D space is described.

  1. Determinants of early cognitive development: hierarchical analysis of a longitudinal study.

    Science.gov (United States)

    Marques dos Santos, Letícia; Neves dos Santos, Darci; Bastos, Ana Cecília Sousa; Assis, Ana Marlúcia Oliveira; Prado, Matildes Silva; Barreto, Mauricio L

    2008-02-01

    The study describes the relationship between anthropometric status, socioeconomic conditions, and quality of home environment and child cognitive development in 320 children from 20 to 42 months of age, randomly selected from 20,000 households that represent the range of socioeconomic and environmental conditions in Salvador, Bahia, Northeast Brazil. The inclusion criterion was to be less than 42 months of age between January and July 1999. Child cognitive development was assessed using the Bayley Scales for Infant Development, and the Home Observation for Measurement of the Environment Inventory (HOME) was applied to assess quality of home environment. Anthropometric status was measured using the indicators weight/age and height/age ratios (z-scores), and socioeconomic data were collected through a standard questionnaire. Statistical analysis was conducted through univariate and hierarchical linear regression. Socioeconomic factors were found to have an indirect impact on early cognitive development mediated by the child's proximal environment factors, such as appropriate play materials and games available and school attendance. No independent association was seen between nutritional status and early cognitive development.

  2. Hierarchical Bayesian analysis of censored microbiological contamination data for use in risk assessment and mitigation.

    Science.gov (United States)

    Busschaert, P; Geeraerd, A H; Uyttendaele, M; Van Impe, J F

    2011-06-01

    Microbiological contamination data often is censored because of the presence of non-detects or because measurement outcomes are known only to be smaller than, greater than, or between certain boundary values imposed by the laboratory procedures. Therefore, it is not straightforward to fit distributions that summarize contamination data for use in quantitative microbiological risk assessment, especially when variability and uncertainty are to be characterized separately. In this paper, distributions are fit using Bayesian analysis, and results are compared to results obtained with a methodology based on maximum likelihood estimation and the non-parametric bootstrap method. The Bayesian model is also extended hierarchically to estimate the effects of the individual elements of a covariate such as, for example, on a national level, the food processing company where the analyzed food samples were processed, or, on an international level, the geographical origin of contamination data. Including this extra information allows a risk assessor to differentiate between several scenario's and increase the specificity of the estimate of risk of illness, or compare different scenario's to each other. Furthermore, inference is made on the predictive importance of several different covariates while taking into account uncertainty, allowing to indicate which covariates are influential factors determining contamination.

  3. Asian pollution climatically modulates mid-latitude cyclones following hierarchical modelling and observational analysis.

    Science.gov (United States)

    Wang, Yuan; Zhang, Renyi; Saravanan, R

    2014-01-01

    Increasing levels of anthropogenic aerosols in Asia have raised considerable concern regarding its potential impact on the global atmosphere, but the magnitude of the associated climate forcing remains to be quantified. Here, using a novel hierarchical modelling approach and observational analysis, we demonstrate modulated mid-latitude cyclones by Asian pollution over the past three decades. Regional and seasonal simulations using a cloud-resolving model show that Asian pollution invigorates winter cyclones over the northwest Pacific, increasing precipitation by 7% and net cloud radiative forcing by 1.0 W m(-2) at the top of the atmosphere and by 1.7 W m(-2) at the Earth's surface. A global climate model incorporating the diabatic heating anomalies from Asian pollution produces a 9% enhanced transient eddy meridional heat flux and reconciles a decadal variation of mid-latitude cyclones derived from the Reanalysis data. Our results unambiguously reveal a large impact of the Asian pollutant outflows on the global general circulation and climate.

  4. Hierarchical adaptation scheme for multiagent data fusion and resource management in situation analysis

    Science.gov (United States)

    Benaskeur, Abder R.; Roy, Jean

    2001-08-01

    Sensor Management (SM) has to do with how to best manage, coordinate and organize the use of sensing resources in a manner that synergistically improves the process of data fusion. Based on the contextual information, SM develops options for collecting further information, allocates and directs the sensors towards the achievement of the mission goals and/or tunes the parameters for the realtime improvement of the effectiveness of the sensing process. Conscious of the important role that SM has to play in modern data fusion systems, we are currently studying advanced SM Concepts that would help increase the survivability of the current Halifax and Iroquois Class ships, as well as their possible future upgrades. For this purpose, a hierarchical scheme has been proposed for data fusion and resource management adaptation, based on the control theory and within the process refinement paradigm of the JDL data fusion model, and taking into account the multi-agent model put forward by the SASS Group for the situation analysis process. The novelty of this work lies in the unified framework that has been defined for tackling the adaptation of both the fusion process and the sensor/weapon management.

  5. Variable cluster analysis method for building neural network model

    Institute of Scientific and Technical Information of China (English)

    王海东; 刘元东

    2004-01-01

    To address the problems that input variables should be reduced as much as possible and explain output variables fully in building neural network model of complicated system, a variable selection method based on cluster analysis was investigated. Similarity coefficient which describes the mutual relation of variables was defined. The methods of the highest contribution rate, part replacing whole and variable replacement are put forwarded and deduced by information theory. The software of the neural network based on cluster analysis, which can provide many kinds of methods for defining variable similarity coefficient, clustering system variable and evaluating variable cluster, was developed and applied to build neural network forecast model of cement clinker quality. The results show that all the network scale, training time and prediction accuracy are perfect. The practical application demonstrates that the method of selecting variables for neural network is feasible and effective.

  6. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Årup; Frutiger, Sally A.

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing. Hum. Brain Mapping 15...

  7. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing...

  8. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing...

  9. Cluster analysis of Southeastern U.S. climate stations

    Science.gov (United States)

    Stooksbury, D. E.; Michaels, P. J.

    1991-09-01

    A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.

  10. Initial magnetization analysis of iron cluster assemblies

    Energy Technology Data Exchange (ETDEWEB)

    Michele, Oliver; Hesse, Juergen; Bremers, Heiko [Technische Universitaet Braunschweig, Institut fuer Metallphysik und Nukleare Festkoerperphysik, Mendelssohnstrasse 3, 38106 Braunschweig (Germany); Peng, Dong-Lian; Sumiyama, Kenji; Hihara, Takehiko; Yamamuro, Saeki [Department of Materials Science and Engineering, Nagoya Institute of Technology, Nagoya 466-8555 (Japan)

    2004-12-01

    Nearly monodispersed oxide-coated Fe cluster assemblies were prepared using a plasma-gas-condensation style cluster beam deposition apparatus (D. L. Peng et al. J. Appl. Phys. 92 3075 (2002)). The characterization of such assemblies is presented using SQUID magnetometry. The aim of this contribution is the interpretation of the initial magnetization curves instead of the usual presentation of hysteresis loops and coercivities. The description of the initial magnetization is based on a proposed vector model valid for Stoner-Wohlfarth particles. The model includes the particles' anisotropy and possible interactions regarding these influences as equivalent magnetic fields. The model is an extension of the one described by Michele et al. (J. Phys.: Condens. Matter 16 427 (2004)) regarding the fact that in a completely demagnetized state, in the sample consisting of a very large number of particles always equal anisotropy fields of opposite signs are present. We measured the initial magnetization curves for different temperatures and present the temperature dependence of the model's parameters. (Abstract Copyright [2004], Wiley Periodicals, Inc.)

  11. Spatial Data Mining using Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Ch.N.Santhosh Kumar

    2012-09-01

    Full Text Available Data mining, which is refers to as Knowledge Discovery in Databases(KDD, means a process of nontrivialexaction of implicit, previously useful and unknown information such as knowledge rules, descriptions,regularities, and major trends from large databases. Data mining is evolved in a multidisciplinary field ,including database technology, machine learning, artificial intelligence, neural network, informationretrieval, and so on. In principle data mining should be applicable to the different kind of data and databasesused in many different applications, including relational databases, transactional databases, datawarehouses, object- oriented databases, and special application- oriented databases such as spatialdatabases, temporal databases, multimedia databases, and time- series databases. Spatial data mining, alsocalled spatial mining, is data mining as applied to the spatial data or spatial databases. Spatial data are thedata that have spatial or location component, and they show the information, which is more complex thanclassical data. A spatial database stores spatial data represents by spatial data types and spatialrelationships and among data. Spatial data mining encompasses various tasks. These include spatialclassification, spatial association rule mining, spatial clustering, characteristic rules, discriminant rules,trend detection. This paper presents how spatial data mining is achieved using clustering.

  12. Multivariate analysis of the globular clusters in M87

    CERN Document Server

    Das, Sukanta; Davoust, Emmanuel

    2015-01-01

    An objective classification of 147 globular clusters in the inner region of the giant elliptical galaxy M87 is carried out with the help of two methods of multivariate analysis. First independent component analysis is used to determine a set of independent variables that are linear combinations of various observed parameters (mostly Lick indices) of the globular clusters. Next K-means cluster analysis is applied on the independent components, to find the optimum number of homogeneous groups having an underlying structure. The properties of the four groups of globular clusters thus uncovered are used to explain the formation mechanism of the host galaxy. It is suggested that M87 formed in two successive phases. First a monolithic collapse, which gave rise to an inner group of metal-rich clusters with little systematic rotation and an outer group of metal-poor clusters in eccentric orbits. In a second phase, the galaxy accreted low-mass satellites in a dissipationless fashion, from the gas of which the two othe...

  13. Topological cluster analysis reveals the systemic organization of the Caenorhabditis elegans connectome.

    Directory of Open Access Journals (Sweden)

    Yunkyu Sohn

    2011-05-01

    Full Text Available The modular organization of networks of individual neurons interwoven through synapses has not been fully explored due to the incredible complexity of the connectivity architecture. Here we use the modularity-based community detection method for directed, weighted networks to examine hierarchically organized modules in the complete wiring diagram (connectome of Caenorhabditis elegans (C. elegans and to investigate their topological properties. Incorporating bilateral symmetry of the network as an important cue for proper cluster assignment, we identified anatomical clusters in the C. elegans connectome, including a body-spanning cluster, which correspond to experimentally identified functional circuits. Moreover, the hierarchical organization of the five clusters explains the systemic cooperation (e.g., mechanosensation, chemosensation, and navigation that occurs among the structurally segregated biological circuits to produce higher-order complex behaviors.

  14. Identifying clinical course patterns in SMS data using cluster analysis

    DEFF Research Database (Denmark)

    Kent, Peter; Kongsted, Alice

    2012-01-01

    ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically importa...... of cluster analysis. More research is needed, especially head-to-head studies, to identify which technique is best to use under what circumstances.......ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically important...... by spline analysis. However, cluster analysis of SMS data in its original untransformed form may be simpler and offer other advantages. Therefore, the aim of this study was to determine whether cluster analysis could be used for identifying clinical course patterns distinct from the pattern of the whole...

  15. Cluster analysis of undergraduate drinkers based on alcohol expectancy scores.

    Science.gov (United States)

    Leeman, Robert F; Kulesza, Magdalena; Stewart, Diana W; Copeland, Amy L

    2012-03-01

    Expectancies of alcohol's effects have been associated with problem drinking in undergraduates. If subgroups can be classified based on expectancies, this may facilitate identifying those at highest risk for problem drinking. Undergraduates (N = 612) from two state universities completed a web-based survey. Responses to the Comprehensive Effects of Alcohol scale were analyzed using k-means cluster analysis separately within each university sample. Hartigan's heuristic was used to determine that five was the optimal number of clusters in each sample. Clusters were distinguishable based on their overall magnitude of expectancy endorsement and by a tendency to endorse stronger positive than negative expectancies. Subsequent analyses were conducted to compare clusters on alcohol involvement and trait disinhibition. A cluster characterized by endorsement of positive and negative expectancies ("strong expectancy") was associated with a particularly problematic risk profile, specifically concerning difficulties with self-control (i.e., trait disinhibition and impaired control over alcohol use). A cluster with higher positive and lower negative expectancies reported frequent heavy drinking but appeared to be at lower risk than the strong expectancy cluster in a number of respects. Negative expectancy endorsement appeared to represent added risk above and beyond positive expectancies. Results suggest that both the magnitude and combination of expectancies endorsed by subgroups of undergraduate drinkers may relate to their risk level in terms of alcohol involvement and personality traits. These findings may have implications for interventions with young adult drinkers.

  16. Cluster analysis for identifying sub-groups and selecting potential discriminatory variables in human encephalitis

    Directory of Open Access Journals (Sweden)

    Crowcroft Natasha S

    2010-12-01

    Full Text Available Abstract Background Encephalitis is an acute clinical syndrome of the central nervous system (CNS, often associated with fatal outcome or permanent damage, including cognitive and behavioural impairment, affective disorders and epileptic seizures. Infection of the central nervous system is considered to be a major cause of encephalitis and more than 100 different pathogens have been recognized as causative agents. However, a large proportion of cases have unknown disease etiology. Methods We perform hierarchical cluster analysis on a multicenter England encephalitis data set with the aim of identifying sub-groups in human encephalitis. We use the simple matching similarity measure which is appropriate for binary data sets and performed variable selection using cluster heatmaps. We also use heatmaps to visually assess underlying patterns in the data, identify the main clinical and laboratory features and identify potential risk factors associated with encephalitis. Results Our results identified fever, personality and behavioural change, headache and lethargy as the main characteristics of encephalitis. Diagnostic variables such as brain scan and measurements from cerebrospinal fluids are also identified as main indicators of encephalitis. Our analysis revealed six major clusters in the England encephalitis data set. However, marked within-cluster heterogeneity is observed in some of the big clusters indicating possible sub-groups. Overall, the results show that patients are clustered according to symptom and diagnostic variables rather than causal agents. Exposure variables such as recent infection, sick person contact and animal contact have been identified as potential risk factors. Conclusions It is in general assumed and is a common practice to group encephalitis cases according to disease etiology. However, our results indicate that patients are clustered with respect to mainly symptom and diagnostic variables rather than causal agents

  17. Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters III: Analysis of 30 Clusters

    CERN Document Server

    Wagner-Kaiser, R; Sarajedini, A; von Hippel, T; van Dyk, D A; Robinson, E; Stein, N; Jefferys, W H

    2016-01-01

    We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic Globular Clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within each cluster and the relative proportion of those populations. We find the helium differences among the two populations in the clusters fall in the range of ~0.04 to 0.11. Because adequate models varying in CNO are not presently available, we view these spreads as upper limits and present them with statistical rather than observational uncertainties. Evidence supports previous studies suggesting an increase in helium content concurrent with increasing mass of the cluster and also find that the proportion of the first population of stars increases with mass as well. Our results are examined in the context of proposed g...

  18. Principal Component Clustering Approach to Teaching Quality Discriminant Analysis

    Science.gov (United States)

    Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan

    2016-01-01

    Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…

  19. Cluster analysis of radionuclide concentrations in beach sand

    NARCIS (Netherlands)

    de Meijer, R.J.; James, I.; Jennings, P.J.; Keoyers, J.E.

    This paper presents a method in which natural radionuclide concentrations of beach sand minerals are traced along a stretch of coast by cluster analysis. This analysis yields two groups of mineral deposit with different origins. The method deviates from standard methods of following dispersal of

  20. The increase of rural development measures efficiency at the micro-regions level by cluster analysis. A Romanian case study

    Directory of Open Access Journals (Sweden)

    Maria VINCZE

    2011-06-01

    Full Text Available The aim of this paper is to demonstrate the role of cluster analysis of rural localities as the basis for a more efficient way of choosing the rural development measures to be used to stimulate rural socio-economic growth. We present evidence of the typologies of rural localities determined by hierarchical cluster using the Ward method. We used five groups of criteria: 1. characterising labour force supply (10 indicators; 2. those which describe the structure of employment via economic activities (5 indicators; 3. characteristics of living standards (7 indicators, 4. labour force, natural resources and local income characteristics (11 indicators. All of these indicators, used in the first stage of factor analysis, and in the second stage in the cluster analyses, permit classification of rural localities in different clusters, which, generally need different measures for rural employment growth. We offer a short description of the groups of localities which belong to different clusters. This information can help local, county and regional level decision makers to identify the most efficient approaches to stimulating rural development.

  1. Technology Clusters Exploration for Patent Portfolio through Patent Abstract Analysis

    Directory of Open Access Journals (Sweden)

    Gabjo Kim

    2016-12-01

    Full Text Available This study explores technology clusters through patent analysis. The aim of exploring technology clusters is to grasp competitors’ levels of sustainable research and development (R&D and establish a sustainable strategy for entering an industry. To achieve this, we first grouped the patent documents with similar technologies by applying affinity propagation (AP clustering, which is effective while grouping large amounts of data. Next, in order to define the technology clusters, we adopted the term frequency-inverse document frequency (TF-IDF weight, which lists the terms in order of importance. We collected the patent data of Korean electric car companies from the United States Patent and Trademark Office (USPTO to verify our proposed methodology. As a result, our proposed methodology presents more detailed information on the Korean electric car industry than previous studies.

  2. An Empirical Analysis of Rough Set Categorical Clustering Techniques

    Science.gov (United States)

    2017-01-01

    Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy. PMID:28068344

  3. An Empirical Analysis of Rough Set Categorical Clustering Techniques.

    Science.gov (United States)

    Uddin, Jamal; Ghazali, Rozaida; Deris, Mustafa Mat

    2017-01-01

    Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy.

  4. Visualization methods for statistical analysis of microarray clusters

    Directory of Open Access Journals (Sweden)

    Li Kai

    2005-05-01

    Full Text Available Abstract Background The most common method of identifying groups of functionally related genes in microarray data is to apply a clustering algorithm. However, it is impossible to determine which clustering algorithm is most appropriate to apply, and it is difficult to verify the results of any algorithm due to the lack of a gold-standard. Appropriate data visualization tools can aid this analysis process, but existing visualization methods do not specifically address this issue. Results We present several visualization techniques that incorporate meaningful statistics that are noise-robust for the purpose of analyzing the results of clustering algorithms on microarray data. This includes a rank-based visualization method that is more robust to noise, a difference display method to aid assessments of cluster quality and detection of outliers, and a projection of high dimensional data into a three dimensional space in order to examine relationships between clusters. Our methods are interactive and are dynamically linked together for comprehensive analysis. Further, our approach applies to both protein and gene expression microarrays, and our architecture is scalable for use on both desktop/laptop screens and large-scale display devices. This methodology is implemented in GeneVAnD (Genomic Visual ANalysis of Datasets and is available at http://function.princeton.edu/GeneVAnD. Conclusion Incorporating relevant statistical information into data visualizations is key for analysis of large biological datasets, particularly because of high levels of noise and the lack of a gold-standard for comparisons. We developed several new visualization techniques and demonstrated their effectiveness for evaluating cluster quality and relationships between clusters.

  5. Cognitive analysis of multiple sclerosis utilizing fuzzy cluster means

    Directory of Open Access Journals (Sweden)

    Imianvan Anthony Agboizebeta

    2012-01-01

    Full Text Available Multiple sclerosis, often called MS, is a disease that affects the central nervous system (the brain and spinal cord. Myelin provides insulation for nerve cells improves the conduction of impulses along the nerves and is important for maintaining the health of the nerves. In multiple sclerosis, inflammation causes the myelin to disappear. Genetic factors, environmental issues and viral infection may also play a role in developing the disease. Ms is characterized by life threatening symptoms such as; loss of balance, hearing problem and depression. The application of Fuzzy Cluster Means (FCM or Fuzzy CMean analysis to the diagnosis of different forms of multiple sclerosis is the focal point of this paper. Application of cluster analysis involves a sequence of methodological and analytical decision steps that enhances the quality and meaning of the clusters produced. Uncertainties associated with analysis of multiple sclerosis test data are eliminated by the system

  6. Statistical analysis of bound companions in the Coma cluster

    Science.gov (United States)

    Mendelin, Martin; Binggeli, Bruno

    2017-08-01

    Aims: The rich and nearby Coma cluster of galaxies is known to have substructure. We aim to create a more detailed picture of this substructure by searching directly for bound companions around individual giant members. Methods: We have used two catalogs of Coma galaxies, one covering the cluster core for a detailed morphological analysis, another covering the outskirts. The separation limit between possible companions (secondaries) and giants (primaries) is chosen as MB = -19 and MR = -20, respectively for the two catalogs. We have created pseudo-clusters by shuffling positions or velocities of the primaries and search for significant over-densities of possible companions around giants by comparison with the data. This method was developed and applied first to the Virgo cluster. In a second approach we introduced a modified nearest neighbor analysis using several interaction parameters for all galaxies. Results: We find evidence for some excesses due to possible companions for both catalogs. Satellites are typically found among the faintest dwarfs (MB type giants (spirals) in the outskirts, which is expected in an infall scenario of cluster evolution. A rough estimate for an upper limit of bound galaxies within Coma is 2-4%, to be compared with 7% for Virgo. Conclusions: The results agree well with the expected low frequency of bound companions in a regular cluster such as Coma. To exploit the data more fully and reach more detailed insights into the physics of cluster evolution we suggest applying the method also to model clusters created by N-body simulations for comparison.

  7. Multi-scaling hierarchical structure analysis on the sequence of E. coli complete genome

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    We have applied the newly developed hierarchical structure theory for complex systems to analyze the multi-scaling structures of the nucleotide density distribution along a linear DNA sequence from the complete Escherichia coli genome. The hierarchical symmetry in the nucleotide density distribution was demonstrated. In particular, we have shown that the G, C density distribution that represents a strong H-bonding between the two DNA chains is more coherent with smaller similarity parameter compared to that of A, T density distribution, indicating a better organized multi-scaling fluctuation field for G, C density distribution along the genome sequence. The biological significance of these findings is under investigation.

  8. Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis.

    Science.gov (United States)

    Liao, Minlei; Li, Yunfeng; Kianifard, Farid; Obi, Engels; Arcona, Stephen

    2016-03-02

    Cluster analysis (CA) is a frequently used applied statistical technique that helps to reveal hidden structures and "clusters" found in large data sets. However, this method has not been widely used in large healthcare claims databases where the distribution of expenditure data is commonly severely skewed. The purpose of this study was to identify cost change patterns of patients with end-stage renal disease (ESRD) who initiated hemodialysis (HD) by applying different clustering methods. A retrospective, cross-sectional, observational study was conducted using the Truven Health MarketScan® Research Databases. Patients aged ≥18 years with ≥2 ESRD diagnoses who initiated HD between 2008 and 2010 were included. The K-means CA method and hierarchical CA with various linkage method