WorldWideScience

Sample records for factor analysis cluster

  1. Learning From Hidden Traits: Joint Factor Analysis and Latent Clustering

    Science.gov (United States)

    Yang, Bo; Fu, Xiao; Sidiropoulos, Nicholas D.

    2017-01-01

    Dimensionality reduction techniques play an essential role in data analytics, signal processing and machine learning. Dimensionality reduction is usually performed in a preprocessing stage that is separate from subsequent data analysis, such as clustering or classification. Finding reduced-dimension representations that are well-suited for the intended task is more appealing. This paper proposes a joint factor analysis and latent clustering framework, which aims at learning cluster-aware low-dimensional representations of matrix and tensor data. The proposed approach leverages matrix and tensor factorization models that produce essentially unique latent representations of the data to unravel latent cluster structure -- which is otherwise obscured because of the freedom to apply an oblique transformation in latent space. At the same time, latent cluster structure is used as prior information to enhance the performance of factorization. Specific contributions include several custom-built problem formulations, corresponding algorithms, and discussion of associated convergence properties. Besides extensive simulations, real-world datasets such as Reuters document data and MNIST image data are also employed to showcase the effectiveness of the proposed approaches.

  2. Common Factor Analysis Versus Principal Component Analysis: Choice for Symptom Cluster Research

    Directory of Open Access Journals (Sweden)

    Hee-Ju Kim, PhD, RN

    2008-03-01

    Conclusion: If the study purpose is to explain correlations among variables and to examine the structure of the data (this is usual for most cases in symptom cluster research, CFA provides a more accurate result. If the purpose of a study is to summarize data with a smaller number of variables, PCA is the choice. PCA can also be used as an initial step in CFA because it provides information regarding the maximum number and nature of factors. In using factor analysis for symptom cluster research, several issues need to be considered, including subjectivity of solution, sample size, symptom selection, and level of measure.

  3. Using Multilevel Factor Analysis with Clustered Data: Investigating the Factor Structure of the Positive Values Scale

    Science.gov (United States)

    Huang, Francis L.; Cornell, Dewey G.

    2016-01-01

    Advances in multilevel modeling techniques now make it possible to investigate the psychometric properties of instruments using clustered data. Factor models that overlook the clustering effect can lead to underestimated standard errors, incorrect parameter estimates, and model fit indices. In addition, factor structures may differ depending on…

  4. Cluster analysis

    CERN Document Server

    Everitt, Brian S; Leese, Morven; Stahl, Daniel

    2011-01-01

    Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.Real life examples are used throughout to demons

  5. Analysis of risk factors for cluster behavior of dental implant failures.

    Science.gov (United States)

    Chrcanovic, Bruno Ramos; Kisch, Jenö; Albrektsson, Tomas; Wennerberg, Ann

    2017-08-01

    Some studies indicated that implant failures are commonly concentrated in few patients. To identify and analyze cluster behavior of dental implant failures among subjects of a retrospective study. This retrospective study included patients receiving at least three implants only. Patients presenting at least three implant failures were classified as presenting a cluster behavior. Univariate and multivariate logistic regression models and generalized estimating equations analysis evaluated the effect of explanatory variables on the cluster behavior. There were 1406 patients with three or more implants (8337 implants, 592 failures). Sixty-seven (4.77%) patients presented cluster behavior, with 56.8% of all implant failures. The intake of antidepressants and bruxism were identified as potential negative factors exerting a statistically significant influence on a cluster behavior at the patient-level. The negative factors at the implant-level were turned implants, short implants, poor bone quality, age of the patient, the intake of medicaments to reduce the acid gastric production, smoking, and bruxism. A cluster pattern among patients with implant failure is highly probable. Factors of interest as predictors for implant failures could be a number of systemic and local factors, although a direct causal relationship cannot be ascertained. © 2017 Wiley Periodicals, Inc.

  6. Cluster analysis of the factors influencing innovative development of economy in regions of Russian Federation

    Directory of Open Access Journals (Sweden)

    V. N. Yur’ev

    2017-01-01

    Full Text Available This article provides a statistical description aimed at identifying the factors, which influence on the innovative development in regions of the Russian Federation. Presented article refers to the results of previous research [1, p. 212–218]. On the first stage, there was given a terminology on the concepts of innovations and innovative development, as well as their role in the modern economy was stated. On the next stage, the factors, which may have an influence on the volume of innovative products, activities and services, were chosen. The results received from this article show the cluster analysis of the regions conducted according to three chosen methods. In the course of the research, data was collected from an official web page of Federal State Statistics Service in accordance to previously chosen factors, its’ analysis and conclusions were made, on the current step the cluster analysis was additionally conducted. To analyze the sample rates and to divide regions to the clusters we’ve used a fully integrated line of analytic solutions Statistica [2], for analyzing, visualizing and forecasting. As a result of a statistical analysis and Statistica use regions were divided into clusters according to the three methods: hierarchical classification, Kaverage method and two-input distribution. To make more detailed analysis, linear, power and exponential equations were built for each region. As a result there were drawn two tables: 1 with the Euclidian distances; 2 with the regression models and the meaningful factors. Thereby, regions were grouped. For each group conclusions and recommendations were given. The results of current research will be applicable for analysis and planning of different commercial and governmental market participants.

  7. Indentifying the major air pollutants base on factor and cluster analysis, a case study in 74 Chinese cities

    Science.gov (United States)

    Zhang, Jing; Zhang, Lan-yue; Du, Ming; Zhang, Wei; Huang, Xin; Zhang, Ya-qi; Yang, Yue-yi; Zhang, Jian-min; Deng, Shi-huai; Shen, Fei; Li, Yuan-wei; Xiao, Hong

    2016-11-01

    This article investigated the major air pollutants and its spatial and seasonal distribution in 74 Chinese cities. Factor analysis and Cluster analysis are employed to indentify major factors of air pollutants. The following results are obtained (1) major factors are obtained in spring, summer, autumn, and winter. The first factor in spring includes NO2, PM10, CO, and PM2.5; the first factor in summer and autumn includes PM10, PM2.5, CO and SO2; in winter, the first factor includes NO2, PM10, PM2.5, and SO2. (2) In spring, cities of cluster 5 are the severest polluted by emission sources of SO2, CO, PM10, and PM2.5; the emission sources of O3 would significantly influence the air quality in cities of cluster 2; the emission sources of NO2 could significantly influence the air quality in cities of cluster 3 and cluster 5. (3) In summer, cities of cluster 5 are the severest polluted by automotive emissions and coal flue gas. Cities of cluster 1 are the lightest polluted. Cities of cluster 3 and cluster 2 are polluted by emission sources of SO2 and O3. (4) In Autumn, cities of cluster 3 and 4 are the severest polluted by the emission sources of SO2, CO, PM10, and PM2.5; the emission sources of NO2 would significantly influence the air quality in cities of cluster 5; the emission sources of O3 could significantly influence the air quality in cities of cluster 1 and cluster 4. (5) In winter, cities of cluster 5 are the severest polluted by the emission sources of SO2, CO, PM10, PM2.5, and CO; the emission sources of O3 could significantly influence the air quality in cities of cluster 1 and cluster 5.

  8. Factor analysis of the clustering of common somatic symptoms: a preliminary study

    Directory of Open Access Journals (Sweden)

    Tsai Chung-Huang

    2010-06-01

    Full Text Available Abstract Background Studies of outpatient department patients indicate that somatic discomforts such as headache, neck pain, chest pain, low back pain, and gastrointestinal discomfort are commonly found in patients with multiple complaints. Clustering of some symptoms has been found in common somatic symptom analyses. Because of the complexity involved in the diagnosis of patients with multiple complaints, the aim of this study is to identify and classify patterns of somatic symptoms in individuals assessed during a health examination. Methods A total of 683 patients (437 males, 246 females received a one-day physical examination and completed a structured survey during the period from May 2007 to April 2008. A physical symptoms interview was conducted, and medical and demographic data was collected. Results Based on the factor analysis, 4 clusters of symptoms were identified: 1 pain symptoms, 2 cold symptoms, 3 cardiopulmonary symptoms, and 4 gastrointestinal symptoms. The distribution of symptoms differed between males and females. After varimax rotation of factor patterns, 4 extracted factors emerged. In males, the factors were 1 pain symptoms, 2 cold symptoms, 3 cardiopulmonary symptoms, and 4 gastrointestinal symptoms. In females, the factors were 1 pain symptoms, 2 cold symptoms, 3 cardiopulmonary symptoms, and 4 head and gastrointestinal symptoms. Conclusions Four clusters of somatic symptoms emerged for both males and females; however, the predominant symptoms were different in males and females. Females displayed more head-related symptoms than males. Patients should be thoroughly interviewed about additional symptoms within the same cluster after the recognition of a single somatic complaint.

  9. Determination of the principal factors of river water quality through cluster analysis method and its prediction

    Institute of Scientific and Technical Information of China (English)

    Liang GUO; Ying ZHAO; Peng WANG

    2012-01-01

    In this paper, an artificial neural network model was built to predict the Chemical Oxygen Demand (CODMn) measured by permanganate index in Songhua River. To enhance the prediction accuracy, principal factors were determined through the analysis of the weight relation between influencing factors and forecasting object using cluster analysis method, which optimized the topological structure of the prediction model input items of the artificial neural network. It was shown that application of the principal factors in water quality prediction model can improve its forecasting skill significantly through the comparison between results of prediction by artificial neural network and the measurements of the CODMn. This methodology is also applicable to various water quality prediction targets of other water bodies and it is valuable for theoretical study and practical application.

  10. WHY DO SOME NATIONS SUCCEED AND OTHERS FAIL IN INTERNATIONAL COMPETITION? FACTOR ANALYSIS AND CLUSTER ANALYSIS AT EUROPEAN LEVEL

    Directory of Open Access Journals (Sweden)

    Popa Ion

    2015-07-01

    Full Text Available As stated by Michael Porter (1998: 57, 'this is perhaps the most frequently asked economic question of our times.' However, a widely accepted answer is still missing. The aim of this paper is not to provide the BIG answer for such a BIG question, but rather to provide a different perspective on the competitiveness at the national level. In this respect, we followed a two step procedure, called “tandem analysis”. (OECD, 2008. First we employed a Factor Analysis in order to reveal the underlying factors of the initial dataset followed by a Cluster Analysis which aims classifying the 35 countries according to the main characteristics of competitiveness resulting from Factor Analysis. The findings revealed that clustering the 35 states after the first two factors: Smart Growth and Market Development, which recovers almost 76% of common variability of the twelve original variables, are highlighted four clusters as well as a series of useful information in order to analyze the characteristics of the four clusters and discussions on them.

  11. Application of Factor Analysis on the Financial Ratios of Indian Cement Industry and Validation of the Results by Cluster Analysis

    Science.gov (United States)

    De, Anupam; Bandyopadhyay, Gautam; Chakraborty, B. N.

    2010-10-01

    Financial ratio analysis is an important and commonly used tool in analyzing financial health of a firm. Quite a large number of financial ratios, which can be categorized in different groups, are used for this analysis. However, to reduce number of ratios to be used for financial analysis and regrouping them into different groups on basis of empirical evidence, Factor Analysis technique is being used successfully by different researches during the last three decades. In this study Factor Analysis has been applied over audited financial data of Indian cement companies for a period of 10 years. The sample companies are listed on the Stock Exchange India (BSE and NSE). Factor Analysis, conducted over 44 variables (financial ratios) grouped in 7 categories, resulted in 11 underlying categories (factors). Each factor is named in an appropriate manner considering the factor loads and constituent variables (ratios). Representative ratios are identified for each such factor. To validate the results of Factor Analysis and to reach final conclusion regarding the representative ratios, Cluster Analysis had been performed.

  12. Understanding the Support Needs of People with Intellectual and Related Developmental Disabilities through Cluster Analysis and Factor Analysis of Statewide Data

    Science.gov (United States)

    Viriyangkura, Yuwadee

    2014-01-01

    Through a secondary analysis of statewide data from Colorado, people with intellectual and related developmental disabilities (ID/DD) were classified into five clusters based on their support needs characteristics using cluster analysis techniques. Prior latent factor models of support needs in the field of ID/DD were examined to investigate the…

  13. Understanding the Support Needs of People with Intellectual and Related Developmental Disabilities through Cluster Analysis and Factor Analysis of Statewide Data

    Science.gov (United States)

    Viriyangkura, Yuwadee

    2014-01-01

    Through a secondary analysis of statewide data from Colorado, people with intellectual and related developmental disabilities (ID/DD) were classified into five clusters based on their support needs characteristics using cluster analysis techniques. Prior latent factor models of support needs in the field of ID/DD were examined to investigate the…

  14. Comparing 3 dietary pattern methods--cluster analysis, factor analysis, and index analysis--With colorectal cancer risk: The NIH-AARP Diet and Health Study.

    Science.gov (United States)

    Reedy, Jill; Wirfält, Elisabet; Flood, Andrew; Mitrou, Panagiota N; Krebs-Smith, Susan M; Kipnis, Victor; Midthune, Douglas; Leitzmann, Michael; Hollenbeck, Albert; Schatzkin, Arthur; Subar, Amy F

    2010-02-15

    The authors compared dietary pattern methods-cluster analysis, factor analysis, and index analysis-with colorectal cancer risk in the National Institutes of Health (NIH)-AARP Diet and Health Study (n = 492,306). Data from a 124-item food frequency questionnaire (1995-1996) were used to identify 4 clusters for men (3 clusters for women), 3 factors, and 4 indexes. Comparisons were made with adjusted relative risks and 95% confidence intervals, distributions of individuals in clusters by quintile of factor and index scores, and health behavior characteristics. During 5 years of follow-up through 2000, 3,110 colorectal cancer cases were ascertained. In men, the vegetables and fruits cluster, the fruits and vegetables factor, the fat-reduced/diet foods factor, and all indexes were associated with reduced risk; the meat and potatoes factor was associated with increased risk. In women, reduced risk was found with the Healthy Eating Index-2005 and increased risk with the meat and potatoes factor. For men, beneficial health characteristics were seen with all fruit/vegetable patterns, diet foods patterns, and indexes, while poorer health characteristics were found with meat patterns. For women, findings were similar except that poorer health characteristics were seen with diet foods patterns. Similarities were found across methods, suggesting basic qualities of healthy diets. Nonetheless, findings vary because each method answers a different question.

  15. Cluster analysis for applications

    CERN Document Server

    Anderberg, Michael R

    1973-01-01

    Cluster Analysis for Applications deals with methods and various applications of cluster analysis. Topics covered range from variables and scales to measures of association among variables and among data units. Conceptual problems in cluster analysis are discussed, along with hierarchical and non-hierarchical clustering methods. The necessary elements of data analysis, statistics, cluster analysis, and computer implementation are integrated vertically to cover the complete path from raw data to a finished analysis.Comprised of 10 chapters, this book begins with an introduction to the subject o

  16. The contribution of psychological factors to recovery after mild traumatic brain injury: is cluster analysis a useful approach?

    Science.gov (United States)

    Snell, Deborah L; Surgenor, Lois J; Hay-Smith, E Jean C; Williman, Jonathan; Siegert, Richard J

    2015-01-01

    Outcomes after mild traumatic brain injury (MTBI) vary, with slow or incomplete recovery for a significant minority. This study examines whether groups of cases with shared psychological factors but with different injury outcomes could be identified using cluster analysis. This is a prospective observational study following 147 adults presenting to a hospital-based emergency department or concussion services in Christchurch, New Zealand. This study examined associations between baseline demographic, clinical, psychological variables (distress, injury beliefs and symptom burden) and outcome 6 months later. A two-step approach to cluster analysis was applied (Ward's method to identify clusters, K-means to refine results). Three meaningful clusters emerged (high-adapters, medium-adapters, low-adapters). Baseline cluster-group membership was significantly associated with outcomes over time. High-adapters appeared recovered by 6-weeks and medium-adapters revealed improvements by 6-months. The low-adapters continued to endorse many symptoms, negative recovery expectations and distress, being significantly at risk for poor outcome more than 6-months after injury (OR (good outcome) = 0.12; CI = 0.03-0.53; p Cluster analysis supported the notion that groups could be identified early post-injury based on psychological factors, with group membership associated with differing outcomes over time. Implications for clinical care providers regarding therapy targets and cases that may benefit from different intensities of intervention are discussed.

  17. Marketing research cluster analysis

    Directory of Open Access Journals (Sweden)

    Marić Nebojša

    2002-01-01

    Full Text Available One area of applications of cluster analysis in marketing is identification of groups of cities and towns with similar demographic profiles. This paper considers main aspects of cluster analysis by an example of clustering 12 cities with the use of Minitab software.

  18. Research on the relationship between the elements and pharmacological activities in velvet antler using factor analysis and cluster analysis

    Science.gov (United States)

    Zhou, Libing

    2017-04-01

    Velvet antler has certain effect on improving the body's immune cells and the regulation of immune system function, nervous system, anti-stress, anti-aging and osteoporosis. It has medicinal applications to treat a wide range of diseases such as tissue wound healing, anti-tumor, cardiovascular disease, et al. Therefore, the research on the relationship between pharmacological activities and elements in velvet antler is of great significance. The objective of this study was to comprehensively evaluate 15 kinds of elements in different varieties of velvet antlers and study on the relationship between the elements and traditional Chinese medicine efficacy for the human. The factor analysis and the factor cluster analysis methods were used to analyze the data of elements in the sika velvet antler, cervus elaphus linnaeus, flower horse hybrid velvet antler, apiti (elk) velvet antler, male reindeer velvet antler and find out the relationship between 15 kinds of elements including Ca, P, Mg, Na, K, Fe, Cu, Mn, Al, Ba, Co, Sr, Cr, Zn and Ni. Combining with MATLAB2010 and SPSS software, the chemometrics methods were made on the relationship between the elements in velvet antler and the pharmacological activities. The first commonality factor F1 had greater load on the indexes of Ca, P, Mg, Co, Sr and Ni, and the second commonality factor F2 had greater load on the indexes of K, Mn, Zn and Cr, and the third commonality factor F3 had greater load on the indexes of Na, Cu and Ba, and the fourth commonality factor F4 had greater load on the indexes of Fe and Al. 15 kinds of elements in velvet antler in the order were elk velvet antler>flower horse hybrid velvet antler>cervus elaphus linnaeus>sika velvet antler>male reindeer velvet antler. Based on the factor analysis and the factor cluster analysis, a model for evaluating traditional Chinese medicine quality was constructed. These studies provide the scientific base and theoretical foundation for the future large-scale rational

  19. Comparative analysis of a tourism cluster in the Baikal region: role of cooperation as a factor of development

    Directory of Open Access Journals (Sweden)

    Nina Nikolayevna Danilenko

    2014-06-01

    Full Text Available The article investigates cooperation in the field of tourism as a factor and feature of tourism clusters development. The analysis of tourism clusters development, trends and most common forms of cooperation between the participants in two regions of the Baikal region (Irkutsk region and the Republic of Buryatia was carried out based on the results of interviews with representatives of tourism business, education and government. The results indicate that compared with European practice, the areas of cooperation of Russian tourism sector enterprises with other economic actors are less diverse. Some attributes of cluster development based on cooperation are indicated in the Republic of Buryatia, whereas they are missing in the Irkutsk region, although two regions are the objects of a number of national and regional development programs aimed at tourism clusters development.

  20. Cluster Correspondence Analysis

    NARCIS (Netherlands)

    M. van de Velden (Michel); A. Iodice D' Enza; F. Palumbo

    2014-01-01

    markdownabstract__Abstract__ A new method is proposed that combines dimension reduction and cluster analysis for categorical data. A least-squares objective function is formulated that approximates the cluster by variables cross-tabulation. Individual observations are assigned to clusters

  1. Exploring syndrome differentiation using non-negative matrix factorization and cluster analysis in patients with atopic dermatitis.

    Science.gov (United States)

    Yun, Younghee; Jung, Wonmo; Kim, Hyunho; Jang, Bo-Hyoung; Kim, Min-Hee; Noh, Jiseong; Ko, Seong-Gyu; Choi, Inhwa

    2017-08-01

    Syndrome differentiation (SD) results in a diagnostic conclusion based on a cluster of concurrent symptoms and signs, including pulse form and tongue color. In Korea, there is a strong interest in the standardization of Traditional Medicine (TM). In order to standardize TM treatment, standardization of SD should be given priority. The aim of this study was to explore the SD, or symptom clusters, of patients with atopic dermatitis (AD) using non-negative factorization methods and k-means clustering analysis. We screened 80 patients and enrolled 73 eligible patients. One TM dermatologist evaluated the symptoms/signs using an existing clinical dataset from patients with AD. This dataset was designed to collect 15 dermatologic and 18 systemic symptoms/signs associated with AD. Non-negative matrix factorization was used to decompose the original data into a matrix with three features and a weight matrix. The point of intersection of the three coordinates from each patient was placed in three-dimensional space. With five clusters, the silhouette score reached 0.484, and this was the best silhouette score obtained from two to nine clusters. Patients were clustered according to the varying severity of concurrent symptoms/signs. Through the distribution of the null hypothesis generated by 10,000 permutation tests, we found significant cluster-specific symptoms/signs from the confidence intervals in the upper and lower 2.5% of the distribution. Patients in each cluster showed differences in symptoms/signs and severity. In a clinical situation, SD and treatment are based on the practitioners' observations and clinical experience. SD, identified through informatics, can contribute to development of standardized, objective, and consistent SD for each disease. Copyright © 2017. Published by Elsevier Ltd.

  2. CLEAN: CLustering Enrichment ANalysis

    Directory of Open Access Journals (Sweden)

    Medvedovic Mario

    2009-07-01

    Full Text Available Abstract Background Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyses have been used to identify biologically meaningful clusters, compare clustering algorithms and identify biological pathways associated with the biological process under investigation. Results We developed a computational framework for analytically and visually integrating knowledge-based functional categories with the cluster analysis of genomics data. The framework is based on the simple, conceptually appealing, and biologically interpretable gene-specific functional coherence score (CLEAN score. The score is derived by correlating the clustering structure as a whole with functional categories of interest. We directly demonstrate that integrating biological knowledge in this way improves the reproducibility of conclusions derived from cluster analysis. The CLEAN score differentiates between the levels of functional coherence for genes within the same cluster based on their membership in enriched functional categories. We show that this aspect results in higher reproducibility across independent datasets and produces more informative genes for distinguishing different sample types than the scores based on the traditional cluster-wide analysis. We also demonstrate the utility of the CLEAN framework in comparing clusterings produced by different algorithms. CLEAN was implemented as an add-on R package and can be downloaded at http://Clusteranalysis.org. The package integrates routines for calculating gene specific functional coherence scores and the open source interactive Java-based viewer Functional TreeView (FTreeView. Conclusion Our results indicate that using the gene-specific functional coherence score improves the reproducibility of the

  3. Cluster Correspondence Analysis.

    Science.gov (United States)

    van de Velden, M; D'Enza, A Iodice; Palumbo, F

    2017-03-01

    A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.

  4. A study of area clustering using factor analysis in small area estimation (An analysis of per capita expenditures of subdistricts level in regency and municipality of Bogor)

    Science.gov (United States)

    Wahyudi, Notodiputro, Khairil Anwar; Kurnia, Anang; Anisa, Rahma

    2016-02-01

    Empirical Best Linear Unbiased Prediction (EBLUP) is one of indirect estimating methods which used to estimate parameters of small areas. EBLUP methods works in using auxiliary variables of area while adding the area random effects. In estimating non-sampled area, the standard EBLUP can no longer be used due to no information of area random effects. To obtain more proper estimation methods for non sampled area, the standard EBLUP model has to be modified by adding cluster information. The aim of this research was to study clustering methods using factor analysis by means of simulation, provide better cluster information. The criteria used to evaluate the goodness of fit of the methods in the simulation study were the mean percentage of clustering accuracy. The results of the simulation study showed the use of factor analysis in clustering has increased the average percentage of accuracy particularly when using Ward method. The method was taken into account to estimate the per capita expenditures based on Small Area Estimation (SAE) techniques. The method was eventually used to estimate the per capita expenditures from SUSENAS and the quality of the estimates was measured by RMSE. This research has shown that the standard-modified EBLUP model provided with factor analysis better estimates when compared with standard EBLUP model and the standard-modified EBLUP without the factor analysis. Moreover, it was also shown that the clustering information is important in estimating non sampled area.

  5. APROACHES TOWARDS CLUSTER ANALYSIS

    National Research Council Canada - National Science Library

    Manuela Tvaronaviciene; Kristina Razminiene; Leonardo Piccinetti

    2015-01-01

    .... The findings indicate that case study is used in many articles refering to cluster research. Other methods, such as analysis, interview, survey, research, equation and others are used to support case study...

  6. Using two-level factor analysis to test for cluster bias in ordinal data.

    NARCIS (Netherlands)

    Jak, Suzanne; Oort, F.J .; Dolan, Conor V.

    2014-01-01

    The test for cluster bias is a test of measurement invariance across clusters in 2-level data. This article examines the true positive rates (empirical power) and false positive rates of the test for cluster bias using the likelihood ratio test (LRT) and the Wald test with ordinal data. A simulation

  7. Bladder Carcinoma Data with Clinical Risk Factors and Molecular Markers: A Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Enrique Redondo-Gonzalez

    2015-01-01

    Full Text Available Bladder cancer occurs in the epithelial lining of the urinary bladder and is amongst the most common types of cancer in humans, killing thousands of people a year. This paper is based on the hypothesis that the use of clinical and histopathological data together with information about the concentration of various molecular markers in patients is useful for the prediction of outcomes and the design of treatments of nonmuscle invasive bladder carcinoma (NMIBC. A population of 45 patients with a new diagnosis of NMIBC was selected. Patients with benign prostatic hyperplasia (BPH, muscle invasive bladder carcinoma (MIBC, carcinoma in situ (CIS, and NMIBC recurrent tumors were not included due to their different clinical behavior. Clinical history was obtained by means of anamnesis and physical examination, and preoperative imaging and urine cytology were carried out for all patients. Then, patients underwent conventional transurethral resection (TURBT and some proteomic analyses quantified the biomarkers (p53, neu, and EGFR. A postoperative follow-up was performed to detect relapse and progression. Clusterings were performed to find groups with clinical, molecular markers, histopathological prognostic factors, and statistics about recurrence, progression, and overall survival of patients with NMIBC. Four groups were found according to tumor sizes, risk of relapse or progression, and biological behavior. Outlier patients were also detected and categorized according to their clinical characters and biological behavior.

  8. Using Two-Level Factor Analysis to Test for Cluster Bias in Ordinal Data.

    Science.gov (United States)

    Jak, Suzanne; Oort, Frans J; Dolan, Conor V

    2014-01-01

    The test for cluster bias is a test of measurement invariance across clusters in 2-level data. This article examines the true positive rates (empirical power) and false positive rates of the test for cluster bias using the likelihood ratio test (LRT) and the Wald test with ordinal data. A simulation study indicates that the scaled version of the LRT that accounts for nonnormality of the data gives untrustworthy results, whereas the unscaled LRT and the Wald test have acceptable false positive rates and perform well in terms of empirical power rate if the amount of cluster bias is large. The test for cluster bias is illustrated with data from research on teacher-student relations.

  9. Multiple Factor Analysis and k-Means Clustering-Based Classification of the DOE Groundwater Contaminant Database

    Science.gov (United States)

    Faybishenko, B.; Hazen, T. C.

    2009-12-01

    , between the types of contaminant groups and the contamination severity. The relationships between contaminant groups and the plume depth and velocity, and contaminant groups and climate are weak, and there is no a significant relationship with the plume volumes. To visualize the contribution of different factors, the results of MFA calculations are presented using two- and three-dimensional maps. Using the first four factors for the basic plume characteristics, a k-means cluster analysis was applied to classify the plumes into respective clusters. These results can be used to plan characterization, monitoring, and modeling of contaminant behavior at contaminated sites, and to design appropriate remediation technologies.

  10. Spatial analysis of suicide mortality in Québec: spatial clustering and area factor correlates.

    Science.gov (United States)

    Ngamini Ngui, André; Apparicio, Philippe; Moltchanova, Elena; Vasiliadis, Helen-Maria

    2014-12-15

    Understanding the spatial distribution of suicide can inform the planning, implementation and evaluation of suicide prevention actions. No previous study has assessed spatial clustering of the different methods of suicide in Quebec. The aim of this study was to assess spatial clustering of suicide in Quebec between 2004 and 2007 and neighborhood level predictors of the clusters. Scan statistics was applied to detect clusters of suicides by method and by sex. Smoothed standardized mortality ratios (SMRs) for suicide for each neighborhood were also estimated and their association with neighborhood characteristics was investigated using the Bayesian hierarchical spatial model. The pattern of suicide rate was different among men and women; men showed higher standardized mortality rates. The most likely clusters of suicide were found in remote rural areas. However, some neighborhoods in urban areas also had noticeable suicide clusters. Firearms suicide was most likely found in remote rural areas while poisoning and hanging suicide methods clustered in urban areas. These findings suggest that it is important to take geographical variations into account in national policy and health services planning.

  11. Comprehensive cluster analysis with Transitivity Clustering.

    Science.gov (United States)

    Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan

    2011-03-01

    Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

  12. [Cluster analysis in biomedical researches].

    Science.gov (United States)

    Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

    2013-01-01

    Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.

  13. IMPACTS OF STRUCTURAL FACTORS ON ENERGY CONSUMPTION IN CLUSTER-BASED WIRELESS SENSOR NETWORKS: A COMPREHENSIVE ANALYSIS

    Directory of Open Access Journals (Sweden)

    Taner Cevik

    2015-02-01

    Full Text Available Limited energy is the major driving factor for research on wireless sensor networks. Clustering alleviates this energy shortage problem by reducing data traffic conveyed over the network and therefore several clustering methods are proposed in the literature. Researchers put forward their methods by making serious assumptions such as always locating single sink at one side of the topology or making clusters near to the sink with smaller sizes. However, to the best of our knowledge, there is no comprehensive research that investigates the effects of various structural alternatives on energy consumption of wireless sensor networks. In this paper, we thoroughly analyse the impact of various structural approaches such as cluster size, number of tiers in the topology, node density, position and number of sinks. Extensive simulation results are provided. The results show that the best performance about lifetime prolongation is achieved by locating a sufficient number of sinks around the network area.

  14. Impacts of Structural Factors on Energy Consumption in Cluster-Based Wireless Sensor Networks : A Comprehensive Analysis

    Directory of Open Access Journals (Sweden)

    Taner Cevik

    2015-02-01

    Full Text Available Limited energy is the major driving factor for research on wireless sensor networks. Clustering alleviates this energy shortage problem by reducing data traffic conveyed over the network and therefore several clustering methods are proposed in the literature. Researchers put forward their methods by making serious assumptions such as always locating single sink at one side of the topology or making clusters near to the sink with smaller sizes. However, to the best of our knowledge, there is no comprehensive research that investigates the effects of various structural alternatives on energy consumption of wireless sensor networks. In this paper, we thoroughly analyse the impact of various structural approaches such as cluster size, number of tiers in the topology, node density, position and number of sinks. Extensive simulation results are provided. The results show that the best performance about lifetime prolongation is achieved by locating a sufficient number of sinks around the network area.

  15. Descriptive analysis of factors that influence economical results in the furniture cluster of Bento Gonçalves

    Directory of Open Access Journals (Sweden)

    Miguel Afonso Sellitto

    2014-12-01

    Full Text Available The purpose of this article is to analyze factors that can influence the competitiveness of companies in the furniture cluster of Bento Gonçalves, Rio Grande do Sul. By a literature review, we identify four factors that can influence competition in clusters: the region's productivity, innovation, relationship with suppliers, and cooperation between companies. The research method is the single case study. The research techniques are the review of specific bibliographic and documentation of the studied cluster, and interviews with experts of the cluster. The main findings are: the cluster has high productivity, mainly by hi-tech machinery employed by the main companies; innovation is permanent and motivated by the imposition to medium and short companies of business goals by the main companies; the relationship with suppliers is problematic regarding the large-scale vendors by the lack of the practice of collective purchases in the area; and cooperation between enterprises is small, by the culture of the region that don´t appreciate depending on resources available outside the companies. Such factors can contribute to produce hypotheses for further research.

  16. A pyrosequencing assay for the quantitative methylation analysis of the PCDHB gene cluster, the major factor in neuroblastoma methylator phenotype.

    Science.gov (United States)

    Banelli, Barbara; Brigati, Claudio; Di Vinci, Angela; Casciano, Ida; Forlani, Alessandra; Borzì, Luana; Allemanni, Giorgio; Romani, Massimo

    2012-03-01

    Epigenetic alterations are hallmarks of cancer and powerful biomarkers, whose clinical utilization is made difficult by the absence of standardization and of common methods of data interpretation. The coordinate methylation of many loci in cancer is defined as 'CpG island methylator phenotype' (CIMP) and identifies clinically distinct groups of patients. In neuroblastoma (NB), CIMP is defined by a methylation signature, which includes different loci, but its predictive power on outcome is entirely recapitulated by the PCDHB cluster only. We have developed a robust and cost-effective pyrosequencing-based assay that could facilitate the clinical application of CIMP in NB. This assay permits the unbiased simultaneous amplification and sequencing of 17 out of 19 genes of the PCDHB cluster for quantitative methylation analysis, taking into account all the sequence variations. As some of these variations were at CpG doublets, we bypassed the data interpretation conducted by the methylation analysis software to assign the corrected methylation value at these sites. The final result of the assay is the mean methylation level of 17 gene fragments in the protocadherin B cluster (PCDHB) cluster. We have utilized this assay to compare the methylation levels of the PCDHB cluster between high-risk and very low-risk NB patients, confirming the predictive value of CIMP. Our results demonstrate that the pyrosequencing-based assay herein described is a powerful instrument for the analysis of this gene cluster that may simplify the data comparison between different laboratories and, in perspective, could facilitate its clinical application. Furthermore, our results demonstrate that, in principle, pyrosequencing can be efficiently utilized for the methylation analysis of gene clusters with high internal homologies.

  17. Quantitative analysis of individual hepatocyte growth factor receptor clusters in influenza A virus infected human epithelial cells using localization microscopy.

    Science.gov (United States)

    Wang, Qiaoyun; Dierkes, Rüdiger; Kaufmann, Rainer; Cremer, Christoph

    2014-04-01

    In this report, we applied a special localization microscopy technique (Spectral Precision Distance/Spatial Position Determination Microscopy/SPDM) to quantitatively analyze the effect of influenza A virus (IAV) infection on the spatial distribution of individual HGFR (Hepatocyte Growth Factor Receptor) proteins on the membrane of human epithelial cells at the single molecule resolution level. We applied this SPDM method to Alexa 488 labeled HGFR proteins with two different ligands. The ligands were either HGF (Hepatocyte Growth Factor), or IAV. In addition, the HGFR distribution in a control group of mock-incubated cells without any ligands was investigated. The spatial distribution of 1×10(6) individual HGFR proteins localized in large regions of interest on membranes of 240 cells was quantitatively analyzed and found to be highly non-random. Between 21% and 24% of the HGFR molecules were located in 44,304 small clusters with an average diameter of 54nm. The mean density of HGFR molecule signals per individual cluster was very similar in control cells, in cells with ligand only, and in IAV infected cells, independent of the incubation time. From the density of HGFR molecule signals in the clusters and the diameter of the clusters, the number of HGFR molecule signals per cluster was estimated to be in the range between 4 and 11 (means 5-6). This suggests that the membrane bound HGFR clusters form small molecular complexes with a maximum diameter of few tens of nm, composed of a relatively low number of HGFR molecules. This article is part of a Special Issue entitled: Viral Membrane Proteins - Channels for Cellular Networking. Copyright © 2013 Elsevier B.V. All rights reserved.

  18. 会计社会责任探究%The Analysis of Influencing Factors to Financial Industry Cluster

    Institute of Scientific and Technical Information of China (English)

    陈沉

    2012-01-01

    企业社会责任成为诸多专家学者的研究焦点,企业作为经济活动的载体,承担着社会责任,会计在披露企业社会责任中扮演着越来越重要的角色,会计人员提供的会计信息质量,直接关系到相关利益者的经济利益以及社会资源的有效配置。基于此种原因,本文着重探讨会计社会责任的一些基本问题。%In recent years, financial industry cluster has flourished, promoting economic growth in these areas. This arti- cle first studies the emerging background of financial industry cluster, then concludes that five factors influence financial indus- try cluster, which are geographic location, economic effect, trading needs, government support and base installation. So we use 2000--2009 panel data of 31 provinces ( cities), adopt fixed effect model to examine the five factors' effect on financial indus- try cluster. The paper concludes that geographic location, economic effect, trading needs and government support have positive influence on financial industry cluster. Finally, we put forwards several pieces of suggestion to develop financial industry clus- ter in China.

  19. Stratification and analysis of housing indicators of rural areas of Isfahan province using factor and cluster analyses

    Directory of Open Access Journals (Sweden)

    S. E. Seidaiy

    2013-01-01

    : market economy and planned economy. In view of market economy, housing problems are solved through the market mechanisms and housing needs are provided by private sector (Chadwick, 1987:88, Ziyari, et al., 210:4. In planned economy government has the role of planner, designer and manager (Aghasi, 1996:201, Chadwick, 1987:88, Shucksmith, 2003:213. In Islam's ideological system the importance of housing is as far as that the housing provision is considered as one of the bases of economic independency, and eradication of poverty in the society.3– DiscussionTo evaluate and analyze the housing indicators in the rural areas of Isfahan province, first data and the related variables are collected and based on them the desired indicators are obtained (Table-1; then, in line with goals of research, we will go through the following steps:Analysis of housing situation in rural areas of Isfahan province by using housing indicators,Determining effective factors in improving housing indicators,And stratification of rural areas based on these indicators.Applying statistical techniques (factor analysis and cluster analysis, analysis of indicators and prioritization of rural areas of the province are performed. Table 1: Housing IndicatorsROWindicatorsROWindicators1The population of rural areas12The average of infrastructure lifetime2The number of households13The share of households that have a minimum electricity4The family size14The share of households that have a minimum telephone4The number of residential units,15The share of households that have a minimum water piping5The household density in residential units16The share of households that have a minimum gas piping6The density of people in residential units17The share of households that have a minimum central heating and cooling system7The housing shortages18The share of households that have a minimum kitchen8The average of number of rooms in the household19The share of households that have a minimum bathroom9The average of number

  20. Cluster analysis: a new approach for identification of underlying risk factors for coronary artery disease in essential hypertensive patients.

    Science.gov (United States)

    Guo, Qi; Lu, Xiaoni; Gao, Ya; Zhang, Jingjing; Yan, Bin; Su, Dan; Song, Anqi; Zhao, Xi; Wang, Gang

    2017-03-07

    Grading of essential hypertension according to blood pressure (BP) level may not adequately reflect clinical heterogeneity of hypertensive patients. This study was carried out to explore clinical phenotypes in essential hypertensive patients using cluster analysis. This study recruited 513 hypertensive patients and evaluated BP variations with ambulatory blood pressure monitoring. Four distinct hypertension groups were identified using cluster analysis: (1) younger male smokers with relatively high BP had the most severe carotid plaque thickness but no coronary artery disease (CAD); (2) older women with relatively low diastolic BP had more diabetes; (3) non-smokers with a low systolic BP level had neither diabetes nor CAD; (4) hypertensive patients with BP reverse dipping were most likely to have CAD but had least severe carotid plaque thickness. In binary logistic analysis, reverse dipping was significantly associated with prevalence of CAD. Cluster analysis was shown to be a feasible approach for investigating the heterogeneity of essential hypertension in clinical studies. BP reverse dipping might be valuable for prediction of CAD in hypertensive patients when compared with carotid plaque thickness. However, large-scale prospective trials with more information of plaque morphology are necessary to further compare the predicative power between BP dipping pattern and carotid plaque.

  1. Integrative cluster analysis in bioinformatics

    CERN Document Server

    Abu-Jamous, Basel; Nandi, Asoke K

    2015-01-01

    Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review o

  2. Creative Clusters in Visegrad Countries: Factors Conditioning Cluster Establishment and Development

    Directory of Open Access Journals (Sweden)

    Bialic-Davendra Magdalena

    2016-06-01

    Full Text Available Since the accession of the Visegrad Group of countries (V4 to the European Union, the importance of clusters has increased. With growing global competitiveness and EU 12 trends, a gradual awareness of creative industries is observed in V4 countries. Therefore, this article analyses creative clusters and factors conditioning their establishment and development. On the basis of a literature review and a questionnaire survey, a mapping of creative clusters was conducted. In addition, catalysts, main motives and key factors in the process of their establishment were identified, as were the activities and factors hampering their development. The scheme of cluster development is presented as the outcome of the qualitative analysis, along with a comparison to findings of other studies. Research findings show that trust building and administrative obstacles are among the main barriers, especially for design clusters and cultural clusters.

  3. Usage of K-cluster and factor analysis for grouping and evaluation the quality of olive oil in accordance with physico-chemical parameters

    Science.gov (United States)

    Milev, M.; Nikolova, Kr.; Ivanova, Ir.; Dobreva, M.

    2015-11-01

    25 olive oils were studied- different in origin and ways of extraction, in accordance with 17 physico-chemical parameters as follows: color parameters - a and b, light, fluorescence peaks, pigments - chlorophyll and β-carotene, fatty-acid content. The goals of the current study were: Conducting correlation analysis to find the inner relation between the studied indices; By applying factor analysis with the help of the method of Principal Components (PCA), to reduce the great number of variables into a few factors, which are of main importance for distinguishing the different types of olive oil;Using K-means cluster to compare and group the tested types olive oils based on their similarity. The inner relation between the studied indices was found by applying correlation analysis. A factor analysis using PCA was applied on the basis of the found correlation matrix. Thus the number of the studied indices was reduced to 4 factors, which explained 79.3% from the entire variation. The first one unified the color parameters, β-carotene and the related with oxidative products fluorescence peak - about 520 nm. The second one was determined mainly by the chlorophyll content and related to it fluorescence peak - about 670 nm. The third and the fourth factors were determined by the fatty-acid content of the samples. The third one unified the fatty-acids, which give us the opportunity to distinguish olive oil from the other plant oils - oleic, linoleic and stearin acids. The fourth factor included fatty-acids with relatively much lower content in the studied samples. It is enquired the number of clusters to be determined preliminary in order to apply the K-Cluster analysis. The variant K = 3 was worked out because the types of the olive oil were three. The first cluster unified all salad and pomace olive oils, the second unified the samples of extra virgin oilstaken as controls from producers, which were bought from the trade network. The third cluster unified samples from

  4. Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study.

    Science.gov (United States)

    Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein

    2014-11-01

    Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.

  5. FACTOR MODEL ASSESSMENT OF THE COMPETITIVE INNOVATION CLUSTERS ELECTRONICS BASED ON ANALYSIS OF THE STAGES OF THEIR LIFE CYCLE

    Directory of Open Access Journals (Sweden)

    A. V. Brykin

    2013-01-01

    Full Text Available The cluster principle development in the world of electronics is one of the most effective examples of high-tech industry. The author considers the possibility of using clusters to modernize the Russian economy.

  6. Cluster Analysis of Adolescent Blogs

    Science.gov (United States)

    Liu, Eric Zhi-Feng; Lin, Chun-Hung; Chen, Feng-Yi; Peng, Ping-Chuan

    2012-01-01

    Emerging web applications and networking systems such as blogs have become popular, and they offer unique opportunities and environments for learners, especially for adolescent learners. This study attempts to explore the writing styles and genres used by adolescents in their blogs by employing content, factor, and cluster analyses. Factor…

  7. Scoring methods used in cluster analysis

    OpenAIRE

    Sirota, Sergej

    2014-01-01

    The aim of the thesis is to compare methods of cluster analysis correctly classify objects in the dataset into groups, which are known. In the theoretical section first describes the steps needed to prepare a data file for cluster analysis. The next theoretical section is dedicated to the cluster analysis, which describes ways of measuring similarity of objects and clusters, and dedicated to description the methods of cluster analysis used in practical part of this thesis. In practical part a...

  8. Nonlinear analysis of EAS clusters

    CERN Document Server

    Zotov, M Yu; Fomin, Y A; Fomin, Yu. A.

    2002-01-01

    We apply certain methods of nonlinear time series analysis to the extensive air shower clusters found earlier in the data set obtained with the EAS-1000 Prototype array. In particular, we use the Grassberger-Procaccia algorithm to compute the correlation dimension of samples in the vicinity of the clusters. The validity of the results is checked by surrogate data tests and some additional quantities. We compare our conclusions with the results of similar investigations performed by the EAS-TOP and LAAS groups.

  9. Supermodel Analysis of Galaxy Clusters

    CERN Document Server

    Fusco-Femiano, R; Lapi, A

    2009-01-01

    [abridged] We present the analysis of the X-ray brightness and temperature profiles for six clusters belonging to both the Cool Core and Non Cool Core classes, in terms of the Supermodel (SM) developed by Cavaliere, Lapi & Fusco-Femiano (2009). Based on the gravitational wells set by the dark matter halos, the SM straightforwardly expresses the equilibrium of the IntraCluster Plasma (ICP) modulated by the entropy deposited at the boundary by standing shocks from gravitational accretion, and injected at the center by outgoing blastwaves from mergers or from outbursts of Active Galactic Nuclei. The cluster set analyzed here highlights not only how simply the SM represents the main dichotomy Cool vs. Non Cool Core clusters in terms of a few ICP parameters governing the radial entropy run, but also how accurately it fits even complex brightness and temperature profiles. For Cool Core clusters like A2199 and A2597, the SM with a low level of central entropy straightforwardly yields the characteristic peaked pr...

  10. Factor analysis

    CERN Document Server

    Gorsuch, Richard L

    2013-01-01

    Comprehensive and comprehensible, this classic covers the basic and advanced topics essential for using factor analysis as a scientific tool in psychology, education, sociology, and related areas. Emphasizing the usefulness of the techniques, it presents sufficient mathematical background for understanding and sufficient discussion of applications for effective use. This includes not only theory but also the empirical evaluations of the importance of mathematical distinctions for applied scientific analysis.

  11. CONSIDERATIONS REGARDING THE FACTORS THAT INFLUENCE THE PERFORMANCE OF CLUSTER

    Directory of Open Access Journals (Sweden)

    DANA-CODRUŢA DUDĂ-DĂIANU

    2012-05-01

    Full Text Available Economic performance is an objective of each cluster and innovation is a result of future performance indicator. The working paper proposed to measure the cluster performance based on three success factors: competitiveness cluster, cluster growth and the degree of the objectives. Based on Porter's diamond model will be a breakdown of the main factors influencing the development of clusters and their delineation in general and specific factors cluster. In the same time, will analyze the main directions that define performance clusters: access to resources, access to specialized knowledge, entrepreneurship based on the opportunities, collaboration between organizations and cluster specific organizational culture.

  12. SPATIO-TEMPORAL CLUSTER ANALYSIS OF DISEASE

    Directory of Open Access Journals (Sweden)

    M. S. Abramovich

    2014-01-01

    Full Text Available The robust version of the spatial scanning statistics for clustering is proposed. Spatio-temporal cluster analysis algorithms were used for the cluster detection of incidence of thyroid carcinoma. Me-thods and algorithms of detection and building clusters for disease on studying territories are consi-dered.

  13. Suicide Clusters: A Review of Risk Factors and Mechanisms

    Science.gov (United States)

    Haw, Camilla; Hawton, Keith; Niedzwiedz, Claire; Platt, Steve

    2013-01-01

    Suicide clusters, although uncommon, cause great concern in the communities in which they occur. We searched the world literature on suicide clusters and describe the risk factors and proposed psychological mechanisms underlying the spatio-temporal clustering of suicides (point clusters). Potential risk factors include male gender, being an…

  14. Suicide Clusters: A Review of Risk Factors and Mechanisms

    Science.gov (United States)

    Haw, Camilla; Hawton, Keith; Niedzwiedz, Claire; Platt, Steve

    2013-01-01

    Suicide clusters, although uncommon, cause great concern in the communities in which they occur. We searched the world literature on suicide clusters and describe the risk factors and proposed psychological mechanisms underlying the spatio-temporal clustering of suicides (point clusters). Potential risk factors include male gender, being an…

  15. Research on Competitiveness of County Economy Based on Factor Analysis and Cluster Analysis——Taking 88 Counties in Guizhou as Samples

    Institute of Scientific and Technical Information of China (English)

    2011-01-01

    17 indices are selected,such as the growth rate of total regional output value,the proportion of tertiary industry in GDP,per capita financial expenditure,and soil erosion rate of Guizhou Province in 2009.According to the relevant indices data of statistical yearbook and governmental website,by using the method of factor analysis and the method of cluster analysis,we assess the competitiveness of county economy in 88 counties of Guizhou Province.The results show that the competitiveness of county economy in Guizhou Province is impacted by factors of location and economic foundation.In addition,the resources environment,economic structure,economic developmental speed and other factors also impact the competitiveness of county economy in Guizhou Province.Based on these,in the light of the developmental characteristics of different counties in conjunction with different developmental advantages in different regions,we should adopt different developmental strategies according to local conditions,which is significant to rapid,healthy and sustainable development of county economy in Guizhou Province.

  16. The SMART CLUSTER METHOD - adaptive earthquake cluster analysis and declustering

    Science.gov (United States)

    Schaefer, Andreas; Daniell, James; Wenzel, Friedemann

    2016-04-01

    Earthquake declustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity with usual applications comprising of probabilistic seismic hazard assessments (PSHAs) and earthquake prediction methods. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation. Various methods have been developed to address this issue from other researchers. These have differing ranges of complexity ranging from rather simple statistical window methods to complex epidemic models. This study introduces the smart cluster method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal identification. Hereby, an adaptive search algorithm for data point clusters is adopted. It uses the earthquake density in the spatio-temporal neighbourhood of each event to adjust the search properties. The identified clusters are subsequently analysed to determine directional anisotropy, focussing on a strong correlation along the rupture plane and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010/2011 Darfield-Christchurch events, an adaptive classification procedure is applied to disassemble subsequent ruptures which may have been grouped into an individual cluster using near-field searches, support vector machines and temporal splitting. The steering parameters of the search behaviour are linked to local earthquake properties like magnitude of completeness, earthquake density and Gutenberg-Richter parameters. The method is capable of identifying and classifying earthquake clusters in space and time. It is tested and validated using earthquake data from California and New Zealand. As a result of the cluster identification process, each event in

  17. Cluster Analysis of Ranunculus Species

    Directory of Open Access Journals (Sweden)

    SURANTO

    2002-01-01

    Full Text Available The aim of the experiment was to examine whether the morphological characters of eleven species of Ranunculus collected from a number of populations were in agreement with the genetic data (isozyme. The method used in this study was polyacrilamide gel electrophoresis using peroxides, estarase, malate dehydrogenase, and acid phosphatase enzymes. The results showed that cluster analysis based on isozyme data have given a good support to classification of eleven species based on morphological groups. This study concluded that in certain species each morphological variation was profit to be genetically based.

  18. In Silico Analysis of Gene Expression Network Components Underlying Pigmentation Phenotypes in the Python Identified Evolutionarily Conserved Clusters of Transcription Factor Binding Sites

    Directory of Open Access Journals (Sweden)

    Kristopher J. L. Irizarry

    2016-01-01

    Full Text Available Color variation provides the opportunity to investigate the genetic basis of evolution and selection. Reptiles are less studied than mammals. Comparative genomics approaches allow for knowledge gained in one species to be leveraged for use in another species. We describe a comparative vertebrate analysis of conserved regulatory modules in pythons aimed at assessing bioinformatics evidence that transcription factors important in mammalian pigmentation phenotypes may also be important in python pigmentation phenotypes. We identified 23 python orthologs of mammalian genes associated with variation in coat color phenotypes for which we assessed the extent of pairwise protein sequence identity between pythons and mouse, dog, horse, cow, chicken, anole lizard, and garter snake. We next identified a set of melanocyte/pigment associated transcription factors (CREB, FOXD3, LEF-1, MITF, POU3F2, and USF-1 that exhibit relatively conserved sequence similarity within their DNA binding regions across species based on orthologous alignments across multiple species. Finally, we identified 27 evolutionarily conserved clusters of transcription factor binding sites within ~200-nucleotide intervals of the 1500-nucleotide upstream regions of AIM1, DCT, MC1R, MITF, MLANA, OA1, PMEL, RAB27A, and TYR from Python bivittatus. Our results provide insight into pigment phenotypes in pythons.

  19. In Silico Analysis of Gene Expression Network Components Underlying Pigmentation Phenotypes in the Python Identified Evolutionarily Conserved Clusters of Transcription Factor Binding Sites

    Science.gov (United States)

    2016-01-01

    Color variation provides the opportunity to investigate the genetic basis of evolution and selection. Reptiles are less studied than mammals. Comparative genomics approaches allow for knowledge gained in one species to be leveraged for use in another species. We describe a comparative vertebrate analysis of conserved regulatory modules in pythons aimed at assessing bioinformatics evidence that transcription factors important in mammalian pigmentation phenotypes may also be important in python pigmentation phenotypes. We identified 23 python orthologs of mammalian genes associated with variation in coat color phenotypes for which we assessed the extent of pairwise protein sequence identity between pythons and mouse, dog, horse, cow, chicken, anole lizard, and garter snake. We next identified a set of melanocyte/pigment associated transcription factors (CREB, FOXD3, LEF-1, MITF, POU3F2, and USF-1) that exhibit relatively conserved sequence similarity within their DNA binding regions across species based on orthologous alignments across multiple species. Finally, we identified 27 evolutionarily conserved clusters of transcription factor binding sites within ~200-nucleotide intervals of the 1500-nucleotide upstream regions of AIM1, DCT, MC1R, MITF, MLANA, OA1, PMEL, RAB27A, and TYR from Python bivittatus. Our results provide insight into pigment phenotypes in pythons. PMID:27698666

  20. Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.

    Science.gov (United States)

    He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej

    2011-12-01

    Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.

  1. Obstructive Sleep Apnea: A Cluster Analysis at Time of Diagnosis

    Science.gov (United States)

    Grillet, Yves; Richard, Philippe; Stach, Bruno; Vivodtzev, Isabelle; Timsit, Jean-Francois; Lévy, Patrick; Tamisier, Renaud; Pépin, Jean-Louis

    2016-01-01

    Background The classification of obstructive sleep apnea is on the basis of sleep study criteria that may not adequately capture disease heterogeneity. Improved phenotyping may improve prognosis prediction and help select therapeutic strategies. Objectives: This study used cluster analysis to investigate the clinical clusters of obstructive sleep apnea. Methods An ascending hierarchical cluster analysis was performed on baseline symptoms, physical examination, risk factor exposure and co-morbidities from 18,263 participants in the OSFP (French national registry of sleep apnea). The probability for criteria to be associated with a given cluster was assessed using odds ratios, determined by univariate logistic regression. Results: Six clusters were identified, in which patients varied considerably in age, sex, symptoms, obesity, co-morbidities and environmental risk factors. The main significant differences between clusters were minimally symptomatic versus sleepy obstructive sleep apnea patients, lean versus obese, and among obese patients different combinations of co-morbidities and environmental risk factors. Conclusions Our cluster analysis identified six distinct clusters of obstructive sleep apnea. Our findings underscore the high degree of heterogeneity that exists within obstructive sleep apnea patients regarding clinical presentation, risk factors and consequences. This may help in both research and clinical practice for validating new prevention programs, in diagnosis and in decisions regarding therapeutic strategies. PMID:27314230

  2. Clustering of risk factors and social class in childhood and adulthood in British women's heart and health study: cross sectional analysis

    OpenAIRE

    Ebrahim, S; Montaner, D.; Lawlor, DA

    2004-01-01

    OBJECTIVE: To examine co-occurrence and clustering of risk factors used in the Framingham equation by social class in childhood and adult life. DESIGN: Cross sectional study. SETTING: 23 towns across England, Wales, and Scotland. PARTICIPANTS: 2936 women aged 60-79 years. MAIN OUTCOME MEASURES: Prevalence of risk factors (hypertension, obesity, smoking, left ventricular hypertrophy on electrocardiography, diabetes, and low concentration of high density cholesterol); ratios of observed to expe...

  3. Cluster analysis in phenotyping a Portuguese population.

    Science.gov (United States)

    Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J

    2015-09-03

    Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.

  4. The applicability and effectiveness of cluster analysis

    Science.gov (United States)

    Ingram, D. S.; Actkinson, A. L.

    1973-01-01

    An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.

  5. Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

    Science.gov (United States)

    Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

    2013-03-01

    Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.

  6. Factored Translation with Unsupervised Word Clusters

    DEFF Research Database (Denmark)

    Rishøj, Christian; Søgaard, Anders

    2011-01-01

    Unsupervised word clustering algorithms — which form word clusters based on a measure of distributional similarity — have proven to be useful in providing beneficial features for various natural language processing tasks involving supervised learning. This work explores the utility of such word c....... While such an “oracle” method is not identified, evaluations indicate that unsupervised word cluster are most beneficial in sentences without unknown words....

  7. 因子分析法和聚类分析法在服刑人员行为矫正评估中的应用%Application of Factor Analysis and Clustering Analysis in the Evaluation of Inmates' Behavior Modification

    Institute of Scientific and Technical Information of China (English)

    姜晓莉; 朱云峰

    2014-01-01

    文章针对服刑人员行为矫正评估中测试指标多、计算量大导致的不易观测问题,提出了采用因子分析和聚类分析法对不同测题进行有分类抽题和组题,利用聚类分析对测评成绩的合理归类,进而对数据进行综合分析的评定方法,并以图表方式实例化展示了分析结果。%In view of such problems arising from the evaluation of inmates' behavior modification as redun-dant test indicators and massive calculations, this article proposes classifying and grouping the tests by means of factor analysis and clustering analysis,classify the test results by means of clustering analysis, and then making a comprehensive analysis of the data, all of which is achieved via programs.In addition, the analysis results are demonstrated via diagrams.

  8. Cancer incidence in men: a cluster analysis of spatial patterns

    Directory of Open Access Journals (Sweden)

    D'Alò Daniela

    2008-11-01

    Full Text Available Abstract Background Spatial clustering of different diseases has received much less attention than single disease mapping. Besides chance or artifact, clustering of different cancers in a given area may depend on exposure to a shared risk factor or to multiple correlated factors (e.g. cigarette smoking and obesity in a deprived area. Models developed so far to investigate co-occurrence of diseases are not well-suited for analyzing many cancers simultaneously. In this paper we propose a simple two-step exploratory method for screening clusters of different cancers in a population. Methods Cancer incidence data were derived from the regional cancer registry of Umbria, Italy. A cluster analysis was performed on smoothed and non-smoothed standardized incidence ratios (SIRs of the 13 most frequent cancers in males. The Besag, York and Mollie model (BYM and Poisson kriging were used to produce smoothed SIRs. Results Cluster analysis on non-smoothed SIRs was poorly informative in terms of clustering of different cancers, as only larynx and oral cavity were grouped, and of characteristic patterns of cancer incidence in specific geographical areas. On the other hand BYM and Poisson kriging gave similar results, showing cancers of the oral cavity, larynx, esophagus, stomach and liver formed a main cluster. Lung and urinary bladder cancers clustered together but not with the cancers mentioned above. Both methods, particularly the BYM model, identified distinct geographic clusters of adjacent areas. Conclusion As in single disease mapping, non-smoothed SIRs do not provide reliable estimates of cancer risks because of small area variability. The BYM model produces smooth risk surfaces which, when entered into a cluster analysis, identify well-defined geographical clusters of adjacent areas. It probably enhances or amplifies the signal arising from exposure of more areas (statistical units to shared risk factors that are associated with different cancers. In

  9. Robust cluster analysis and variable selection

    CERN Document Server

    Ritter, Gunter

    2014-01-01

    Clustering remains a vibrant area of research in statistics. Although there are many books on this topic, there are relatively few that are well founded in the theoretical aspects. In Robust Cluster Analysis and Variable Selection, Gunter Ritter presents an overview of the theory and applications of probabilistic clustering and variable selection, synthesizing the key research results of the last 50 years. The author focuses on the robust clustering methods he found to be the most useful on simulated data and real-time applications. The book provides clear guidance for the varying needs of bot

  10. ASteCA - Automated Stellar Cluster Analysis

    CERN Document Server

    Perren, Gabriel I; Piatti, Andrés E

    2014-01-01

    We present ASteCA (Automated Stellar Cluster Analysis), a suit of tools designed to fully automatize the standard tests applied on stellar clusters to determine their basic parameters. The set of functions included in the code make use of positional and photometric data to obtain precise and objective values for a given cluster's center coordinates, radius, luminosity function and integrated color magnitude, as well as characterizing through a statistical estimator its probability of being a true physical cluster rather than a random overdensity of field stars. ASteCA incorporates a Bayesian field star decontamination algorithm capable of assigning membership probabilities using photometric data alone. An isochrone fitting process based on the generation of synthetic clusters from theoretical isochrones and selection of the best fit through a genetic algorithm is also present, which allows ASteCA to provide accurate estimates for a cluster's metallicity, age, extinction and distance values along with its unce...

  11. Cluster analysis for computer workload evaluation

    CERN Document Server

    Landau, K

    1976-01-01

    An introduction to computer workload analysis is given, showing its range of application in computer centre management, system and application programming. Cluster methods are discussed which can be used in conjunction with workload data and cluster algorithms are adapted to the specific set problem. Several samples of CDC 7600- accounting-data-collected at CERN, the European Organization for Nuclear Research-underwent a cluster analysis to determine job groups. The conclusions from resource usage of typical job groups in relation to computer workload analysis are discussed. (17 refs).

  12. Model-free data analysis for source separation based on Non-Negative Matrix Factorization and k-means clustering (NMFk)

    Science.gov (United States)

    Vesselinov, V. V.; Alexandrov, B.

    2014-12-01

    The identification of the physical sources causing spatial and temporal fluctuations of state variables such as river stage levels and aquifer hydraulic heads is challenging. The fluctuations can be caused by variations in natural and anthropogenic sources such as precipitation events, infiltration, groundwater pumping, barometric pressures, etc. The source identification and separation can be crucial for conceptualization of the hydrological conditions and characterization of system properties. If the original signals that cause the observed state-variable transients can be successfully "unmixed", decoupled physics models may then be applied to analyze the propagation of each signal independently. We propose a new model-free inverse analysis of transient data based on Non-negative Matrix Factorization (NMF) method for Blind Source Separation (BSS) coupled with k-means clustering algorithm, which we call NMFk. NMFk is capable of identifying a set of unique sources from a set of experimentally measured mixed signals, without any information about the sources, their transients, and the physical mechanisms and properties controlling the signal propagation through the system. A classical BSS conundrum is the so-called "cocktail-party" problem where several microphones are recording the sounds in a ballroom (music, conversations, noise, etc.). Each of the microphones is recording a mixture of the sounds. The goal of BSS is to "unmix'" and reconstruct the original sounds from the microphone records. Similarly to the "cocktail-party" problem, our model-freee analysis only requires information about state-variable transients at a number of observation points, m, where m > r, and r is the number of unknown unique sources causing the observed fluctuations. We apply the analysis on a dataset from the Los Alamos National Laboratory (LANL) site. We identify and estimate the impact and sources are barometric pressure and water-supply pumping effects. We also estimate the

  13. A New Source Apportionment Method of Maxed Dust Source Based on Cluster Analysis and Factor Analysis%基于聚类分析和因子分析的混合尘源解析新方法

    Institute of Scientific and Technical Information of China (English)

    郑雪峰; 邹长武; 印红玲

    2011-01-01

    The cluster analysis and factor analysis methods were successfully used to solve the problem of multicollinearity in CMB ( chemical mass balance)model. First the cluster analysis was used to analyse the colinearity among the emission sources, then the main factors of the emission sources with strong colinearity were selected by principal component analysis. Bring them and other single sources into the CMB to calculate.Finally returned the main factors contribution to single source of mixed dust sources, the contribution of each emission sources could be acquired.Compared with the other methods,the results showed that the analytical results were realistic and the method was feasible.%运用聚类分析扣因子分析来解决大气颗粒物源解析CMB模型(化学质量平衡模型)在解析混合尘源中遇到的共线性问题.即通过聚类分析对排放源进行共线性强弱分类,根据分析结果对其中共线性较强的一类(扬尘类源)提取主因子,并和其他独立尘源共同带入CMB模型进行计算.最后将主因子贡献量返回,得到各个源贡献值.通过和其他方法的效果进行比较,结果表明,该方法解析结果符合实际,具有可行性.

  14. On the Equivalence of Nonnegative Matrix Factorization and K-means- Spectral Clustering

    Energy Technology Data Exchange (ETDEWEB)

    Ding, Chris; He, Xiaofeng; Simon, Horst D.; Jin, Rong

    2005-12-04

    We provide a systematic analysis of nonnegative matrix factorization (NMF) relating to data clustering. We generalize the usual X = FG{sup T} decomposition to the symmetric W = HH{sup T} and W = HSH{sup T} decompositions. We show that (1) W = HH{sup T} is equivalent to Kernel K-means clustering and the Laplacian-based spectral clustering. (2) X = FG{sup T} is equivalent to simultaneous clustering of rows and columns of a bipartite graph. We emphasizes the importance of orthogonality in NMF and soft clustering nature of NMF. These results are verified with experiments on face images and newsgroups.

  15. Cluster analysis of multiple planetary flow regimes

    Science.gov (United States)

    Mo, Kingtse; Ghil, Michael

    1988-01-01

    A modified cluster analysis method developed for the classification of quasi-stationary events into a few planetary flow regimes and for the examination of transitions between these regimes is described. The method was applied first to a simple deterministic model and then to a 500-mbar data set for Northern Hemisphere (NH), for which cluster analysis was carried out in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters were found in the low-frequency band of more than 10 days, while transient clusters were found in the band-pass frequency window between 2.5 and 6 days. In the low-frequency band, three pairs of clusters determined EOFs 1, 2, and 3, respectively; they exhibited well-known regional features, such as blocking, the Pacific/North American pattern, and wave trains. Both model and low-pass data exhibited strong bimodality.

  16. [Cluster analysis and its application].

    Science.gov (United States)

    Půlpán, Zdenĕk

    2002-01-01

    The study exploits knowledge-oriented and context-based modification of well-known algorithms of (fuzzy) clustering. The role of fuzzy sets is inherently inclined towards coping with linguistic domain knowledge also. We try hard to obtain from rich diverse data and knowledge new information about enviroment that is being explored.

  17. Cluster-based exposure variation analysis.

    Science.gov (United States)

    Samani, Afshin; Mathiassen, Svend Erik; Madeleine, Pascal

    2013-04-04

    Static posture, repetitive movements and lack of physical variation are known risk factors for work-related musculoskeletal disorders, and thus needs to be properly assessed in occupational studies. The aims of this study were (i) to investigate the effectiveness of a conventional exposure variation analysis (EVA) in discriminating exposure time lines and (ii) to compare it with a new cluster-based method for analysis of exposure variation. For this purpose, we simulated a repeated cyclic exposure varying within each cycle between "low" and "high" exposure levels in a "near" or "far" range, and with "low" or "high" velocities (exposure change rates). The duration of each cycle was also manipulated by selecting a "small" or "large" standard deviation of the cycle time. Theses parameters reflected three dimensions of exposure variation, i.e. range, frequency and temporal similarity.Each simulation trace included two realizations of 100 concatenated cycles with either low (ρ = 0.1), medium (ρ = 0.5) or high (ρ = 0.9) correlation between the realizations. These traces were analyzed by conventional EVA, and a novel cluster-based EVA (C-EVA). Principal component analysis (PCA) was applied on the marginal distributions of 1) the EVA of each of the realizations (univariate approach), 2) a combination of the EVA of both realizations (multivariate approach) and 3) C-EVA. The least number of principal components describing more than 90% of variability in each case was selected and the projection of marginal distributions along the selected principal component was calculated. A linear classifier was then applied to these projections to discriminate between the simulated exposure patterns, and the accuracy of classified realizations was determined. C-EVA classified exposures more correctly than univariate and multivariate EVA approaches; classification accuracy was 49%, 47% and 52% for EVA (univariate and multivariate), and C-EVA, respectively (p analysis are the advantages

  18. Cluster Analysis of the Malaysian Hipposideros

    Science.gov (United States)

    Sazali, Siti Nurlydia; Laman, Charlie J.; Abdullah, M. T.

    2008-01-01

    A preliminary study on the morphometric variations among species in the genus Hipposideros was conducted using voucher specimens from the Universiti Malaysia Sarawak (UNIMAS) Zoological Museum and the Department of Wildlife and National Park (DWNP) Kuala Lumpur. A total of 24 individuals from six species of this genus were morphologically studied where all related measurements of body, skull and dental were measured and recorded. The statistical data subjected to the cluster analysis shows that the genus Hipposideros is divided into two major clusters where each species was clearly separated. The cluster analysis among Hipposideros species is useful for aiding in species identification.

  19. Using cluster analysis to explore survey data.

    Science.gov (United States)

    Spencer, Llinos; Roberts, Gwerfyl; Irvine, Fiona; Jones, Peter; Baker, Colin

    2007-01-01

    Llinos Haf Spencer reports on the use of the cluster analysis statistical technique in nursing research and uses data from the Welsh Language Awareness in Healthcare Provision in Wales survey as an exemplar She concludes that cluster analysis is a valuable tool to tease out patterns in data that are not initially evident in bivariate analyses and thus should be considered as a viable option for nursing research.

  20. Nursing home care quality: a cluster analysis.

    Science.gov (United States)

    Grøndahl, Vigdis Abrahamsen; Fagerli, Liv Berit

    2017-02-13

    Purpose The purpose of this paper is to explore potential differences in how nursing home residents rate care quality and to explore cluster characteristics. Design/methodology/approach A cross-sectional design was used, with one questionnaire including questions from quality from patients' perspective and Big Five personality traits, together with questions related to socio-demographic aspects and health condition. Residents ( n=103) from four Norwegian nursing homes participated (74.1 per cent response rate). Hierarchical cluster analysis identified clusters with respect to care quality perceptions. χ(2) tests and one-way between-groups ANOVA were performed to characterise the clusters ( pclusters were identified; Cluster 1 residents (28.2 per cent) had the best care quality perceptions and Cluster 2 (67.0 per cent) had the worst perceptions. The clusters were statistically significant and characterised by personal-related conditions: gender, psychological well-being, preferences, admission, satisfaction with staying in the nursing home, emotional stability and agreeableness, and by external objective care conditions: healthcare personnel and registered nurses. Research limitations/implications Residents assessed as having no cognitive impairments were included, thus excluding the largest group. By choosing questionnaire design and structured interviews, the number able to participate may increase. Practical implications Findings may provide healthcare personnel and managers with increased knowledge on which to develop strategies to improve specific care quality perceptions. Originality/value Cluster analysis can be an effective tool for differentiating between nursing homes residents' care quality perceptions.

  1. 基于因子分析和聚类分析的休闲体育产业竞争力评价研究%Evaluation on Competitiveness of Leisure Sports Industry Based on Factor Analysis and Cluster Analysis

    Institute of Scientific and Technical Information of China (English)

    陈毅清

    2014-01-01

    According to Michael Porter diamond model,the evaluation index system of leisure sports industry’s competitiveness was designed ,and the competitiveness of leisure sports industry of 16 cities in Anhui Province was evaluated empirically by using the method of factor analysis and cluster analysis. Finally,some policy sug-gestions were proposed combined with the actual situation.%依据波特钻石理论构建了休闲体育产业竞争力评价指标体系,并采用因子分析法和聚类分析法对安徽省各地级市的休闲体育产业竞争力进行了实证分析。最后结合实际情况,提出相应的对策建议。

  2. Cluster Analysis and Clinical Asthma Phenotypes

    Science.gov (United States)

    Shaw, Dominic E.; Berry, Michael A.; Thomas, Michael; Brightling, Christopher E.; Wardlaw, Andrew J.

    2014-01-01

    Rationale Heterogeneity in asthma expression is multidimensional, including variability in clinical, physiologic, and pathologic parameters. Classification requires consideration of these disparate domains in a unified model. Objectives To explore the application of a multivariate mathematical technique, k-means cluster analysis, for identifying distinct phenotypic groups. Methods We performed k-means cluster analysis in three independent asthma populations. Clusters of a population managed in primary care (n = 184) with predominantly mild to moderate disease, were compared with a refractory asthma population managed in secondary care (n = 187). We then compared differences in asthma outcomes (exacerbation frequency and change in corticosteroid dose at 12 mo) between clusters in a third population of 68 subjects with predominantly refractory asthma, clustered at entry into a randomized trial comparing a strategy of minimizing eosinophilic inflammation (inflammation-guided strategy) with standard care. Measurements and Main Results Two clusters (early-onset atopic and obese, noneosinophilic) were common to both asthma populations. Two clusters characterized by marked discordance between symptom expression and eosinophilic airway inflammation (early-onset symptom predominant and late-onset inflammation predominant) were specific to refractory asthma. Inflammation-guided management was superior for both discordant subgroups leading to a reduction in exacerbation frequency in the inflammation-predominant cluster (3.53 [SD, 1.18] vs. 0.38 [SD, 0.13] exacerbation/patient/yr, P = 0.002) and a dose reduction of inhaled corticosteroid in the symptom-predominant cluster (mean difference, 1,829 μg beclomethasone equivalent/d [95% confidence interval, 307–3,349 μg]; P = 0.02). Conclusions Cluster analysis offers a novel multidimensional approach for identifying asthma phenotypes that exhibit differences in clinical response to treatment algorithms. PMID:18480428

  3. Performance Analysis of Hierarchical Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    K.Ranjini

    2011-07-01

    Full Text Available Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters, so that the data in each subset (ideally share some common trait - often proximity according to some defined distance measure. Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. This paper explains the implementation of agglomerative and divisive clustering algorithms applied on various types of data. The details of the victims of Tsunami in Thailand during the year 2004, was taken as the test data. Visual programming is used for implementation and running time of the algorithms using different linkages (agglomerative to different types of data are taken for analysis.

  4. Factors influencing fluffy layer suspended matter (FLSM properties in the Odra River - Pomeranian Bay - Arkona Deep System (Baltic Sea as derived by principal components analysis (PCA, and cluster analysis (CA

    Directory of Open Access Journals (Sweden)

    J. Pempkowiak

    2005-01-01

    Full Text Available Factors conditioning formation and properties of suspended matter resting on the sea floor (Fluffy Layer Suspended Matter - FLSM in the Odra river mouth - Arkona Deep system (southern Baltic Sea were investigated. Thirty FLSM samples were collected from four sampling stations, during nine cruises, in the period 1996-1998. Twenty six chemical properties of the fluffy material were measured (organic matter-total, humic substances, a variety of fatty acids fractions, P, N, δ13C, δ15N; Li; heavy metals- Co, Cd, Pb, Ni, Zn, Fe, Al, Mn, Cu, Cr. The so obtained data set was subjected to statistical evaluation. Comparison of mean values of the measured properties led to conclusion that both seasonal and spatial differences of the fluffy material collected at the stations occured. Application of Principal Component Analysis, and Cluster Analysis, to the data set amended with environmental characteristics (depth, salinity, chlorophyll a, distance from the river mouth, led to quantification of factors conditioning the FLSM formation. The five most important factors were: contribution of the lithogenic component (responsible for 25% of the data set variability, time dependent factors (including primary productivity, mass exchange with fine sediment fraction, atmospheric deposition, contribution of material originating from abrasion-altogether 21%, contribution of fresh autochtonous organic matter (9%, influence of microbial activity (8%, seasonality (8%.

  5. Clustering analysis of telecommunication customers

    Institute of Scientific and Technical Information of China (English)

    REN Hong; ZHENG Yan; WU Ye-rong

    2009-01-01

    In this article, a clustering method based on genetic algorithm (GA) for telecommunication customer subdivision is presented. First, the features of telecommunication customers (such as the calling behavior and consuming behavior) are extracted. Second, the similarities between the multidimensional feature vectors of telecommunication customers are computed and mapped as the distance between samples on a two-dimensional plane. Finally, the distances are adjusted to approximate the similarities gradually by GA. One advantage of this method is the independent distribution of the sample space. The experiments demonstrate the feasibility of the proposed method.

  6. Half-lives and cluster preformation factors for various cluster emissions in trans-lead nuclei

    Science.gov (United States)

    Ni, Dongdong; Ren, Zhongzhou

    2010-08-01

    The generalized density-dependent cluster model (GDDCM) is extended to study cluster radioactivity in even-even and odd-A nuclei decaying to the doubly magic nucleus Pb208 or its neighboring nuclei. The microscopic cluster-daughter potential is numerically constructed in the double-folding model with M3Y nucleon-nucleon interactions plus proton-proton Coulomb interactions. Instead of the WKB barrier penetration probability, the exact solution of the Schrödinger equation with outgoing Coulomb wave boundary conditions is presented. The cluster preformation factor is well taken into account based on some available experimental cases. The calculated half-lives are found to be in good agreement with the experimental data. This indicates that a unified description of α decay and cluster radioactivity has been achieved by the GDDCM. Predictions of cluster emission half-lives are made for promising emitters, which may guide future experiments.

  7. Filtering Genes for Cluster and Network Analysis

    Directory of Open Access Journals (Sweden)

    Parkhomenko Elena

    2009-06-01

    Full Text Available Abstract Background Prior to cluster analysis or genetic network analysis it is customary to filter, or remove genes considered to be irrelevant from the set of genes to be analyzed. Often genes whose variation across samples is less than an arbitrary threshold value are deleted. This can improve interpretability and reduce bias. Results This paper introduces modular models for representing network structure in order to study the relative effects of different filtering methods. We show that cluster analysis and principal components are strongly affected by filtering. Filtering methods intended specifically for cluster and network analysis are introduced and compared by simulating modular networks with known statistical properties. To study more realistic situations, we analyze simulated "real" data based on well-characterized E. coli and S. cerevisiae regulatory networks. Conclusion The methods introduced apply very generally, to any similarity matrix describing gene expression. One of the proposed methods, SUMCOV, performed well for all models simulated.

  8. Using Cluster Analysis to Examine Husband-Wife Decision Making

    Science.gov (United States)

    Bonds-Raacke, Jennifer M.

    2006-01-01

    Cluster analysis has a rich history in many disciplines and although cluster analysis has been used in clinical psychology to identify types of disorders, its use in other areas of psychology has been less popular. The purpose of the current experiments was to use cluster analysis to investigate husband-wife decision making. Cluster analysis was…

  9. Clustering analysis of seismicity and aftershock identification.

    Science.gov (United States)

    Zaliapin, Ilya; Gabrielov, Andrei; Keilis-Borok, Vladimir; Wong, Henry

    2008-07-01

    We introduce a statistical methodology for clustering analysis of seismicity in the time-space-energy domain and use it to establish the existence of two statistically distinct populations of earthquakes: clustered and nonclustered. This result can be used, in particular, for nonparametric aftershock identification. The proposed approach expands the analysis of Baiesi and Paczuski [Phys. Rev. E 69, 066106 (2004)10.1103/PhysRevE.69.066106] based on the space-time-magnitude nearest-neighbor distance eta between earthquakes. We show that for a homogeneous Poisson marked point field with exponential marks, the distance eta has the Weibull distribution, which bridges our results with classical correlation analysis for point fields. The joint 2D distribution of spatial and temporal components of eta is used to identify the clustered part of a point field. The proposed technique is applied to several seismicity models and to the observed seismicity of southern California.

  10. Foundations of factor analysis

    CERN Document Server

    Mulaik, Stanley A

    2009-01-01

    Introduction Factor Analysis and Structural Theories Brief History of Factor Analysis as a Linear Model Example of Factor AnalysisMathematical Foundations for Factor Analysis Introduction Scalar AlgebraVectorsMatrix AlgebraDeterminants Treatment of Variables as Vectors Maxima and Minima of FunctionsComposite Variables and Linear Transformations Introduction Composite Variables Unweighted Composite VariablesDifferentially Weighted Composites Matrix EquationsMulti

  11. Clustering of Mycobacterium tuberculosis Cases in Acapulco: Spoligotyping and Risk Factors

    Directory of Open Access Journals (Sweden)

    Elizabeth Nava-Aguilera

    2011-01-01

    Full Text Available Recurrence and reinfection of tuberculosis have quite different implications for prevention. We identified 267 spoligotypes of Mycobacterium tuberculosis from consecutive tuberculosis patients in Acapulco, Mexico, to assess the level of clustering and risk factors for clustered strains. Point cluster analysis examined spatial clustering. Risk analysis relied on the Mantel Haenszel procedure to examine bivariate associations, then to develop risk profiles of combinations of risk factors. Supplementary analysis of the spoligotyping data used SpolTools. Spoligotyping identified 85 types, 50 of them previously unreported. The five most common spoligotypes accounted for 55% of tuberculosis cases. One cluster of 70 patients (26% of the series produced a single spoligotype from the Manila Family (Clade EAI2. The high proportion (78% of patients infected with cluster strains is compatible with recent transmission of TB in Acapulco. Geomatic analysis showed no spatial clustering; clustering was associated with a risk profile of uneducated cases who lived in single-room dwellings. The Manila emerging strain accounted for one in every four cases, confirming that one strain can predominate in a hyperendemic area.

  12. Prevalence and risk factors of seizure clusters in adult patients with epilepsy.

    Science.gov (United States)

    Chen, Baibing; Choi, Hyunmi; Hirsch, Lawrence J; Katz, Austen; Legge, Alexander; Wong, Rebecca A; Jiang, Alfred; Kato, Kenneth; Buchsbaum, Richard; Detyniecki, Kamil

    2017-07-01

    In the current study, we explored the prevalence of physician-confirmed seizure clusters. We also investigated potential clinical factors associated with the occurrence of seizure clusters overall and by epilepsy type. We reviewed medical records of 4116 adult (≥16years old) outpatients with epilepsy at our centers for documentation of seizure clusters. Variables including patient demographics, epilepsy details, medical and psychiatric history, AED history, and epilepsy risk factors were then tested against history of seizure clusters. Patients were then divided into focal epilepsy, idiopathic generalized epilepsy (IGE), or symptomatic generalized epilepsy (SGE), and the same analysis was run. Overall, seizure clusters were independently associated with earlier age of seizure onset, symptomatic generalized epilepsy (SGE), central nervous system (CNS) infection, cortical dysplasia, status epilepticus, absence of 1-year seizure freedom, and having failed 2 or more AEDs (Pseizure clusters than patients with focal epilepsy (16.3%) and IGE (7.4%; all Pepilepsy type showed that absence of 1-year seizure freedom since starting treatment at one of our centers was associated with seizure clustering in patients across all 3 epilepsy types. In patients with SGE, clusters were associated with perinatal/congenital brain injury. In patients with focal epilepsy, clusters were associated with younger age of seizure onset, complex partial seizures, cortical dysplasia, status epilepticus, CNS infection, and having failed 2 or more AEDs. In patients with IGE, clusters were associated with presence of an aura. Only 43.5% of patients with seizure clusters were prescribed rescue medications. Patients with intractable epilepsy are at a higher risk of developing seizure clusters. Factors such as having SGE, CNS infection, cortical dysplasia, status epilepticus or an early seizure onset, can also independently increase one's chance of having seizure clusters. Copyright © 2017. Published

  13. Cognitive analysis of multiple sclerosis utilizing fuzzy cluster means

    Directory of Open Access Journals (Sweden)

    Imianvan Anthony Agboizebeta

    2012-01-01

    Full Text Available Multiple sclerosis, often called MS, is a disease that affects the central nervous system (the brain and spinal cord. Myelin provides insulation for nerve cells improves the conduction of impulses along the nerves and is important for maintaining the health of the nerves. In multiple sclerosis, inflammation causes the myelin to disappear. Genetic factors, environmental issues and viral infection may also play a role in developing the disease. Ms is characterized by life threatening symptoms such as; loss of balance, hearing problem and depression. The application of Fuzzy Cluster Means (FCM or Fuzzy CMean analysis to the diagnosis of different forms of multiple sclerosis is the focal point of this paper. Application of cluster analysis involves a sequence of methodological and analytical decision steps that enhances the quality and meaning of the clusters produced. Uncertainties associated with analysis of multiple sclerosis test data are eliminated by the system

  14. Cluster and constraint analysis in tetrahedron packings.

    Science.gov (United States)

    Jin, Weiwei; Lu, Peng; Liu, Lufeng; Li, Shuixiang

    2015-04-01

    The disordered packings of tetrahedra often show no obvious macroscopic orientational or positional order for a wide range of packing densities, and it has been found that the local order in particle clusters is the main order form of tetrahedron packings. Therefore, a cluster analysis is carried out to investigate the local structures and properties of tetrahedron packings in this work. We obtain a cluster distribution of differently sized clusters, and peaks are observed at two special clusters, i.e., dimer and wagon wheel. We then calculate the amounts of dimers and wagon wheels, which are observed to have linear or approximate linear correlations with packing density. Following our previous work, the amount of particles participating in dimers is used as an order metric to evaluate the order degree of the hierarchical packing structure of tetrahedra, and an order map is consequently depicted. Furthermore, a constraint analysis is performed to determine the isostatic or hyperstatic region in the order map. We employ a Monte Carlo algorithm to test jamming and then suggest a new maximally random jammed packing of hard tetrahedra from the order map with a packing density of 0.6337.

  15. Identifying Peer Institutions Using Cluster Analysis

    Science.gov (United States)

    Boronico, Jess; Choksi, Shail S.

    2012-01-01

    The New York Institute of Technology's (NYIT) School of Management (SOM) wishes to develop a list of peer institutions for the purpose of benchmarking and monitoring/improving performance against other business schools. The procedure utilizes relevant criteria for the purpose of establishing this peer group by way of a cluster analysis. The…

  16. Cytokines and clustered cardiovascular risk factors in children

    DEFF Research Database (Denmark)

    Andersen, Lars Bo; Müller, Klaus; Eiberg, Stig

    2010-01-01

    The aim was to evaluate the possible role of tumor necrosis factor alpha (TNF-alpha), interleukin-6 (IL-6), C-reactive protein (CRP), low fitness, and fatness in the early development of clustering of cardiovascular disease (CVD) risk factors and insulin resistance. Subjects for this cross...

  17. Clustering of cardiovascular risk factors and carotid intima-media thickness: The USE-IMT study

    Science.gov (United States)

    Wang, Xin; den Ruijter, Hester M.; Anderson, Todd J.; Britton, Annie R.; Dekker, Jacqueline; Engström, Gunnar; Evans, Greg W.; de Graaf, Jacqueline; Grobbee, Diederick E.; Hedblad, Bo; Holewijn, Suzanne; Ikeda, Ai; Kauhanen, Jussi; Kitagawa, Kazuo; Kitamura, Akihiko; Kurl, Sudhir; Lonn, Eva M.; Lorenz, Matthias W.; Mathiesen, Ellisiv B.; Nijpels, Giel; Okazaki, Shuhei; Polak, Joseph F.; Price, Jacqueline F.; Rembold, Christopher M.; Rosvall, Maria; Rundek, Tatjana; Salonen, Jukka T.; Sitzer, Matthias; Stehouwer, Coen D. A.; Tuomainen, Tomi-Pekka; Peters, Sanne A. E.; Bots, Michiel L.

    2017-01-01

    Background The relation of a single risk factor with atherosclerosis is established. Clinically we know of risk factor clustering within individuals. Yet, studies into the magnitude of the relation of risk factor clusters with atherosclerosis are limited. Here, we assessed that relation. Methods Individual participant data from 14 cohorts, involving 59,025 individuals were used in this cross-sectional analysis. We made 15 clusters of four risk factors (current smoking, overweight, elevated blood pressure, elevated total cholesterol). Multilevel age and sex adjusted linear regression models were applied to estimate mean differences in common carotid intima-media thickness (CIMT) between clusters using those without any of the four risk factors as reference group. Results Compared to the reference, those with 1, 2, 3 or 4 risk factors had a significantly higher common CIMT: mean difference of 0.026 mm, 0.052 mm, 0.074 mm and 0.114 mm, respectively. These findings were the same in men and in women, and across ethnic groups. Within each risk factor cluster (1, 2, 3 risk factors), groups with elevated blood pressure had the largest CIMT and those with elevated cholesterol the lowest CIMT, a pattern similar for men and women. Conclusion Clusters of risk factors relate to increased common CIMT in a graded manner, similar in men, women and across race-ethnic groups. Some clusters seemed more atherogenic than others. Our findings support the notion that cardiovascular prevention should focus on sets of risk factors rather than individual levels alone, but may prioritize within clusters. PMID:28323823

  18. [Visual field progression in glaucoma: cluster analysis].

    Science.gov (United States)

    Bresson-Dumont, H; Hatton, J; Foucher, J; Fonteneau, M

    2012-11-01

    Visual field progression analysis is one of the key points in glaucoma monitoring, but distinction between true progression and random fluctuation is sometimes difficult. There are several different algorithms but no real consensus for detecting visual field progression. The trend analysis of global indices (MD, sLV) may miss localized deficits or be affected by media opacities. Conversely, point-by-point analysis makes progression difficult to differentiate from physiological variability, particularly when the sensitivity of a point is already low. The goal of our study was to analyse visual field progression with the EyeSuite™ Octopus Perimetry Clusters algorithm in patients with no significant changes in global indices or worsening of the analysis of pointwise linear regression. We analyzed the visual fields of 162 eyes (100 patients - 58 women, 42 men, average age 66.8 ± 10.91) with ocular hypertension or glaucoma. For inclusion, at least six reliable visual fields per eye were required, and the trend analysis (EyeSuite™ Perimetry) of visual field global indices (MD and SLV), could show no significant progression. The analysis of changes in cluster mode was then performed. In a second step, eyes with statistically significant worsening of at least one of their clusters were analyzed point-by-point with the Octopus Field Analysis (OFA). Fifty four eyes (33.33%) had a significant worsening in some clusters, while their global indices remained stable over time. In this group of patients, more advanced glaucoma was present than in stable group (MD 6.41 dB vs. 2.87); 64.82% (35/54) of those eyes in which the clusters progressed, however, had no statistically significant change in the trend analysis by pointwise linear regression. Most software algorithms for analyzing visual field progression are essentially trend analyses of global indices, or point-by-point linear regression. This study shows the potential role of analysis by clusters trend. However, for best

  19. Generic, network schema agnostic sparse tensor factorization for single-pass clustering of heterogeneous information networks

    Science.gov (United States)

    Meng, Qinggang; Deng, Su; Huang, Hongbin; Wu, Yahui; Badii, Atta

    2017-01-01

    Heterogeneous information networks (e.g. bibliographic networks and social media networks) that consist of multiple interconnected objects are ubiquitous. Clustering analysis is an effective method to understand the semantic information and interpretable structure of the heterogeneous information networks, and it has attracted the attention of many researchers in recent years. However, most studies assume that heterogeneous information networks usually follow some simple schemas, such as bi-typed networks or star network schema, and they can only cluster one type of object in the network each time. In this paper, a novel clustering framework is proposed based on sparse tensor factorization for heterogeneous information networks, which can cluster multiple types of objects simultaneously in a single pass without any network schema information. The types of objects and the relations between them in the heterogeneous information networks are modeled as a sparse tensor. The clustering issue is modeled as an optimization problem, which is similar to the well-known Tucker decomposition. Then, an Alternating Least Squares (ALS) algorithm and a feasible initialization method are proposed to solve the optimization problem. Based on the tensor factorization, we simultaneously partition different types of objects into different clusters. The experimental results on both synthetic and real-world datasets have demonstrated that our proposed clustering framework, STFClus, can model heterogeneous information networks efficiently and can outperform state-of-the-art clustering algorithms as a generally applicable single-pass clustering method for heterogeneous network which is network schema agnostic. PMID:28245222

  20. Generic, network schema agnostic sparse tensor factorization for single-pass clustering of heterogeneous information networks.

    Science.gov (United States)

    Wu, Jibing; Meng, Qinggang; Deng, Su; Huang, Hongbin; Wu, Yahui; Badii, Atta

    2017-01-01

    Heterogeneous information networks (e.g. bibliographic networks and social media networks) that consist of multiple interconnected objects are ubiquitous. Clustering analysis is an effective method to understand the semantic information and interpretable structure of the heterogeneous information networks, and it has attracted the attention of many researchers in recent years. However, most studies assume that heterogeneous information networks usually follow some simple schemas, such as bi-typed networks or star network schema, and they can only cluster one type of object in the network each time. In this paper, a novel clustering framework is proposed based on sparse tensor factorization for heterogeneous information networks, which can cluster multiple types of objects simultaneously in a single pass without any network schema information. The types of objects and the relations between them in the heterogeneous information networks are modeled as a sparse tensor. The clustering issue is modeled as an optimization problem, which is similar to the well-known Tucker decomposition. Then, an Alternating Least Squares (ALS) algorithm and a feasible initialization method are proposed to solve the optimization problem. Based on the tensor factorization, we simultaneously partition different types of objects into different clusters. The experimental results on both synthetic and real-world datasets have demonstrated that our proposed clustering framework, STFClus, can model heterogeneous information networks efficiently and can outperform state-of-the-art clustering algorithms as a generally applicable single-pass clustering method for heterogeneous network which is network schema agnostic.

  1. Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms.

    Science.gov (United States)

    Esplin, M Sean; Manuck, Tracy A; Varner, Michael W; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M; Ilekis, John

    2015-09-01

    We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators' ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB. We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks' gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software. One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ(2) analysis (PD and DH, P cluster 3 of SPTB. We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. The Financial Evaluation of Pharmaceutical Listed Firms Based on Factor Analysis and Cluster Analysis%基于因子分析和聚类分析的医药上市公司财务评价

    Institute of Scientific and Technical Information of China (English)

    潘欣; 郭继荣

    2016-01-01

    Medical and health system reform has entered into the phase of the deepwater area in our country. Because of the characteristics of high technology, high investment, high return and high risk, and big policy-effected, pharmaceutical listed firms are facing more serious financial risk than generally listed ones. The study of pharmaceutical listed firms’ financial risk evaluation is of great significance. Factor analysis by numerous correlation dimension reduction to the original index condensed into a few factors can generalize data information, and then get firms’ composite scores through the statistical analysis, which overcomes the artificial randomness to determine the weight. Factor scores and comprehensive scores of pharmaceutical manufacturing firms uniting factor analysis and cluster analysis is beneficial to the comprehensive, objective and accurate evaluation of the firms’ financial situation.%我国医疗卫生体制改革已经进入深水区阶段,由于高技术、高投入、高回报、高风险和受政策冲击大等特性的影响,医药上市公司较一般上市公司面临着更加严峻的财务风险,研究医药上市公司的财务风险评价具有重要意义。因子分析通过降维把众多具有相关性的原始指标浓缩成少数几个能高度概括数据信息的因子,通过统计分析可以得到公司综合得分,克服了人为确定权数的随意性。通过综合聚类分析和因子分析得到的医药制造公司的因子得分和综合得分情况,有利于全面、客观、准确地评价公司的财务状况。

  3. Cluster analysis of obesity and asthma phenotypes.

    Directory of Open Access Journals (Sweden)

    E Rand Sutherland

    Full Text Available BACKGROUND: Asthma is a heterogeneous disease with variability among patients in characteristics such as lung function, symptoms and control, body weight, markers of inflammation, and responsiveness to glucocorticoids (GC. Cluster analysis of well-characterized cohorts can advance understanding of disease subgroups in asthma and point to unsuspected disease mechanisms. We utilized an hypothesis-free cluster analytical approach to define the contribution of obesity and related variables to asthma phenotype. METHODOLOGY AND PRINCIPAL FINDINGS: In a cohort of clinical trial participants (n = 250, minimum-variance hierarchical clustering was used to identify clinical and inflammatory biomarkers important in determining disease cluster membership in mild and moderate persistent asthmatics. In a subset of participants, GC sensitivity was assessed via expression of GC receptor alpha (GCRα and induction of MAP kinase phosphatase-1 (MKP-1 expression by dexamethasone. Four asthma clusters were identified, with body mass index (BMI, kg/m(2 and severity of asthma symptoms (AEQ score the most significant determinants of cluster membership (F = 57.1, p<0.0001 and F = 44.8, p<0.0001, respectively. Two clusters were composed of predominantly obese individuals; these two obese asthma clusters differed from one another with regard to age of asthma onset, measures of asthma symptoms (AEQ and control (ACQ, exhaled nitric oxide concentration (F(ENO and airway hyperresponsiveness (methacholine PC(20 but were similar with regard to measures of lung function (FEV(1 (% and FEV(1/FVC, airway eosinophilia, IgE, leptin, adiponectin and C-reactive protein (hsCRP. Members of obese clusters demonstrated evidence of reduced expression of GCRα, a finding which was correlated with a reduced induction of MKP-1 expression by dexamethasone CONCLUSIONS AND SIGNIFICANCE: Obesity is an important determinant of asthma phenotype in adults. There is heterogeneity in

  4. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations.

    Science.gov (United States)

    Corrigan, Neil; Bankart, Michael J G; Gray, Laura J; Smith, Karen L

    2014-05-24

    There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations include avoidance of cluster merges where

  5. Semi-supervised consensus clustering for gene expression data analysis

    OpenAIRE

    Wang, Yunli; Pan, Youlian

    2014-01-01

    Background Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and do...

  6. Geographic atrophy phenotype identification by cluster analysis.

    Science.gov (United States)

    Monés, Jordi; Biarnés, Marc

    2017-07-20

    To identify ocular phenotypes in patients with geographic atrophy secondary to age-related macular degeneration (GA) using a data-driven cluster analysis. This was a retrospective analysis of data from a prospective, natural history study of patients with GA who were followed for ≥6 months. Cluster analysis was used to identify subgroups within the population based on the presence of several phenotypic features: soft drusen, reticular pseudodrusen (RPD), primary foveal atrophy, increased fundus autofluorescence (FAF), greyish FAF appearance and subfoveal choroidal thickness (SFCT). A comparison of features between the subgroups was conducted, and a qualitative description of the new phenotypes was proposed. The atrophy growth rate between phenotypes was then compared. Data were analysed from 77 eyes of 77 patients with GA. Cluster analysis identified three groups: phenotype 1 was characterised by high soft drusen load, foveal atrophy and slow growth; phenotype 3 showed high RPD load, extrafoveal and greyish FAF appearance and thin SFCT; the characteristics of phenotype 2 were midway between phenotypes 1 and 3. Phenotypes differed in all measured features (p≤0.013), with decreases in the presence of soft drusen, foveal atrophy and SFCT seen from phenotypes 1 to 3 and corresponding increases in high RPD load, high FAF and greyish FAF appearance. Atrophy growth rate differed between phenotypes 1, 2 and 3 (0.63, 1.91 and 1.73 mm(2)/year, respectively, p=0.0005). Cluster analysis identified three distinct phenotypes in GA. One of them showed a particularly slow growth pattern. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  7. A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990–2010: a systematic analysis for the Global Burden of Disease Study 2010

    Science.gov (United States)

    Lim, Stephen S; Vos, Theo; Flaxman, Abraham D; Danaei, Goodarz; Shibuya, Kenji; Adair-Rohani, Heather; Amann, Markus; Anderson, H Ross; Andrews, Kathryn G; Aryee, Martin; Atkinson, Charles; Bacchus, Loraine J; Bahalim, Adil N; Balakrishnan, Kalpana; Balmes, John; Barker-Collo, Suzanne; Baxter, Amanda; Bell, Michelle L; Blore, Jed D; Blyth, Fiona; Bonner, Carissa; Borges, Guilherme; Bourne, Rupert; Boussinesq, Michel; Brauer, Michael; Brooks, Peter; Bruce, Nigel G; Brunekreef, Bert; Bryan-Hancock, Claire; Bucello, Chiara; Buchbinder, Rachelle; Bull, Fiona; Burnett, Richard T; Byers, Tim E; Calabria, Bianca; Carapetis, Jonathan; Carnahan, Emily; Chafe, Zoe; Charlson, Fiona; Chen, Honglei; Chen, Jian Shen; Cheng, Andrew Tai-Ann; Child, Jennifer Christine; Cohen, Aaron; Colson, K Ellicott; Cowie, Benjamin C; Darby, Sarah; Darling, Susan; Davis, Adrian; Degenhardt, Louisa; Dentener, Frank; Des Jarlais, Don C; Devries, Karen; Dherani, Mukesh; Ding, Eric L; Dorsey, E Ray; Driscoll, Tim; Edmond, Karen; Ali, Suad Eltahir; Engell, Rebecca E; Erwin, Patricia J; Fahimi, Saman; Falder, Gail; Farzadfar, Farshad; Ferrari, Alize; Finucane, Mariel M; Flaxman, Seth; Fowkes, Francis Gerry R; Freedman, Greg; Freeman, Michael K; Gakidou, Emmanuela; Ghosh, Santu; Giovannucci, Edward; Gmel, Gerhard; Graham, Kathryn; Grainger, Rebecca; Grant, Bridget; Gunnell, David; Gutierrez, Hialy R; Hall, Wayne; Hoek, Hans W; Hogan, Anthony; Hosgood, H Dean; Hoy, Damian; Hu, Howard; Hubbell, Bryan J; Hutchings, Sally J; Ibeanusi, Sydney E; Jacklyn, Gemma L; Jasrasaria, Rashmi; Jonas, Jost B; Kan, Haidong; Kanis, John A; Kassebaum, Nicholas; Kawakami, Norito; Khang, Young-Ho; Khatibzadeh, Shahab; Khoo, Jon-Paul; Kok, Cindy; Laden, Francine; Lalloo, Ratilal; Lan, Qing; Lathlean, Tim; Leasher, Janet L; Leigh, James; Li, Yang; Lin, John Kent; Lipshultz, Steven E; London, Stephanie; Lozano, Rafael; Lu, Yuan; Mak, Joelle; Malekzadeh, Reza; Mallinger, Leslie; Marcenes, Wagner; March, Lyn; Marks, Robin; Martin, Randall; McGale, Paul; McGrath, John; Mehta, Sumi; Mensah, George A; Merriman, Tony R; Micha, Renata; Michaud, Catherine; Mishra, Vinod; Hanafiah, Khayriyyah Mohd; Mokdad, Ali A; Morawska, Lidia; Mozaff arian, Dariush; Murphy, Tasha; Naghavi, Mohsen; Neal, Bruce; Nelson, Paul K; Nolla, Joan Miquel; Norman, Rosana; Olives, Casey; Omer, Saad B; Orchard, Jessica; Osborne, Richard; Ostro, Bart; Page, Andrew; Pandey, Kiran D; Parry, Charles D H; Passmore, Erin; Patra, Jayadeep; Pearce, Neil; Pelizzari, Pamela M; Petzold, Max; Phillips, Michael R; Pope, Dan; Pope III, C Arden; Powles, John; Rao, Mayuree; Razavi, Homie; Rehfuess, Eva A; Rehm, Jürgen T; Ritz, Beate; Rivara, Frederick P; Roberts, Thomas; Robinson, Carolyn; Rodriguez-Portales, Jose A; Romieu, Isabelle; Room, Robin; Rosenfeld, Lisa C; Roy, Ananya; Rushton, Lesley; Salomon, Joshua A; Sampson, Uchechukwu; Sanchez-Riera, Lidia; Sanman, Ella; Sapkota, Amir; Seedat, Soraya; Shi, Peilin; Shield, Kevin; Shivakoti, Rupak; Singh, Gitanjali M; Sleet, David A; Smith, Emma; Smith, Kirk R; Stapelberg, Nicolas J C; Steenland, Kyle; Stöckl, Heidi; Stovner, Lars Jacob; Straif, Kurt; Straney, Lahn; Thurston, George D; Tran, Jimmy H; Van Dingenen, Rita; van Donkelaar, Aaron; Veerman, J Lennert; Vijayakumar, Lakshmi; Weintraub, Robert; Weissman, Myrna M; White, Richard A; Whiteford, Harvey; Wiersma, Steven T; Wilkinson, James D; Williams, Hywel C; Williams, Warwick; Wilson, Nicholas; Woolf, Anthony D; Yip, Paul; Zielinski, Jan M; Lopez, Alan D; Murray, Christopher J L; Ezzati, Majid

    2014-01-01

    Summary Background Quantification of the disease burden caused by different risks informs prevention by providing an account of health loss different to that provided by a disease-by-disease analysis. No complete revision of global disease burden caused by risk factors has been done since a comparative risk assessment in 2000, and no previous analysis has assessed changes in burden attributable to risk factors over time. Methods We estimated deaths and disability-adjusted life years (DALYs; sum of years lived with disability [YLD] and years of life lost [YLL]) attributable to the independent effects of 67 risk factors and clusters of risk factors for 21 regions in 1990 and 2010. We estimated exposure distributions for each year, region, sex, and age group, and relative risks per unit of exposure by systematically reviewing and synthesising published and unpublished data. We used these estimates, together with estimates of cause-specific deaths and DALYs from the Global Burden of Disease Study 2010, to calculate the burden attributable to each risk factor exposure compared with the theoretical-minimum-risk exposure. We incorporated uncertainty in disease burden, relative risks, and exposures into our estimates of attributable burden. Findings In 2010, the three leading risk factors for global disease burden were high blood pressure (7·0% [95% uncertainty interval 6·2–7·7] of global DALYs), tobacco smoking including second-hand smoke (6·3% [5·5–7·0]), and alcohol use (5·5% [5·0–5·9]). In 1990, the leading risks were childhood underweight (7·9% [6·8–9·4]), household air pollution from solid fuels (HAP; 7·0% [5·6–8·3]), and tobacco smoking including second-hand smoke (6·1% [5·4–6·8]). Dietary risk factors and physical inactivity collectively accounted for 10·0% (95% UI 9·2–10·8) of global DALYs in 2010, with the most prominent dietary risks being diets low in fruits and those high in sodium. Several risks that primarily affect

  8. MANNER OF STOCKS SORTING USING CLUSTER ANALYSIS METHODS

    Directory of Open Access Journals (Sweden)

    Jana Halčinová

    2014-06-01

    Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.

  9. AMOEBA clustering revisited. [cluster analysis, classification, and image display program

    Science.gov (United States)

    Bryant, Jack

    1990-01-01

    A description of the clustering, classification, and image display program AMOEBA is presented. Using a difficult high resolution aircraft-acquired MSS image, the steps the program takes in forming clusters are traced. A number of new features are described here for the first time. Usage of the program is discussed. The theoretical foundation (the underlying mathematical model) is briefly presented. The program can handle images of any size and dimensionality.

  10. Mapping Cigarettes Similarities using Cluster Analysis Methods

    Directory of Open Access Journals (Sweden)

    Lorentz Jäntschi

    2007-09-01

    Full Text Available The aim of the research was to investigate the relationship and/or occurrences in and between chemical composition information (tar, nicotine, carbon monoxide, market information (brand, manufacturer, price, and public health information (class, health warning as well as clustering of a sample of cigarette data. A number of thirty cigarette brands have been analyzed. Six categorical (cigarette brand, manufacturer, health warnings, class and four continuous (tar, nicotine, carbon monoxide concentrations and package price variables were collected for investigation of chemical composition, market information and public health information. Multiple linear regression and two clusterization techniques have been applied. The study revealed interesting remarks. The carbon monoxide concentration proved to be linked with tar and nicotine concentration. The applied clusterization methods identified groups of cigarette brands that shown similar characteristics. The tar and carbon monoxide concentrations were the main criteria used in clusterization. An analysis of a largest sample could reveal more relevant and useful information regarding the similarities between cigarette brands.

  11. MEME-LaB: motif analysis in clusters.

    Science.gov (United States)

    Brown, Paul; Baxter, Laura; Hickman, Richard; Beynon, Jim; Moore, Jonathan D; Ott, Sascha

    2013-07-01

    Genome-wide expression analysis can result in large numbers of clusters of co-expressed genes. Although there are tools for ab initio discovery of transcription factor-binding sites, most do not provide a quick and easy way to study large numbers of clusters. To address this, we introduce a web tool called MEME-LaB. The tool wraps MEME (an ab initio motif finder), providing an interface for users to input multiple gene clusters, retrieve promoter sequences, run motif finding and then easily browse and condense the results, facilitating better interpretation of the results from large-scale datasets. MEME-LaB is freely accessible at: http://wsbc.warwick.ac.uk/wsbcToolsWebpage/. Supplementary data are available at Bioinformatics online.

  12. Cognitive analysis of multiple sclerosis utilizing fuzzy cluster means

    Directory of Open Access Journals (Sweden)

    Imianvan Anthony Agboizebeta

    2012-02-01

    Full Text Available Multiple sclerosis, often called MS, is a disease that affects the central nervous system (the brain andspinal cord. Myelin provides insulation for nerve cells improves the conduction of impulses along thenerves and is important for maintaining the health of the nerves. In multiple sclerosis, inflammationcauses the myelin to disappear. Genetic factors, environmental issues and viral infection may alsoplay a role in developing the disease. Ms is characterized by life threatening symptoms such as; loss ofbalance, hearing problem and depression. The application of Fuzzy Cluster Means (FCM or Fuzzy CMeananalysis to the diagnosis of different forms of multiple sclerosis is the focal point of this paper.Application of cluster analysis involves a sequence of methodological and analytical decision stepsthat enhances the quality and meaning of the clusters produced. Uncertainties associated withanalysis of multiple sclerosis test data are eliminated by the system

  13. CLUSTERING ANALYSIS OF DEBRIS-FLOW STREAMS

    Institute of Scientific and Technical Information of China (English)

    Yuan-Fan TSAI; Huai-Kuang TSAI; Cheng-Yan KAO

    2004-01-01

    The Chi-Chi earthquake in 1999 caused disastrous landslides, which triggered numerous debris flows and killed hundreds of people. A critical rainfall intensity line for each debris-flow stream is studied to prevent such a disaster. However, setting rainfall lines from incomplete data is difficult, so this study considered eight critical factors to group streams, such that streams within a cluster have similar rainfall lines. A genetic algorithm is applied to group 377 debris-flow streams selected from the center of an area affected by the Chi-Chi earthquake. These streams are grouped into seven clusters with different characteristics. The results reveal that the proposed method effectively groups debris-flow streams.

  14. Risk Factors for Cardiovascular Disease and Their Clustering among Adults in Jilin (China

    Directory of Open Access Journals (Sweden)

    Jianxing Yu

    2015-12-01

    Full Text Available Background: Clustering of cardiovascular disease (CVD risk factors constitutes a major public health challenge. Although a number of researchers have investigated the CVD risk factor clusters in China, little is known about the related prevalence and clustering associated with demographics in Jilin Province in China; this study aims to reveal that relationship. Methods: A cross-sectional survey based on a sample of 16,834 adults aged 18 to 79 years was conducted in Jilin in 2012. The prevalence and clustering of CVD risk factors were analysed through complex weighted computation. Quantitative variables were compared by the t test, and categorical variables were compared by the Rao-Scott-χ2 test. Finally, multivariable logistic regression analysis was used to evaluate the CVD risk factor clusters associated with demographics. Results: The prevalences of hypertension, diabetes, dyslipidemia, overweight and smoking were 37.3%, 8.2%, 36.8%, 47.3%, and 31.0%, respectively, and these risk factors were associated with gender, education level, age, occupation and family income (p < 0.05. Overall, compared with females, the adjusted ORs of ≥1, ≥2 and ≥3 risk factors clusters in males were 3.70 (95%CI 3.26 to 4.20, 4.66 (95%CI 4.09 to 5.31, and 5.76 (95%CI 5.01 to 6.63, respectively. In particular, the adjusted ORs of ≥1, ≥2 and ≥3 risk factors increased with age. Conclusions: CVD risk factor clusters are common among adults in northeast China, and they constitute a major public health challenge. More effective attention and interventions should be directed toward the elderly and toward persons with lower incomes and low levels of education.

  15. Equivalent damage validation by variable cluster analysis

    Science.gov (United States)

    Drago, Carlo; Ferlito, Rachele; Zucconi, Maria

    2016-06-01

    The main aim of this work is to perform a clustering analysis on the damage relieved in the old center of L'Aquila after the earthquake occurred on April 6, 2009 and to validate an Indicator of Equivalent Damage ED that summarizes the information reported on the AeDES card regarding the level of damage and their extension on the surface of the buildings. In particular we used a sample of 13442 masonry buildings located in an area characterized by a Macroseismic Intensity equal to 8 [1]. The aim is to ensure the coherence between the clusters and its hierarchy identified in the data of damage detected and in the data of the ED elaborated.

  16. Prevalence and risk factors for intimate partner violence among Grade 8 learners in urban South Africa: baseline analysis from the Skhokho Supporting Success cluster randomised controlled trial.

    Science.gov (United States)

    Shamu, Simukai; Gevers, Anik; Mahlangu, B Pinky; Jama Shai, P Nwabisa; Chirwa, Esnat D; Jewkes, Rachel K

    2016-01-01

    Intimate partner violence (IPV) is a serious public health problem among adolescents. This study investigated the prevalence of and factors associated with Grade 8 girls' experience and boys' perpetration of IPV in South Africa. Participants were interviewed using interviewer-administered questionnaires about IPV, childhood violence, bullying, gender attitudes, alcohol use and risky sexual behaviours. Multiple logistic regression analysis was conducted to assess factors associated with girls' experience and boys' perpetration of IPV. Structural equation modelling (SEM) was conducted to assess the pathways to IPV experience and perpetration. Results show dating relationships are common among girls (52.5%) and boys (70.7%) and high prevalence of sexual or physical IPV experience by girls (30.9%; 95% CI: 28.2-33.7) and perpetration by boys (39.5%; 95% CI: 36.6-42.3). The logistic regression model showed factors associated with girls' experience of IPV include childhood experience of violence, individual gender inequitable attitudes, corporal punishment at home and in school, alcohol use, wider communication with one's partner and being more negative about school. We found three pathways from childhood trauma to IPV experience and perpetration in both models and these are through inequitable gender attitudes and risky sex, bullying and alcohol use. Prevention of IPV in children needs to encompass prevention of exposure to trauma in childhood and addressing gender attitudes and social norms to encourage positive disciplining approaches. : The trial is registered on ClinicalTrials.gov as NCT02349321. © The Author 2015. Published by Oxford University Press on behalf of Royal Society of Tropical Medicine and Hygiene. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  17. Cluster and factor analyses using water quality data in the Sapkyo reservoir watershed

    Energy Technology Data Exchange (ETDEWEB)

    Rim, Chang-Soo [Chungwoon University, Hongsung(Korea); Shin, Jae-Ki [Inje University, Kimhae(Korea)

    2002-04-30

    The monthly water quality data measured at 19 stations located in the Sapkyo reservoir watershed were clustered into 2 to 7 clusters and factor analysis was conducted to characterize the water quality, using the information obtained from cluster analysis. The result of cluster analysis shows that Sapkyo reservoir and each stream (Sapkyo stream, Muhan stream and Kokkyo stream) in Sapkyo reservoir watershed have their own water quality characteristics. The result of water quality analysis indicates that the concentration of suspended solids from Sapkyo reservoir is much higher than those of other streams, and which is probably because of increment of phytoplankton biomass with rich nutrient flowing into Sapkyo reservoir from the upper stream of watershed. Furthermore, the concentrations of biochemical oxygen demand and chemical oxygen demand were 3.5 to 4.8 times and 1.7 to 2.5 times those of other streams, respectively. The overall water quality of Sapkyo reservoir watershed was considered to exceed eutrophic condition. Based on factor analysis, the water quality characteristics of Sapkyo stream and Muhan stream were closely related with farm land and residence. The water quality of Kokkyo stream was influenced by superabundant organic matter flowing from Chonan city and district wastewater treatment plant located in the upper stream of Kokkyo stream. The water quality factor influencing Sapkyo reservoir was closely related with water quality factors of other three streams. (author). 20 refs., 6 tabs., 3 figs.

  18. A cluster analysis on road traffic accidents using genetic algorithms

    Science.gov (United States)

    Saharan, Sabariah; Baragona, Roberto

    2017-04-01

    The analysis of traffic road accidents is increasingly important because of the accidents cost and public road safety. The availability or large data sets makes the study of factors that affect the frequency and severity accidents are viable. However, the data are often highly unbalanced and overlapped. We deal with the data set of the road traffic accidents recorded in Christchurch, New Zealand, from 2000-2009 with a total of 26440 accidents. The data is in a binary set and there are 50 factors road traffic accidents with four level of severity. We used genetic algorithm for the analysis because we are in the presence of a large unbalanced data set and standard clustering like k-means algorithm may not be suitable for the task. The genetic algorithm based on clustering for unknown K, (GCUK) has been used to identify the factors associated with accidents of different levels of severity. The results provided us with an interesting insight into the relationship between factors and accidents severity level and suggest that the two main factors that contributes to fatal accidents are "Speed greater than 60 km h" and "Did not see other people until it was too late". A comparison with the k-means algorithm and the independent component analysis is performed to validate the results.

  19. Data Clustering Analysis Based on Wavelet Feature Extraction

    Institute of Scientific and Technical Information of China (English)

    QIANYuntao; TANGYuanyan

    2003-01-01

    A novel wavelet-based data clustering method is presented in this paper, which includes wavelet feature extraction and cluster growing algorithm. Wavelet transform can provide rich and diversified information for representing the global and local inherent structures of dataset. therefore, it is a very powerful tool for clustering feature extraction. As an unsupervised classification, the target of clustering analysis is dependent on the specific clustering criteria. Several criteria that should be con-sidered for general-purpose clustering algorithm are pro-posed. And the cluster growing algorithm is also con-structed to connect clustering criteria with wavelet fea-tures. Compared with other popular clustering methods,our clustering approach provides multi-resolution cluster-ing results,needs few prior parameters, correctly deals with irregularly shaped clusters, and is insensitive to noises and outliers. As this wavelet-based clustering method isaimed at solving two-dimensional data clustering prob-lem, for high-dimensional datasets, self-organizing mapand U-matrlx method are applied to transform them intotwo-dimensional Euclidean space, so that high-dimensional data clustering analysis,Results on some sim-ulated data and standard test data are reported to illus-trate the power of our method.

  20. Constructing storyboards based on hierarchical clustering analysis

    Science.gov (United States)

    Hasebe, Satoshi; Sami, Mustafa M.; Muramatsu, Shogo; Kikuchi, Hisakazu

    2005-07-01

    There are growing needs for quick preview of video contents for the purpose of improving accessibility of video archives as well as reducing network traffics. In this paper, a storyboard that contains a user-specified number of keyframes is produced from a given video sequence. It is based on hierarchical cluster analysis of feature vectors that are derived from wavelet coefficients of video frames. Consistent use of extracted feature vectors is the key to avoid a repetition of computationally-intensive parsing of the same video sequence. Experimental results suggest that a significant reduction in computational time is gained by this strategy.

  1. Latent cluster analysis of ALS phenotypes identifies prognostically differing groups.

    Directory of Open Access Journals (Sweden)

    Jeban Ganesalingam

    Full Text Available BACKGROUND: Amyotrophic lateral sclerosis (ALS is a degenerative disease predominantly affecting motor neurons and manifesting as several different phenotypes. Whether these phenotypes correspond to different underlying disease processes is unknown. We used latent cluster analysis to identify groupings of clinical variables in an objective and unbiased way to improve phenotyping for clinical and research purposes. METHODS: Latent class cluster analysis was applied to a large database consisting of 1467 records of people with ALS, using discrete variables which can be readily determined at the first clinic appointment. The model was tested for clinical relevance by survival analysis of the phenotypic groupings using the Kaplan-Meier method. RESULTS: The best model generated five distinct phenotypic classes that strongly predicted survival (p<0.0001. Eight variables were used for the latent class analysis, but a good estimate of the classification could be obtained using just two variables: site of first symptoms (bulbar or limb and time from symptom onset to diagnosis (p<0.00001. CONCLUSION: The five phenotypic classes identified using latent cluster analysis can predict prognosis. They could be used to stratify patients recruited into clinical trials and generating more homogeneous disease groups for genetic, proteomic and risk factor research.

  2. Bayesian Exploratory Factor Analysis

    DEFF Research Database (Denmark)

    Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.;

    2014-01-01

    This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corr......This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor......, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study confirms the validity of the approach. The method is used to produce interpretable low dimensional aggregates...

  3. [On National Demonstration Areas: a cluster analysis].

    Science.gov (United States)

    Mao, F; Jiang, Y Y; Dong, W L; Ji, N; Dong, J Q

    2017-04-10

    Objective: To understand the 'backward' provinces and the relatively poor work among the construction of National Demonstration Area, so as to promote communication and future visions among different regions. Methods: Methods on Cluster analysis were used to compare the development of National Demonstration Area in different provinces, including the coverage of National Demonstration Area and the scores of non-communicable disease (NCDs) prevention and control work based on a standardized indicating system. Results: According to the results from the construction of National Demonstration Area, all the 29 provinces and the Xinjiang Production and Construction Corps (except Tibet and Qinghai) were classified into 6 categories: Shanghai; Beijing, Zhejiang, Chongqing; Tianjin, Shandong, Guangdong and Xinjiang Production and Construction Corps; Hebei, Fujian, Hubei, Jiangsu, Liaoning, Xinjiang, Hunan and Guangxi; Shanxi, Jilin, Henan, Hainan,Sichuan, Anhui and Jiangxi; Inner Mongolia, Shaanxi, Ningxia, Guizhou, Yunnan, Gansu and Heilongjiang. Based on the scores gathered from this study, 24 items that representing the achievements from the NCDs prevention and control endeavor were classified into 4 categories: Manpower, special day on NCD, information materials development, policy/strategy support, financial support, mass media, enabled environment, community fitness campaign, health promotion for children and teenage, institutional structure and patient self-management; healthy diet, risk factors on NCDs surveillance, tobacco control and community diagnosis; intervention of high-risk groups, identification of high-risk groups, reporting system on cardiovascular and cerebrovascular events, popularization of basic public health service, workplace intervention programs, construction of demonstration units and mortality surveillance; oral hygiene and tumor registration. Contents including oral hygiene, tumor registration, intervention on high-risk groups, identification of

  4. Diagnostics of subtropical plants functional state by cluster analysis

    Directory of Open Access Journals (Sweden)

    Oksana Belous

    2016-05-01

    Full Text Available The article presents an application example of statistical methods for data analysis on diagnosis of the adaptive capacity of subtropical plants varieties. We depicted selection indicators and basic physiological parameters that were defined as diagnostic. We used evaluation on a set of parameters of water regime, there are: determination of water deficit of the leaves, determining the fractional composition of water and detection parameters of the concentration of cell sap (CCS (for tea culture flushes. These settings are characterized by high liability and high responsiveness to the effects of many abiotic factors that determined the particular care in the selection of plant material for analysis and consideration of the impact on sustainability. On the basis of the experimental data calculated the coefficients of pair correlation between climatic factors and used physiological indicators. The result was a selection of physiological and biochemical indicators proposed to assess the adaptability and included in the basis of methodical recommendations on diagnostics of the functional state of the studied cultures. Analysis of complex studies involving a large number of indicators is quite difficult, especially does not allow to quickly identify the similarity of new varieties for their adaptive responses to adverse factors, and, therefore, to set general requirements to conditions of cultivation. Use of cluster analysis suggests that in the analysis of only quantitative data; define a set of variables used to assess varieties (and the more sampling, the more accurate the clustering will happen, be sure to ascertain the measure of similarity (or difference between objects. It is shown that the identification of diagnostic features, which are subjected to statistical processing, impact the accuracy of the varieties classification. Selection in result of the mono-clusters analysis (variety tea Kolhida; hazelnut Lombardsky red; variety kiwi Monty

  5. A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering

    CERN Document Server

    Seldin, Yevgeny

    2010-01-01

    We formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. This formulation enables practical and theoretical comparison of different approaches to graph clustering as well as comparison of graph clustering with other possible ways to model the graph. We adapt the PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008; Seldin, 2009) to derive a PAC-Bayesian generalization bound for graph clustering. The bound shows that graph clustering should optimize a trade-off between empirical data fit and the mutual information that clusters preserve on the graph nodes. A similar trade-off derived from information-theoretic considerations was already shown to produce state-of-the-art results in practice (Slonim et al., 2005; Yom-Tov and Slonim, 2009). This paper supports the empirical evidence by providing a better theoretical foundation, suggesting formal generalization guarantees, and offering...

  6. Classification of persons attempting suicide. A review of cluster analysis research

    Directory of Open Access Journals (Sweden)

    Wołodźko, Tymoteusz

    2014-08-01

    Full Text Available Aim: Review of conclusions from cluster analysis research on suicide risk factors published after the year 1993. Methods: Search and analysis of cluster analysis research papers on suicidal behaviour. Results: Following groups where distinguished: (1 persons with comorbid mental disorders or with severe symptoms, (2 persons without mental disorders or with mild symptoms, (3 persons with personality disorders and externalizing psychopathology, (4 socially withdrawn persons with a tendency to avoid social contacts, (5 depressive persons Conclusions: Analysis of studies on characteristics of suicide attempters, with the application of cluster analysis, has indicated the possibility of differentiation of several groups of persons with significantly increased risk of suicide attempt. The reviewed cluster analysis research had multiple methodological limitations. Studies employing cluster analysis on large, representative and homogeneous population are needed.

  7. 应用因子分析和K-MEANS聚类的客户分群建模%Customer Segmentation Modeling on Factor Analysis and K-MEANS Clustering

    Institute of Scientific and Technical Information of China (English)

    彭凯; 秦永彬; 许道云

    2011-01-01

    为挖掘存量用户的潜在数据业务使用需求,研究客户细分成为各电信运营商进行差异化营销所必须解决的问题.利用聚类算法提出了一种解决电信短信业务客户分群的应用模型.首先基于因子分析为复杂参数变量下的数据挖掘有效地减少了冗余字段,提高了模型构建的质量和效率,然后通过无监督的K-MEANS分群算法完成分群.经验证,该短信分群模型具备明显的特征差异性.2009年某西部通信企业应用该模型在数据业务差异化营销中取得了明显的效益.%To develop customers' potential demands for data services, the research for customer segmentation has become a primitive work of telecommunications operators in order to run a differentiated users' marketing. Through the use of clustering algorithm, this paper presented a segmentation modeling for differentiating customers using short messaging services in telecommunications operators. Firstly, based on factor analysis, redundant properties were simplified in the complex data mining under variable parameters in order to improve the quality and efficiency of the modeling, and then the customer segmentation model was constructed through unsupervised clustering K-MEANS algorithm. It was verified that the SMS users have the obvious differentiation of characteristics by using the cluster model. In 2009, a western communications enterprise achieved significant benefits with application of the model in the differentiated data service marketing.

  8. CHOOSING A HEALTH INSTITUTION WITH MULTIPLE CORRESPONDENCE ANALYSIS AND CLUSTER ANALYSIS IN A POPULATION BASED STUDY

    Directory of Open Access Journals (Sweden)

    ASLI SUNER

    2013-06-01

    Full Text Available Multiple correspondence analysis is a method making easy to interpret the categorical variables given in contingency tables, showing the similarities, associations as well as divergences among these variables via graphics on a lower dimensional space. Clustering methods are helped to classify the grouped data according to their similarities and to get useful summarized data from them. In this study, interpretations of multiple correspondence analysis are supported by cluster analysis; factors affecting referred health institute such as age, disease group and health insurance are examined and it is aimed to compare results of the methods.

  9. Cardiovascular risk factor clustering and its association with fitness in nine-year-old rural Norwegian children

    DEFF Research Database (Denmark)

    Resaland, G K; Mamen, A; Boreham, C

    2010-01-01

    .3). To determine the degree of clustering, six CVD risk factors were selected: homeostasis model assessment score, waist circumference, triglycerides, systolic blood pressure, total cholesterol to high-density lipoprotein ratio and fitness (VO(2peak)). Clustering was observed in 9.9% of the boys and 13......This paper describes cardiovascular disease (CVD) risk factor levels in a population-representative sample of healthy, rural Norwegian children and examines the association between fitness and clustering of CVD risk factors. Final analyses included 111 boys and 116 girls (mean age 9.3 +/- 0.......8% of the girls. In a different analysis, fitness was omitted as a CVD risk factor and analyzed against the five remaining CVD risk factors. Low fitness was a strong predictor for clustering of CVD risk factors, and children in the least-fit quartile had significantly poorer CVD risk factor values than all...

  10. A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990-2010: a systematic analysis for the Global Burden of Disease Study 2010

    NARCIS (Netherlands)

    Lim, S.S.; Vos, T.; Flaxman, A.D.; Danaei, G.; Shibuya, K.; Adair-Rohani, H.; Amann, M.; Anderson, H.R.; Andrews, K.G.; Aryee, M.; Atkinson, C.; Bacchus, L.J.; Bahalim, A.N.; Balakrishnan, K.; Balmes, J.; Barker-Collo, S.; Baxter, A.; Bell, M.L.; Blore, J.D.; Blyth, F.; Bonner, C.; Borges, G.; Bourne, R.; Boussinesq, M.; Brauer, M.|info:eu-repo/dai/nl/31149157X; Brooks, P.; Bruce, N.G.; Brunekreef, B.|info:eu-repo/dai/nl/067548180; Bryan-Hancock, C.; Bucello, C.; Buchbinder, R.; Bull, F.; Burnett, R.T.; Byers, T.E.; Calabria, B.; Carapetis, J.; Carnahan, E.; Chafe, Z.; Charlson, F.; Chen, H.; Chen, J.S.; Cheng, A.T.; Child, J.C.; Cohen, A.; Colson, K.E.; Cowie, B.C.; Darby, S.; Darling, S.; Davis, A.; Degenhardt, L.; Dentener, F.; Des Jarlais, D.C.; Devries, K.; Dherani, M.; Ding, E.L.; Dorsey, E.R.; Driscoll, T.; Edmond, K.; Ali, S.E.; Engell, R.E.; Erwin, P.J.; Fahimi, S.; Falder, G.; Farzadfar, F.; Ferrari, A.; Finucane, M.M.; Flaxman, S.; Fowkes, F.G.R.; Freedman, G.; Freeman, M.K.; Gakidou, E.; Ghosh, S.; Giovannucci, E.; Gmel, G.; Graham, K.; Grainger, R.; Grant, B.; Gunnell, D.; Gutierrez, H.R.; Hall, W.; Hoek, H.W.; Hogan, A.; Hosgood, H.D.; Hoy, D.; Hu, H.; Hubbell, B.J.; Hutchings, S.J.; Ibeanusi, S.E.; Jacklyn, G.L.; Jasrasaria, R.; Jonas, J.B.; Kan, H.; Kanis, J.A.; Kassebaum, N.; Kawakami, N.; Khang, Y-H.; Khatibzadeh, S.; Khoo, J-P.; de Kok, C.; Laden, F.; Lalloo, R.; Lan, Q.; Lathlean, T.; Leasher, J.L.; Leigh, J.; Li, Y.; Lin, J.K.; Lipshultz, S.E.; London, S.; Lozano, R.; Lu, Y.; Mak, J.; Malekzadeh, R.; Mallinger, L.; Marcenes, W.; March, L.; Marks, R.; Martin, R.; McGale, P.; McGrath, J.; Mehta, S.; Mensah, G.A.; Merriman, T.R.; Micha, R.; Michaud, C.; Mishra, V.; Hanafiah, K.M.; Mokdad, A.A.; Morawska, L.; Mozaffarian, D.; Murphy, T.; Naghavi, M.; Neal, B.; Nelson, P.K.; Nolla, J.M.; Norman, R.; Olives, C.; Omer, S. B; Orchard, J.; Osborne, R.; Ostro, B.; Page, A.; Pandey, K.D.; Parry, C.D.H.; Passmore, E.; Patra, J.; Pearce, N.; Pelizzari, P.M.; Petzold, M.; Phillips, M.R.; Pope, D.; Pope, C.A.; Powles, J.; Rao, M.; Razavi, H.; Rehfuess, E.A.; Rehm, J.T.; Ritz, B.; Rivara, F.P.; Roberts, T.; Robinson, C.; Rodriguez-Portales, J.A.; Romieu, I.; Room, R.; Rosenfeld, L.C.; Roy, A.; Rushton, L.; Salomon, J.A.; Sampson, U.; Sanchez-Riera, L.; Sanman, E.; Sapkota, A.; Seedat, S.; Shi, P.; Shield, K.; Shivakoti, R.; Singh, G.M.; Sleet, D.A.; Smith, E.; Smith, K.R.; Stapelberg, N.J.C.; Steenland, K.; Stöckl, H.; Stovner, L.J.; Straif, K.; Straney, L.; Thurston, G.D.; Tran, J.H.; van Dingenen, R.; van Donkelaar, A.; Veerman, J.L.; Vijayakumar, L.; Weintraub, R.; Weissman, M.M.; White, R.A.; Whiteford, H.; Wiersma, S.T.; Wilkinson, J.D.; Williams, H.C.; Williams, W.; Wilson, N.; Woolf, A.D.; Yip, P.; Zielinski, J.M.; Lopez, A.D.; Murray, C.J.L.; Ezzati, M.

    2012-01-01

    BACKGROUND Quantification of the disease burden caused by different risks informs prevention by providing an account of health loss different to that provided by a disease-by-disease analysis. No complete revision of global disease burden caused by risk factors has been done since a comparative risk

  11. A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990-2010: a systematic analysis for the Global Burden of Disease Study 2010

    NARCIS (Netherlands)

    Lim, S.S.; Vos, T.; Flaxman, A.D.; Danaei, G.; Shibuya, K.; Adair-Rohani, H.; Amann, M.; Anderson, H.R.; Andrews, K.G.; Aryee, M.; Atkinson, C.; Bacchus, L.J.; Bahalim, A.N.; Balakrishnan, K.; Balmes, J.; Barker-Collo, S.; Baxter, A.; Bell, M.L.; Blore, J.D.; Blyth, F.; Bonner, C.; Borges, G.; Bourne, R.; Boussinesq, M.; Brauer, M.; Brooks, P.; Bruce, N.G.; Brunekreef, B.; Bryan-Hancock, C.; Bucello, C.; Buchbinder, R.; Bull, F.; Burnett, R.T.; Byers, T.E.; Calabria, B.; Carapetis, J.; Carnahan, E.; Chafe, Z.; Charlson, F.; Chen, H.; Chen, J.S.; Cheng, A.T.; Child, J.C.; Cohen, A.; Colson, K.E.; Cowie, B.C.; Darby, S.; Darling, S.; Davis, A.; Degenhardt, L.; Dentener, F.; Des Jarlais, D.C.; Devries, K.; Dherani, M.; Ding, E.L.; Dorsey, E.R.; Driscoll, T.; Edmond, K.; Ali, S.E.; Engell, R.E.; Erwin, P.J.; Fahimi, S.; Falder, G.; Farzadfar, F.; Ferrari, A.; Finucane, M.M.; Flaxman, S.; Fowkes, F.G.R.; Freedman, G.; Freeman, M.K.; Gakidou, E.; Ghosh, S.; Giovannucci, E.; Gmel, G.; Graham, K.; Grainger, R.; Grant, B.; Gunnell, D.; Gutierrez, H.R.; Hall, W.; Hoek, H.W.; Hogan, A.; Hosgood, H.D.; Hoy, D.; Hu, H.; Hubbell, B.J.; Hutchings, S.J.; Ibeanusi, S.E.; Jacklyn, G.L.; Jasrasaria, R.; Jonas, J.B.; Kan, H.; Kanis, J.A.; Kassebaum, N.; Kawakami, N.; Khang, Y-H.; Khatibzadeh, S.; Khoo, J-P.; de Kok, C.; Laden, F.; Lalloo, R.; Lan, Q.; Lathlean, T.; Leasher, J.L.; Leigh, J.; Li, Y.; Lin, J.K.; Lipshultz, S.E.; London, S.; Lozano, R.; Lu, Y.; Mak, J.; Malekzadeh, R.; Mallinger, L.; Marcenes, W.; March, L.; Marks, R.; Martin, R.; McGale, P.; McGrath, J.; Mehta, S.; Mensah, G.A.; Merriman, T.R.; Micha, R.; Michaud, C.; Mishra, V.; Hanafiah, K.M.; Mokdad, A.A.; Morawska, L.; Mozaffarian, D.; Murphy, T.; Naghavi, M.; Neal, B.; Nelson, P.K.; Nolla, J.M.; Norman, R.; Olives, C.; Omer, S. B; Orchard, J.; Osborne, R.; Ostro, B.; Page, A.; Pandey, K.D.; Parry, C.D.H.; Passmore, E.; Patra, J.; Pearce, N.; Pelizzari, P.M.; Petzold, M.; Phillips, M.R.; Pope, D.; Pope, C.A.; Powles, J.; Rao, M.; Razavi, H.; Rehfuess, E.A.; Rehm, J.T.; Ritz, B.; Rivara, F.P.; Roberts, T.; Robinson, C.; Rodriguez-Portales, J.A.; Romieu, I.; Room, R.; Rosenfeld, L.C.; Roy, A.; Rushton, L.; Salomon, J.A.; Sampson, U.; Sanchez-Riera, L.; Sanman, E.; Sapkota, A.; Seedat, S.; Shi, P.; Shield, K.; Shivakoti, R.; Singh, G.M.; Sleet, D.A.; Smith, E.; Smith, K.R.; Stapelberg, N.J.C.; Steenland, K.; Stöckl, H.; Stovner, L.J.; Straif, K.; Straney, L.; Thurston, G.D.; Tran, J.H.; van Dingenen, R.; van Donkelaar, A.; Veerman, J.L.; Vijayakumar, L.; Weintraub, R.; Weissman, M.M.; White, R.A.; Whiteford, H.; Wiersma, S.T.; Wilkinson, J.D.; Williams, H.C.; Williams, W.; Wilson, N.; Woolf, A.D.; Yip, P.; Zielinski, J.M.; Lopez, A.D.; Murray, C.J.L.; Ezzati, M.

    2012-01-01

    BACKGROUND Quantification of the disease burden caused by different risks informs prevention by providing an account of health loss different to that provided by a disease-by-disease analysis. No complete revision of global disease burden caused by risk factors has been done since a comparative risk

  12. 基于因子-聚类分析复合模型的维修保障能力评估方法%Evaluation Method of Maintain Capacity Based on Factor-clustering Analysis Complex Model

    Institute of Scientific and Technical Information of China (English)

    张锐丽; 史凤隆; 高万春

    2013-01-01

    Maintenance capability assessment involves many measurable indicators , and how to streamline a large number of in-dex values is a hot research problem .We used the factor analysis to integrate various indicators , considered their relevance , and then extracted the common factors .According to the common factors which represent the maintenance indicators , we reintegrated the original data , carried out the groups divided with systematic cluster .%维修保障能力中涉及衡量的指标值较多,如何对大量的指标值进行精简,是当前评估保障能力研究的热点。本文使用因子分析先将指标综合,考虑其相关性,提取公共因子,然后根据公因子代表的维修指标重新评估维修保障能力。

  13. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

    CERN Document Server

    Emmons, Scott; Gallant, Mike; Börner, Katy

    2016-01-01

    Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Blondel, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 o...

  14. Cluster analysis in severe emphysema subjects using phenotype and genotype data: an exploratory investigation

    Directory of Open Access Journals (Sweden)

    Martinez Fernando J

    2010-03-01

    Full Text Available Abstract Background Numerous studies have demonstrated associations between genetic markers and COPD, but results have been inconsistent. One reason may be heterogeneity in disease definition. Unsupervised learning approaches may assist in understanding disease heterogeneity. Methods We selected 31 phenotypic variables and 12 SNPs from five candidate genes in 308 subjects in the National Emphysema Treatment Trial (NETT Genetics Ancillary Study cohort. We used factor analysis to select a subset of phenotypic variables, and then used cluster analysis to identify subtypes of severe emphysema. We examined the phenotypic and genotypic characteristics of each cluster. Results We identified six factors accounting for 75% of the shared variability among our initial phenotypic variables. We selected four phenotypic variables from these factors for cluster analysis: 1 post-bronchodilator FEV1 percent predicted, 2 percent bronchodilator responsiveness, and quantitative CT measurements of 3 apical emphysema and 4 airway wall thickness. K-means cluster analysis revealed four clusters, though separation between clusters was modest: 1 emphysema predominant, 2 bronchodilator responsive, with higher FEV1; 3 discordant, with a lower FEV1 despite less severe emphysema and lower airway wall thickness, and 4 airway predominant. Of the genotypes examined, membership in cluster 1 (emphysema-predominant was associated with TGFB1 SNP rs1800470. Conclusions Cluster analysis may identify meaningful disease subtypes and/or groups of related phenotypic variables even in a highly selected group of severe emphysema subjects, and may be useful for genetic association studies.

  15. Bayesian Exploratory Factor Analysis

    DEFF Research Database (Denmark)

    Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.

    2014-01-01

    This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor......, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study confirms the validity of the approach. The method is used to produce interpretable low dimensional aggregates...

  16. Analysis and Prediction of Crimes by Clustering and Classification

    Directory of Open Access Journals (Sweden)

    Rasoul Kiani

    2015-08-01

    Full Text Available Crimes will somehow influence organizations and institutions when occurred frequently in a society. Thus, it seems necessary to study reasons, factors and relations between occurrence of different crimes and finding the most appropriate ways to control and avoid more crimes. The main objective of this paper is to classify clustered crimes based on occurrence frequency during different years. Data mining is used extensively in terms of analysis, investigation and discovery of patterns for occurrence of different crimes. We applied a theoretical model based on data mining techniques such as clustering and classification to real crime dataset recorded by police in England and Wales within 1990 to 2011. We assigned weights to the features in order to improve the quality of the model and remove low value of them. The Genetic Algorithm (GA is used for optimizing of Outlier Detection operator parameters using RapidMiner tool.

  17. Cluster analysis of word frequency dynamics

    Science.gov (United States)

    Maslennikova, Yu S.; Bochkarev, V. V.; Belashova, I. A.

    2015-01-01

    This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations.

  18. Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms

    Science.gov (United States)

    Esplin, M Sean; Manuck, Tracy A.; Varner, Michael W.; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M.; Ilekis, John

    2015-01-01

    Objective We sought to employ an innovative tool based on common biological pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB), in order to enhance investigators' ability to identify to highlight common mechanisms and underlying genetic factors responsible for SPTB. Study Design A secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks gestation. Each woman was assessed for the presence of underlying SPTB etiologies. A hierarchical cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis using VEGAS software. Results 1028 women with SPTB were assigned phenotypes. Hierarchical clustering of the phenotypes revealed five major clusters. Cluster 1 (N=445) was characterized by maternal stress, cluster 2 (N=294) by premature membrane rupture, cluster 3 (N=120) by familial factors, and cluster 4 (N=63) by maternal comorbidities. Cluster 5 (N=106) was multifactorial, characterized by infection (INF), decidual hemorrhage (DH) and placental dysfunction (PD). These three phenotypes were highly correlated by Chi-square analysis [PD and DH (p<2.2e-6); PD and INF (p=6.2e-10); INF and DH (p=0.0036)]. Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. Conclusion We identified 5 major clusters of SPTB based on a phenotype tool and hierarchal clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors underlying SPTB. PMID:26070700

  19. Clustering of metabolic syndrome factors in Malaysian population: Asian Criteria revisited

    Directory of Open Access Journals (Sweden)

    YN Azwany

    2011-08-01

    Full Text Available Introduction: Metabolic syndrome (MetS had been known as clustering of risk factors for cardiovascular disease and diabetes. Over the years, clinical criteria had been revised to highlight importance of various risk factors in defining MetS. Studies had reported different clustering of factors based on different population characteristics.Objective: Our study aimed to identify the clustering factors among our Malaysian population based on sexes and 4 major ethnic groups namely Malay, Chinese, Indian and other minor ethnicMethods: A national cross sectional study was done covering both Peninsular and East Malaysia. Subjects’ sociodemographic, body mass index (BMI, waist, hip and neck circumference, blood pressure, fasting triglycerides (TG and HDL-cholesterol and glucose, urine microalbumin and serum insulin were taken. Principal component factor analysis with Varimax rotation was done to identify the clustering based on sex and ethnic groups.Results: One thousand two hundred and sixty eight male and 2355 female subjects were recruited. Majority of subjects were Malays (63.0% followed by Chinese (13.3%, Indian (7.4% and other ethnic groups (13.8% which followed the population composition in Malaysia. Four factors were identified for both men and women. The factors were anthropometry, glycemia, blood pressure and dyslipidemia given the cumulative percent of variance of 69.4 and 65.9 respectively. There are 4 factors identified for Malay, Chinese and Aborigines but 5 factors for Indian ethnic groups given cumulative percent of variance explained ranged from 65.1 to 77.7.Discussion and Conclusion: BMI, neck circumference, blood pressure, Fasting TG and HDL had a high factor loading in both sexes suggesting that for field screening, diagnostic criteria would be adequate criteria. These factors also showed a similar pattern of loading by different ethnic groups. In conclusion, in Malaysian population, at least one measurement from each components

  20. In Silico Analysis for Transcription Factors With Zn(II2C6 Binuclear Cluster DNA-Binding Domains in Candida albicans

    Directory of Open Access Journals (Sweden)

    Sergi Maicas

    2005-01-01

    presence of the CysX2CysX6CysX5-16CysX2CysX6-8Cys motif and a putative nuclear localization signal. Using this approach, 70 putative Zn(II2C6 transcription factors have been found in the genome of C. albicans.

  1. Functional analysis of a biosynthetic cluster essential for production of 4-formylaminooxyvinylglycine, a germination-arrest factor from Pseudomonas fluorescens WH6

    Science.gov (United States)

    Rhizosphere-associated Pseudomonas fluorescens WH6 produces the germination-arrest factor, 4-formylaminooxyvinylglycine (FVG). FVG has previously been shown to both arrest the germination of weedy grasses and to inhibit the growth of the bacterial plant pathogen Erwinia amylovora. Very little is kno...

  2. Identification of five chronic obstructive pulmonary disease subgroups with different prognoses in the ECLIPSE cohort using cluster analysis.

    Science.gov (United States)

    Rennard, Stephen I; Locantore, Nicholas; Delafont, Bruno; Tal-Singer, Ruth; Silverman, Edwin K; Vestbo, Jørgen; Miller, Bruce E; Bakke, Per; Celli, Bartolomé; Calverley, Peter M A; Coxson, Harvey; Crim, Courtney; Edwards, Lisa D; Lomas, David A; MacNee, William; Wouters, Emiel F M; Yates, Julie C; Coca, Ignacio; Agustí, Alvar

    2015-03-01

    Chronic obstructive pulmonary disease (COPD) is a heterogeneous disease that likely includes clinically relevant subgroups. To identify subgroups of COPD in ECLIPSE (Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints) subjects using cluster analysis and to assess clinically meaningful outcomes of the clusters during 3 years of longitudinal follow-up. Factor analysis was used to reduce 41 variables determined at recruitment in 2,164 patients with COPD to 13 main factors, and the variables with the highest loading were used for cluster analysis. Clusters were evaluated for their relationship with clinically meaningful outcomes during 3 years of follow-up. The relationships among clinical parameters were evaluated within clusters. Five subgroups were distinguished using cross-sectional clinical features. These groups differed regarding outcomes. Cluster A included patients with milder disease and had fewer deaths and hospitalizations. Cluster B had less systemic inflammation at baseline but had notable changes in health status and emphysema extent. Cluster C had many comorbidities, evidence of systemic inflammation, and the highest mortality. Cluster D had low FEV1, severe emphysema, and the highest exacerbation and COPD hospitalization rate. Cluster E was intermediate for most variables and may represent a mixed group that includes further clusters. The relationships among clinical variables within clusters differed from that in the entire COPD population. Cluster analysis using baseline data in ECLIPSE identified five COPD subgroups that differ in outcomes and inflammatory biomarkers and show different relationships between clinical parameters, suggesting the clusters represent clinically and biologically different subtypes of COPD.

  3. Multilevel exploratory factor analysis of discrete data

    NARCIS (Netherlands)

    Barendse, M.T.; Oort, F.J.; Jak, S.; Timmerman, M.E.

    2013-01-01

    Exploratory factor analysis (EFA) can be used to determine the dimensionality of a set of items. When data come from clustered subjects, such as pupils within schools or children within families, the hierarchical structure of the data should be taken into account. Standard multilevel EFA is only sui

  4. Somatotyping using 3D anthropometry: a cluster analysis.

    Science.gov (United States)

    Olds, Tim; Daniell, Nathan; Petkov, John; David Stewart, Arthur

    2013-01-01

    Somatotyping is the quantification of human body shape, independent of body size. Hitherto, somatotyping (including the most popular method, the Heath-Carter system) has been based on subjective visual ratings, sometimes supported by surface anthropometry. This study used data derived from three-dimensional (3D) whole-body scans as inputs for cluster analysis to objectively derive clusters of similar body shapes. Twenty-nine dimensions normalised for body size were measured on a purposive sample of 301 adults aged 17-56 years who had been scanned using a Vitus Smart laser scanner. K-means Cluster Analysis with v-fold cross-validation was used to determine shape clusters. Three male and three female clusters emerged, and were visualised using those scans closest to the cluster centroid and a caricature defined by doubling the difference between the average scan and the cluster centroid. The male clusters were decidedly endomorphic (high fatness), ectomorphic (high linearity), and endo-mesomorphic (a mixture of fatness and muscularity). The female clusters were clearly endomorphic, ectomorphic, and the ecto-mesomorphic (a mixture of linearity and muscularity). An objective shape quantification procedure combining 3D scanning and cluster analysis yielded shape clusters strikingly similar to traditional somatotyping.

  5. A hybrid monkey search algorithm for clustering analysis.

    Science.gov (United States)

    Chen, Xin; Zhou, Yongquan; Luo, Qifang

    2014-01-01

    Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.

  6. A Hybrid Monkey Search Algorithm for Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Xin Chen

    2014-01-01

    Full Text Available Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.

  7. Metabolic syndrome across Europe: different clusters of risk factors.

    Science.gov (United States)

    Scuteri, Angelo; Laurent, Stephane; Cucca, Francesco; Cockcroft, John; Cunha, Pedro Guimaraes; Mañas, Leocadio Rodriguez; Mattace Raso, Francesco U; Muiesan, Maria Lorenza; Ryliškytė, Ligita; Rietzschel, Ernst; Strait, James; Vlachopoulos, Charalambos; Völzke, Henry; Lakatta, Edward G; Nilsson, Peter M

    2015-04-01

    Metabolic syndrome (MetS) remains a controversial entity. Specific clusters of MetS components - rather than MetS per se - are associated with accelerated arterial ageing and with cardiovascular (CV) events. To investigate whether the distribution of clusters of MetS components differed cross-culturally, we studied 34,821 subjects from 12 cohorts from 10 European countries and one cohort from the USA in the MARE (Metabolic syndrome and Arteries REsearch) Consortium. In accordance with the ATP III criteria, MetS was defined as an alteration three or more of the following five components: elevated glucose (G), fasting glucose ≥110 mg/dl; low HDL cholesterol, 102 cm for men or >88 cm for women. MetS had a 24.3% prevalence (8468 subjects: 23.9% in men vs. 24.6% in women, p < 0.001) with an age-associated increase in its prevalence in all the cohorts. The age-adjusted prevalence of the clusters of MetS components previously associated with greater arterial and CV burden differed across countries (p < 0.0001) and in men and women (p < 0.0001). In details, the cluster TBW was observed in 12% of the subjects with MetS, but was far more common in the cohorts from the UK (32.3%), Sardinia in Italy (19.6%), and Germany (18.5%) and less prevalent in the cohorts from Sweden (1.2%), Spain (2.6%), and the USA (2.5%). The cluster GBW accounted for 12.7% of subjects with MetS with higher occurrence in Southern Europe (Italy, Spain, and Portugal: 31.4, 18.4, and 17.1% respectively) and in Belgium (20.4%), than in Northern Europe (Germany, Sweden, and Lithuania: 7.6, 9.4, and 9.6% respectively). The analysis of the distribution of MetS suggested that what follows under the common definition of MetS is not a unique entity rather a constellation of cluster of MetS components, likely selectively risky for CV disease, whose occurrence differs across countries. © The European Society of Cardiology 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.

  8. Smartness and Italian Cities. A Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Flavio Boscacci

    2014-05-01

    Full Text Available Smart cities have been recently recognized as the most pleasing and attractive places to live in; due to this, both scholars and policy-makers pay close attention to this topic. Specifically, urban “smartness” has been identified by plenty of characteristics that can be grouped into six dimensions (Giffinger et al. 2007: smart Economy (competitiveness, smart People (social and human capital, smart Governance (participation, smart Mobility (both ICTs and transport, smart Environment (natural resources, and smart Living (quality of life. According to this analytical framework, in the present paper the relation between urban attractiveness and the “smart” characteristics has been investigated in the 103 Italian NUTS3 province capitals in the year 2011. To this aim, a descriptive statistics has been followed by a regression analysis (OLS, where the dependent variable measuring the urban attractiveness has been proxied by housing market prices. Besides, a Cluster Analysis (CA has been developed in order to find differences and commonalities among the province capitals.The OLS results indicate that living, people and economy are the key drivers for achieving a better urban attractiveness. Environment, instead, keeps on playing a minor role. Besides, the CA groups the province capitals a

  9. Instantaneous normal mode analysis of melting of finite dust clusters.

    Science.gov (United States)

    Melzer, André; Schella, André; Schablinski, Jan; Block, Dietmar; Piel, Alexander

    2012-06-01

    The experimental melting transition of finite two-dimensional dust clusters in a dusty plasma is analyzed using the method of instantaneous normal modes. In the experiment, dust clusters are heated in a thermodynamic equilibrium from a solid to a liquid state using a four-axis laser manipulation system. The fluid properties of the dust cluster, such as the diffusion constant, are measured from the instantaneous normal mode analysis. Thereby, the phase transition of these finite clusters is approached from the liquid phase. From the diffusion constants, unique melting temperatures have been assigned to dust clusters of various sizes that very well reflect their dynamical stability properties.

  10. Cluster Analysis to Identify Possible Subgroups in Tinnitus Patients.

    Science.gov (United States)

    van den Berge, Minke J C; Free, Rolien H; Arnold, Rosemarie; de Kleine, Emile; Hofman, Rutger; van Dijk, J Marc C; van Dijk, Pim

    2017-01-01

    In tinnitus treatment, there is a tendency to shift from a "one size fits all" to a more individual, patient-tailored approach. Insight in the heterogeneity of the tinnitus spectrum might improve the management of tinnitus patients in terms of choice of treatment and identification of patients with severe mental distress. The goal of this study was to identify subgroups in a large group of tinnitus patients. Data were collected from patients with severe tinnitus complaints visiting our tertiary referral tinnitus care group at the University Medical Center Groningen. Patient-reported and physician-reported variables were collected during their visit to our clinic. Cluster analyses were used to characterize subgroups. For the selection of the right variables to enter in the cluster analysis, two approaches were used: (1) variable reduction with principle component analysis and (2) variable selection based on expert opinion. Various variables of 1,783 tinnitus patients were included in the analyses. Cluster analysis (1) included 976 patients and resulted in a four-cluster solution. The effect of external influences was the most discriminative between the groups, or clusters, of patients. The "silhouette measure" of the cluster outcome was low (0.2), indicating a "no substantial" cluster structure. Cluster analysis (2) included 761 patients and resulted in a three-cluster solution, comparable to the first analysis. Again, a "no substantial" cluster structure was found (0.2). Two cluster analyses on a large database of tinnitus patients revealed that clusters of patients are mostly formed by a different response of external influences on their disease. However, both cluster outcomes based on this dataset showed a poor stability, suggesting that our tinnitus population comprises a continuum rather than a number of clearly defined subgroups.

  11. PERFORMANCE ANALYSIS OF CLUSTERED RADIO INTERFEROMETRIC CALIBRATION

    NARCIS (Netherlands)

    Kazemi, S.; Yatawatta, S.; Zaroubi, S.

    2012-01-01

    Subtraction of compact, bright sources is essential to produce high quality images in radio astronomy. It is recently proposed that 'clustered' calibration can perform better in subtracting fainter background sources. This is due to the fact that the effective power of a source cluster is greater th

  12. The Psychology of Yoga Practitioners: A Cluster Analysis.

    Science.gov (United States)

    Genovese, Jeremy E C; Fondran, Kristine M

    2017-03-30

    Yoga practitioners (N = 261) completed the revised Expression of Spirituality Inventory (ESI) and the Multidimensional Body-Self Relations Questionnaire. Cluster analysis revealed three clusters: Cluster A scored high on all four spiritual constructs. They had high positive evaluations of their appearance, but a lower orientation towards their appearance. They tended to have a high evaluation of their fitness and health, and higher body satisfaction. Cluster B showed lower scores on the spiritual constructs. Like Cluster A, members of Cluster B tended to show high positive evaluations of appearance and fitness. They also had higher body satisfaction. Members of Cluster B had a higher fitness orientation and a higher appearance orientation than members of Cluster A. Members of Cluster C had low scores for all spiritual constructs. They had a low evaluation of, and unhappiness with, their appearance. They were unhappy with the size and appearance of their bodies. They tended to see themselves as overweight. There was a significant difference in years of practice between the three groups (Kruskall-Wallis, p = .0041). Members of Cluster A have the most years of yoga experience and members of Cluster B have more yoga experience than members of Cluster C. These results suggest the possible existence of a developmental trajectory for yoga practitioners. Such a developmental sequence may have important implications for yoga practice and instruction.

  13. Using Cluster Analysis for Data Mining in Educational Technology Research

    Science.gov (United States)

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

    2012-01-01

    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  14. A Survey of Popular R Packages for Cluster Analysis

    Science.gov (United States)

    Flynt, Abby; Dean, Nema

    2016-01-01

    Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans, and hclust functions; the mclust library; the poLCA…

  15. Using Cluster Analysis for Data Mining in Educational Technology Research

    Science.gov (United States)

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

    2012-01-01

    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  16. A Survey of Popular R Packages for Cluster Analysis

    Science.gov (United States)

    Flynt, Abby; Dean, Nema

    2016-01-01

    Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans, and hclust functions; the mclust library; the poLCA…

  17. Comparison of Skin Moisturizer: Consumer-Based Brand Equity (CBBE Factors in Clusters Based on Consumer Ethnocentrism

    Directory of Open Access Journals (Sweden)

    Yossy Hanna Garlina

    2014-09-01

    Full Text Available This research aims to analyze relevant factors contributing to the four dimensions of consumer-based brand equity in skin moisturizer industry. It is then followed by the clustering of female consumers of skin moisturizer based on ethnocentrism and differentiating each cluster’s consumer-based brand equity dimensions towards a domestic skin moisturizer brand Mustika Ratu, skin moisturizer. Research used descriptive survey method analysis. Primary data was obtained through questionnaire distribution to 70 female respondents for factor analysis and 120 female respondents for cluster analysis and one way analysis of variance (ANOVA. This research employed factor analysis to obtain relevant factors contributing to the five dimensions of consumer-based brand equity in skin moisturizer industry. Cluster analysis and one way analysis of variance (ANOVA were to see the difference of consumer-based brand equity between highly ethnocentric consumer and low ethnocentric consumer towards the same skin moisturizer domestic brand, Mustika Ratu skin moisturizer. Research found in all individual dimension analysis, all variable means and individual means show distinct difference between the high ethnocentric consumer and the low ethnocentric consumer. The low ethnocentric consumer cluster tends to be lower in mean score of Brand Loyalty, Perceived Quality, Brand Awareness, Brand Association, and Overall Brand Equity than the high ethnocentric consumer cluster. Research concludes consumer ethnocentrism is positively correlated with preferences towards domestic products and negatively correlated with foreign-made product preference. It is, then, highly ethnocentric consumers have positive perception towards domestic product.

  18. Cyber Profiling Using Log Analysis And K-Means Clustering

    Directory of Open Access Journals (Sweden)

    Muhammad Zulfadhilah

    2016-07-01

    Full Text Available The Activities of Internet users are increasing from year to year and has had an impact on the behavior of the users themselves. Assessment of user behavior is often only based on interaction across the Internet without knowing any others activities. The log activity can be used as another way to study the behavior of the user. The Log Internet activity is one of the types of big data so that the use of data mining with K-Means technique can be used as a solution for the analysis of user behavior. This study has been carried out the process of clustering using K-Means algorithm is divided into three clusters, namely high, medium, and low. The results of the higher education institution show that each of these clusters produces websites that are frequented by the sequence: website search engine, social media, news, and information. This study also showed that the cyber profiling had been done strongly influenced by environmental factors and daily activities.

  19. Relative Expression of Vitamin D Hydroxylases, CYP27B1 and CYP24A1, and of Cyclooxygenase-2 and Heterogeneity of Human Colorectal Cancer in Relation to Age, Gender, Tumor Location, and Malignancy: Results from Factor and Cluster Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Brozek, Wolfgang, E-mail: wolfgang.brozek@gmx.at; Manhardt, Teresa; Kállay, Enikö; Peterlik, Meinrad; Cross, Heide S. [Department of Pathophysiology, Medical University of Vienna, Waehringer Guertel 18-20, A-1090 Vienna (Austria)

    2012-07-26

    Previous studies on the significance of vitamin D insufficiency and chronic inflammation in colorectal cancer development clearly indicated that maintenance of cellular homeostasis in the large intestinal epithelium requires balanced interaction of 1,25-(OH){sub 2}D{sub 3} and prostaglandin cellular signaling networks. The present study addresses the question how colorectal cancer pathogenesis depends on alterations of activities of vitamin D hydroxylases, i.e., CYP27B1-encoded 25-hydroxyvitamin D-1α-hydroxylase and CYP24A1-encoded 25-hydroxyvitamin D-24-hydroxylase, and inflammation-induced cyclooxygenase-2 (COX-2). Data from 105 cancer patients on CYP27B1, VDR, CYP24A1, and COX-2 mRNA expression in relation to tumor grade, anatomical location, gender and age were fit into a multivariate model of exploratory factor analysis. Nearly identical results were obtained by the principal factor and the maximum likelihood method, and these were confirmed by hierarchical cluster analysis: Within the eight mutually dependent variables studied four independent constellations were found that identify different features of colorectal cancer pathogenesis: (i) Escape of COX-2 activity from restraints by the CYP27B1/VDR system can initiate cancer growth anywhere in the colorectum regardless of age and gender; (ii) variations in COX-2 expression are mainly responsible for differences in cancer incidence in relation to tumor location; (iii) advancing age has a strong gender-specific influence on cancer incidence; (iv) progression from well differentiated to undifferentiated cancer is solely associated with a rise in CYP24A1 expression.

  20. Clustering Algorithm for Unsupervised Monaural Musical Sound Separation Based on Non-negative Matrix Factorization

    Science.gov (United States)

    Park, Sang Ha; Lee, Seokjin; Sung, Koeng-Mo

    Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.

  1. Cluster analysis of the hot subdwarfs in the PG survey

    Science.gov (United States)

    Thejll, Peter; Charache, Darryl; Shipman, Harry L.

    1989-01-01

    Application of cluster analysis to the hot subdwarfs in the Palomar Green (PG) survey of faint blue high-Galactic-latitude objects is assessed, with emphasis on data noise and the number of clusters to subdivide the data into. The data used in the study are presented, and cluster analysis, using the CLUSTAN program, is applied to it. Distances are calculated using the Euclidean formula, and clustering is done by Ward's method. The results are discussed, and five groups representing natural divisions of the subdwarfs in the PG survey are presented.

  2. Factor Analysis and AIC.

    Science.gov (United States)

    Akaike, Hirotugu

    1987-01-01

    The Akaike Information Criterion (AIC) was introduced to extend the method of maximum likelihood to the multimodel situation. Use of the AIC in factor analysis is interesting when it is viewed as the choice of a Bayesian model; thus, wider applications of AIC are possible. (Author/GDC)

  3. How Teachers Use and Manage Their Blogs? A Cluster Analysis of Teachers' Blogs in Taiwan

    Science.gov (United States)

    Liu, Eric Zhi-Feng; Hou, Huei-Tse

    2013-01-01

    The development of Web 2.0 has ushered in a new set of web-based tools, including blogs. This study focused on how teachers use and manage their blogs. A sample of 165 teachers' blogs in Taiwan was analyzed by factor analysis, cluster analysis and qualitative content analysis. First, the teachers' blogs were analyzed according to six criteria…

  4. Investigating Subtypes of Child Development: A Comparison of Cluster Analysis and Latent Class Cluster Analysis in Typology Creation

    Science.gov (United States)

    DiStefano, Christine; Kamphaus, R. W.

    2006-01-01

    Two classification methods, latent class cluster analysis and cluster analysis, are used to identify groups of child behavioral adjustment underlying a sample of elementary school children aged 6 to 11 years. Behavioral rating information across 14 subscales was obtained from classroom teachers and used as input for analyses. Both the procedures…

  5. Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.

    Science.gov (United States)

    Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun

    2017-01-01

    Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.

  6. Analysis of Stemming Algorithm for Text Clustering

    Directory of Open Access Journals (Sweden)

    N.Sandhya

    2011-09-01

    Full Text Available Text document clustering plays an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In Bag of words representation of documents the words that appear in documents often have many morphological variants and in most cases, morphological variants of words have similar semantic interpretations and can be considered as equivalent for the purpose of clustering applications. For this reason, a number of stemming Algorithms, or stemmers, have been developed, which attempt to reduce a word to its stem or root form. Thus, the key terms of a document are represented by stems rather than by the original words. In this work we have studied the impact of stemming algorithm along with four popular similarity measures (Euclidean, cosine, Pearson correlation and extended Jaccard in conjunction with different types of vector representation (boolean, term frequency and term frequency and inverse document frequency on cluster quality. For Clustering documents we have used partitional based clustering technique K Means. Performance is measured against a human-imposed classification of Classic data set. We conducted a number of experiments and used entropy measure to assure statistical significance of results. Cosine, Pearson correlation and extended Jaccard similarities emerge as the best measures to capture human categorization behavior, while Euclidean measures perform poor. After applying the Stemming algorithm Euclidean measure shows little improvement.

  7. Toward optimal cluster power spectrum analysis

    CERN Document Server

    Smith, Robert E

    2014-01-01

    The power spectrum of galaxy clusters is an important probe of the cosmological model. In this paper we determine the optimal weighting scheme for maximizing the signal-to-noise ratio for such measurements. We find a closed form analytic expression for the optimal weights. Our expression takes into account: cluster mass, finite survey volume effects, survey masking, and a flux limit. The implementation of this weighting scheme requires knowledge of the measured cluster masses, and analytic models for the bias and space-density of clusters as a function of mass and redshift. Recent studies have suggested that the optimal method for reconstruction of the matter density field from a set of clusters is mass-weighting (Seljak et al 2009, Hamaus et al 2010, Cai et al 2011). We compare our optimal weighting scheme with this approach and also with the original power spectrum scheme of Feldman et al (1994). We show that our optimal weighting scheme outperforms these approaches for both volume- and flux-limited cluster...

  8. Spatial clustering and risk factors of malaria infections in Bata district, Equatorial Guinea.

    Science.gov (United States)

    Gómez-Barroso, Diana; García-Carrasco, Emely; Herrador, Zaida; Ncogo, Policarpo; Romay-Barja, María; Ondo Mangue, Martín Eka; Nseng, Gloria; Riloha, Matilde; Santana, Maria Angeles; Valladares, Basilio; Aparicio, Pilar; Benito, Agustín

    2017-04-12

    The transmission of malaria is intense in the majority of the countries of sub-Saharan Africa, particularly in those that are located along the Equatorial strip. The present study aimed to describe the current distribution of malaria prevalence among children and its environment-related factors as well as to detect malaria spatial clusters in the district of Bata, in Equatorial Guinea. From June to August 2013 a representative cross-sectional survey using a multistage, stratified, cluster-selected sample was carried out of children in urban and rural areas of Bata District. All children were tested for malaria using rapid diagnostic tests (RDTs). Results were linked to each household by global position system data. Two cluster analysis methods were used: hot spot analysis using the Getis-Ord Gi statistic, and the SaTScan™ spatial statistic estimates, based on the assumption of a Poisson distribution to detect spatial clusters. In addition, univariate associations and Poisson regression model were used to explore the association between malaria prevalence at household level with different environmental factors. A total of 1416 children aged 2 months to 15 years living in 417 households were included in this study. Malaria prevalence by RDTs was 47.53%, being highest in the age group 6-15 years (63.24%, p malaria risk is greater (65.81%) (p Malaria prevalence was higher in those houses located malaria prevalence with altitude (IRR: 0.73; 95% CI 0.62-0.86). A significant cluster inland of the district, in rural areas has been found. This study reveals a high prevalence of RDT-based malaria among children in Bata district. Those households situated in inland rural areas, near to a river, a green area and/or at low altitude were a risk factor for malaria. Spatial tools can help policy makers to promote new recommendations for malaria control.

  9. The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

    Science.gov (United States)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-07-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  10. The smart cluster method - Adaptive earthquake cluster identification and analysis in strong seismic regions

    Science.gov (United States)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-03-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  11. Factor analysis identifies subgroups of constipation

    Institute of Scientific and Technical Information of China (English)

    Philip G Dinning; Mike Jones; Linda Hunt; Sergio E Fuentealba; Jamshid Kalanter; Denis W King; David Z Lubowski; Nicholas J Talley; Ian J Cook

    2011-01-01

    AIM: To determine whether distinct symptom groupings exist in a constipated population and whether such grouping might correlate with quantifiable pathophysiological measures of colonic dysfunction. METHODS: One hundred and ninety-one patients presenting to a Gastroenterology clinic with constipation and 32 constipated patients responding to a newspaper advertisement completed a 53-item, wide-ranging selfreport questionnaire. One hundred of these patients had colonic transit measured scintigraphically. Factor analysis determined whether constipation-related symptoms grouped into distinct aspects of symptomatology. Cluster analysis was used to determine whether individual patients naturally group into distinct subtypes. RESULTS: Cluster analysis yielded a 4 cluster solution with the presence or absence of pain and laxative unresponsiveness providing the main descriptors. Amongst all clusters there was a considerable proportion of patients with demonstrable delayed colon transit, irritable bowel syndrome positive criteria and regular stool frequency. The majority of patients with these characteristics also reported regular laxative use. CONCLUSION: Factor analysis identified four constipation subgroups, based on severity and laxative unresponsiveness, in a constipated population. However, clear stratification into clinically identifiable groups remains imprecise.

  12. Cluster Analysis of Acute Care Use Yields Insights for Tailored Pediatric Asthma Interventions.

    Science.gov (United States)

    Abir, Mahshid; Truchil, Aaron; Wiest, Dawn; Nelson, Daniel B; Goldstick, Jason E; Koegel, Paul; Lozon, Marie M; Choi, Hwajung; Brenner, Jeffrey

    2017-09-01

    We undertake this study to understand patterns of pediatric asthma-related acute care use to inform interventions aimed at reducing potentially avoidable hospitalizations. Hospital claims data from 3 Camden city facilities for 2010 to 2014 were used to perform cluster analysis classifying patients aged 0 to 17 years according to their asthma-related hospital use. Clusters were based on 2 variables: asthma-related ED visits and hospitalizations. Demographics and a number of sociobehavioral and use characteristics were compared across clusters. Children who met the criteria (3,170) were included in the analysis. An examination of a scree plot showing the decline in within-cluster heterogeneity as the number of clusters increased confirmed that clusters of pediatric asthma patients according to hospital use exist in the data. Five clusters of patients with distinct asthma-related acute care use patterns were observed. Cluster 1 (62% of patients) showed the lowest rates of acute care use. These patients were least likely to have a mental health-related diagnosis, were less likely to have visited multiple facilities, and had no hospitalizations for asthma. Cluster 2 (19% of patients) had a low number of asthma ED visits and onetime hospitalization. Cluster 3 (11% of patients) had a high number of ED visits and low hospitalization rates, and the highest rates of multiple facility use. Cluster 4 (7% of patients) had moderate ED use for both asthma and other illnesses, and high rates of asthma hospitalizations; nearly one quarter received care at all facilities, and 1 in 10 had a mental health diagnosis. Cluster 5 (1% of patients) had extreme rates of acute care use. Differences observed between groups across multiple sociobehavioral factors suggest these clusters may represent children who differ along multiple dimensions, in addition to patterns of service use, with implications for tailored interventions. Copyright © 2017 American College of Emergency Physicians

  13. Intelligent Hybrid Cluster Based Classification Algorithm for Social Network Analysis

    Directory of Open Access Journals (Sweden)

    S. Muthurajkumar

    2014-05-01

    Full Text Available In this paper, we propose an hybrid clustering based classification algorithm based on mean approach to effectively classify to mine the ordered sequences (paths from weblog data in order to perform social network analysis. In the system proposed in this work for social pattern analysis, the sequences of human activities are typically analyzed by switching behaviors, which are likely to produce overlapping clusters. In this proposed system, a robust Modified Boosting algorithm is proposed to hybrid clustering based classification for clustering the data. This work is useful to provide connection between the aggregated features from the network data and traditional indices used in social network analysis. Experimental results show that the proposed algorithm improves the decision results from data clustering when combined with the proposed classification algorithm and hence it is proved that of provides better classification accuracy when tested with Weblog dataset. In addition, this algorithm improves the predictive performance especially for multiclass datasets which can increases the accuracy.

  14. A Bayesian semiparametric factor analysis model for subtype identification.

    Science.gov (United States)

    Sun, Jiehuan; Warren, Joshua L; Zhao, Hongyu

    2017-04-25

    Disease subtype identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to infer disease subtypes, which often lead to biologically meaningful insights into disease. Despite many successes, existing clustering methods may not perform well when genes are highly correlated and many uninformative genes are included for clustering due to the high dimensionality. In this article, we introduce a novel subtype identification method in the Bayesian setting based on gene expression profiles. This method, called BCSub, adopts an innovative semiparametric Bayesian factor analysis model to reduce the dimension of the data to a few factor scores for clustering. Specifically, the factor scores are assumed to follow the Dirichlet process mixture model in order to induce clustering. Through extensive simulation studies, we show that BCSub has improved performance over commonly used clustering methods. When applied to two gene expression datasets, our model is able to identify subtypes that are clinically more relevant than those identified from the existing methods.

  15. Bayesian model-based cluster analysis for predicting macrofaunal communities

    NARCIS (Netherlands)

    Braak, ter C.J.F.; Hoijtink, H.; Akkermans, W.; Verdonschot, P.F.M.

    2003-01-01

    To predict macrofaunal community composition from environmental data a two-step approach is often followed: (1) the water samples are clustered into groups on the basis of the macrofauna data and (2) the groups are related to the environmental data, e.g. by discriminant analysis. For the cluster ana

  16. Hierarchical Cluster Analysis – Various Approaches to Data Preparation

    Directory of Open Access Journals (Sweden)

    Z. Pacáková

    2013-09-01

    Full Text Available The article deals with two various approaches to data preparation to avoid multicollinearity. The aim of the article is to find similarities among the e-communication level of EU states using hierarchical cluster analysis. The original set of fourteen indicators was first reduced on the basis of correlation analysis while in case of high correlation indicator of higher variability was included in further analysis. Secondly the data were transformed using principal component analysis while the principal components are poorly correlated. For further analysis five principal components explaining about 92% of variance were selected. Hierarchical cluster analysis was performed both based on the reduced data set and the principal component scores. Both times three clusters were assumed following Pseudo t-Squared and Pseudo F Statistic, but the final clusters were not identical. An important characteristic to compare the two results found was to look at the proportion of variance accounted for by the clusters which was about ten percent higher for the principal component scores (57.8% compared to 47%. Therefore it can be stated, that in case of using principal component scores as an input variables for cluster analysis with explained proportion high enough (about 92% for in our analysis, the loss of information is lower compared to data reduction on the basis of correlation analysis.

  17. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.

    Science.gov (United States)

    Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy

    2016-01-01

    Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.

  18. Entropic Approach to Multiscale Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Antonio Insolia

    2012-05-01

    Full Text Available Recently, a novel method has been introduced to estimate the statistical significance of clustering in the direction distribution of objects. The method involves a multiscale procedure, based on the Kullback–Leibler divergence and the Gumbel statistics of extreme values, providing high discrimination power, even in presence of strong background isotropic contamination. It is shown that the method is: (i semi-analytical, drastically reducing computation time; (ii very sensitive to small, medium and large scale clustering; (iii not biased against the null hypothesis. Applications to the physics of ultra-high energy cosmic rays, as a cosmological probe, are presented and discussed.

  19. Risk factors for cluster seizures in canine idiopathic epilepsy.

    Science.gov (United States)

    Packer, Rowena M A; Shihab, Nadia K; Torres, Bruno B J; Volk, Holger A

    2016-04-01

    Cluster seizures (CS), two or more seizures within a 24-hour period, are reported in 38-77% of dogs with idiopathic epilepsy (IE). Negative outcomes associated with CS include a reduced likelihood of achieving seizure freedom, decreased survival time and increased likelihood of euthanasia. Previous studies have found factors including breed, sex and neuter status are associated with CS in dogs with IE; however, only one UK study in a multi-breed study of CS in IE patients exists to the author's knowledge, and thus further data is required to confirm these results. Data from 384 dogs treated at a multi-breed canine specific epilepsy clinic were retrospectively collected from electronic patient records. 384 dogs were included in the study, of which nearly half had a history of CS (49.1%). Dogs with a history of CS had a younger age at onset than those without (p = 0.033). In a multivariate model, three variables predicted risk of CS: a history of status epilepticus (p = 0.047), age at seizure onset (p = 0.066) and breed (German Shepherd Dog) (p Dogs with a history of status epilepticus and dogs with an older age at seizure onset were less likely to be affected by cluster seizures. German Shepherd Dogs (71% experiencing CS) were significantly more likely to suffer from CS compared to Labrador Retrievers (25%) (p < 0.001). There was no association between sex, neuter status, body size and CS. Further studies into the pathophysiology and genetics of CS are required to further understand this phenomenon.

  20. Detection of Functional Change Using Cluster Trend Analysis in Glaucoma.

    Science.gov (United States)

    Gardiner, Stuart K; Mansberger, Steven L; Demirel, Shaban

    2017-05-01

    Global analyses using mean deviation (MD) assess visual field progression, but can miss localized changes. Pointwise analyses are more sensitive to localized progression, but more variable so require confirmation. This study assessed whether cluster trend analysis, averaging information across subsets of locations, could improve progression detection. A total of 133 test-retest eyes were tested 7 to 10 times. Rates of change and P values were calculated for possible re-orderings of these series to generate global analysis ("MD worsening faster than x dB/y with P cluster analyses ("n locations [or clusters] worsening faster than x dB/y with P cluster analysis criterion, and 4.1 years (95% CI, 4.0-4.5) for the best pointwise criterion. However, for pointwise analysis, only 38% of these changes were confirmed, compared with 61% for clusters and 76% for MD. The time until 25% of eyes showed subsequently confirmed deterioration was 6.3 years (95% CI, 6.0-7.2) for global, 6.3 years (95% CI, 6.0-7.0) for pointwise, and 6.0 years (95% CI, 5.3-6.6) for cluster analyses. Although the specificity is still suboptimal, cluster trend analysis detects subsequently confirmed deterioration sooner than either global or pointwise analyses.

  1. Stability-based validation of dietary patterns obtained by cluster analysis.

    Science.gov (United States)

    Sauvageot, Nicolas; Schritz, Anna; Leite, Sonia; Alkerwi, Ala'a; Stranges, Saverio; Zannad, Faiez; Streel, Sylvie; Hoge, Axelle; Donneau, Anne-Françoise; Albert, Adelin; Guillaume, Michèle

    2017-01-14

    Cluster analysis is a data-driven method used to create clusters of individuals sharing similar dietary habits. However, this method requires specific choices from the user which have an influence on the results. Therefore, there is a need of an objective methodology helping researchers in their decisions during cluster analysis. The objective of this study was to use such a methodology based on stability of clustering solutions to select the most appropriate clustering method and number of clusters for describing dietary patterns in the NESCAV study (Nutrition, Environment and Cardiovascular Health), a large population-based cross-sectional study in the Greater Region (N = 2298). Clustering solutions were obtained with K-means, K-medians and Ward's method and a number of clusters varying from 2 to 6. Their stability was assessed with three indices: adjusted Rand index, Cramer's V and misclassification rate. The most stable solution was obtained with K-means method and a number of clusters equal to 3. The "Convenient" cluster characterized by the consumption of convenient foods was the most prevalent with 46% of the population having this dietary behaviour. In addition, a "Prudent" and a "Non-Prudent" patterns associated respectively with healthy and non-healthy dietary habits were adopted by 25% and 29% of the population. The "Convenient" and "Non-Prudent" clusters were associated with higher cardiovascular risk whereas the "Prudent" pattern was associated with a decreased cardiovascular risk. Associations with others factors showed that the choice of a specific dietary pattern is part of a wider lifestyle profile. This study is of interest for both researchers and public health professionals. From a methodological standpoint, we showed that using stability of clustering solutions could help researchers in their choices. From a public health perspective, this study showed the need of targeted health promotion campaigns describing the benefits of healthy

  2. Design and Analysis Considerations for Cluster Randomized Controlled Trials That Have a Small Number of Clusters.

    Science.gov (United States)

    Deke, John

    2016-10-25

    Cluster randomized controlled trials (CRCTs) often require a large number of clusters in order to detect small effects with high probability. However, there are contexts where it may be possible to design a CRCT with a much smaller number of clusters (10 or fewer) and still detect meaningful effects. The objective is to offer recommendations for best practices in design and analysis for small CRCTs. I use simulations to examine alternative design and analysis approaches. Specifically, I examine (1) which analytic approaches control Type I errors at the desired rate, (2) which design and analytic approaches yield the most power, (3) what is the design effect of spurious correlations, and (4) examples of specific scenarios under which impacts of different sizes can be detected with high probability. I find that (1) mixed effects modeling and using Ordinary Least Squares (OLS) on data aggregated to the cluster level both control the Type I error rate, (2) randomization within blocks is always recommended, but how best to account for blocking through covariate adjustment depends on whether the precision gains offset the degrees of freedom loss, (3) power calculations can be accurate when design effects from small sample, spurious correlations are taken into account, and (4) it is very difficult to detect small effects with just four clusters, but with six or more clusters, there are realistic circumstances under which small effects can be detected with high probability. © The Author(s) 2016.

  3. Visual verification and analysis of cluster detection for molecular dynamics.

    Science.gov (United States)

    Grottel, Sebastian; Reina, Guido; Vrabec, Jadran; Ertl, Thomas

    2007-01-01

    A current research topic in molecular thermodynamics is the condensation of vapor to liquid and the investigation of this process at the molecular level. Condensation is found in many physical phenomena, e.g. the formation of atmospheric clouds or the processes inside steam turbines, where a detailed knowledge of the dynamics of condensation processes will help to optimize energy efficiency and avoid problems with droplets of macroscopic size. The key properties of these processes are the nucleation rate and the critical cluster size. For the calculation of these properties it is essential to make use of a meaningful definition of molecular clusters, which currently is a not completely resolved issue. In this paper a framework capable of interactively visualizing molecular datasets of such nucleation simulations is presented, with an emphasis on the detected molecular clusters. To check the quality of the results of the cluster detection, our framework introduces the concept of flow groups to highlight potential cluster evolution over time which is not detected by the employed algorithm. To confirm the findings of the visual analysis, we coupled the rendering view with a schematic view of the clusters' evolution. This allows to rapidly assess the quality of the molecular cluster detection algorithm and to identify locations in the simulation data in space as well as in time where the cluster detection fails. Thus, thermodynamics researchers can eliminate weaknesses in their cluster detection algorithms. Several examples for the effective and efficient usage of our tool are presented.

  4. A Flocking Based algorithm for Document Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.

  5. RELIABILITY ANALYSIS OF RING, AGENT AND CLUSTER BASED DISTRIBUTED SYSTEMS

    Directory of Open Access Journals (Sweden)

    R.SEETHALAKSHMI

    2011-08-01

    Full Text Available The introduction of pervasive devices and mobile devices has led to immense growth of real time distributed processing. In such context reliability of the computing environment is very important. Reliability is the probability that the devices, links, processes, programs and files work efficiently for the specified period of time and in the specified condition. Distributed systems are available as conventional ring networks, clusters and agent based systems. Reliability of such systems is focused. These networks are heterogeneous and scalable in nature. There are several factors, which are to be considered for reliability estimation. These include the application related factors like algorithms, data-set sizes, memory usage pattern, input-output, communication patterns, task granularity and load-balancing. It also includes the hardware related factors like processor architecture, memory hierarchy, input-output configuration and network. The software related factors concerning reliability are operating systems, compiler, communication protocols, libraries and preprocessor performance. In estimating the reliability of a system, the performance estimation is an important aspect. Reliability analysis is approached using probability.

  6. Differences in Pedaling Technique in Cycling: A Cluster Analysis.

    Science.gov (United States)

    Lanferdini, Fábio J; Bini, Rodrigo R; Figueiredo, Pedro; Diefenthaeler, Fernando; Mota, Carlos B; Arndt, Anton; Vaz, Marco A

    2016-10-01

    To employ cluster analysis to assess if cyclists would opt for different strategies in terms of neuromuscular patterns when pedaling at the power output of their second ventilatory threshold (POVT2) compared with cycling at their maximal power output (POMAX). Twenty athletes performed an incremental cycling test to determine their power output (POMAX and POVT2; first session), and pedal forces, muscle activation, muscle-tendon unit length, and vastus lateralis architecture (fascicle length, pennation angle, and muscle thickness) were recorded (second session) in POMAX and POVT2. Athletes were assigned to 2 clusters based on the behavior of outcome variables at POVT2 and POMAX using cluster analysis. Clusters 1 (n = 14) and 2 (n = 6) showed similar power output and oxygen uptake. Cluster 1 presented larger increases in pedal force and knee power than cluster 2, without differences for the index of effectiveness. Cluster 1 presented less variation in knee angle, muscle-tendon unit length, pennation angle, and tendon length than cluster 2. However, clusters 1 and 2 showed similar muscle thickness, fascicle length, and muscle activation. When cycling at POVT2 vs POMAX, cyclists could opt for keeping a constant knee power and pedal-force production, associated with an increase in tendon excursion and a constant fascicle length. Increases in power output lead to greater variations in knee angle, muscle-tendon unit length, tendon length, and pennation angle of vastus lateralis for a similar knee-extensor activation and smaller pedal-force changes in cyclists from cluster 2 than in cluster 1.

  7. Logistics Enterprise Evaluation Model Based On Fuzzy Clustering Analysis

    Science.gov (United States)

    Fu, Pei-hua; Yin, Hong-bo

    In this thesis, we introduced an evaluation model based on fuzzy cluster algorithm of logistics enterprises. First of all,we present the evaluation index system which contains basic information, management level, technical strength, transport capacity,informatization level, market competition and customer service. We decided the index weight according to the grades, and evaluated integrate ability of the logistics enterprises using fuzzy cluster analysis method. In this thesis, we introduced the system evaluation module and cluster analysis module in detail and described how we achieved these two modules. At last, we gave the result of the system.

  8. Assessment of cluster yield components by image analysis.

    Science.gov (United States)

    Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose

    2015-04-01

    Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.

  9. Fascioliasis risk factors and space-time clusters in domestic ruminants in Bangladesh.

    Science.gov (United States)

    Rahman, A K M Anisur; Islam, S K Shaheenur; Talukder, Md Hasanuzzaman; Hassan, Md Kumrul; Dhand, Navneet K; Ward, Michael P

    2017-05-08

    A retrospective observational study was conducted to identify fascioliasis hotspots, clusters, potential risk factors and to map fascioliasis risk in domestic ruminants in Bangladesh. Cases of fascioliasis in cattle, buffalo, sheep and goats from all districts in Bangladesh between 2011 and 2013 were identified via secondary surveillance data from the Department of Livestock Services' Epidemiology Unit. From each case report, date of report, species affected and district data were extracted. The total number of domestic ruminants in each district was used to calculate fascioliasis cases per ten thousand animals at risk per district, and this was used for cluster and hotspot analysis. Clustering was assessed with Moran's spatial autocorrelation statistic, hotspots with the local indicator of spatial association (LISA) statistic and space-time clusters with the scan statistic (Poisson model). The association between district fascioliasis prevalence and climate (temperature, precipitation), elevation, land cover and water bodies was investigated using a spatial regression model. A total of 1,723,971 cases of fascioliasis were reported in the three-year study period in cattle (1,164,560), goats (424,314), buffalo (88,924) and sheep (46,173). A total of nine hotspots were identified; one of these persisted in each of the three years. Only two local clusters were found. Five space-time clusters located within 22 districts were also identified. Annual risk maps of fascioliasis cases correlated with the hotspots and clusters detected. Cultivated and managed (P < 0.001) and artificial surface (P = 0.04) land cover areas, and elevation (P = 0.003) were positively and negatively associated with fascioliasis in Bangladesh, respectively. Results indicate that due to land use characteristics some areas of Bangladesh are at greater risk of fascioliasis. The potential risk factors, hot spots and clusters identified in this study can be used to guide science

  10. CLUSTER ANALYSIS OF NATURAL DISASTER LOSSES IN POLISH AGRICULTURE

    Directory of Open Access Journals (Sweden)

    Grzegorz STRUPCZEWSKI

    2015-04-01

    Full Text Available Agricultural production risk is of special nature due to a great number of hazards, relative weakness of production entities on the market and high ambiguity which is greater than in industrial production. Natural disasters occurring very frequently, at simultaneous low percentage of insured farmers, cause damage of such sizes that force the state to organise current financial aid (for instance in the form of preferential natural disaster loans. This aid is usually not sufficient. On the other hand, regional diversity of the risk level does not positively affect the development of insurance. From the perspective of insurance companies and policymakers it becomes highly important to investigate the spatial structure of losses in agriculture caused by natural disasters. The purpose of the research is to classify the 16 Polish voivodeships into clusters in order to show differences between them according to the criterion of level of damage in agricultural farms caused by natural disasters. On the basis of the cluster analysis it was demonstrated that 11 voivodeships form quite a homogeneous group in terms of size of damage in agriculture (the value of damage in cultivations and the acreage of destroyed cultivations are two most important factors determining affiliation to the cluster, however, the profile of loss occurring in other five voivodeships has a very individual course and requires separate handling in the actuarial sense. It was also proved that high value of losses in agriculture in the absolute sense in given voivodeships do not have to mean high vulnerability of agricultural farms from these voivodeships to natural risks.

  11. Clusters of Factors Identify A High Prevalence of Pregnancy Involvement Among US Adolescent Males.

    Science.gov (United States)

    Lau, May; Lin, Hua; Flores, Glenn

    2015-08-01

    The study purpose was to use recursive partitioning analysis (RPA) to identify factors that, when clustered, are associated with a high prevalence of pregnancy involvement among US adolescent males. The National Survey of Family Growth is a nationally representative survey of individuals 15-44 years old. RPA was done for the 2002 and 2006-2010 cycles to identify factors which, when combined, identify adolescent males with the highest prevalence of pregnancy involvement. Pregnancy-involvement prevalence among adolescent males was 6 %. Two clusters of adolescent males have the highest pregnancy-involvement prevalence, at 84-87 %. In RPA, the highest pregnancy-involvement prevalence (87 %) was seen in adolescent males who ever HIV tested, had >4 lifetime sexual partners, reported less than an almost certain chance of feeling less physical pleasure with condom use, had an educational attainment of 4 lifetime sexual partners, reported less than an almost certain chance of feeling less physical pleasure with condom use, had an educational attainment ≥11th grade, were >17 years old, and had their first contraceptive education ≥10th grade, had a pregnancy-involvement prevalence of 84 %. Pregnancy-prevention efforts among adolescent males who have been involved in a pregnancy may need to target risk factors identified in clusters with the highest pregnancy prevalence to prevent subsequent pregnancies in these adolescent males and improve their future outcomes.

  12. Geographical clustering of prostate cancer grade and stage at diagnosis, before and after adjustment for risk factors

    Directory of Open Access Journals (Sweden)

    Curriero Frank

    2005-01-01

    Full Text Available Abstract Background Spatial variation in patterns of disease outcomes is often explored with techniques such as cluster detection analysis. In other types of investigations, geographically varying individual or community level characteristics are often used as independent predictors in statistical models which also attempt to explain variation in disease outcomes. However, there is a lack of research which combines geographically referenced exploratory analysis with multilevel models. We used a spatial scan statistic approach, in combination with predicted block group-level disease patterns from multilevel models, to examine geographic variation in prostate cancer grade and stage at diagnosis. Results We examined data from 20928 Maryland men with incident prostate cancer reported to the Maryland Cancer Registry during 1992–1997. Initial cluster detection analyses, prior to adjustment, indicated that there were four statistically significant clusters of high and low rates of each outcome (later stage at diagnosis and higher histologic grade of tumor for prostate cancer cases in Maryland during 1992–1997. After adjustment for individual case attributes, including age, race, year of diagnosis, patterns of clusters changed for both outcomes. Additional adjustment for Census block group and county-level socioeconomic measures changed the cluster patterns further. Conclusions These findings provide evidence that, in locations where adjustment changed patterns of clusters, the adjustment factors may be contributing causes of the original clusters. In addition, clusters identified after adjusting for individual and area-level predictors indicate area of unexplained variation, and merit further small-area investigations.

  13. Patterns of Brucellosis Infection Symptoms in Azerbaijan: A Latent Class Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Rita Ismayilova

    2014-01-01

    Full Text Available Brucellosis infection is a multisystem disease, with a broad spectrum of symptoms. We investigated the existence of clusters of infected patients according to their clinical presentation. Using national surveillance data from the Electronic-Integrated Disease Surveillance System, we applied a latent class cluster (LCC analysis on symptoms to determine clusters of brucellosis cases. A total of 454 cases reported between July 2011 and July 2013 were analyzed. LCC identified a two-cluster model and the Vuong-Lo-Mendell-Rubin likelihood ratio supported the cluster model. Brucellosis cases in the second cluster (19% reported higher percentages of poly-lymphadenopathy, hepatomegaly, arthritis, myositis, and neuritis and changes in liver function tests compared to cases of the first cluster. Patients in the second cluster had a severe brucellosis disease course and were associated with longer delay in seeking medical attention. Moreover, most of them were from Beylagan, a region focused on sheep and goat livestock production in south-central Azerbaijan. Patients in cluster 2 accounted for one-quarter of brucellosis cases and had a more severe clinical presentation. Delay in seeking medical care may explain severe illness. Future work needs to determine the factors that influence brucellosis case seeking and identify brucellosis species, particularly among cases from Beylagan.

  14. Detection of early glaucomatous progression with octopus cluster trend analysis.

    Science.gov (United States)

    Naghizadeh, Farzaneh; Holló, Gábor

    2014-01-01

    To compare the ability of Corrected Cluster Trend Analysis (CCTA) and Cluster Trend Analysis (CTA) with event analysis of Octopus visual field series to detect early glaucomatous progression. One eye of 15 healthy, 19 ocular hypertensive, 20 preperimetric, and 51 perimetric glaucoma (PG) patients were investigated with Octopus normal G2 test at 6-month intervals for 1.5 to 3 years. Progression was defined with significant worsening in any of the 10 Octopus clusters with CCTA, and event analysis criteria, respectively. With event analysis, 9 PG eyes showed localized progression and 1 diffuse mean defect (MD) worsening. With CCTA, progression was indicated in 1 normal, 1 ocular hypertensive, and 1 preperimetric glaucoma eyes due to vitreous floaters, and 28 PG eyes including all 9 eyes with localized progression with event analysis. The locations of CCTA progression matched those found with event analysis in all 9 cases. In 17 of the remaining 19 eyes, progressing clusters matched the locations that were suspicious but not definitive for progression with event analysis. In the eye with diffuse MD worsening, CTA found significant progression for 7 clusters. For global MD progression rate, eyes worsened with CCTA only did not differ from the stable eyes but had significantly smaller progression rates than the eyes progressed with event analysis (P=0.0002). In PG, Octopus CCTA and CTA are clinically useful to identify early progression and areas suspicious for early progression. However, in some eyes with no glaucomatous visual field damage, vitreous floaters may cause progression artifacts.

  15. Cluster Analysis of Gene Expression Data

    CERN Document Server

    Domany, E

    2002-01-01

    The expression levels of many thousands of genes can be measured simultaneously by DNA microarrays (chips). This novel experimental tool has revolutionized research in molecular biology and generated considerable excitement. A typical experiment uses a few tens of such chips, each dedicated to a single sample - such as tissue extracted from a particular tumor. The results of such an experiment contain several hundred thousand numbers, that come in the form of a table, of several thousand rows (one for each gene) and 50 - 100 columns (one for each sample). We developed a clustering methodology to mine such data. In this review I provide a very basic introduction to the subject, aimed at a physics audience with no prior knowledge of either gene expression or clustering methods. I explain what genes are, what is gene expression and how it is measured by DNA chips. Next I explain what is meant by "clustering" and how we analyze the massive amounts of data from such experiments, and present results obtained from a...

  16. Comparative analysis of genomic signal processing for microarray data clustering.

    Science.gov (United States)

    Istepanian, Robert S H; Sungoor, Ala; Nebel, Jean-Christophe

    2011-12-01

    Genomic signal processing is a new area of research that combines advanced digital signal processing methodologies for enhanced genetic data analysis. It has many promising applications in bioinformatics and next generation of healthcare systems, in particular, in the field of microarray data clustering. In this paper we present a comparative performance analysis of enhanced digital spectral analysis methods for robust clustering of gene expression across multiple microarray data samples. Three digital signal processing methods: linear predictive coding, wavelet decomposition, and fractal dimension are studied to provide a comparative evaluation of the clustering performance of these methods on several microarray datasets. The results of this study show that the fractal approach provides the best clustering accuracy compared to other digital signal processing and well known statistical methods.

  17. Using cluster analysis to organize and explore regional GPS velocities

    Science.gov (United States)

    Simpson, Robert W.; Thatcher, Wayne; Savage, James C.

    2012-01-01

    Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.

  18. Clusters, Halos, And S-Factors In Fermionic Molecular Dynamics *

    Directory of Open Access Journals (Sweden)

    Feldmeier Hans

    2013-12-01

    Full Text Available In Fermionic Molecular Dynamics antisymmetrized products of Gaussian wave packets are projected on angular momentum, linear momentum, and parity. An appropriately chosen set of these states span the many-body Hilbert space in which the Hamiltonian is diagonalized. The wave packet parameters – position, momentum, width and spin – are obtained by variation under constraints. The great flexibility of this basis allows to describe not only shell-model like states but also exotic states like halos, e.g. the two-proton halo in 17Ne, or cluster states as they appear for example in 12C close to the α breakup threshold where the Hoyle state is located. Even a fully microscopic calculation of the 3He(α,γ7Be capture reaction is possible and yields an astrophysical S-factor that compares very well with newer data. As representatives of numerous results these cases will be discussed in this contribution, some of them not published so far. The Hamiltonian is based on the realistic Argonne V18 nucleon-nucleon interaction.

  19. Exploring cognitive heterogeneity in first-episode psychosis: What cluster analysis can reveal.

    Science.gov (United States)

    Reser, Maree P; Allott, Kelly A; Killackey, Eóin; Farhall, John; Cotton, Susan M

    2015-10-30

    Variable outcomes in first-episode psychosis (FEP) are partly attributable to heterogeneity in cognitive functioning. To aid identification of those likely to have poorer or better outcomes, we examined whether purported cognitive profiles identified through use of cluster analysis in chronic schizophrenia were evident in FEP. We also aimed to assess whether there was a relationship between cognitive profile and factors independent of the solution, providing external validation that the cognitive profiles represented distinct subgroups. Ward's method hierarchical cluster analysis, verified by a k-means cluster solution, was performed using data obtained from a cognitive test battery administered to 128 participants aged 15-25 years. Four cognitive profiles were identified. A continuity element was evident; participants in cluster four were more cognitively impaired compared to participants in cluster three, who appeared more cognitively intact. Clusters one and two were distinguishable across measures of attention and working memory and visual recognition memory, most likely reflecting sample specific patterns of deficit. Participants in cluster four had significantly lower premorbid and current IQ and higher negative symptoms compared to participants in cluster three. The distinct levels and patterns of cognition found in chronic schizophrenia cohorts are also evident across diagnostic categories in FEP. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  20. Cluster headache and the hypocretin receptor 2 reconsidered: a genetic association study and meta-analysis.

    Science.gov (United States)

    Weller, Claudia M; Wilbrink, Leopoldine A; Houwing-Duistermaat, Jeanine J; Koelewijn, Stephany C; Vijfhuizen, Lisanne S; Haan, Joost; Ferrari, Michel D; Terwindt, Gisela M; van den Maagdenberg, Arn M J M; de Vries, Boukje

    2015-08-01

    Cluster headache is a severe neurological disorder with a complex genetic background. A missense single nucleotide polymorphism (rs2653349; p.Ile308Val) in the HCRTR2 gene that encodes the hypocretin receptor 2 is the only genetic factor that is reported to be associated with cluster headache in different studies. However, as there are conflicting results between studies, we re-evaluated its role in cluster headache. We performed a genetic association analysis for rs2653349 in our large Leiden University Cluster headache Analysis (LUCA) program study population. Systematic selection of the literature yielded three additional studies comprising five study populations, which were included in our meta-analysis. Data were extracted according to predefined criteria. A total of 575 cluster headache patients from our LUCA study and 874 controls were genotyped for HCRTR2 SNP rs2653349 but no significant association with cluster headache was found (odds ratio 0.91 (95% confidence intervals 0.75-1.10), p = 0.319). In contrast, the meta-analysis that included in total 1167 cluster headache cases and 1618 controls from the six study populations, which were part of four different studies, showed association of the single nucleotide polymorphism with cluster headache (random effect odds ratio 0.69 (95% confidence intervals 0.53-0.90), p = 0.006). The association became weaker, as the odds ratio increased to 0.80, when the meta-analysis was repeated without the initial single South European study with the largest effect size. Although we did not find evidence for association of rs2653349 in our LUCA study, which is the largest investigated study population thus far, our meta-analysis provides genetic evidence for a role of HCRTR2 in cluster headache. Regardless, we feel that the association should be interpreted with caution as meta-analyses with individual populations that have limited power have diminished validity. © International Headache Society 2014.

  1. Image Retrieval Based on Multiview Constrained Nonnegative Matrix Factorization and Gaussian Mixture Model Spectral Clustering Method

    Directory of Open Access Journals (Sweden)

    Qunyi Xie

    2016-01-01

    Full Text Available Content-based image retrieval has recently become an important research topic and has been widely used for managing images from repertories. In this article, we address an efficient technique, called MNGS, which integrates multiview constrained nonnegative matrix factorization (NMF and Gaussian mixture model- (GMM- based spectral clustering for image retrieval. In the proposed methodology, the multiview NMF scheme provides competitive sparse representations of underlying images through decomposition of a similarity-preserving matrix that is formed by fusing multiple features from different visual aspects. In particular, the proposed method merges manifold constraints into the standard NMF objective function to impose an orthogonality constraint on the basis matrix and satisfy the structure preservation requirement of the coefficient matrix. To manipulate the clustering method on sparse representations, this paper has developed a GMM-based spectral clustering method in which the Gaussian components are regrouped in spectral space, which significantly improves the retrieval effectiveness. In this way, image retrieval of the whole database translates to a nearest-neighbour search in the cluster containing the query image. Simultaneously, this study investigates the proof of convergence of the objective function and the analysis of the computational complexity. Experimental results on three standard image datasets reveal the advantages that can be achieved with the proposed retrieval scheme.

  2. A Novel Divisive Hierarchical Clustering Algorithm for Geospatial Analysis

    Directory of Open Access Journals (Sweden)

    Shaoning Li

    2017-01-01

    Full Text Available In the fields of geographic information systems (GIS and remote sensing (RS, the clustering algorithm has been widely used for image segmentation, pattern recognition, and cartographic generalization. Although clustering analysis plays a key role in geospatial modelling, traditional clustering methods are limited due to computational complexity, noise resistant ability and robustness. Furthermore, traditional methods are more focused on the adjacent spatial context, which makes it hard for the clustering methods to be applied to multi-density discrete objects. In this paper, a new method, cell-dividing hierarchical clustering (CDHC, is proposed based on convex hull retraction. The main steps are as follows. First, a convex hull structure is constructed to describe the global spatial context of geospatial objects. Then, the retracting structure of each borderline is established in sequence by setting the initial parameter. The objects are split into two clusters (i.e., “sub-clusters” if the retracting structure intersects with the borderlines. Finally, clusters are repeatedly split and the initial parameter is updated until the terminate condition is satisfied. The experimental results show that CDHC separates the multi-density objects from noise sufficiently and also reduces complexity compared to the traditional agglomerative hierarchical clustering algorithm.

  3. A Distributed Flocking Approach for Information Stream Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Intelligence analysts are currently overwhelmed with the amount of information streams generated everyday. There is a lack of comprehensive tool that can real-time analyze the information streams. Document clustering analysis plays an important role in improving the accuracy of information retrieval. However, most clustering technologies can only be applied for analyzing the static document collection because they normally require a large amount of computation resource and long time to get accurate result. It is very difficult to cluster a dynamic changed text information streams on an individual computer. Our early research has resulted in a dynamic reactive flock clustering algorithm which can continually refine the clustering result and quickly react to the change of document contents. This character makes the algorithm suitable for cluster analyzing dynamic changed document information, such as text information stream. Because of the decentralized character of this algorithm, a distributed approach is a very natural way to increase the clustering speed of the algorithm. In this paper, we present a distributed multi-agent flocking approach for the text information stream clustering and discuss the decentralized architectures and communication schemes for load balance and status information synchronization in this approach.

  4. Comparison of population-averaged and cluster-specific models for the analysis of cluster randomized trials with missing binary outcomes: a simulation study.

    Science.gov (United States)

    Ma, Jinhui; Raina, Parminder; Beyene, Joseph; Thabane, Lehana

    2013-01-23

    The objective of this simulation study is to compare the accuracy and efficiency of population-averaged (i.e. generalized estimating equations (GEE)) and cluster-specific (i.e. random-effects logistic regression (RELR)) models for analyzing data from cluster randomized trials (CRTs) with missing binary responses. In this simulation study, clustered responses were generated from a beta-binomial distribution. The number of clusters per trial arm, the number of subjects per cluster, intra-cluster correlation coefficient, and the percentage of missing data were allowed to vary. Under the assumption of covariate dependent missingness, missing outcomes were handled by complete case analysis, standard multiple imputation (MI) and within-cluster MI strategies. Data were analyzed using GEE and RELR. Performance of the methods was assessed using standardized bias, empirical standard error, root mean squared error (RMSE), and coverage probability. GEE performs well on all four measures--provided the downward bias of the standard error (when the number of clusters per arm is small) is adjusted appropriately--under the following scenarios: complete case analysis for CRTs with a small amount of missing data; standard MI for CRTs with variance inflation factor (VIF) cluster MI for CRTs with VIF≥3 and cluster size>50. RELR performs well only when a small amount of data was missing, and complete case analysis was applied. GEE performs well as long as appropriate missing data strategies are adopted based on the design of CRTs and the percentage of missing data. In contrast, RELR does not perform well when either standard or within-cluster MI strategy is applied prior to the analysis.

  5. Cluster analysis of WIBS single particle bioaerosol data

    Directory of Open Access Journals (Sweden)

    N. H. Robinson

    2012-09-01

    Full Text Available Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial datasets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Waveband Integrated Bioaerosol Sensor (WIBS. The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL before being applied to two separate contemporaneous ambient WIBS datasets recorded in a forest site in Colorado, USA as part of the BEACHON-RoMBAS project. Cluster analysis results between both datasets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity to represent: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long term online PBAP measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics is improved.

  6. Cluster analysis of WIBS single particle bioaerosol data

    Science.gov (United States)

    Robinson, N. H.; Allan, J. D.; Huffman, J. A.; Kaye, P. H.; Foot, V. E.; Gallagher, M.

    2012-09-01

    Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial datasets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Waveband Integrated Bioaerosol Sensor (WIBS). The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL) before being applied to two separate contemporaneous ambient WIBS datasets recorded in a forest site in Colorado, USA as part of the BEACHON-RoMBAS project. Cluster analysis results between both datasets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity) to represent: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long term online PBAP measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics is improved.

  7. Cluster analysis of clinical data identifies fibromyalgia subgroups.

    Directory of Open Access Journals (Sweden)

    Elisa Docampo

    Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.

  8. Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

    Energy Technology Data Exchange (ETDEWEB)

    Data Analysis and Visualization (IDAV) and the Department of Computer Science, University of California, Davis, One Shields Avenue, Davis CA 95616, USA,; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,' ' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA; Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA; Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA,; Computer Science Division,University of California, Berkeley, CA, USA,; Computer Science Department, University of California, Irvine, CA, USA,; All authors are with the Berkeley Drosophila Transcription Network Project, Lawrence Berkeley National Laboratory,; Rubel, Oliver; Weber, Gunther H.; Huang, Min-Yu; Bethel, E. Wes; Biggin, Mark D.; Fowlkes, Charless C.; Hendriks, Cris L. Luengo; Keranen, Soile V. E.; Eisen, Michael B.; Knowles, David W.; Malik, Jitendra; Hagen, Hans; Hamann, Bernd

    2008-05-12

    The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii) evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.

  9. Clustering of obesity and dental caries with lifestyle factors among Danish adolescents

    DEFF Research Database (Denmark)

    Cinar, Ayse Basak; Christensen, Lisa Boge; Hede, Borge

    2011-01-01

    To assess any clustering between obesity, dental health, and lifestyle factors (dietary patterns, physical activity, smoking, and alcohol consumption) among adolescents.......To assess any clustering between obesity, dental health, and lifestyle factors (dietary patterns, physical activity, smoking, and alcohol consumption) among adolescents....

  10. Variable cluster analysis method for building neural network model

    Institute of Scientific and Technical Information of China (English)

    王海东; 刘元东

    2004-01-01

    To address the problems that input variables should be reduced as much as possible and explain output variables fully in building neural network model of complicated system, a variable selection method based on cluster analysis was investigated. Similarity coefficient which describes the mutual relation of variables was defined. The methods of the highest contribution rate, part replacing whole and variable replacement are put forwarded and deduced by information theory. The software of the neural network based on cluster analysis, which can provide many kinds of methods for defining variable similarity coefficient, clustering system variable and evaluating variable cluster, was developed and applied to build neural network forecast model of cement clinker quality. The results show that all the network scale, training time and prediction accuracy are perfect. The practical application demonstrates that the method of selecting variables for neural network is feasible and effective.

  11. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Årup; Frutiger, Sally A.

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing. Hum. Brain Mapping 15...

  12. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing...

  13. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing...

  14. Cluster analysis of Southeastern U.S. climate stations

    Science.gov (United States)

    Stooksbury, D. E.; Michaels, P. J.

    1991-09-01

    A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.

  15. Initial magnetization analysis of iron cluster assemblies

    Energy Technology Data Exchange (ETDEWEB)

    Michele, Oliver; Hesse, Juergen; Bremers, Heiko [Technische Universitaet Braunschweig, Institut fuer Metallphysik und Nukleare Festkoerperphysik, Mendelssohnstrasse 3, 38106 Braunschweig (Germany); Peng, Dong-Lian; Sumiyama, Kenji; Hihara, Takehiko; Yamamuro, Saeki [Department of Materials Science and Engineering, Nagoya Institute of Technology, Nagoya 466-8555 (Japan)

    2004-12-01

    Nearly monodispersed oxide-coated Fe cluster assemblies were prepared using a plasma-gas-condensation style cluster beam deposition apparatus (D. L. Peng et al. J. Appl. Phys. 92 3075 (2002)). The characterization of such assemblies is presented using SQUID magnetometry. The aim of this contribution is the interpretation of the initial magnetization curves instead of the usual presentation of hysteresis loops and coercivities. The description of the initial magnetization is based on a proposed vector model valid for Stoner-Wohlfarth particles. The model includes the particles' anisotropy and possible interactions regarding these influences as equivalent magnetic fields. The model is an extension of the one described by Michele et al. (J. Phys.: Condens. Matter 16 427 (2004)) regarding the fact that in a completely demagnetized state, in the sample consisting of a very large number of particles always equal anisotropy fields of opposite signs are present. We measured the initial magnetization curves for different temperatures and present the temperature dependence of the model's parameters. (Abstract Copyright [2004], Wiley Periodicals, Inc.)

  16. Spatial Data Mining using Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Ch.N.Santhosh Kumar

    2012-09-01

    Full Text Available Data mining, which is refers to as Knowledge Discovery in Databases(KDD, means a process of nontrivialexaction of implicit, previously useful and unknown information such as knowledge rules, descriptions,regularities, and major trends from large databases. Data mining is evolved in a multidisciplinary field ,including database technology, machine learning, artificial intelligence, neural network, informationretrieval, and so on. In principle data mining should be applicable to the different kind of data and databasesused in many different applications, including relational databases, transactional databases, datawarehouses, object- oriented databases, and special application- oriented databases such as spatialdatabases, temporal databases, multimedia databases, and time- series databases. Spatial data mining, alsocalled spatial mining, is data mining as applied to the spatial data or spatial databases. Spatial data are thedata that have spatial or location component, and they show the information, which is more complex thanclassical data. A spatial database stores spatial data represents by spatial data types and spatialrelationships and among data. Spatial data mining encompasses various tasks. These include spatialclassification, spatial association rule mining, spatial clustering, characteristic rules, discriminant rules,trend detection. This paper presents how spatial data mining is achieved using clustering.

  17. Multivariate analysis of the globular clusters in M87

    CERN Document Server

    Das, Sukanta; Davoust, Emmanuel

    2015-01-01

    An objective classification of 147 globular clusters in the inner region of the giant elliptical galaxy M87 is carried out with the help of two methods of multivariate analysis. First independent component analysis is used to determine a set of independent variables that are linear combinations of various observed parameters (mostly Lick indices) of the globular clusters. Next K-means cluster analysis is applied on the independent components, to find the optimum number of homogeneous groups having an underlying structure. The properties of the four groups of globular clusters thus uncovered are used to explain the formation mechanism of the host galaxy. It is suggested that M87 formed in two successive phases. First a monolithic collapse, which gave rise to an inner group of metal-rich clusters with little systematic rotation and an outer group of metal-poor clusters in eccentric orbits. In a second phase, the galaxy accreted low-mass satellites in a dissipationless fashion, from the gas of which the two othe...

  18. Gas density fluctuations in the Perseus Cluster: clumping factor and velocity power spectrum

    Energy Technology Data Exchange (ETDEWEB)

    Zhuravleva, I.; Churazov, E.; Arevalo, P.; Schekochihin, A. A.; Allen, S. W.; Fabian, A. C.; Forman, W. R.; Sanders, J. S.; Simionescu, A.; Sunyaev, R.; Vikhlinin, A.; Werner, N.

    2015-05-20

    X-ray surface brightness fluctuations in the core of the Perseus Cluster are analysed, using deep observations with the Chandra observatory. The amplitude of gas density fluctuations on different scales is measured in a set of radial annuli. It varies from 7 to 12 per cent on scales of ~10–30 kpc within radii of 30–220 kpc from the cluster centre. Using a statistical linear relation between the observed amplitude of density fluctuations and predicted velocity, the characteristic velocity of gas motions on each scale is calculated. The typical amplitudes of the velocity outside the central 30 kpc region are 90–140 km s-1 on ~20–30 kpc scales and 70–100 km s-1 on smaller scales ~7–10 kpc. The velocity power spectrum (PS) is consistent with cascade of turbulence and its slope is in a broad agreement with the slope for canonical Kolmogorov turbulence. The gas clumping factor estimated from the PS of the density fluctuations is lower than 7–8 per cent for radii ~30–220 kpc from the centre, leading to a density bias of less than 3–4 per cent in the cluster core. Uncertainties of the analysis are examined and discussed. Future measurements of the gas velocities with the Astro-H, Athena and Smart-X observatories will directly measure the gas density–velocity perturbation relation and further reduce systematic uncertainties in this analysis.

  19. Identifying clinical course patterns in SMS data using cluster analysis

    DEFF Research Database (Denmark)

    Kent, Peter; Kongsted, Alice

    2012-01-01

    ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically importa...... of cluster analysis. More research is needed, especially head-to-head studies, to identify which technique is best to use under what circumstances.......ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically important...... by spline analysis. However, cluster analysis of SMS data in its original untransformed form may be simpler and offer other advantages. Therefore, the aim of this study was to determine whether cluster analysis could be used for identifying clinical course patterns distinct from the pattern of the whole...

  20. Latent class factor and cluster models, bi-plots and tri-plots and related graphical displays

    NARCIS (Netherlands)

    Magidson, J.; Vermunt, J.K.

    2001-01-01

    We propose an alternative method of conducting exploratory latent class analysis that utilizes latent class factor models, and compare it to the more traditional approach based on latent class cluster models. We show that when formulated in terms of R mutually independent, dichotomous latent

  1. The diversity of young adult wheeze: a cluster analysis in a longitudinal birth cohort.

    Science.gov (United States)

    Kurukulaaratchy, R J; Zhang, H; Raza, A; Patil, V; Karmaus, W; Ewart, S; Arshad, S H

    2014-01-01

    Cluster analyses have enhanced understanding of the heterogeneity of both paediatric and adult wheezing. However, while adolescence represents an important transitional phase, the nature of young adult wheeze has yet to be clearly characterised. To use cluster analysis to define, for the first time, clinically relevant young adult wheeze clusters in a longitudinal birth cohort. K-means cluster analysis was undertaken among 309 currently wheezing subjects at 18 years in the Isle of Wight birth cohort (N = 1456). Thirteen disease-characterising clustering variables at 18 years were used. Resulting clusters were then further characterised by severity indices plus potential risk factors for wheeze development throughout the 1st 18 years of life. Six wheeze clusters were identified. Cluster 1 (12.3%) male-early-childhood-onset-atopic-wheeze-with-normal-lung-function had male predominance, normal spirometry, low bronchodilator reversibility (BDR), intermediate bronchial hyper-responsiveness (BHR), high atopy prevalence and more admissions. Cluster 2 (24.2%) early-childhood-onset-wheeze-with-intermediate-lung-function had no specific sex association, intermediate spirometry, BDR, BHR, more significant BTS step therapy and admissions. Cluster 3 (9.7%) female-early-childhood-onset-atopic-wheeze-with-impaired-lung-function showed female predominance, high allergic disease comorbidity, more severe BDR and BHR, greatest airflow obstruction, high smoking prevalence, higher symptom severity and admissions. Cluster 4 (19.4%) female-undiagnosed-wheezers had adolescent-onset non-atopic wheeze, low BDR and BHR, impaired but non-obstructed spirometry, high symptom frequency and highest smoking prevalence. Cluster 5 (24.6%) female-late-childhood-onset-wheeze-with-normal-lung-function showed no specific atopy association, normal spirometry, low BDR, BHR and symptom severity. Cluster 6 (9.7%) male-late-childhood-onset-atopic-wheeze-with-impaired-lung-function had high atopy and

  2. Sputum neutrophil counts are associated with more severe asthma phenotypes using cluster analysis.

    Science.gov (United States)

    Moore, Wendy C; Hastie, Annette T; Li, Xingnan; Li, Huashi; Busse, William W; Jarjour, Nizar N; Wenzel, Sally E; Peters, Stephen P; Meyers, Deborah A; Bleecker, Eugene R

    2014-06-01

    Clinical cluster analysis from the Severe Asthma Research Program (SARP) identified 5 asthma subphenotypes that represent the severity spectrum of early-onset allergic asthma, late-onset severe asthma, and severe asthma with chronic obstructive pulmonary disease characteristics. Analysis of induced sputum from a subset of SARP subjects showed 4 sputum inflammatory cellular patterns. Subjects with concurrent increases in eosinophil (≥2%) and neutrophil (≥40%) percentages had characteristics of very severe asthma. To better understand interactions between inflammation and clinical subphenotypes, we integrated inflammatory cellular measures and clinical variables in a new cluster analysis. Participants in SARP who underwent sputum induction at 3 clinical sites were included in this analysis (n = 423). Fifteen variables, including clinical characteristics and blood and sputum inflammatory cell assessments, were selected using factor analysis for unsupervised cluster analysis. Four phenotypic clusters were identified. Cluster A (n = 132) and B (n = 127) subjects had mild-to-moderate early-onset allergic asthma with paucigranulocytic or eosinophilic sputum inflammatory cell patterns. In contrast, these inflammatory patterns were present in only 7% of cluster C (n = 117) and D (n = 47) subjects who had moderate-to-severe asthma with frequent health care use despite treatment with high doses of inhaled or oral corticosteroids and, in cluster D, reduced lung function. The majority of these subjects (>83%) had sputum neutrophilia either alone or with concurrent sputum eosinophilia. Baseline lung function and sputum neutrophil percentages were the most important variables determining cluster assignment. This multivariate approach identified 4 asthma subphenotypes representing the severity spectrum from mild-to-moderate allergic asthma with minimal or eosinophil-predominant sputum inflammation to moderate-to-severe asthma with neutrophil-predominant or mixed granulocytic

  3. Cluster analysis of undergraduate drinkers based on alcohol expectancy scores.

    Science.gov (United States)

    Leeman, Robert F; Kulesza, Magdalena; Stewart, Diana W; Copeland, Amy L

    2012-03-01

    Expectancies of alcohol's effects have been associated with problem drinking in undergraduates. If subgroups can be classified based on expectancies, this may facilitate identifying those at highest risk for problem drinking. Undergraduates (N = 612) from two state universities completed a web-based survey. Responses to the Comprehensive Effects of Alcohol scale were analyzed using k-means cluster analysis separately within each university sample. Hartigan's heuristic was used to determine that five was the optimal number of clusters in each sample. Clusters were distinguishable based on their overall magnitude of expectancy endorsement and by a tendency to endorse stronger positive than negative expectancies. Subsequent analyses were conducted to compare clusters on alcohol involvement and trait disinhibition. A cluster characterized by endorsement of positive and negative expectancies ("strong expectancy") was associated with a particularly problematic risk profile, specifically concerning difficulties with self-control (i.e., trait disinhibition and impaired control over alcohol use). A cluster with higher positive and lower negative expectancies reported frequent heavy drinking but appeared to be at lower risk than the strong expectancy cluster in a number of respects. Negative expectancy endorsement appeared to represent added risk above and beyond positive expectancies. Results suggest that both the magnitude and combination of expectancies endorsed by subgroups of undergraduate drinkers may relate to their risk level in terms of alcohol involvement and personality traits. These findings may have implications for interventions with young adult drinkers.

  4. Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters III: Analysis of 30 Clusters

    CERN Document Server

    Wagner-Kaiser, R; Sarajedini, A; von Hippel, T; van Dyk, D A; Robinson, E; Stein, N; Jefferys, W H

    2016-01-01

    We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic Globular Clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within each cluster and the relative proportion of those populations. We find the helium differences among the two populations in the clusters fall in the range of ~0.04 to 0.11. Because adequate models varying in CNO are not presently available, we view these spreads as upper limits and present them with statistical rather than observational uncertainties. Evidence supports previous studies suggesting an increase in helium content concurrent with increasing mass of the cluster and also find that the proportion of the first population of stars increases with mass as well. Our results are examined in the context of proposed g...

  5. Principal Component Clustering Approach to Teaching Quality Discriminant Analysis

    Science.gov (United States)

    Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan

    2016-01-01

    Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…

  6. Cluster analysis of radionuclide concentrations in beach sand

    NARCIS (Netherlands)

    de Meijer, R.J.; James, I.; Jennings, P.J.; Keoyers, J.E.

    This paper presents a method in which natural radionuclide concentrations of beach sand minerals are traced along a stretch of coast by cluster analysis. This analysis yields two groups of mineral deposit with different origins. The method deviates from standard methods of following dispersal of

  7. Technology Clusters Exploration for Patent Portfolio through Patent Abstract Analysis

    Directory of Open Access Journals (Sweden)

    Gabjo Kim

    2016-12-01

    Full Text Available This study explores technology clusters through patent analysis. The aim of exploring technology clusters is to grasp competitors’ levels of sustainable research and development (R&D and establish a sustainable strategy for entering an industry. To achieve this, we first grouped the patent documents with similar technologies by applying affinity propagation (AP clustering, which is effective while grouping large amounts of data. Next, in order to define the technology clusters, we adopted the term frequency-inverse document frequency (TF-IDF weight, which lists the terms in order of importance. We collected the patent data of Korean electric car companies from the United States Patent and Trademark Office (USPTO to verify our proposed methodology. As a result, our proposed methodology presents more detailed information on the Korean electric car industry than previous studies.

  8. An Empirical Analysis of Rough Set Categorical Clustering Techniques

    Science.gov (United States)

    2017-01-01

    Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy. PMID:28068344

  9. An Empirical Analysis of Rough Set Categorical Clustering Techniques.

    Science.gov (United States)

    Uddin, Jamal; Ghazali, Rozaida; Deris, Mustafa Mat

    2017-01-01

    Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy.

  10. Visualization methods for statistical analysis of microarray clusters

    Directory of Open Access Journals (Sweden)

    Li Kai

    2005-05-01

    Full Text Available Abstract Background The most common method of identifying groups of functionally related genes in microarray data is to apply a clustering algorithm. However, it is impossible to determine which clustering algorithm is most appropriate to apply, and it is difficult to verify the results of any algorithm due to the lack of a gold-standard. Appropriate data visualization tools can aid this analysis process, but existing visualization methods do not specifically address this issue. Results We present several visualization techniques that incorporate meaningful statistics that are noise-robust for the purpose of analyzing the results of clustering algorithms on microarray data. This includes a rank-based visualization method that is more robust to noise, a difference display method to aid assessments of cluster quality and detection of outliers, and a projection of high dimensional data into a three dimensional space in order to examine relationships between clusters. Our methods are interactive and are dynamically linked together for comprehensive analysis. Further, our approach applies to both protein and gene expression microarrays, and our architecture is scalable for use on both desktop/laptop screens and large-scale display devices. This methodology is implemented in GeneVAnD (Genomic Visual ANalysis of Datasets and is available at http://function.princeton.edu/GeneVAnD. Conclusion Incorporating relevant statistical information into data visualizations is key for analysis of large biological datasets, particularly because of high levels of noise and the lack of a gold-standard for comparisons. We developed several new visualization techniques and demonstrated their effectiveness for evaluating cluster quality and relationships between clusters.

  11. Statistical analysis of bound companions in the Coma cluster

    Science.gov (United States)

    Mendelin, Martin; Binggeli, Bruno

    2017-08-01

    Aims: The rich and nearby Coma cluster of galaxies is known to have substructure. We aim to create a more detailed picture of this substructure by searching directly for bound companions around individual giant members. Methods: We have used two catalogs of Coma galaxies, one covering the cluster core for a detailed morphological analysis, another covering the outskirts. The separation limit between possible companions (secondaries) and giants (primaries) is chosen as MB = -19 and MR = -20, respectively for the two catalogs. We have created pseudo-clusters by shuffling positions or velocities of the primaries and search for significant over-densities of possible companions around giants by comparison with the data. This method was developed and applied first to the Virgo cluster. In a second approach we introduced a modified nearest neighbor analysis using several interaction parameters for all galaxies. Results: We find evidence for some excesses due to possible companions for both catalogs. Satellites are typically found among the faintest dwarfs (MB type giants (spirals) in the outskirts, which is expected in an infall scenario of cluster evolution. A rough estimate for an upper limit of bound galaxies within Coma is 2-4%, to be compared with 7% for Virgo. Conclusions: The results agree well with the expected low frequency of bound companions in a regular cluster such as Coma. To exploit the data more fully and reach more detailed insights into the physics of cluster evolution we suggest applying the method also to model clusters created by N-body simulations for comparison.

  12. Application of Cluster Analysis in Assessment of Dietary Habits of Secondary School Students

    Directory of Open Access Journals (Sweden)

    Zalewska Magdalena

    2014-12-01

    Full Text Available Maintenance of proper health and prevention of diseases of civilization are now significant public health problems. Nutrition is an important factor in the development of youth, as well as the current and future state of health. The aim of the study was to show the benefits of the application of cluster analysis to assess the dietary habits of high school students. The survey was carried out on 1,631 eighteen-year-old students in seven randomly selected secondary schools in Bialystok using a self-prepared anonymous questionnaire. An evaluation of the time of day meals were eaten and the number of meals consumed was made for the surveyed students. The cluster analysis allowed distinguishing characteristic structures of dietary habits in the observed population. Four clusters were identified, which were characterized by relative internal homogeneity and substantial variation in terms of the number of meals during the day and the time of their consumption. The most important characteristics of cluster 1 were cumulated food ration in 2 or 3 meals and long intervals between meals. Cluster 2 was characterized by eating the recommended number of 4 or 5 meals a day. In the 3rd cluster, students ate 3 meals a day with large intervals between them, and in the 4th they had four meals a day while maintaining proper intervals between them. In all clusters dietary mistakes occurred, but most of them were related to clusters 1 and 3. Cluster analysis allowed for the identification of major flaws in nutrition, which may include irregular eating and skipping meals, and indicated possible connections between eating patterns and disturbances of body weight in the examined population.

  13. AVES: A Computer Cluster System approach for INTEGRAL Scientific Analysis

    Science.gov (United States)

    Federici, M.; Martino, B. L.; Natalucci, L.; Umbertini, P.

    The AVES computing system, based on an "Cluster" architecture is a fully integrated, low cost computing facility dedicated to the archiving and analysis of the INTEGRAL data. AVES is a modular system that uses the software resource manager (SLURM) and allows almost unlimited expandibility (65,536 nodes and hundreds of thousands of processors); actually is composed by 30 Personal Computers with Quad-Cores CPU able to reach the computing power of 300 Giga Flops (300x10{9} Floating point Operations Per Second), with 120 GB of RAM and 7.5 Tera Bytes (TB) of storage memory in UFS configuration plus 6 TB for users area. AVES was designed and built to solve growing problems raised from the analysis of the large data amount accumulated by the INTEGRAL mission (actually about 9 TB) and due to increase every year. The used analysis software is the OSA package, distributed by the ISDC in Geneva. This is a very complex package consisting of dozens of programs that can not be converted to parallel computing. To overcome this limitation we developed a series of programs to distribute the workload analysis on the various nodes making AVES automatically divide the analysis in N jobs sent to N cores. This solution thus produces a result similar to that obtained by the parallel computing configuration. In support of this we have developed tools that allow a flexible use of the scientific software and quality control of on-line data storing. The AVES software package is constituted by about 50 specific programs. Thus the whole computing time, compared to that provided by a Personal Computer with single processor, has been enhanced up to a factor 70.

  14. Research on Collective Learning Mechanism and Influencing Factors of Industrial Cluster Innovation Network

    Directory of Open Access Journals (Sweden)

    Lan Wang

    2013-02-01

    Full Text Available This study attempts to contribute to the cluster innovation literature by adding the collective learning perspective and propose an analytical framework on collective learning of cluster. Industrial cluster is viewed as a prevalent mode for technology innovation in knowledge-based economy. Collective learning outlines how local innovation network and spatial proximity between actors influence the sharing and creation of skills and knowledge in cluster. Firstly, this study discusses the structure and character of innovation network within industrial cluster. Secondly, it analyzes the collective learning mechanism of industrial cluster, which is involves in three dimensions: horizontal learning, vertical learning and multi-angle learning. Then, it focuses on some influencing factors of collective learning within innovation network. Finally, this study analyzes the role of global-local linkages in the dynamic capability of cluster innovation network.

  15. Traffic Accident, System Model and Cluster Analysis in GIS

    Directory of Open Access Journals (Sweden)

    Veronika Vlčková

    2015-07-01

    Full Text Available One of the many often frequented topics as normal journalism, so the professional public, is the problem of traffic accidents. This article illustrates the orientation of considerations to a less known context of accidents, with the help of constructive systems theory and its methods, cluster analysis and geoinformation engineering. Traffic accident is reframing the space-time, and therefore it can be to study with tools of technology of geographic information systems. The application of system approach enabling the formulation of the system model, grabbed by tools of geoinformation engineering and multicriterial and cluster analysis.

  16. Key Factors for the Successful Operation of Clusters: The Case for Slovenia

    Directory of Open Access Journals (Sweden)

    Gajšek Brigita

    2016-05-01

    Full Text Available Background and Purpose: Companies are increasingly specializing and developing those key areas with which they can compete on the global market and are linking in clusters that are ingredient of territorial competitiveness. Clusters can play a competitive role in global value chains but once being successful, they may decline. For this reason, researching key factors for the successful operation of clusters in Slovenia is beneficial.

  17. Key Factors for the Successful Operation of Clusters: The Case for Slovenia

    OpenAIRE

    Gajšek Brigita; Kovač Jure

    2016-01-01

    Background and Purpose: Companies are increasingly specializing and developing those key areas with which they can compete on the global market and are linking in clusters that are ingredient of territorial competitiveness. Clusters can play a competitive role in global value chains but once being successful, they may decline. For this reason, researching key factors for the successful operation of clusters in Slovenia is beneficial.

  18. Cluster Analysis of Metal Concentrations in River Kubanni Zaria, Nigeria

    Directory of Open Access Journals (Sweden)

    A.W. Butu

    2013-08-01

    Full Text Available The cluster analysis was used to assess the degree of association of the metal concentrations in river Kubanni Zaria, Nigeria. The main sources of data for the analysis were the sediment from four distinct locations along the long profile Kubanni River which were analyzed using Instrumental Nitrogen Activities Analysis (INAA techniques. The Nigerian Research Reactor-1(NIRR-1 which is Miniature Nitrogen Source Reactor (MNSR was used to analyze the data. The result of the laboratory analysis was subjected to cluster analysis. The analysis shows a stable clustering system where the metal concentrations in the four different locations were grouped into two main groups with one outlier. The level of concentration of elements that were sampled in the dry months were cluster in group I and those collected in the raining months were in group II. This strongly support that there is temporal variation in the levels of concentration of metal contaminants between wet and dry seasons in river Kubanni and also confirms the fact that the elements that were collected in the wet season are from the same source and those in the dry season are also from common source.

  19. Cluster-cluster clustering

    Science.gov (United States)

    Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C. S.

    1985-01-01

    The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales.

  20. Cluster-cluster clustering

    Energy Technology Data Exchange (ETDEWEB)

    Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C.S.

    1985-08-01

    The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales. 30 references.

  1. Alpha-cluster preformation factor within cluster-formation model for odd-A and odd-odd heavy nuclei

    Science.gov (United States)

    Saleh Ahmed, Saad M.

    2017-06-01

    The alpha-cluster probability that represents the preformation of alpha particle in alpha-decay nuclei was determined for high-intensity alpha-decay mode odd-A and odd-odd heavy nuclei, 82 work. Our previous successful determination of phenomenological values of alpha-cluster preformation factors for even-even nuclei motivated us to expand the work to cover other types of nuclei. The formation energy of interior alpha cluster needed to be derived for the different nuclear systems with considering the unpaired-nucleon effect. The results showed the phenomenological value of alpha preformation probability and reflected the unpaired nucleon effect and the magic and sub-magic effects in nuclei. These results and their analyses presented are very useful for future work concerning the calculation of the alpha decay constants and the progress of its theory.

  2. Analysis of traffic accidents on rural highways using Latent Class Clustering and Bayesian Networks.

    Science.gov (United States)

    de Oña, Juan; López, Griselda; Mujalli, Randa; Calvo, Francisco J

    2013-03-01

    One of the principal objectives of traffic accident analyses is to identify key factors that affect the severity of an accident. However, with the presence of heterogeneity in the raw data used, the analysis of traffic accidents becomes difficult. In this paper, Latent Class Cluster (LCC) is used as a preliminary tool for segmentation of 3229 accidents on rural highways in Granada (Spain) between 2005 and 2008. Next, Bayesian Networks (BNs) are used to identify the main factors involved in accident severity for both, the entire database (EDB) and the clusters previously obtained by LCC. The results of these cluster-based analyses are compared with the results of a full-data analysis. The results show that the combined use of both techniques is very interesting as it reveals further information that would not have been obtained without prior segmentation of the data. BN inference is used to obtain the variables that best identify accidents with killed or seriously injured. Accident type and sight distance have been identify in all the cases analysed; other variables such as time, occupant involved or age are identified in EDB and only in one cluster; whereas variables vehicles involved, number of injuries, atmospheric factors, pavement markings and pavement width are identified only in one cluster. Copyright © 2012 Elsevier Ltd. All rights reserved.

  3. Application of microarray analysis on computer cluster and cloud platforms.

    Science.gov (United States)

    Bernau, C; Boulesteix, A-L; Knaus, J

    2013-01-01

    Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.

  4. Bayesian analysis of two stellar populations in Galactic globular clusters- III. Analysis of 30 clusters

    Science.gov (United States)

    Wagner-Kaiser, R.; Stenning, D. C.; Sarajedini, A.; von Hippel, T.; van Dyk, D. A.; Robinson, E.; Stein, N.; Jefferys, W. H.

    2016-12-01

    We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic globular clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within each cluster and the relative proportion of those populations. We find the helium differences among the two populations in the clusters fall in the range of ˜0.04 to 0.11. Because adequate models varying in carbon, nitrogen, and oxygen are not presently available, we view these spreads as upper limits and present them with statistical rather than observational uncertainties. Evidence supports previous studies suggesting an increase in helium content concurrent with increasing mass of the cluster and we also find that the proportion of the first population of stars increases with mass as well. Our results are examined in the context of proposed globular cluster formation scenarios. Additionally, we leverage our Bayesian technique to shed light on the inconsistencies between the theoretical models and the observed data.

  5. Structural factors of solar system cluster ground coupled storage rationalization

    Directory of Open Access Journals (Sweden)

    Viktor V. Wysochin

    2015-12-01

    Full Text Available The computational investigations of unsteady heat transfer in seasonal solar heat storage system were conducted. This storage system consists of nine ground heat exchangers. The investigations were made for periodical diurnal cycle charging during summer season. The heat exchanger is presented as vertical probe with concentric tubes arrangement. Aim: The aim of the work is the optimization of cluster ground coupled storage – the probes quantity in cluster, their lengths and interval – using high precision mathematical model. Materials and Methods: The mathematical model of conjugate solar system functioning and ground coupled storage involves differential equations describing the incoming and conversion of solar energy in solar collector. Also it includes the heat exchange in ground heat exchangers and three-dimensional soil mass. Results: The need of mutual influence accounting of the solar collector and the ground heat exchanger size ranges is shown. One more thing – capability of effectiveness improvement of the collector based on reasonable step size selection for cluster and selection of active heat exchangers quantity in requisite construction. Conclusions: The recommendations for organization of heat exchangers of the collector work are offered. The five-probe structure is the most effective one for cluster arrangement of seasonal heat storage. The recommended interval between probes is 4 meters.

  6. Examination of European Union economic cohesion: A cluster analysis approach

    Directory of Open Access Journals (Sweden)

    Jiri Mazurek

    2014-01-01

    Full Text Available In the past years majority of EU members experienced the highest economic decline in their modern history, but impacts of the global financial crisis were not distributed homogeneously across the continent. The aim of the paper is to examine a cohesion of European Union (plus Norway and Iceland in terms of an economic development of its members from the 1st of January 2008 to the 31st of December 2012. For the study five economic indicators were selected: GDP growth, unemployment, inflation, labour productivity and government debt. Annual data from Eurostat databases were averaged over the whole period and then used as an input for a cluster analysis. It was found that EU countries were divided into six different clusters. The most populated cluster with 14 countries covered Central and West Europe and reflected relative homogeneity of this part of Europe. Countries of Southern Europe (Greece, Portugal and Spain shared their own cluster of the most affected countries by the recent crisis as well as the Baltics and the Balkans states in another cluster. On the other hand Slovakia and Poland, only two countries that escaped a recession, were classified in their own cluster of the most successful countries

  7. Sun Protection Belief Clusters: Analysis of Amazon Mechanical Turk Data.

    Science.gov (United States)

    Santiago-Rivas, Marimer; Schnur, Julie B; Jandorf, Lina

    2016-12-01

    This study aimed (i) to determine whether people could be differentiated on the basis of their sun protection belief profiles and individual characteristics and (ii) explore the use of a crowdsourcing web service for the assessment of sun protection beliefs. A sample of 500 adults completed an online survey of sun protection belief items using Amazon Mechanical Turk. A two-phased cluster analysis (i.e., hierarchical and non-hierarchical K-means) was utilized to determine clusters of sun protection barriers and facilitators. Results yielded three distinct clusters of sun protection barriers and three distinct clusters of sun protection facilitators. Significant associations between gender, age, sun sensitivity, and cluster membership were identified. Results also showed an association between barrier and facilitator cluster membership. The results of this study provided a potential alternative approach to developing future sun protection promotion initiatives in the population. Findings add to our knowledge regarding individuals who support, oppose, or are ambivalent toward sun protection and inform intervention research by identifying distinct subtypes that may best benefit from (or have a higher need for) skin cancer prevention efforts.

  8. Application of cluster analysis to preventive maintenance scheme design of pavement

    Institute of Scientific and Technical Information of China (English)

    ZENG Feng; ZHANG Xiao-ning

    2009-01-01

    To quantitatively identify the maintenance demand for each highway segments in the pavement main-tenance scheme design, a mathematical model of uniform segment division was established and an approach of applying cluster analysis theory to the uniform segment division and evaluation of pavement maintenance demand was proposed.The actual maintenance project of a highway carried out in Guangdong province was cited as an example to demonstrate the validity of the proposed method.It is proved that the cluster analysis can eliminate human factors in classification without being constrained by the quantities of samples, considering muhiple pavement distress indexes and the continuity of samples.Thus it is evident that cluster analysis is an efficient analytical tool in uniform segment division and evaluation of maintenance demand.

  9. 资源型产业集群演化的外部环境因素分析%Analysis of External Environmental Factors of the Evolution of Resource-based Industrial Clusters

    Institute of Scientific and Technical Information of China (English)

    陈振; 严良; 谢雄标

    2011-01-01

    资源型产业集群是以自然资源的开发和加工为主体.以资源产韭为核心的众多相互合作和竞争的企业或组织集聚在资源产地两形成的网络群体.资源型产业集群处于复杂的外部环境当中,本文在综合学者研究和广泛的实地调研的基础上提出了资源型产业集群演化的外部环境因素模型,认为影响资源型产业集群演化的外部环境因素包括:政府行为、金融环境、行业技术进步、相关产业的发展变化、市场需求、对外开放、基础设施、生态环境以及公众意识.本文采用DEMATEL方法对资源型产业集群演化的外部环境因素进行分析,通过对相关专家进行的问卷调查,确定资源型产业集群演化的外部环境因素之闻的直接影响关系,在计算出各因素在资源型产业集群演化的外部环境因素模型中的综合影响关系的基础上,认为政府可以在资源型产业集群演化过程中发挥关键的作用.本文最后提出了相应的政策建议以提高政府对资源型产业集群演化的促进作用.%Resource-based industrial clusters are network groups formed by many mutual cooperative and competitive enterprises or organizations with a core of resource industries based on development and production of natural resources. They are facing a relatively complex external environment. And to get a better understanding of the environment, the paper proposed a model of external environmental factors for the resource-based industrial clusters' evolution on the basis of comprehensive academic researches and extensive investigations. It pointed out that the influencing factors include government behavior, financing environment, technological progress, related industries, market demand, opening policy, infrastructure, ecological environment, and public awareness. in methodology, the paper applied the DEMATEL method to analyze the model and distributed questionnaires to related experts to

  10. Regulatory coordination of clustered microRNAs based on microRNA-transcription factor regulatory network

    Directory of Open Access Journals (Sweden)

    Wang Jin

    2011-12-01

    Full Text Available Abstract Background MicroRNA (miRNA is a class of small RNAs of ~22nt which play essential roles in many crucial biological processes and numerous human diseases at post-transcriptional level of gene expression. It has been revealed that miRNA genes tend to be clustered, and the miRNAs organized into one cluster are usually transcribed coordinately. This implies a coordinated regulation mode exerted by clustered miRNAs. However, how the clustered miRNAs coordinate their regulations on large scale gene expression is still unclear. Results We constructed the miRNA-transcription factor regulatory network that contains the interactions between transcription factors (TFs, miRNAs and non-TF protein-coding genes, and made a genome-wide study on the regulatory coordination of clustered miRNAs. We found that there are two types of miRNA clusters, i.e. homo-clusters that contain miRNAs of the same family and hetero-clusters that contain miRNAs of various families. In general, the homo-clustered as well as the hetero-clustered miRNAs both exhibit coordinated regulation since the miRNAs belonging to one cluster tend to be involved in the same network module, which performs a relatively isolated biological function. However, the homo-clustered miRNAs show a direct regulatory coordination that is realized by one-step regulation (i.e. the direct regulation of the coordinated targets, whereas the hetero-clustered miRNAs show an indirect regulatory coordination that is realized by a regulation comprising at least three steps (e.g. the regulation on the coordinated targets by a miRNA through a sequential action of two TFs. The direct and indirect regulation target different categories of genes, the former predominantly regulating genes involved in emergent responses, the latter targeting genes that imply long-term effects. Conclusion The genomic clustering of miRNAs is closely related to the coordinated regulation in the gene regulatory network. The pattern of

  11. A Geometric Analysis of Subspace Clustering with Outliers

    CERN Document Server

    Soltanolkotabi, Mahdi

    2011-01-01

    This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named {\\em sparse subspace clustering} (SSC) \\cite{Elhamifar09}, which significantly broadens the range of problems where it is provably effective. For instance, we show that SSC can recover multiple subspaces, each of dimension comparable to the ambient dimension. We also prove that SSC can correctly cluster data points even when the subspaces of interest intersect. Further, we develop an extension of SSC that succeeds when the data set is corrupted with possibly overwhelmingly many outliers. Underlying our analysis are clear geometric insights, which may bear on other sparse recovery problems. A numerical study complements our theoretica...

  12. Cluster analysis of knowledge sources in standardized electrical engineering subfields

    Directory of Open Access Journals (Sweden)

    Blagojević Marija

    2016-01-01

    Full Text Available The paper presents a cluster analysis of innovation of knowledge sources based on the standards in the field of Electrical Engineering. Both local (SRPS and global (ISO knowledge sources have been analysed with the aim of innovating a Knowledge Base (KB. The results presented indicate a means/possibility of grouping the subfields within a cluster. They also point to a trend or intensity of knowledge source innovation for the purpose of innovating the KB that accompanies innovations. The study provides the possibility of predicting necessary financial resources in the forthcoming period by means of original mathematical relations. Furthermore, the cluster analysis facilitates the comparison of the innovation intensity in this and other (subfields. Future work relates to the monitoring of the knowledge source innovation by means of KB engineering and improvement of the methodology of prediction using neural networks.

  13. Cluster analysis of passive air sampling data based on the relative composition of persistent organic pollutants.

    Science.gov (United States)

    Liu, Xiande; Wania, Frank

    2014-03-01

    The development of passive air samplers has allowed the measurement of time-integrated concentrations of persistent organic pollutants (POPs) within spatial networks on a variety of scales. Cluster analysis of POP composition may enhance the interpretation of such spatial data. Several methodological aspects of the application of cluster analysis are discussed, including the influence of a dominant pollutant, the role of PAS duplication, and comparison of regional studies. Relying on data from six regional studies in North and South America, Africa, and Asia, we illustrate here how cluster analysis can be used to extract information and gain insights into POP sources and atmospheric transport contributions. Cluster analysis allows classification of PAS samples into those with significant local source contributions and those that represent regional fingerprints. Local emissions, atmospheric transport, and seasonal cycles are identified as being among the major factors determining the variation in POP composition at many sites. By complementing cluster analysis with meteorological data such as air mass back-trajectories, terrain, as well as geographical and socio-economic aspects, a comprehensive picture of the atmospheric contamination of a region by POPs emerges.

  14. Cluster analysis of WIBS single-particle bioaerosol data

    Science.gov (United States)

    Robinson, N. H.; Allan, J. D.; Huffman, J. A.; Kaye, P. H.; Foot, V. E.; Gallagher, M.

    2013-02-01

    Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial data sets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Wideband Integrated Bioaerosol Sensors (WIBSs). The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL) before being applied to two separate contemporaneous ambient WIBS data sets recorded in a forest site in Colorado, USA, as part of the BEACHON-RoMBAS project. Cluster analysis results between both data sets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity) to represent the following: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long-term online primary biological aerosol particle (PBAP) measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics are improved.

  15. Cluster analysis of WIBS single-particle bioaerosol data

    Directory of Open Access Journals (Sweden)

    N. H. Robinson

    2013-02-01

    Full Text Available Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial data sets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Wideband Integrated Bioaerosol Sensors (WIBSs. The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL before being applied to two separate contemporaneous ambient WIBS data sets recorded in a forest site in Colorado, USA, as part of the BEACHON-RoMBAS project. Cluster analysis results between both data sets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity to represent the following: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long-term online primary biological aerosol particle (PBAP measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics are improved.

  16. Joint Sequence Analysis: Association and Clustering

    Science.gov (United States)

    Piccarreta, Raffaella

    2017-01-01

    In its standard formulation, sequence analysis aims at finding typical patterns in a set of life courses represented as sequences. Recently, some proposals have been introduced to jointly analyze sequences defined on different domains (e.g., work career, partnership, and parental histories). We introduce measures to evaluate whether a set of…

  17. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    Science.gov (United States)

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  18. Transcriptional analysis of ESAT-6 cluster 3 in Mycobacterium smegmatis

    Directory of Open Access Journals (Sweden)

    Riccardi Giovanna

    2009-03-01

    Full Text Available Abstract Background The ESAT-6 (early secreted antigenic target, 6 kDa family collects small mycobacterial proteins secreted by Mycobacterium tuberculosis, particularly in the early phase of growth. There are 23 ESAT-6 family members in M. tuberculosis H37Rv. In a previous work, we identified the Zur- dependent regulation of five proteins of the ESAT-6/CFP-10 family (esxG, esxH, esxQ, esxR, and esxS. esxG and esxH are part of ESAT-6 cluster 3, whose expression was already known to be induced by iron starvation. Results In this research, we performed EMSA experiments and transcriptional analysis of ESAT-6 cluster 3 in Mycobacterium smegmatis (msmeg0615-msmeg0625 and M. tuberculosis. In contrast to what we had observed in M. tuberculosis, we found that in M. smegmatis ESAT-6 cluster 3 responds only to iron and not to zinc. In both organisms we identified an internal promoter, a finding which suggests the presence of two transcriptional units and, by consequence, a differential expression of cluster 3 genes. We compared the expression of msmeg0615 and msmeg0620 in different growth and stress conditions by means of relative quantitative PCR. The expression of msmeg0615 and msmeg0620 genes was essentially similar; they appeared to be repressed in most of the tested conditions, with the exception of acid stress (pH 4.2 where msmeg0615 was about 4-fold induced, while msmeg0620 was repressed. Analysis revealed that in acid stress conditions M. tuberculosis rv0282 gene was 3-fold induced too, while rv0287 induction was almost insignificant. Conclusion In contrast with what has been reported for M. tuberculosis, our results suggest that in M. smegmatis only IdeR-dependent regulation is retained, while zinc has no effect on gene expression. The role of cluster 3 in M. tuberculosis virulence is still to be defined; however, iron- and zinc-dependent expression strongly suggests that cluster 3 is highly expressed in the infective process, and that the cluster

  19. Comparison of population-averaged and cluster-specific models for the analysis of cluster randomized trials with missing binary outcomes: a simulation study

    Directory of Open Access Journals (Sweden)

    Ma Jinhui

    2013-01-01

    Full Text Available Abstracts Background The objective of this simulation study is to compare the accuracy and efficiency of population-averaged (i.e. generalized estimating equations (GEE and cluster-specific (i.e. random-effects logistic regression (RELR models for analyzing data from cluster randomized trials (CRTs with missing binary responses. Methods In this simulation study, clustered responses were generated from a beta-binomial distribution. The number of clusters per trial arm, the number of subjects per cluster, intra-cluster correlation coefficient, and the percentage of missing data were allowed to vary. Under the assumption of covariate dependent missingness, missing outcomes were handled by complete case analysis, standard multiple imputation (MI and within-cluster MI strategies. Data were analyzed using GEE and RELR. Performance of the methods was assessed using standardized bias, empirical standard error, root mean squared error (RMSE, and coverage probability. Results GEE performs well on all four measures — provided the downward bias of the standard error (when the number of clusters per arm is small is adjusted appropriately — under the following scenarios: complete case analysis for CRTs with a small amount of missing data; standard MI for CRTs with variance inflation factor (VIF 50. RELR performs well only when a small amount of data was missing, and complete case analysis was applied. Conclusion GEE performs well as long as appropriate missing data strategies are adopted based on the design of CRTs and the percentage of missing data. In contrast, RELR does not perform well when either standard or within-cluster MI strategy is applied prior to the analysis.

  20. Using the Cluster Analysis and the Principal Component Analysis in Evaluating the Quality of a Destination

    Directory of Open Access Journals (Sweden)

    Ida Vajčnerová

    2016-01-01

    Full Text Available The objective of the paper is to explore possibilities of evaluating the quality of a tourist destination by means of the principal components analysis (PCA and the cluster analysis. In the paper both types of analysis are compared on the basis of the results they provide. The aim is to identify advantage and limits of both methods and provide methodological suggestion for their further use in the tourism research. The analyses is based on the primary data from the customers’ satisfaction survey with the key quality factors of a destination. As output of the two statistical methods is creation of groups or cluster of quality factors that are similar in terms of respondents’ evaluations, in order to facilitate the evaluation of the quality of tourist destinations. Results shows the possibility to use both tested methods. The paper is elaborated in the frame of wider research project aimed to develop a methodology for the quality evaluation of tourist destinations, especially in the context of customer satisfaction and loyalty.

  1. Salient concerns in using analgesia for cancer pain among outpatients: A cluster analysis study.

    Science.gov (United States)

    Meghani, Salimah H; Knafl, George J

    2017-02-10

    To identify unique clusters of patients based on their concerns in using analgesia for cancer pain and predictors of the cluster membership. This was a 3-mo prospective observational study (n = 207). Patients were included if they were adults (≥ 18 years), diagnosed with solid tumors or multiple myelomas, and had at least one prescription of around-the-clock pain medication for cancer or cancer-treatment-related pain. Patients were recruited from two outpatient medical oncology clinics within a large health system in Philadelphia. A choice-based conjoint (CBC) analysis experiment was used to elicit analgesic treatment preferences (utilities). Patients employed trade-offs based on five analgesic attributes (percent relief from analgesics, type of analgesic, type of side-effects, severity of side-effects, out of pocket cost). Patients were clustered based on CBC utilities using novel adaptive statistical methods. Multiple logistic regression was used to identify predictors of cluster membership. The analyses found 4 unique clusters: Most patients made trade-offs based on the expectation of pain relief (cluster 1, 41%). For a subset, the main underlying concern was type of analgesic prescribed, i.e., opioid vs non-opioid (cluster 2, 11%) and type of analgesic side effects (cluster 4, 21%), respectively. About one in four made trade-offs based on multiple concerns simultaneously including pain relief, type of side effects, and severity of side effects (cluster 3, 28%). In multivariable analysis, to identify predictors of cluster membership, clinical and socioeconomic factors (education, health literacy, income, social support) rather than analgesic attitudes and beliefs were found important; only the belief, i.e., pain medications can mask changes in health or keep you from knowing what is going on in your body was found significant in predicting two of the four clusters [cluster 1 (-); cluster 4 (+)]. Most patients appear to be driven by a single salient concern in

  2. Characterization of population exposure to organochlorines: A cluster analysis application

    NARCIS (Netherlands)

    R.M. Guimarães (Raphael Mendonça); S. Asmus (Sven); A. Burdorf (Alex)

    2013-01-01

    textabstractThis study aimed to show the results from a cluster analysis application in the characterization of population exposure to organochlorines through variables related to time and exposure dose. Characteristics of 354 subjects in a population exposed to organochlorine pesticides residues

  3. Cluster analysis as a prediction tool for pregnancy outcomes.

    Science.gov (United States)

    Banjari, Ines; Kenjerić, Daniela; Šolić, Krešimir; Mandić, Milena L

    2015-03-01

    Considering specific physiology changes during gestation and thinking of pregnancy as a "critical window", classification of pregnant women at early pregnancy can be considered as crucial. The paper demonstrates the use of a method based on an approach from intelligent data mining, cluster analysis. Cluster analysis method is a statistical method which makes possible to group individuals based on sets of identifying variables. The method was chosen in order to determine possibility for classification of pregnant women at early pregnancy to analyze unknown correlations between different variables so that the certain outcomes could be predicted. 222 pregnant women from two general obstetric offices' were recruited. The main orient was set on characteristics of these pregnant women: their age, pre-pregnancy body mass index (BMI) and haemoglobin value. Cluster analysis gained a 94.1% classification accuracy rate with three branch- es or groups of pregnant women showing statistically significant correlations with pregnancy outcomes. The results are showing that pregnant women both of older age and higher pre-pregnancy BMI have a significantly higher incidence of delivering baby of higher birth weight but they gain significantly less weight during pregnancy. Their babies are also longer, and these women have significantly higher probability for complications during pregnancy (gestosis) and higher probability of induced or caesarean delivery. We can conclude that the cluster analysis method can appropriately classify pregnant women at early pregnancy to predict certain outcomes.

  4. A Cluster Analysis of Personality Style in Adults with ADHD

    Science.gov (United States)

    Robin, Arthur L.; Tzelepis, Angela; Bedway, Marquita

    2008-01-01

    Objective: The purpose of this study was to use hierarchical linear cluster analysis to examine the normative personality styles of adults with ADHD. Method: A total of 311 adults with ADHD completed the Millon Index of Personality Styles, which consists of 24 scales assessing motivating aims, cognitive modes, and interpersonal behaviors. Results:…

  5. A Cluster Analysis of Personality Style in Adults with ADHD

    Science.gov (United States)

    Robin, Arthur L.; Tzelepis, Angela; Bedway, Marquita

    2008-01-01

    Objective: The purpose of this study was to use hierarchical linear cluster analysis to examine the normative personality styles of adults with ADHD. Method: A total of 311 adults with ADHD completed the Millon Index of Personality Styles, which consists of 24 scales assessing motivating aims, cognitive modes, and interpersonal behaviors. Results:…

  6. Language Learner Motivational Types: A Cluster Analysis Study

    Science.gov (United States)

    Papi, Mostafa; Teimouri, Yasser

    2014-01-01

    The study aimed to identify different second language (L2) learner motivational types drawing on the framework of the L2 motivational self system. A total of 1,278 secondary school students learning English in Iran completed a questionnaire survey. Cluster analysis yielded five different groups based on the strength of different variables within…

  7. Making Sense of Cluster Analysis: Revelations from Pakistani Science Classes

    Science.gov (United States)

    Pell, Tony; Hargreaves, Linda

    2011-01-01

    Cluster analysis has been applied to quantitative data in educational research over several decades and has been a feature of the Maurice Galton's research in primary and secondary classrooms. It has offered potentially useful insights for teaching yet its implications for practice are rarely implemented. It has been subject also to negative…

  8. Frailty phenotypes in the elderly based on cluster analysis

    DEFF Research Database (Denmark)

    Dato, Serena; Montesanto, Alberto; Lagani, Vincenzo

    2012-01-01

    genetic background on the frailty status is still questioned. We investigated the applicability of a cluster analysis approach based on specific geriatric parameters, previously set up and validated in a southern Italian population, to two large longitudinal Danish samples. In both cohorts, we identified...

  9. Lower physical activity is a risk factor for a clustering of metabolic risk factors in non-obese and obese Japanese subjects: the Takahata study.

    Science.gov (United States)

    Kaino, Wataru; Daimon, Makoto; Sasaki, Satoshi; Karasawa, Shigeru; Takase, Kaoru; Tada, Kyouko; Wada, Kiriko; Kameda, Wataru; Susa, Shinji; Oizumi, Toshihide; Fukao, Akira; Kubota, Isao; Kayama, Takamasa; Kato, Takeo

    2013-01-01

    In several countries including Japan, people without obesity but with a clustering of metabolic risk factors (MetRFs) were not considered to have the metabolic syndrome (MetS). Here, we examined whether lifestyle characteristics differed between non-obese and obese subjects with or without a clustering of MetRFs. From a population-based cross-sectional study of Japanese subjects aged ≥ 40 years, 1,601 subjects (age: 61.9 ± 10.3 years; 710/891 men/women) were recruited. Physical activity status and daily nutritional intake were estimated using questionnaires. A clustering of MetRFs was defined based on the presence of at least two non-essential risk factors for the diagnosis of the MetS in Japan. Energy intake was not higher in subjects with a clustering of MetRFs compared with those without. Among men, energy expenditure at work was significantly lower in non-obese (9.0 ± 8.2 vs. 11.3 ± 9.3 metabolic equivalents (METs), P = 0.025) and obese (9.0 ± 7.9 vs. 11.6 ± 9.4 METs, P = 0.017) subjects with a clustering of MetRFs than in those without. Multiple logistic regression analysis showed that energy expenditure at work was significantly associated with a clustering of MetRFs after adjusting for possible confounding factors including total energy intake. The ORs (per 1 METs) were 0.970 (95% CI, 0.944-0.997; P = 0.032) in non-obese men and 0.962 (0.926- 0.999; P = 0.043) in obese men. Similar associations were not observed in women. In Japanese males, lower physical activity, but not excessive energy intake, is a risk factor for a clustering of MetRFs independent of their obesity status.

  10. Clustering of Multiple Lifestyle Behaviours and Its Association to Cardiovascular Risk Factors in Children

    DEFF Research Database (Denmark)

    Bel-Serrat, Silvia; Mouratidou, Theodora; Santaliestra-Pasías, Alba María

    2013-01-01

    BACKGROUND/OBJECTIVES: Individual lifestyle behaviours have independently been associated with cardiovascular diseases (CVD) risk factors in children. This study aimed to identify clustered lifestyle behaviours (dietary, physical activity (PA) and sedentary indicators) and to examine their associ...

  11. Performance Analysis of Unsupervised Clustering Methods for Brain Tumor Segmentation

    Directory of Open Access Journals (Sweden)

    Tushar H Jaware

    2013-10-01

    Full Text Available Medical image processing is the most challenging and emerging field of neuroscience. The ultimate goal of medical image analysis in brain MRI is to extract important clinical features that would improve methods of diagnosis & treatment of disease. This paper focuses on methods to detect & extract brain tumour from brain MR images. MATLAB is used to design, software tool for locating brain tumor, based on unsupervised clustering methods. K-Means clustering algorithm is implemented & tested on data base of 30 images. Performance evolution of unsupervised clusteringmethods is presented.

  12. Canine parvovirus in Australia: the role of socio-economic factors in disease clusters.

    Science.gov (United States)

    Brady, S; Norris, J M; Kelman, M; Ward, M P

    2012-08-01

    To identify clusters of canine parvoviral related disease occurring in Australia during 2010 and investigate the role of socio-economic factors contributing to these clusters, reported cases of canine parvovirus were extracted from an on-line disease surveillance system. Reported residential postcode was used to locate cases, and clusters were identified using a scan statistic. Cases included in clusters were compared to those not included in such clusters with respect to human socioeconomic factors (postcode area relative socioeconomic disadvantage, economic resources, education and occupation) and dog factors (neuter status, breed, age, gender, vaccination status). During 2010, there were 1187 cases of canine parvovirus reported. Nineteen significant (P0.05) was found between cases reported from cluster postcodes and those not within clusters for dog age, gender, breed or vaccination status (although the latter needs to be interpreted with caution, since vaccination was absent in most of the cases). Further research is required to investigate the apparent association between indicators of poor socioeconomic status and clusters of reported canine parvovirus diseases; however these initial findings may be useful for developing geographically- and temporally-targeted prevention and disease control programs.

  13. Outcome-Driven Cluster Analysis with Application to Microarray Data.

    Directory of Open Access Journals (Sweden)

    Jessie J Hsu

    Full Text Available One goal of cluster analysis is to sort characteristics into groups (clusters so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes into groups of highly correlated genes that have the same effect on the outcome (recovery. We propose a random effects model where the genes within each group (cluster equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.

  14. Improving Cluster Analysis with Automatic Variable Selection Based on Trees

    Science.gov (United States)

    2014-12-01

    ANALYSIS WITH AUTOMATIC VARIABLE SELECTION BASED ON TREES by Anton D. Orr December 2014 Thesis Advisor: Samuel E. Buttrey Second Reader...DATES COVERED Master’s Thesis 4. TITLE AND SUBTITLE IMPROVING CLUSTER ANALYSIS WITH AUTOMATIC VARIABLE SELECTION BASED ON TREES 5. FUNDING NUMBERS 6...2006 based on classification and regression trees to address problems with determining dissimilarity. Current algorithms do not simultaneously address

  15. Factor analysis of multivariate data

    Digital Repository Service at National Institute of Oceanography (India)

    Fernandes, A.A.; Mahadevan, R.

    A brief introduction to factor analysis is presented. A FORTRAN program, which can perform the Q-mode and R-mode factor analysis and the singular value decomposition of a given data matrix is presented in Appendix B. This computer program, uses...

  16. Factor Analysis of Intern Effectiveness

    Science.gov (United States)

    Womack, Sid T.; Hannah, Shellie Louise; Bell, Columbus David

    2012-01-01

    Four factors in teaching intern effectiveness, as measured by a Praxis III-similar instrument, were found among observational data of teaching interns during the 2010 spring semester. Those factors were lesson planning, teacher/student reflection, fairness & safe environment, and professionalism/efficacy. This factor analysis was as much of a…

  17. Factor analysis and missing data

    NARCIS (Netherlands)

    Kamakura, WA; Wedel, M

    2000-01-01

    The authors study the estimation of factor models and the imputation of missing data and propose an approach that provides direct estimates of factor weights without the replacement of missing data with imputed values. First, the approach is useful in applications of factor analysis in the presence

  18. Exploring the profiles of nurses' job satisfaction in Macau: results of a cluster analysis.

    Science.gov (United States)

    Chan, Moon Fai; Leong, Sok Man; Luk, Andrew Leung; Yeung, Siu Ming; Van, Iat Kio

    2010-02-01

    To determine whether definable subtypes exist within a cohort of nurses with regard to factors associated with nurses' job satisfaction patterns and to compare whether these factors vary between nurses in groups with different profiles. Globally, the health care system is experiencing major changes and influence nurses' job satisfaction and may ultimately affect the quality of nursing care for patients. A descriptive survey. Data were collected using a self-reported structured questionnaire. Nurses were recruited in two hospitals in Macao. Two main outcome variables were collected: Predisposing characteristics and five components on job satisfaction outcomes. A cluster analysis yielded two clusters (n = 649). Cluster 1 consisted of 60.6% (n = 393) and Cluster 2 of 39.4% (n = 256) of the nurses. Cluster 1 nurses were younger, more educated and had less work experience and more intention to change their career than nurses in Cluster 2. Cluster 2 nurses had more work experiences, were of more senior grade and were more satisfied with their current job in terms of peer supports, autonomy and professional opportunities, scheduling and relationships with team members than nurses in Cluster 1. Findings might help by providing important information for health care managers to identify strategies/methods to target a specific group of nurses in hopes of increasing their job satisfaction levels. As a long-term investment, hospital management has to promote work environments that support job satisfaction to attract nurses and thereby improve the quality of nursing care. The results of this study might provide hospital managers with a model to design specified interventions to improve nurses' job satisfaction.

  19. Analysis of local bond-orientational order for liquid gallium at ambient pressure: Two types of cluster structures.

    Science.gov (United States)

    Chen, Lin-Yuan; Tang, Ping-Han; Wu, Ten-Ming

    2016-07-14

    In terms of the local bond-orientational order (LBOO) parameters, a cluster approach to analyze local structures of simple liquids was developed. In this approach, a cluster is defined as a combination of neighboring seeds having at least nb local-orientational bonds and their nearest neighbors, and a cluster ensemble is a collection of clusters with a specified nb and number of seeds ns. This cluster analysis was applied to investigate the microscopic structures of liquid Ga at ambient pressure (AP). The liquid structures studied were generated through ab initio molecular dynamics simulations. By scrutinizing the static structure factors (SSFs) of cluster ensembles with different combinations of nb and ns, we found that liquid Ga at AP contained two types of cluster structures, one characterized by sixfold orientational symmetry and the other showing fourfold orientational symmetry. The SSFs of cluster structures with sixfold orientational symmetry were akin to the SSF of a hard-sphere fluid. On the contrary, the SSFs of cluster structures showing fourfold orientational symmetry behaved similarly as the anomalous SSF of liquid Ga at AP, which is well known for exhibiting a high-q shoulder. The local structures of a highly LBOO cluster whose SSF displayed a high-q shoulder were found to be more similar to the structure of β-Ga than those of other solid phases of Ga. More generally, the cluster structures showing fourfold orientational symmetry have an inclination to resemble more to β-Ga.

  20. Identifying patterns in treatment response profiles in acute bipolar mania: a cluster analysis approach

    Directory of Open Access Journals (Sweden)

    Houston John P

    2008-07-01

    Full Text Available Abstract Background Patients with acute mania respond differentially to treatment and, in many cases, fail to obtain or sustain symptom remission. The objective of this exploratory analysis was to characterize response in bipolar disorder by identifying groups of patients with similar manic symptom response profiles. Methods Patients (n = 222 were selected from a randomized, double-blind study of treatment with olanzapine or divalproex in bipolar I disorder, manic or mixed episode, with or without psychotic features. Hierarchical clustering based on Ward's distance was used to identify groups of patients based on Young-Mania Rating Scale (YMRS total scores at each of 5 assessments over 7 weeks. Logistic regression was used to identify baseline predictors for clusters of interest. Results Four distinct clusters of patients were identified: Cluster 1 (n = 64: patients did not maintain a response (YMRS total scores ≤ 12; Cluster 2 (n = 92: patients responded rapidly (within less than a week and response was maintained; Cluster 3 (n = 36: patients responded rapidly but relapsed soon afterwards (YMRS ≥ 15; Cluster 4 (n = 30: patients responded slowly (≥ 2 weeks and response was maintained. Predictive models using baseline variables found YMRS Item 10 (Appearance, and psychosis to be significant predictors for Clusters 1 and 4 vs. Clusters 2 and 3, but none of the baseline characteristics allowed discriminating between Clusters 1 vs. 4. Experiencing a mixed episode at baseline predicted membership in Clusters 2 and 3 vs. Clusters 1 and 4. Treatment with divalproex, larger number of previous manic episodes, lack of disruptive-aggressive behavior, and more prominent depressive symptoms at baseline were predictors for Cluster 3 vs. 2. Conclusion Distinct treatment response profiles can be predicted by clinical features at baseline. The presence of these features as potential risk factors for relapse in patients who have responded to treatment

  1. 基于因子分析和聚类分析法的皖江城市带经济质量评价研究%Study on the Evaluation of Quality of Urban Economy in Wan Jiang City Belt Based on Factor Analysis and Cluster Analysis

    Institute of Scientific and Technical Information of China (English)

    汪恩辉; 赵国庆

    2015-01-01

    选取15个评价指标,依据搜集的数据,运用因子分析和聚类分析法,对皖江城市带8个城市的经济质量作出评价研究,客观反映经济质量状况.分析表明:经济发展因子、资源环境因子和生活水平因子是影响经济质量的三个主要因子;皖江城市带的三个主要因子得分普遍较低,反映经济质量状况依然不容乐观;因子得分排序和聚类分析发现其经济质量状况分三个层次, 表明皖江城市带的经济质量状况存在明显差距. 在此基础上,提出应分类指导因地制策与转方式调结构双管齐下助推皖江城市带经济质量的提升.%This paper will objectively evaluate and reflect the economic quality of 8 cities in Wan Jiang City Belt, based on 15 in-dexes and the related data about quality of urban economy, by the methods of factor analysis and cluster analysis. The results of research show that: firstly, the three main factors which effect the economic quality are economic development, resources and environment and living standard; secondly, the three main factors in Wan Jiang City Belt generally get low scores, so the economic quality is not opti-mistic; thirdly, according to the score ranking of factors and cluster analysis, quality of the economy can be divided into three levels, which reveals the obvious gaps existing in the economic quality status. Based on these, we should adopt different developmental strategies according to local conditions and transformation of economic development pattern and the adjustment of industrial structure to improve the quality of economy of Wan Jiang City Belt.

  2. DGA Clustering and Analysis: Mastering Modern, Evolving Threats, DGALab

    Directory of Open Access Journals (Sweden)

    Alexander Chailytko

    2016-05-01

    Full Text Available Domain Generation Algorithms (DGA is a basic building block used in almost all modern malware. Malware researchers have attempted to tackle the DGA problem with various tools and techniques, with varying degrees of success. We present a complex solution to populate DGA feed using reversed DGAs, third-party feeds, and a smart DGA extraction and clustering based on emulation of a large number of samples. Smart DGA extraction requires no reverse engineering and works regardless of the DGA type or initialization vector, while enabling a cluster-based analysis. Our method also automatically allows analysis of the whole malware family, specific campaign, etc. We present our system and demonstrate its abilities on more than 20 malware families. This includes showing connections between different campaigns, as well as comparing results. Most importantly, we discuss how to utilize the outcome of the analysis to create smarter protections against similar malware.

  3. Cohort study on clustering of lifestyle risk factors and understanding its association with stress on health and wellbeing among school teachers in Malaysia (CLUSTer)--a study protocol.

    OpenAIRE

    Moy, FM; Hoe, VC; Hairi, NN; Buckley, B.; WARK, PA; Koh, D; Bueno-de-Mesquita, HB; Bulgiba, AM

    2014-01-01

    Background The study on Clustering of Lifestyle risk factors and Understanding its association with Stress on health and wellbeing among school Teachers in Malaysia (CLUSTer) is a prospective cohort study which aims to extensively study teachers in Malaysia with respect to clustering of lifestyle risk factors and stress, and subsequently, to follow-up the population for important health outcomes. Method/design This study is being conducted in six states within Peninsular Malaysia. From each s...

  4. Cohort study on clustering of lifestyle risk factors and understanding its association with stress on health and wellbeing among school teachers in Malaysia (CLUSTer) – a study protocol

    OpenAIRE

    Moy, Foong Ming; Hoe, Victor Chee Wai; Hairi, Noran Naqiah; Buckley, Brian; Wark, Petra A; Koh, David; Bueno-de-Mesquita, HB; Bulgiba, Awang M.

    2014-01-01

    Background The study on Clustering of Lifestyle risk factors and Understanding its association with Stress on health and wellbeing among school Teachers in Malaysia (CLUSTer) is a prospective cohort study which aims to extensively study teachers in Malaysia with respect to clustering of lifestyle risk factors and stress, and subsequently, to follow-up the population for important health outcomes. Method/design This study is being conducted in six states within Peninsular Malaysia. From each s...

  5. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    Directory of Open Access Journals (Sweden)

    I. Crawford

    2015-07-01

    Full Text Available In this paper we present improved methods for discriminating and quantifying Primary Biological Aerosol Particles (PBAP by applying hierarchical agglomerative cluster analysis to multi-parameter ultra violet-light induced fluorescence (UV-LIF spectrometer data. The methods employed in this study can be applied to data sets in excess of 1×106 points on a desktop computer, allowing for each fluorescent particle in a dataset to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient dataset. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4 where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best performing methods were applied to the BEACHON-RoMBAS ambient dataset where it was found that the z-score and range normalisation methods yield similar results with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misatrribution

  6. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    Directory of Open Access Journals (Sweden)

    I. Crawford

    2015-11-01

    Full Text Available In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4 where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio–hydro–atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen–Rocky Mountain Biogenic Aerosol Study ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the

  7. Analyzing the Role of Community and Individual Factors in Food Insecurity: Identifying Diverse Barriers Across Clustered Community Members.

    Science.gov (United States)

    Jablonski, Becca B R; McFadden, Dawn Thilmany; Colpaart, Ashley

    2016-10-01

    This paper uses the results from a community food security assessment survey of 684 residents and three focus groups in Pueblo County, Colorado to examine the question: what community and individual factors contribute to or alleviate food insecurity, and are these factors consistent throughout a sub-county population. Importantly, we use a technique called cluster analysis to endogenously determine the key factors pertinent to food access and fruit and vegetable consumption. Our results show significant heterogeneity among sub-population clusters in terms of the community and individual factors that would make it easier to get access to fruits and vegetables. We find two distinct clusters of food insecure populations: the first was significantly less likely to identify increased access to fruits and vegetables proximate to where they live or work as a way to improve their household's healthy food consumption despite being significantly less likely to utilize a personal vehicle to get to the store; the second group did not report significant challenges with access, rather with affordability. We conclude that though interventions focused on improving the local food retail environment may be important for some subsamples of the food insecure population, it is unclear that proximity to a store with healthy food will support enhanced food security for all. We recommend that future research recognizes that determinants of food insecurity may vary within county or zip code level regions, and that multiple interventions that target sub-population clusters may elicit better improvements in access to and consumption of fruits and vegetables.

  8. Clues on the Evolution of Cluster Galaxies From The Analysis of Their Orbital Anisotropies

    OpenAIRE

    Biviano, A.; Katgert, P.; Thomas, T; Mazure, A.

    2003-01-01

    We study the evolution of galaxies in clusters by the analysis of a sample of about 3000 galaxies, members of 59 clusters from the ESO Nearby Abell Cluster Survey (ENACS). We distinguish four cluster galaxy populations, based on their radial and velocity distributions within the clusters. Using the class of ellipticals and S0's (excluding the very bright ellipticals), we determine the average cluster mass profile, that we compare with mass models available from numerical simulations. We then ...

  9. The Productivity Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, E.

    2014-07-01

    Chennai, also called the Detroit of India, is India's second fastest growing auto market and exports auto components and vehicles to US, Germany, Japan and Brazil. For inclusive growth and sustainable development, 250 auto component industries in Ambattur, Thirumalisai and Thirumudivakkam Industrial Estates located in Chennai have adopted the Cluster Development Approach called Automotive Component Cluster. The objective is to study the Value Chain, Correlation and Data Envelopment Analysis by determining technical efficiency, peer weights, input and output slacks of 100 auto component industries in three estates. The methodology adopted is using Data Envelopment Analysis of Output Oriented Banker Charnes Cooper model by taking net worth, fixed assets, employment as inputs and gross output as outputs. The non-zero represents the weights for efficient clusters. The higher slack obtained reveals the excess net worth, fixed assets, employment and shortage in gross output. To conclude, the variables are highly correlated and the inefficient industries should increase their gross output or decrease the fixed assets or employment. Moreover for sustainable development, the cluster should strengthen infrastructure, technology, procurement, production and marketing interrelationships to decrease costs and to increase productivity and efficiency to compete in the indigenous and export market.

  10. Full text clustering and relationship network analysis of biomedical publications.

    Directory of Open Access Journals (Sweden)

    Renchu Guan

    Full Text Available Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  11. Kinematic gait patterns in healthy runners: A hierarchical cluster analysis.

    Science.gov (United States)

    Phinyomark, Angkoon; Osis, Sean; Hettinga, Blayne A; Ferber, Reed

    2015-11-01

    Previous studies have demonstrated distinct clusters of gait patterns in both healthy and pathological groups, suggesting that different movement strategies may be represented. However, these studies have used discrete time point variables and usually focused on only one specific joint and plane of motion. Therefore, the first purpose of this study was to determine if running gait patterns for healthy subjects could be classified into homogeneous subgroups using three-dimensional kinematic data from the ankle, knee, and hip joints. The second purpose was to identify differences in joint kinematics between these groups. The third purpose was to investigate the practical implications of clustering healthy subjects by comparing these kinematics with runners experiencing patellofemoral pain (PFP). A principal component analysis (PCA) was used to reduce the dimensionality of the entire gait waveform data and then a hierarchical cluster analysis (HCA) determined group sets of similar gait patterns and homogeneous clusters. The results show two distinct running gait patterns were found with the main between-group differences occurring in frontal and sagittal plane knee angles (Pgait strategies. These results suggest care must be taken when selecting samples of subjects in order to investigate the pathomechanics of injured runners.

  12. Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

    Science.gov (United States)

    Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

    2016-04-01

    Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.

  13. The Quantitative Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, Ethirajan

    2016-07-01

    Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity

  14. Cohort study on clustering of lifestyle risk factors and understanding its association with stress on health and wellbeing among school teachers in Malaysia (CLUSTer)--a study protocol.

    Science.gov (United States)

    Moy, Foong Ming; Hoe, Victor Chee Wai; Hairi, Noran Naqiah; Buckley, Brian; Wark, Petra A; Koh, David; Bueno-de-Mesquita, H Bas; Bulgiba, Awang M

    2014-06-17

    The study on Clustering of Lifestyle risk factors and Understanding its association with Stress on health and wellbeing among school Teachers in Malaysia (CLUSTer) is a prospective cohort study which aims to extensively study teachers in Malaysia with respect to clustering of lifestyle risk factors and stress, and subsequently, to follow-up the population for important health outcomes. This study is being conducted in six states within Peninsular Malaysia. From each state, schools from each district are randomly selected and invited to participate in the study. Once the schools agree to participate, all teachers who fulfilled the inclusion criteria are invited to participate. Data collection includes a questionnaire survey and health assessment. Information collected in the questionnaire includes socio-demographic characteristics, participants' medical history and family history of chronic diseases, teaching characteristics and burden, questions on smoking, alcohol consumption and physical activities (IPAQ); a food frequency questionnaire, the job content questionnaire (JCQ); depression, anxiety and stress scale (DASS21); health related quality of life (SF12-V2); Voice Handicap Index 10 on voice disorder, questions on chronic pain, sleep duration and obstetric history for female participants. Following blood drawn for predefined clinical tests, additional blood and urine specimens are collected and stored for future analysis. Active follow up of exposure and health outcomes will be carried out every two years via telephone or face to face contact. Data collection started in March 2013 and as of the end of March 2014 has been completed for four states: Kuala Lumpur, Selangor, Melaka and Penang. Approximately 6580 participants have been recruited. The first round of data collection and blood sampling is expected to be completed by the end of 2014 with an expected 10,000 participants recruited. Our study will provide a good basis for exploring the clustering of

  15. Subtyping demoralization in the medically ill by cluster analysis

    Directory of Open Access Journals (Sweden)

    Chiara Rafanelli

    2013-03-01

    Full Text Available Background and Objectives: There is increasing interest in the issue of demoralization, particularly in the setting of medical disease. The aim of this investigation was to use both DSM-IV comorbidity and the Diagnostic Criteria for Psychosomatic Research (DCPR in order to characterize demoralization in the medically ill. Methods: 1700 patients were recruited from 8 medical centers in the Italian Health System and 1560 agreed to participate. They all underwent a cross-sectional assessment with DSM-IV and DCPR structured interviews. 373 patients (23.9% received a diagnosis of demoralization. Data were submitted to cluster analysis. Results: Four clusters were identified: demoralization and comorbid depression; demoralization and comorbid somatoform/adjustment disorders; demoralization and comorbid anxiety; demoralization without any comorbid DSM disorder. The first cluster included 27.6% of the total sample and was characterized by the presence of DSM-IV mood disorders (mainly major depressive disorder. The second cluster had 18.2% of the cases and contained both DSM-IV somatoform (particularly, undifferentiated somatoform disorder and hypochondriasis and adjustment disorders. In the third cluster (24.7%, DSM-IV anxiety disorders in comorbidity with demoralization were predominant (particularly, generalized anxiety disorder, agoraphobia, panic disorder and obsessive-compulsive disorder. The fourth cluster had 29.5% of the patients and was characterized by the absence of any DSM-IV comorbid disorder. Conclusions: The findings indicate the need of expanding clinical assessment in the medically ill to include the various manifestations of demoralization as encompassed by the DCPR. Subtyping demoralization may yield improved targets for psychosomatic research and treatment trials.

  16. Bayesian Analysis of Multiple Populations in Galactic Globular Clusters

    Science.gov (United States)

    Wagner-Kaiser, Rachel A.; Sarajedini, Ata; von Hippel, Ted; Stenning, David; Piotto, Giampaolo; Milone, Antonino; van Dyk, David A.; Robinson, Elliot; Stein, Nathan

    2016-01-01

    We use GO 13297 Cycle 21 Hubble Space Telescope (HST) observations and archival GO 10775 Cycle 14 HST ACS Treasury observations of Galactic Globular Clusters to find and characterize multiple stellar populations. Determining how globular clusters are able to create and retain enriched material to produce several generations of stars is key to understanding how these objects formed and how they have affected the structural, kinematic, and chemical evolution of the Milky Way. We employ a sophisticated Bayesian technique with an adaptive MCMC algorithm to simultaneously fit the age, distance, absorption, and metallicity for each cluster. At the same time, we also fit unique helium values to two distinct populations of the cluster and determine the relative proportions of those populations. Our unique numerical approach allows objective and precise analysis of these complicated clusters, providing posterior distribution functions for each parameter of interest. We use these results to gain a better understanding of multiple populations in these clusters and their role in the history of the Milky Way.Support for this work was provided by NASA through grant numbers HST-GO-10775 and HST-GO-13297 from the Space Telescope Science Institute, which is operated by AURA, Inc., under NASA contract NAS5-26555. This material is based upon work supported by the National Aeronautics and Space Administration under Grant NNX11AF34G issued through the Office of Space Science. This project was supported by the National Aeronautics & Space Administration through the University of Central Florida's NASA Florida Space Grant Consortium.

  17. Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches.

    Science.gov (United States)

    Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C

    2014-01-01

    Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  18. Applications of Cluster Analysis to the Creation of Perfectionism Profiles: A Comparison of two Clustering Approaches

    Directory of Open Access Journals (Sweden)

    Jocelyn H Bolin

    2014-04-01

    Full Text Available Although traditional clustering methods (e.g., K-means have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  19. Detection of Functional Change Using Cluster Trend Analysis in Glaucoma

    Science.gov (United States)

    Gardiner, Stuart K.; Mansberger, Steven L.; Demirel, Shaban

    2017-01-01

    Purpose Global analyses using mean deviation (MD) assess visual field progression, but can miss localized changes. Pointwise analyses are more sensitive to localized progression, but more variable so require confirmation. This study assessed whether cluster trend analysis, averaging information across subsets of locations, could improve progression detection. Methods A total of 133 test–retest eyes were tested 7 to 10 times. Rates of change and P values were calculated for possible re-orderings of these series to generate global analysis (“MD worsening faster than x dB/y with P trend analysis detects subsequently confirmed deterioration sooner than either global or pointwise analyses. PMID:28715580

  20. Segment clustering methodology for unsupervised Holter recordings analysis

    Science.gov (United States)

    Rodríguez-Sotelo, Jose Luis; Peluffo-Ordoñez, Diego; Castellanos Dominguez, German

    2015-01-01

    Cardiac arrhythmia analysis on Holter recordings is an important issue in clinical settings, however such issue implicitly involves attending other problems related to the large amount of unlabelled data which means a high computational cost. In this work an unsupervised methodology based in a segment framework is presented, which consists of dividing the raw data into a balanced number of segments in order to identify fiducial points, characterize and cluster the heartbeats in each segment separately. The resulting clusters are merged or split according to an assumed criterion of homogeneity. This framework compensates the high computational cost employed in Holter analysis, being possible its implementation for further real time applications. The performance of the method is measure over the records from the MIT/BIH arrhythmia database and achieves high values of sensibility and specificity, taking advantage of database labels, for a broad kind of heartbeats types recommended by the AAMI.

  1. Data Preprocessing in Cluster Analysis of Gene Expression

    Institute of Scientific and Technical Information of China (English)

    杨春梅; 万柏坤; 高晓峰

    2003-01-01

    Considering that the DNA microarray technology has generated explosive gene expression data and that it is urgent to analyse and to visualize such massive datasets with efficient methods, we investigate the data preprocessing methods used in cluster analysis, normalization or logarithm of the matrix, by using hierarchical clustering, principal component analysis (PCA) and self-organizing maps (SOMs). The results illustrate that when using the Euclidean distance as measuring metrics, logarithm of relative expression level is the best preprocessing method, while data preprocessed by normalization cannot attain the expected results because the data structure is ruined. If there are only a few principal components, the PCA is an effective method to extract the frame structure, while SOMs are more suitable for a specific structure.

  2. Clustering of four major lifestyle risk factors among Korean adults with metabolic syndrome

    Science.gov (United States)

    Ha, Shin; Choi, Hui Ran

    2017-01-01

    The purpose of this study was to investigate the clustering pattern of four major lifestyle risk factors—smoking, heavy drinking, poor diet, and physical inactivity—among people with metabolic syndrome in South Korea. There were 2,469 adults with metabolic syndrome aged 30 years or older available with the 5th Korean National Health and Nutrition Examination Survey dataset. We calculated the ratio of the observed to expected (O/E) prevalence for the 16 different combinations and the prevalence odds ratios (POR) of four lifestyle risk factors. The four lifestyle risk factors tended to cluster in specific multiple combinations. Smoking and heavy drinking was clustered (POR: 1.86 for male, 4.46 for female), heavy drinking and poor diet were clustered (POR: 1.38 for male, 1.74 for female), and smoking and physical inactivity were also clustered (POR: 1.48 for male). Those who were male, younger, low-educated and living alone were much more likely to have a higher number of lifestyle risk factors. Some helpful implications can be drawn from the knowledge on clustering pattern of lifestyle risk factors for more effective intervention program targeting metabolic syndrome. PMID:28350828

  3. An Interpretation of the Boshier-Collins Cluster Analysis Testing Houle's Typology.

    Science.gov (United States)

    Furst, Edward J.

    1986-01-01

    This article speculates on an underlying order obscured by the details of the Boshier-Collins cluster analysis and the mapping of Houle's types onto it. A table illustrates an interpretation of cluster analysis on Boshier's Education Participation Scale. (CT)

  4. Sensory over responsivity and obsessive compulsive symptoms: A cluster analysis.

    Science.gov (United States)

    Ben-Sasson, Ayelet; Podoly, Tamar Yonit

    2017-02-01

    Several studies have examined the sensory component in Obsesseive Compulsive Disorder (OCD) and described an OCD subtype which has a unique profile, and that Sensory Phenomena (SP) is a significant component of this subtype. SP has some commonalities with Sensory Over Responsivity (SOR) and might be in part a characteristic of this subtype. Although there are some studies that have examined SOR and its relation to Obsessive Compulsive Symptoms (OCS), literature lacks sufficient data on this interplay. First to further examine the correlations between OCS and SOR, and to explore the correlations between SOR modalities (i.e. smell, touch, etc.) and OCS subscales (i.e. washing, ordering, etc.). Second, to investigate the cluster analysis of SOR and OCS dimensions in adults, that is, to classify the sample using the sensory scores to find whether a sensory OCD subtype can be specified. Our third goal was to explore the psychometric features of a new sensory questionnaire: the Sensory Perception Quotient (SPQ). A sample of non clinical adults (n=350) was recruited via e-mail, social media and social networks. Participants completed questionnaires for measuring SOR, OCS, and anxiety. SOR and OCI-F scores were moderately significantly correlated (n=274), significant correlations between all SOR modalities and OCS subscales were found with no specific higher correlation between one modality to one OCS subscale. Cluster analysis revealed four distinct clusters: (1) No OC and SOR symptoms (NONE; n=100), (2) High OC and SOR symptoms (BOTH; n=28), (3) Moderate OC symptoms (OCS; n=63), (4) Moderate SOR symptoms (SOR; n=83). The BOTH cluster had significantly higher anxiety levels than the other clusters, and shared OC subscales scores with the OCS cluster. The BOTH cluster also reported higher SOR scores across tactile, vision, taste and olfactory modalities. The SPQ was found reliable and suitable to detect SOR, the sample SPQ scores was normally distributed (n=350). SOR is a

  5. Profiles of exercise motivation, physical activity, exercise habit, and academic performance in Malaysian adolescents: A cluster analysis

    Directory of Open Access Journals (Sweden)

    Hairul Anuar Hashim

    2011-06-01

    Full Text Available Objectives: This study examined Malaysian adolescents’ profiles of exercise motivation, exercise habit strength, academic performance, and levels of physical activity (PA using cluster analysis.Methods: The sample (n = 300 consisted of 65.6% males and 34.4% females with a mean age of 13.40 ± 0.49. Statistical analysis was performed using cluster analysis.Results: Cluster analysis revealed three distinct cluster groups. Cluster 1 is characterized by a moderate level of PA, relatively high in motivational indices and relative autonomy index (RAI, low in exercise habit, and moderate level of academic achievement. Cluster 2 has superior academic performance but is low in PA and all other measured variables. Cluster 3 is characterized by high levels of PA and all other variables but is lowest in academic performance. One way ANOVA revealed significant differences between cluster groups in total weekly MET, total minutes of weekly PA, academic performance, introjected regulation, and identified regulation.Conclusion: PA promotion with emphasis on external factors may be effective in instilling exercise habituation among adolescents in the present sample.

  6. Coupled Two-Way Clustering Analysis of Gene Microarray Data

    CERN Document Server

    Getz, G; Domany, E

    2000-01-01

    We present a novel coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task: we present an algorithm, based on iterative clustering, which performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.

  7. Coupled two-way clustering analysis of gene microarray data

    Science.gov (United States)

    Getz, Gad; Levine, Erel; Domany, Eytan

    2000-10-01

    We present a coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task. We present an algorithm, based on iterative clustering, that performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.

  8. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis

    Science.gov (United States)

    Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

    2015-01-01

    Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383

  9. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis.

    Directory of Open Access Journals (Sweden)

    Nan Lin

    Full Text Available Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis.

  10. Analysis of genomic signatures in prokaryotes using multinomial regression and hierarchical clustering

    DEFF Research Database (Denmark)

    Ussery, David; Bohlin, Jon; Skjerve, Eystein

    2009-01-01

    Recently there has been an explosion in the availability of bacterial genomic sequences, making possible now an analysis of genomic signatures across more than 800 hundred different bacterial chromosomes, from a wide variety of environments. Using genomic signatures, we pair-wise compared 867...... different genomic DNA sequences, taken from chromosomes and plasmids more than 100,000 base-pairs in length. Hierarchical clustering was performed on the outcome of the comparisons before a multinomial regression model was fitted. The regression model included the cluster groups as the response variable...... AT content. Small improvements to the regression model, although significant, were also obtained by factors such as sequence size, habitat, growth temperature, selective pressure measured as oligonucleotide usage variance, and oxygen requirement.The statistics obtained using hierarchical clustering...

  11. Competitiveness Analysis of Processing Industry Cluster of Livestock Products in Inner Mongolia Based on "Diamond Model"

    Institute of Scientific and Technical Information of China (English)

    YANG Xing-long; REN Ya-tong

    2012-01-01

    Using Michael Porter’s "diamond model", based on regional development characteristics, we conduct analysis of the competitiveness of processing industry cluster of livestock products in Inner Mongolia from six aspects (the factor conditions, demand conditions, corporate strategy, structure and competition, related and supporting industries, government and opportunities). And we put forward the following rational recommendations for improving the competitiveness of processing industry cluster of livestock products in Inner Mongolia: (i) The government should increase capital input, focus on supporting processing industry of livestock products, and give play to the guidance and aggregation effect of financial funds; (ii) In terms of enterprises, it is necessary to vigorously develop leading enterprises, to give full play to the cluster effect of the leading enterprises.

  12. Functional Analysis of the Fusarielin Biosynthetic Gene Cluster

    Directory of Open Access Journals (Sweden)

    Aida Droce

    2016-12-01

    Full Text Available Fusarielins are polyketides with a decalin core produced by various species of Aspergillus and Fusarium. Although the responsible gene cluster has been identified, the biosynthetic pathway remains to be elucidated. In the present study, members of the gene cluster were deleted individually in a Fusarium graminearum strain overexpressing the local transcription factor. The results suggest that a trans-acting enoyl reductase (FSL5 assists the polyketide synthase FSL1 in biosynthesis of a polyketide product, which is released by hydrolysis by a trans-acting thioesterase (FSL2. Deletion of the epimerase (FSL3 resulted in accumulation of an unstable compound, which could be the released product. A novel compound, named prefusarielin, accumulated in the deletion mutant of the cytochrome P450 monooxygenase FSL4. Unlike the known fusarielins from Fusarium, this compound does not contain oxygenized decalin rings, suggesting that FSL4 is responsible for the oxygenation.

  13. Life history factors, personality and the social clustering of sexual experience in adolescents

    Science.gov (United States)

    2016-01-01

    Adolescent sexual behaviour may show clustering in neighbourhoods, schools and friendship networks. This study aims to assess how experience with sexual intercourse clusters across the social world of adolescents and whether predictors implicated by life history theory or personality traits can account for its between-individual variation and social patterning. Using data on 2877 adolescents from the Avon Longitudinal Study of Parents and Children, we ran logistic multiple classification models to assess the clustering of sexual experience by approximately 17.5 years in schools, neighbourhoods and friendship networks. We examined how much clustering at particular levels could be accounted for by life history predictors and Big Five personality factors. Sexual experience exhibited substantial clustering in friendship networks, while clustering at the level of schools and neighbourhoods was minimal, suggesting a limited role for socio-ecological influences at those levels. While life history predictors did account for some variation in sexual experience, they did not explain clustering in friendship networks. Personality, especially extraversion, explained about a quarter of friends' similarity. After accounting for life history factors and personality, substantial unexplained similarity among friends remained, which may reflect a tendency to associate with similar individuals or the social transmission of behavioural norms. PMID:27853543

  14. Fuzzy Clustering

    DEFF Research Database (Denmark)

    Berks, G.; Keyserlingk, Diedrich Graf von; Jantzen, Jan

    2000-01-01

    A symptom is a condition indicating the presence of a disease, especially, when regarded as an aid in diagnosis.Symptoms are the smallest units indicating the existence of a disease. A syndrome on the other hand is an aggregate, set or cluster of concurrent symptoms which together indicate...... and clustering are the basic concerns in medicine. Classification depends on definitions of the classes and their required degree of participant of the elements in the cases' symptoms. In medicine imprecise conditions are the rule and therefore fuzzy methods are much more suitable than crisp ones. Fuzzy c......-mean clustering is an easy and well improved tool, which has been applied in many medical fields. We used c-mean fuzzy clustering after feature extraction from an aphasia database. Factor analysis was applied on a correlation matrix of 26 symptoms of language disorders and led to five factors. The factors...

  15. Posterior AD-Type Pathology: Cognitive Subtypes Emerging from a Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Antonella Cappa

    2014-01-01

    Full Text Available Background. “Posterior shift” of the neuropathological changes of Alzheimer's disease (AD produces a syndrome (posterior cortical atrophy (PCA dominated by high-level visual deficits. Objective. To explore in patients with AD-type pathology whether a data-driven analysis (cluster analysis based on neuropsychological findings resulted in the emergence of different subgroups of patients; in particular to find out whether it was possible to identify patients with visuospatial deficits consistent with the hypothesis that PCA is a “dorsal stream” syndrome or, rather, whether there were subgroups of patients with different types of impairment within the high-level visual domain. Methods. 23 PCA and 16 DAT patients were studied. By a principal component analysis performed on a wide range of neuropsychological tasks, 15 variables were obtained that loaded onto five main factors (memory, language, perceptual, visuospatial, and calculation which entered a hierarchical cluster analysis. Results. Four clusters of cognitive impairment emerged: visuospatial/perceptual, memory, perceptual/calculation, and language. Only in the first cluster a visuospatial deficit clearly emerged. Conclusions. AD pathology produces not only variants dominated by memory (DAT and, to a lesser extent, visuospatial deficit (PCA, but also other distinct syndromic subtypes with disorders in visual perception and language which reflect a different vulnerability of specific functional networks.

  16. Toward an Empirical Taxonomy of Suicide Ideation: A Cluster Analysis of the Youth Risk Behavior Survey

    Science.gov (United States)

    Flannery, William Peter; Sneed, Carl D.; Marsh, Penny

    2003-01-01

    In this study we examined adolescent risk behaviors, giving special attention to suicide ideation. Cluster analysis was used to classify adolescents ( N = 2,730) on the Youth Risk Behavior Survey. Six clusters of adolescent risk behavior were identified. Although each risk cluster was distinct, some clusters shared overlapping risk behaviors.…

  17. A Novel Double Cluster and Principal Component Analysis-Based Optimization Method for the Orbit Design of Earth Observation Satellites

    Directory of Open Access Journals (Sweden)

    Yunfeng Dong

    2017-01-01

    Full Text Available The weighted sum and genetic algorithm-based hybrid method (WSGA-based HM, which has been applied to multiobjective orbit optimizations, is negatively influenced by human factors through the artificial choice of the weight coefficients in weighted sum method and the slow convergence of GA. To address these two problems, a cluster and principal component analysis-based optimization method (CPC-based OM is proposed, in which many candidate orbits are gradually randomly generated until the optimal orbit is obtained using a data mining method, that is, cluster analysis based on principal components. Then, the second cluster analysis of the orbital elements is introduced into CPC-based OM to improve the convergence, developing a novel double cluster and principal component analysis-based optimization method (DCPC-based OM. In DCPC-based OM, the cluster analysis based on principal components has the advantage of reducing the human influences, and the cluster analysis based on six orbital elements can reduce the search space to effectively accelerate convergence. The test results from a multiobjective numerical benchmark function and the orbit design results of an Earth observation satellite show that DCPC-based OM converges more efficiently than WSGA-based HM. And DCPC-based OM, to some degree, reduces the influence of human factors presented in WSGA-based HM.

  18. A spatial cluster analysis of tractor overturns in Kentucky from 1960 to 2002

    Science.gov (United States)

    Saman, D.M.; Cole, H.P.; Odoi, A.; Myers, M.L.; Carey, D.I.; Westneat, S.C.

    2012-01-01

    Background: Agricultural tractor overturns without rollover protective structures are the leading cause of farm fatalities in the United States. To our knowledge, no studies have incorporated the spatial scan statistic in identifying high-risk areas for tractor overturns. The aim of this study was to determine whether tractor overturns cluster in certain parts of Kentucky and identify factors associated with tractor overturns. Methods: A spatial statistical analysis using Kulldorff's spatial scan statistic was performed to identify county clusters at greatest risk for tractor overturns. A regression analysis was then performed to identify factors associated with tractor overturns. Results: The spatial analysis revealed a cluster of higher than expected tractor overturns in four counties in northern Kentucky (RR = 2.55) and 10 counties in eastern Kentucky (RR = 1.97). Higher rates of tractor overturns were associated with steeper average percent slope of pasture land by county (p = 0.0002) and a greater percent of total tractors with less than 40 horsepower by county (ptractor overturns exist in Kentucky and identifies factors associated with overturns. This study provides policymakers a guide to targeted county-level interventions (e.g., roll-over protective structures promotion interventions) with the intention of reducing tractor overturns in the highest risk counties in Kentucky. ?? 2012 Saman et al.

  19. Monitoring Customer Satisfaction in Service Industry: A Cluster Analysis Approach

    Directory of Open Access Journals (Sweden)

    Matúš Horváth

    2012-10-01

    Full Text Available One of the key performance indicators of quality management system of an organization is customer satisfaction. The process of monitoring customer satisfaction is therefore an important part of the measuring processes of the quality management system. This paper deals with new ways how to analyse and monitor customer satisfaction using the analysis of data containing how the customers use the organisation services and customer leaving rates. The article used cluster analysis in this process for segmentation of customers with the aim to increase the accuracy of the results and on these results based decisions. The aplication example was created as a part of bachelor thesis.

  20. Monitoring Customer Satisfaction in Service Industry: A Cluster Analysis Approach

    Directory of Open Access Journals (Sweden)

    Matúš Horváth

    2012-11-01

    Full Text Available One of the key performance indicators of quality management system of an organization is customer satisfaction. The process of monitoring customer satisfaction is therefore an important part of the measuring processes of the quality management system. This paper deals with new ways how to analyse and monitor customer satisfaction using the analysis of data containing how the customers use the organisation services and customer leaving rates. The article used cluster analysis in this process for segmentation of customers with the aim to increase the accuracy of the results and on these results based decisions. The aplication example was created as a part of bachelor thesis.

  1. Using cluster analysis in measuring social domain of territorial brand

    Directory of Open Access Journals (Sweden)

    Zlata Stepanova

    2009-10-01

    Full Text Available Territorial brand has a social dimension reflected in the social equilibrium and measurable with social effectiveness indicators. The paper offers social effectiveness analysis of territory using investigation object “territorial and social systems (TSS” with their further classification according to social types based on cluster analysis. This method allows the authors to distinct four social types of TSS in Sverdlovsk region in accordance with such characteristics as financial activity, quality of life, social stability and ill-being levels. The results of investigation could be useful for brand policy of territorial authorities.

  2. First course in factor analysis

    CERN Document Server

    Comrey, Andrew L

    2013-01-01

    The goal of this book is to foster a basic understanding of factor analytic techniques so that readers can use them in their own research and critically evaluate their use by other researchers. Both the underlying theory and correct application are emphasized. The theory is presented through the mathematical basis of the most common factor analytic models and several methods used in factor analysis. On the application side, considerable attention is given to the extraction problem, the rotation problem, and the interpretation of factor analytic results. Hence, readers are given a background of

  3. Cluster analysis for DNA methylation profiles having a detection threshold

    Directory of Open Access Journals (Sweden)

    Siegmund Kimberly D

    2006-07-01

    Full Text Available Abstract Background DNA methylation, a molecular feature used to investigate tumor heterogeneity, can be measured on many genomic regions using the MethyLight technology. Due to the combination of the underlying biology of DNA methylation and the MethyLight technology, the measurements, while being generated on a continuous scale, have a large number of 0 values. This suggests that conventional clustering methodology may not perform well on this data. Results We compare performance of existing methodology (such as k-means with two novel methods that explicitly allow for the preponderance of values at 0. We also consider how the ability to successfully cluster such data depends upon the number of informative genes for which methylation is measured and the correlation structure of the methylation values for those genes. We show that when data is collected for a sufficient number of genes, our models do improve clustering performance compared to methods, such as k-means, that do not explicitly respect the supposed biological realities of the situation. Conclusion The performance of analysis methods depends upon how well the assumptions of those methods reflect the properties of the data being analyzed. Differing technologies will lead to data with differing properties, and should therefore be analyzed differently. Consequently, it is prudent to give thought to what the properties of the data are likely to be, and which analysis method might therefore be likely to best capture those properties.

  4. Micro-scale Spatial Clustering of Cholera Risk Factors in Urban Bangladesh.

    Science.gov (United States)

    Bi, Qifang; Azman, Andrew S; Satter, Syed Moinuddin; Khan, Azharul Islam; Ahmed, Dilruba; Riaj, Altaf Ahmed; Gurley, Emily S; Lessler, Justin

    2016-02-01

    Close interpersonal contact likely drives spatial clustering of cases of cholera and diarrhea, but spatial clustering of risk factors may also drive this pattern. Few studies have focused specifically on how exposures for disease cluster at small spatial scales. Improving our understanding of the micro-scale clustering of risk factors for cholera may help to target interventions and power studies with cluster designs. We selected sets of spatially matched households (matched-sets) near cholera case households between April and October 2013 in a cholera endemic urban neighborhood of Tongi Township in Bangladesh. We collected data on exposures to suspected cholera risk factors at the household and individual level. We used intra-class correlation coefficients (ICCs) to characterize clustering of exposures within matched-sets and households, and assessed if clustering depended on the geographical extent of the matched-sets. Clustering over larger spatial scales was explored by assessing the relationship between matched-sets. We also explored whether different exposures tended to appear together in individuals, households, and matched-sets. Household level exposures, including: drinking municipal supplied water (ICC = 0.97, 95%CI = 0.96, 0.98), type of latrine (ICC = 0.88, 95%CI = 0.71, 1.00), and intermittent access to drinking water (ICC = 0.96, 95%CI = 0.87, 1.00) exhibited strong clustering within matched-sets. As the geographic extent of matched-sets increased, the concordance of exposures within matched-sets decreased. Concordance between matched-sets of exposures related to water supply was elevated at distances of up to approximately 400 meters. Household level hygiene practices were correlated with infrastructure shown to increase cholera risk. Co-occurrence of different individual level exposures appeared to mostly reflect the differing domestic roles of study participants. Strong spatial clustering of exposures at a small spatial scale in a cholera endemic

  5. Micro-scale Spatial Clustering of Cholera Risk Factors in Urban Bangladesh.

    Directory of Open Access Journals (Sweden)

    Qifang Bi

    2016-02-01

    Full Text Available Close interpersonal contact likely drives spatial clustering of cases of cholera and diarrhea, but spatial clustering of risk factors may also drive this pattern. Few studies have focused specifically on how exposures for disease cluster at small spatial scales. Improving our understanding of the micro-scale clustering of risk factors for cholera may help to target interventions and power studies with cluster designs. We selected sets of spatially matched households (matched-sets near cholera case households between April and October 2013 in a cholera endemic urban neighborhood of Tongi Township in Bangladesh. We collected data on exposures to suspected cholera risk factors at the household and individual level. We used intra-class correlation coefficients (ICCs to characterize clustering of exposures within matched-sets and households, and assessed if clustering depended on the geographical extent of the matched-sets. Clustering over larger spatial scales was explored by assessing the relationship between matched-sets. We also explored whether different exposures tended to appear together in individuals, households, and matched-sets. Household level exposures, including: drinking municipal supplied water (ICC = 0.97, 95%CI = 0.96, 0.98, type of latrine (ICC = 0.88, 95%CI = 0.71, 1.00, and intermittent access to drinking water (ICC = 0.96, 95%CI = 0.87, 1.00 exhibited strong clustering within matched-sets. As the geographic extent of matched-sets increased, the concordance of exposures within matched-sets decreased. Concordance between matched-sets of exposures related to water supply was elevated at distances of up to approximately 400 meters. Household level hygiene practices were correlated with infrastructure shown to increase cholera risk. Co-occurrence of different individual level exposures appeared to mostly reflect the differing domestic roles of study participants. Strong spatial clustering of exposures at a small spatial scale in a

  6. Micro-scale Spatial Clustering of Cholera Risk Factors in Urban Bangladesh

    Science.gov (United States)

    Bi, Qifang; Azman, Andrew S.; Satter, Syed Moinuddin; Khan, Azharul Islam; Ahmed, Dilruba; Riaj, Altaf Ahmed; Gurley, Emily S.; Lessler, Justin

    2016-01-01

    Close interpersonal contact likely drives spatial clustering of cases of cholera and diarrhea, but spatial clustering of risk factors may also drive this pattern. Few studies have focused specifically on how exposures for disease cluster at small spatial scales. Improving our understanding of the micro-scale clustering of risk factors for cholera may help to target interventions and power studies with cluster designs. We selected sets of spatially matched households (matched-sets) near cholera case households between April and October 2013 in a cholera endemic urban neighborhood of Tongi Township in Bangladesh. We collected data on exposures to suspected cholera risk factors at the household and individual level. We used intra-class correlation coefficients (ICCs) to characterize clustering of exposures within matched-sets and households, and assessed if clustering depended on the geographical extent of the matched-sets. Clustering over larger spatial scales was explored by assessing the relationship between matched-sets. We also explored whether different exposures tended to appear together in individuals, households, and matched-sets. Household level exposures, including: drinking municipal supplied water (ICC = 0.97, 95%CI = 0.96, 0.98), type of latrine (ICC = 0.88, 95%CI = 0.71, 1.00), and intermittent access to drinking water (ICC = 0.96, 95%CI = 0.87, 1.00) exhibited strong clustering within matched-sets. As the geographic extent of matched-sets increased, the concordance of exposures within matched-sets decreased. Concordance between matched-sets of exposures related to water supply was elevated at distances of up to approximately 400 meters. Household level hygiene practices were correlated with infrastructure shown to increase cholera risk. Co-occurrence of different individual level exposures appeared to mostly reflect the differing domestic roles of study participants. Strong spatial clustering of exposures at a small spatial scale in a cholera endemic

  7. Weighing the Giants I: Weak Lensing Masses for 51 Massive Galaxy Clusters - Project Overview, Data Analysis Methods, and Cluster Images

    CERN Document Server

    von der Linden, Anja; Applegate, Douglas E; Kelly, Patrick L; Allen, Steven W; Ebeling, Harald; Burchat, Patricia R; Burke, David L; Donovan, David; Morris, R Glenn; Blandford, Roger; Erben, Thomas; Mantz, Adam

    2012-01-01

    This is the first in a series of papers in which we measure accurate weak-lensing masses for 51 of the most X-ray luminous galaxy clusters known at redshifts 0.15cluster experiments. The primary aim is to improve the absolute mass calibration of cluster observables, currently the dominant systematic uncertainty for cluster count experiments. Key elements of this work are the rigorous quantification of systematic uncertainties, high-quality data reduction and photometric calibration, and the "blind" nature of the analysis to avoid confirmation bias. Our target clusters are drawn from RASS X-ray catalogs, and provide a versatile calibration sample for many aspects of cluster cosmology. We have acquired wide-field, high-quality imaging using the Subaru and CFHT telescopes for all 51 clusters, in at least three bands per cluster. For a subset of 27 clusters, we have data in at least five bands, allowing accurate photo-z estimates of...

  8. Genetics of Cd36 and the clustering of multiple cardiovascular risk factors in spontaneous hypertension.

    Science.gov (United States)

    Pravenec, M; Zidek, V; Simakova, M; Kren, V; Krenova, D; Horky, K; Jachymova, M; Mikova, B; Kazdova, L; Aitman, T J; Churchill, P C; Webb, R C; Hingarh, N H; Yang, Y; Wang, J M; Lezin, E M; Kurtz, T W

    1999-06-01

    Disorders of carbohydrate and lipid metabolism have been reported to cluster in patients with essential hypertension and in spontaneously hypertensive rats (SHRs). A deletion in the Cd36 gene on chromosome 4 has recently been implicated in defective carbohydrate and lipid metabolism in isolated adipocytes from SHRs. However, the role of Cd36 and chromosome 4 in the control of blood pressure and systemic cardiovascular risk factors in SHRs is unknown. In the SHR. BN-Il6/Npy congenic strain, we have found that transfer of a segment of chromosome 4 (including Cd36) from the Brown Norway (BN) rat onto the SHR background induces reductions in blood pressure and ameliorates dietary-induced glucose intolerance, hyperinsulinemia, and hypertriglyceridemia. These results demonstrate that a single chromosome region can influence a broad spectrum of cardiovascular risk factors involved in the hypertension metabolic syndrome. However, analysis of Cd36 genotypes in the SHR and stroke-prone SHR strains indicates that the deletion variant of Cd36 was not critical to the initial selection for hypertension in the SHR model. Thus, the ability of chromosome 4 to influence multiple cardiovascular risk factors, including hypertension, may depend on linkage of Cd36 to other genes trapped within the differential segment of the SHR. BN-Il6/Npy strain.

  9. Cluster Analysis and Fuzzy Query in Ship Maintenance and Design

    Science.gov (United States)

    Che, Jianhua; He, Qinming; Zhao, Yinggang; Qian, Feng; Chen, Qi

    Cluster analysis and fuzzy query win wide-spread applications in modern intelligent information processing. In allusion to the features of ship maintenance data, a variant of hypergraph-based clustering algorithm, i.e., Correlation Coefficient-based Minimal Spanning Tree(CC-MST), is proposed to analyze the bulky data rooting in ship maintenance process, discovery the unknown rules and help ship maintainers make a decision on various device fault causes. At the same time, revising or renewing an existed design of ship or device maybe necessary to eliminate those device faults. For the sake of offering ship designers some valuable hints, a fuzzy query mechanism is designed to retrieve the useful information from large-scale complicated and reluctant ship technical and testing data. Finally, two experiments based on a real ship device fault statistical dataset validate the flexibility and efficiency of the CC-MST algorithm. A fuzzy query prototype demonstrates the usability of our fuzzy query mechanism.

  10. Classification of aquifer vulnerability using K-means cluster analysis

    Science.gov (United States)

    Javadi, S.; Hashemy, S. M.; Mohammadi, K.; Howard, K. W. F.; Neshat, A.

    2017-06-01

    Groundwater is one of the main sources of drinking and agricultural water in arid and semi-arid regions but is becoming increasingly threatened by contamination. Vulnerability mapping has been used for many years as an effective tool for assessing the potential for aquifer pollution and the most common method of intrinsic vulnerability assessment is DRASTIC (Depth to water table, net Recharge, Aquifer media, Soil media, Topography, Impact of vadose zone and hydraulic Conductivity). An underlying problem with the DRASTIC approach relates to the subjectivity involved in selecting relative weightings for each of the DRASTIC factors and assigning rating values to ranges or media types within each factor. In this study, a clustering technique is introduced that removes some of the subjectivity associated with the indexing method. It creates a vulnerability map that does not rely on fixed weights and ratings and, thereby provides a more objective representation of the system's physical characteristics. This methodology was applied to an aquifer in Iran and compared with the standard DRASTIC approach using the water quality parameters nitrate, chloride and total dissolved solids (TDS) as surrogate indicators of aquifer vulnerability. The proposed method required only four of DRASTIC's seven factors - depth to groundwater, hydraulic conductivity, recharge value and the nature of the vadose zone, to produce a superior result. For nitrate, chloride, and TDS, respectively, the clustering approach delivered Pearson correlation coefficients that were 15, 22 and 5 percentage points higher than those obtained for the DRASTIC method.

  11. Analysis of breast cancer progression using principal component analysis and clustering

    Indian Academy of Sciences (India)

    G Alexe; G S Dalgin; S Ganesan; C DeLisi; G Bhanot

    2007-08-01

    We develop a new technique to analyse microarray data which uses a combination of principal components analysis and consensus ensemble -clustering to find robust clusters and gene markers in the data. We apply our method to a public microarray breast cancer dataset which has expression levels of genes in normal samples as well as in three pathological stages of disease; namely, atypical ductal hyperplasia or ADH, ductal carcinoma in situ or DCIS and invasive ductal carcinoma or IDC. Our method averages over clustering techniques and data perturbation to find stable, robust clusters and gene markers. We identify the clusters and their pathways with distinct subtypes of breast cancer (Luminal, Basal and Her2+). We confirm that the cancer phenotype develops early (in early hyperplasia or ADH stage) and find from our analysis that each subtype progresses from ADH to DCIS to IDC along its own specific pathway, as if each was a distinct disease.

  12. 我国省域高技术产业科技活动相对有效性评价——基于DEA模型%Factor Analysis and Cluster Analysis on Technological Innovation Capabilities of SMEs in Yangtze River Delta Region

    Institute of Scientific and Technical Information of China (English)

    戴万亮

    2012-01-01

    利用DEA模型分析了2009年我国31个省(自治区、直辖市)高技术产业科技活动的相对有效性。结果显示:整体来看,我国高技术产业科技活动存在综合效率偏低、技术效率不高、盲目扩大再生产等问题。从提高技术效率、集约生产规模、建立合理的评价体系等方面提出了对策与建议。%This paper briefly introduces the evaluation index system of technological innovation capabilities of SMEs. And then using this evaluation index system and the method of factor analysis and cluster analysis,it quantitatively evaluates and horizontally compares the technological innovation capabilities of SMEs of 16 cities in Yangtze River Delta region,and reveals the spatial distribution and geographical features of tech- nological innovation capabilities of SMEs in Yangtze River Delta region.

  13. Comparison of cluster and principal component analysis techniques to derive dietary patterns in Irish adults.

    Science.gov (United States)

    Hearty, Aine P; Gibney, Michael J

    2009-02-01

    The aims of the present study were to examine and compare dietary patterns in adults using cluster and factor analyses and to examine the format of the dietary variables on the pattern solutions (i.e. expressed as grams/day (g/d) of each food group or as the percentage contribution to total energy intake). Food intake data were derived from the North/South Ireland Food Consumption Survey 1997-9, which was a randomised cross-sectional study of 7 d recorded food and nutrient intakes of a representative sample of 1379 Irish adults aged 18-64 years. Cluster analysis was performed using the k-means algorithm and principal component analysis (PCA) was used to extract dietary factors. Food data were reduced to thirty-three food groups. For cluster analysis, the most suitable format of the food-group variable was found to be the percentage contribution to energy intake, which produced six clusters: 'Traditional Irish'; 'Continental'; 'Unhealthy foods'; 'Light-meal foods & low-fat milk'; 'Healthy foods'; 'Wholemeal bread & desserts'. For PCA, food groups in the format of g/d were found to be the most suitable format, and this revealed four dietary patterns: 'Unhealthy foods & high alcohol'; 'Traditional Irish'; 'Healthy foods'; 'Sweet convenience foods & low alcohol'. In summary, cluster and PCA identified similar dietary patterns when presented with the same dataset. However, the two dietary pattern methods required a different format of the food-group variable, and the most appropriate format of the input variable should be considered in future studies.

  14. Dynamical analysis of NGC 110: cluster of fainter stars or data fluctuation?

    CERN Document Server

    Joshi, Gireesh C

    2016-01-01

    The stellar enhancement of the cluster NGC 110 is investigated in various optical and infrared (IR) bands. The radial density profile of the IR region does not show a stellar enhancement in the central region of the cluster. This stellar deficiency may be occurring by undetected fainter stars due to the contamination effect of massive stars. Since, our analysis is not indicating the stellar enhancement below 16.5 mag of I band, therefore the cluster is assumed to be a group of fainter stars. The proposed magnitude scatter factor would be an excellent tool to understand the characteristic of colour-scattering of stars. The most probable members do not coincide with the model isochronic fitting in the optical bands due to poor data quality of P P MXL catalogue. The different values of the mean proper motions are found for the fainter stars of the cluster and field regions, whereas similar values are obtained for radial zones of the cluster. The symmetrical distribution of fainter stars of the core are found aro...

  15. Analysis of forest fires spatial clustering using local fractal measure

    Science.gov (United States)

    Kanevski, Mikhail; Rochat, Mikael; Timonin, Vadim

    2013-04-01

    The research deals with an application of local fractal measure - local sandbox counting or mass counting, for the characterization of patterns of spatial clustering. The main application concerns the simulated (random patterns within validity domain in forest regions) and real data (forest fires in Ticino, Switzerland) case studies. The global patterns of spatial clustering of forest fires were extensively studied using different topological (nearest-neighbours, Voronoi polygons), statistical (Ripley's k-function, Morisita diagram) and fractal/multifractal measures (box-counting, sandbox counting, lacunarity) (Kanevski, 2008). Generalizations of these measures to functional ones can reveal the structure of the phenomena, e.g. burned areas. All these measures are valuable and complementary tools to study spatial clustering. Moreover, application of the validity domain (complex domain where phenomena is studied) concept helps in understanding and interpretation of the results. In the present paper a sandbox counting method was applied locally, i.e. each point of ignition was considered as a centre of events counting with an increasing search radius. Then, the local relationships between the radius and the number of ignition points within the given radius were examined. Finally, the results are mapped using an interpolation algorithm for the visualization and analytical purposes. Both 2d (X,Y) and 3d (X,Y,Z) cases were studied and compared. Local "fractal" study gives an interesting spatially distributed picture of clustering. The real data case study was compared with a reference homogeneous pattern - complete spatial randomness. The difference between two patterns clearly indicates the regions with important spatial clustering. An extension to the local functional measure was applied taking into account the surface of burned area, i.e. by analysing only data with the fires above some threshold of burned area. Such analysis is similar to marked point processes and

  16. 集群内中小企业信贷融资因素分析——基于常州三个产业集群的实证%Analysis on Factors Affecting Credit Financing by SMEs within Industrial Clusters——Based on Three Industrial Clusters in Changzhou

    Institute of Scientific and Technical Information of China (English)

    马鸿杰; 胡汉辉

    2009-01-01

    The paper firstly makes an empirical analysis on factors affecting credit financing by small and medium enterprises (SME) within an industrial duster. The analysis is based on data collected from three industrial clus-ters in Changzhou, i.e. the textile industry in Hutang, floor production in Henglin and lamp manufacturing in Zouqu. It is found that profitability, mortgage abundance and cluster degree are important factor affecting financing capability of SMEs. Secondly, through making eointegration test between the duster degree and the financing ca-pability of SMEs, the paper focuses on analyzing financing advantages of SMEs from the perspectives of reputation importance, information sharing enterprise cooperation and credit risks. Finally, it concludes that financing prob-lems facing SMEs may be solved by developing industrial clusters of SMEs.%以常州湖塘纺织、横林地板、邹区灯具三大集群为例,首先分析了群内中小企业的信贷融资因素,得出企业的盈利能力、有无充足抵押物、企业所在地产业集群化程度是影响中小企业信贷融资能力的重要因素;其次通过对集群化程度与企业信贷融资能力协整检验后,重点从集群的信誉重要性、信息共享、企业合作、信贷风险等方面分析群内中小企业的融资优势;最后分析得出可以通过中小企业的集群化发展破解中小企业融资难的困境.

  17. [The hierarchical clustering analysis of hyperspectral image based on probabilistic latent semantic analysis].

    Science.gov (United States)

    Yi, Wen-Bin; Shen, Li; Qi, Yin-Feng; Tang, Hong

    2011-09-01

    The paper introduces the Probabilistic Latent Semantic Analysis (PLSA) to the image clustering and an effective image clustering algorithm using the semantic information from PLSA is proposed which is used for hyperspectral images. Firstly, the ISODATA algorithm is used to obtain the initial clustering result of hyperspectral image and the clusters of the initial clustering result are considered as the visual words of the PLSA. Secondly, the object-oriented image segmentation algorithm is used to partition the hyperspectral image and segments with relatively pure pixels are regarded as documents in PLSA. Thirdly, a variety of identification methods which can estimate the best number of cluster centers is combined to get the number of latent semantic topics. Then the conditional distributions of visual words in topics and the mixtures of topics in different documents are estimated by using PLSA. Finally, the conditional probabilistic of latent semantic topics are distinguished using statistical pattern recognition method, the topic type for each visual in each document will be given and the clustering result of hyperspectral image are then achieved. Experimental results show the clusters of the proposed algorithm are better than K-MEANS and ISODATA in terms of object-oriented property and the clustering result is closer to the distribution of real spatial distribution of surface.

  18. Investigating nurses' knowledge, attitudes, and skills patterns towards clinical management system: results of a cluster analysis.

    Science.gov (United States)

    Chan, M F

    2006-09-01

    To determine whether definable subtypes exist within a cohort of Hong Kong nurses as related to the clinical management system use in their clinical practices based on their knowledge, attitudes, skills, and background factors. Data were collected using a structured questionnaire. The sample of 242 registered nurses was recruited from three hospitals in Hong Kong. The study employs personal and demographic variables, knowledge, attitudes, and skills scale. A cluster analysis yielded two clusters. Each cluster represents a different profile of Hong Kong nurses on the clinical management system use in their clinical practices. The first group (Cluster 1) was labeled 'lower attitudes, less skilful and average knowledge' group, and represented 55.4% of the total respondents. The second group (Cluster 2) was labeled as 'positive attitudes, good knowledge but less skilful'. They comprised almost 44.6% of this nursing sample. Cluster 2 had more older nurses, the majority were educated to the baccalaureate or above level, with more than 10 years working experience, and they held a more senior ranking then Cluster 1. A clear profile of Hong Kong nurses may benefit healthcare professionals in making appropriate education or assistance to prompt the use of the clinical management system by nurses an officially recognized profession. The findings were useful in determining nurse-users' specific needs and their preferences for modification of the clinical management system. Such findings should be used to formulate strategies to encourage nurses to resolve actual problems following computer training and to increase the depth and breadth of nurses' knowledge, attitudes, and skills toward such system.

  19. Covariance analysis of differential drag-based satellite cluster flight

    Science.gov (United States)

    Ben-Yaacov, Ohad; Ivantsov, Anatoly; Gurfil, Pini

    2016-06-01

    One possibility for satellite cluster flight is to control relative distances using differential drag. The idea is to increase or decrease the drag acceleration on each satellite by changing its attitude, and use the resulting small differential acceleration as a controller. The most significant advantage of the differential drag concept is that it enables cluster flight without consuming fuel. However, any drag-based control algorithm must cope with significant aerodynamical and mechanical uncertainties. The goal of the current paper is to develop a method for examination of the differential drag-based cluster flight performance in the presence of noise and uncertainties. In particular, the differential drag control law is examined under measurement noise, drag uncertainties, and initial condition-related uncertainties. The method used for uncertainty quantification is the Linear Covariance Analysis, which enables us to propagate the augmented state and filter covariance without propagating the state itself. Validation using a Monte-Carlo simulation is provided. The results show that all uncertainties have relatively small effect on the inter-satellite distance, even in the long term, which validates the robustness of the used differential drag controller.

  20. Clustered Numerical Data Analysis Using Markov Lie Monoid Based Networks

    Science.gov (United States)

    Johnson, Joseph

    2016-03-01

    We have designed and build an optimal numerical standardization algorithm that links numerical values with their associated units, error level, and defining metadata thus supporting automated data exchange and new levels of artificial intelligence (AI). The software manages all dimensional and error analysis and computational tracing. Tables of entities verses properties of these generalized numbers (called ``metanumbers'') support a transformation of each table into a network among the entities and another network among their properties where the network connection matrix is based upon a proximity metric between the two items. We previously proved that every network is isomorphic to the Lie algebra that generates continuous Markov transformations. We have also shown that the eigenvectors of these Markov matrices provide an agnostic clustering of the underlying patterns. We will present this methodology and show how our new work on conversion of scientific numerical data through this process can reveal underlying information clusters ordered by the eigenvalues. We will also show how the linking of clusters from different tables can be used to form a ``supernet'' of all numerical information supporting new initiatives in AI.

  1. Dynamical analysis of galaxy cluster merger Abell 2146

    CERN Document Server

    White, J A; King, L J; Lee, B E; Russell, H R; Baum, S A; Clowe, D I; Coleman, J E; Donahue, M; Edge, A C; Fabian, A C; Johnstone, R M; McNamara, B R; ODea, C P; Sanders, J S

    2015-01-01

    We present a dynamical analysis of the merging galaxy cluster system Abell 2146 using spectroscopy obtained with the Gemini Multi-Object Spectrograph on the Gemini North telescope. As revealed by the Chandra X-ray Observatory, the system is undergoing a major merger and has a gas structure indicative of a recent first core passage. The system presents two large shock fronts, making it unique amongst these rare systems. The hot gas structure indicates that the merger axis must be close to the plane of the sky and that the two merging clusters are relatively close in mass, from the observation of two shock fronts. Using 63 spectroscopically determined cluster members, we apply various statistical tests to establish the presence of two distinct massive structures. With the caveat that the system has recently undergone a major merger, the virial mass estimate is M_vir = 8.5 +4.3 -4.7 x 10 ^14 M_sol for the whole system, consistent with the mass determination in a previous study using the Sunyaev-Zeldovich signal....

  2. Case-control geographic clustering for residential histories accounting for risk factors and covariates

    Directory of Open Access Journals (Sweden)

    Goovaerts Pierre

    2006-08-01

    Full Text Available Abstract Background Methods for analyzing space-time variation in risk in case-control studies typically ignore residential mobility. We develop an approach for analyzing case-control data for mobile individuals and apply it to study bladder cancer in 11 counties in southeastern Michigan. At this time data collection is incomplete and no inferences should be drawn – we analyze these data to demonstrate the novel methods. Global, local and focused clustering of residential histories for 219 cases and 437 controls is quantified using time-dependent nearest neighbor relationships. Business address histories for 268 industries that release known or suspected bladder cancer carcinogens are analyzed. A logistic model accounting for smoking, gender, age, race and education specifies the probability of being a case, and is incorporated into the cluster randomization procedures. Sensitivity of clustering to definition of the proximity metric is assessed for 1 to 75 k nearest neighbors. Results Global clustering is partly explained by the covariates but remains statistically significant at 12 of the 14 levels of k considered. After accounting for the covariates 26 Local clusters are found in Lapeer, Ingham, Oakland and Jackson counties, with the clusters in Ingham and Oakland counties appearing in 1950 and persisting to the present. Statistically significant focused clusters are found about the business address histories of 22 industries located in Oakland (19 clusters, Ingham (2 and Jackson (1 counties. Clusters in central and southeastern Oakland County appear in the 1930's and persist to the present day. Conclusion These methods provide a systematic approach for evaluating a series of increasingly realistic alternative hypotheses regarding the sources of excess risk. So long as selection of cases and controls is population-based and not geographically biased, these tools can provide insights into geographic risk factors that were not specifically

  3. Coupled Two-Way Clustering Analysis of Breast Cancer and Colon Cancer Gene Expression Data

    CERN Document Server

    Getz, G; Kela, I; Domany, E; Notterman, D A; Getz, Gad; Gal, Hilah; Kela, Itai; Domany, Eytan; Notterman, Dan A.

    2003-01-01

    We present and review Coupled Two Way Clustering, a method designed to mine gene expression data. The method identifies submatrices of the total expression matrix, whose clustering analysis reveals partitions of samples (and genes) into biologically relevant classes. We demonstrate, on data from colon and breast cancer, that we are able to identify partitions that elude standard clustering analysis.

  4. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel-time se...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing...

  5. Clustering Analysis on E-commerce Transaction Based on K-means Clustering

    Directory of Open Access Journals (Sweden)

    Xuan HUANG

    2014-02-01

    Full Text Available Based on the density, increment and grid etc, shortcomings like the bad elasticity, weak handling ability of high-dimensional data, sensitive to time sequence of data, bad independence of parameters and weak handling ability of noise are usually existed in clustering algorithm when facing a large number of high-dimensional transaction data. Making experiments by sampling data samples of the 300 mobile phones of Taobao, the following conclusions can be obtained: compared with Single-pass clustering algorithm, the K-means clustering algorithm has a high intra-class dissimilarity and inter-class similarity when analyzing e-commerce transaction. In addition, the K-means clustering algorithm has very high efficiency and strong elasticity when dealing with a large number of data items. However, clustering effects of this algorithm are affected by clustering number and initial positions of clustering center. Therefore, it is easy to show the local optimization for clustering results. Therefore, how to determine clustering number and initial positions of the clustering center of this algorithm is still the important job to be researched in the future.

  6. Cluster analysis for identifying sub-groups and selecting potential discriminatory variables in human encephalitis

    Directory of Open Access Journals (Sweden)

    Crowcroft Natasha S

    2010-12-01

    Full Text Available Abstract Background Encephalitis is an acute clinical syndrome of the central nervous system (CNS, often associated with fatal outcome or permanent damage, including cognitive and behavioural impairment, affective disorders and epileptic seizures. Infection of the central nervous system is considered to be a major cause of encephalitis and more than 100 different pathogens have been recognized as causative agents. However, a large proportion of cases have unknown disease etiology. Methods We perform hierarchical cluster analysis on a multicenter England encephalitis data set with the aim of identifying sub-groups in human encephalitis. We use the simple matching similarity measure which is appropriate for binary data sets and performed variable selection using cluster heatmaps. We also use heatmaps to visually assess underlying patterns in the data, identify the main clinical and laboratory features and identify potential risk factors associated with encephalitis. Results Our results identified fever, personality and behavioural change, headache and lethargy as the main characteristics of encephalitis. Diagnostic variables such as brain scan and measurements from cerebrospinal fluids are also identified as main indicators of encephalitis. Our analysis revealed six major clusters in the England encephalitis data set. However, marked within-cluster heterogeneity is observed in some of the big clusters indicating possible sub-groups. Overall, the results show that patients are clustered according to symptom and diagnostic variables rather than causal agents. Exposure variables such as recent infection, sick person contact and animal contact have been identified as potential risk factors. Conclusions It is in general assumed and is a common practice to group encephalitis cases according to disease etiology. However, our results indicate that patients are clustered with respect to mainly symptom and diagnostic variables rather than causal agents

  7. Fitness, fatness and clustering of cardiovascular risk factors in children from Denmark, Estonia and Portugal

    DEFF Research Database (Denmark)

    Andersen, Lars B; Sardinha, Luis B; Froberg, Karsten

    2008-01-01

    BACKGROUND: Levels of overweight have increased and fitness has decreased in children. Potentially, these changes may be a threat to future health. Numerous studies have measured changes in body mass index (BMI), but few have assessed the independent effects of low fitness, overweight and physical...... inactivity on cardiovascular (CVD) risk factors. METHODS: A cross-sectional multi-center study including 1 769 children from Denmark, Estonia and Portugal. The main outcome was clustering of CVD risk factors. Independent variables were waist circumference, skinfolds, physical activity and cardio...... significant. Fitness showed the same strength of association with the clustered risk score including systolic blood pressure, triglyceride, HOMA score, and cholesterol:HDL with odds ratio for the upper quartile of 4.97 (95% CI: 3.20-7.73). Physical activity was associated with clustered risk even after...

  8. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda.

    Science.gov (United States)

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-03-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards' method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. © The Author 2015

  9. Transforming Rubrics Using Factor Analysis

    Science.gov (United States)

    Baryla, Ed; Shelley, Gary; Trainor, William

    2012-01-01

    Student learning and program effectiveness is often assessed using rubrics. While much time and effort may go into their creation, it is equally important to assess how effective and efficient the rubrics actually are in terms of measuring competencies over a number of criteria. This study demonstrates the use of common factor analysis to identify…

  10. The cosmological analysis of X-ray cluster surveys; III. Bypassing cluster mass measurements

    CERN Document Server

    Pierre, M; Faccioli, L; Clerc, N; Gastaud, R; Koulouridis, E; Pacaud, F

    2016-01-01

    Despite strong theoretical arguments, the use of clusters as cosmological probes is, in practice, frequently questioned because of the many uncertainties impinging on cluster mass estimates. Our aim is to develop a fully self-consistent cosmological approach of X-ray cluster surveys, exclusively based on observable quantities, rather than masses. This procedure is justified given the possibility to directly derive the cluster properties via ab initio modelling, either analytically or by using hydrodynamical simulations. In this third paper, we evaluate the method on cluster toy-catalogues. We model the population of detected clusters in the count-rate -- hardness-ratio -- angular size -- redshift space and compare the corresponding 4-dimensional diagram with theoretical predictions. The best cosmology+physics parameter configuration is determined using a simple minimisation procedure; errors on the parameters are derived by scanning the likelihood hyper-surfaces with a wide range of starting values. The metho...

  11. Cluster analysis of autoantibodies in 852 patients with systemic lupus erythematosus from a single center.

    Science.gov (United States)

    Artim-Esen, Bahar; Çene, Erhan; Şahinkaya, Yasemin; Ertan, Semra; Pehlivan, Özlem; Kamali, Sevil; Gül, Ahmet; Öcal, Lale; Aral, Orhan; Inanç, Murat

    2014-07-01

    Associations between autoantibodies and clinical features have been described in systemic lupus erythematosus (SLE). Herein, we aimed to define autoantibody clusters and their clinical correlations in a large cohort of patients with SLE. We analyzed 852 patients with SLE who attended our clinic. Seven autoantibodies were selected for cluster analysis: anti-DNA, anti-Sm, anti-RNP, anticardiolipin (aCL) immunoglobulin (Ig)G or IgM, lupus anticoagulant (LAC), anti-Ro, and anti-La. Two-step clustering and Kaplan-Meier survival analyses were used. Five clusters were identified. A cluster consisted of patients with only anti-dsDNA antibodies, a cluster of anti-Sm and anti-RNP, a cluster of aCL IgG/M and LAC, and a cluster of anti-Ro and anti-La antibodies. Analysis revealed 1 more cluster that consisted of patients who did not belong to any of the clusters formed by antibodies chosen for cluster analysis. Sm/RNP cluster had significantly higher incidence of pulmonary hypertension and Raynaud phenomenon. DsDNA cluster had the highest incidence of renal involvement. In the aCL/LAC cluster, there were significantly more patients with neuropsychiatric involvement, antiphospholipid syndrome, autoimmune hemolytic anemia, and thrombocytopenia. According to the Systemic Lupus International Collaborating Clinics damage index, the highest frequency of damage was in the aCL/LAC cluster. Comparison of 10 and 20 years survival showed reduced survival in the aCL/LAC cluster. This study supports the existence of autoantibody clusters with distinct clinical features in SLE and shows that forming clinical subsets according to autoantibody clusters may be useful in predicting the outcome of the disease. Autoantibody clusters in SLE may exhibit differences according to the clinical setting or population.

  12. A comparison of the dietary patterns derived by principal component analysis and cluster analysis in older Australians.

    Science.gov (United States)

    Thorpe, Maree G; Milte, Catherine M; Crawford, David; McNaughton, Sarah A

    2016-02-29

    Despite increased use of dietary pattern methods in nutritional epidemiology, there have been few direct comparisons of methods. Older adults are a particularly understudied population in the dietary pattern literature. This study aimed to compare dietary patterns derived by principal component analysis (PCA) and cluster analysis (CA) in older adults and to examine their associations with socio-demographic and health behaviours. Men (n = 1888) and women (n = 2071) aged 55-65 years completed a 111-item food frequency questionnaire in 2010. Food items were collapsed into 52 food groups and dietary patterns were determined by PCA and CA. Associations between dietary patterns and participant characteristics were examined using Chi-square analysis. The standardised PCA-derived dietary patterns were compared across the clusters using one-way ANOVA. PCA identified four dietary patterns in men and two dietary patterns in women. CA identified three dietary patterns in both men and women. Men in cluster 1 (fruit, vegetables, wholegrains, fish and poultry) scored higher on PCA factor 1 (vegetable dishes, fruit, fish and poultry) and factor 4 (vegetables) compared to factor 2 (spreads, biscuits, cakes and confectionery) and factor 3 (red meat, processed meat, white-bread and hot chips) (mean, 95% CI; 0.92, 0.82-1.02 vs. 0.74, 0.63-0.84 vs. -0.43, -0.50- -0.35 vs. 0.60 0.46-0.74, respectively). Women in cluster 1 (fruit, vegetables and fish) scored highest on PCA factor 1 (fruit, vegetables and fish) compared to factor 2 (processed meat, hot chips cakes and confectionery) (1.05, 0.97-1.14 vs. -0.14, -0.21- -0.07, respectively). Cluster 3 (small eaters) in both men and women had negative factor scores for all the identified PCA dietary patterns. Those with dietary patterns characterised by higher consumption of red and processed meat and refined grains were more likely to be Australian-born, have a lower level of education, a higher BMI, smoke and did not meet physical

  13. Tracking of clustered cardiovascular disease risk factors from childhood to adolescence

    DEFF Research Database (Denmark)

    Bugge, Anna; El-Naaman, Bianca; McMurray, Robert G

    2013-01-01

    samples were analyzed for CVD risk factors. A clustered risk-score (z-score) was constructed by adding sex-specific z-scores for blood pressure, homeostatic model assessment (HOMA-IR), triglyceride, skinfolds and negative values of high-density lipoprotein cholesterol (HDLc) and VO(2peak...

  14. Inflammatory Markers and Clustered Cardiovascular Disease Risk Factors in Danish Adolescents

    DEFF Research Database (Denmark)

    Bugge, Anna; El-Naaman, Bianca; McMurray, Robert G

    2012-01-01

    Aims: To evaluate the associations between inflammatory markers and clustering of cardiovascular disease (CVD) risk factors, and to examine how inflammatory markers and CVD risk are related to fatness and cardiorespiratory fitness (VO(2peak)) in adolescents. Methods: Body mass and height, skinfolds...

  15. Clustering of obesity and dental health with lifestyle factors among Turkish and Finnish pre-adolescents

    DEFF Research Database (Denmark)

    Cinar, Basak; Murtomaa, Heikki

    2008-01-01

    This study aims to assess any clustering between obesity, number of decayed, missing, and filled teeth (DMFT), television (TV) viewing, and lifestyle factors among pre-adolescents living in 2 countries with different developmental status and oral health care systems - Turkey and Finland....

  16. Genetic factors influence the clustering of depression among individuals with lower socioeconomic status

    NARCIS (Netherlands)

    S. López León (Sandra); W.C. Choy (Wing Chi); Y.S. Aulchenko (Yurii); S. Claes (Stephan); B.A. Oostra (Ben); J.P. Mackenbach (Johan); C.M. van Duijn (Cock); A.C.J.W. Janssens (Cécile)

    2009-01-01

    textabstractObjective: To investigate the extent to which shared genetic factors can explain the clustering of depression among individuals with lower socioeconomic status, and to examine if neuroticism or intelligence are involved in these pathways. Methods: In total 2,383 participants (1,028 men a

  17. Clustering of cardiovascular risk factors and hypertension control status among hypertensive patients in the outpatient setting

    Institute of Scientific and Technical Information of China (English)

    刘军

    2014-01-01

    Objective To investigate the status of the clustering of cardiovascular risk factors and hypertension control among hypertensive patients in the outpatient setting in China.Methods This multi-center cross-sectional study was carried out from June to December 2009.Study patients were consecutively recruited from 46

  18. Reliability analysis of cluster-based ad-hoc networks

    Energy Technology Data Exchange (ETDEWEB)

    Cook, Jason L. [Quality Engineering and System Assurance, Armament Research Development Engineering Center, Picatinny Arsenal, NJ (United States); Ramirez-Marquez, Jose Emmanuel [School of Systems and Enterprises, Stevens Institute of Technology, Castle Point on Hudson, Hoboken, NJ 07030 (United States)], E-mail: Jose.Ramirez-Marquez@stevens.edu

    2008-10-15

    The mobile ad-hoc wireless network (MAWN) is a new and emerging network scheme that is being employed in a variety of applications. The MAWN varies from traditional networks because it is a self-forming and dynamic network. The MAWN is free of infrastructure and, as such, only the mobile nodes comprise the network. Pairs of nodes communicate either directly or through other nodes. To do so, each node acts, in turn, as a source, destination, and relay of messages. The virtue of a MAWN is the flexibility this provides; however, the challenge for reliability analyses is also brought about by this unique feature. The variability and volatility of the MAWN configuration makes typical reliability methods (e.g. reliability block diagram) inappropriate because no single structure or configuration represents all manifestations of a MAWN. For this reason, new methods are being developed to analyze the reliability of this new networking technology. New published methods adapt to this feature by treating the configuration probabilistically or by inclusion of embedded mobility models. This paper joins both methods together and expands upon these works by modifying the problem formulation to address the reliability analysis of a cluster-based MAWN. The cluster-based MAWN is deployed in applications with constraints on networking resources such as bandwidth and energy. This paper presents the problem's formulation, a discussion of applicable reliability metrics for the MAWN, and illustration of a Monte Carlo simulation method through the analysis of several example networks.

  19. Time series clustering analysis of health-promoting behavior

    Science.gov (United States)

    Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng

    2013-10-01

    Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.

  20. Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.

    Science.gov (United States)

    Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D

    2017-06-01

    Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.

  1. Cohort study on clustering of lifestyle risk factors and understanding its association with stress on health and wellbeing among school teachers in Malaysia (CLUSTer) – a study protocol

    Science.gov (United States)

    2014-01-01

    Background The study on Clustering of Lifestyle risk factors and Understanding its association with Stress on health and wellbeing among school Teachers in Malaysia (CLUSTer) is a prospective cohort study which aims to extensively study teachers in Malaysia with respect to clustering of lifestyle risk factors and stress, and subsequently, to follow-up the population for important health outcomes. Method/design This study is being conducted in six states within Peninsular Malaysia. From each state, schools from each district are randomly selected and invited to participate in the study. Once the schools agree to participate, all teachers who fulfilled the inclusion criteria are invited to participate. Data collection includes a questionnaire survey and health assessment. Information collected in the questionnaire includes socio-demographic characteristics, participants’ medical history and family history of chronic diseases, teaching characteristics and burden, questions on smoking, alcohol consumption and physical activities (IPAQ); a food frequency questionnaire, the job content questionnaire (JCQ); depression, anxiety and stress scale (DASS21); health related quality of life (SF12-V2); Voice Handicap Index 10 on voice disorder, questions on chronic pain, sleep duration and obstetric history for female participants. Following blood drawn for predefined clinical tests, additional blood and urine specimens are collected and stored for future analysis. Active follow up of exposure and health outcomes will be carried out every two years via telephone or face to face contact. Data collection started in March 2013 and as of the end of March 2014 has been completed for four states: Kuala Lumpur, Selangor, Melaka and Penang. Approximately 6580 participants have been recruited. The first round of data collection and blood sampling is expected to be completed by the end of 2014 with an expected 10,000 participants recruited. Discussion Our study will provide a good basis

  2. IPC two-color analysis of x ray galaxy clusters

    Science.gov (United States)

    White, Raymond E., III

    1990-01-01

    The mass distributions were determined of several clusters of galaxies by using X ray surface brightness data from the Einstein Observatory Imaging Proportional Counter (IPC). Determining cluster mass distributions is important for constraining the nature of the dark matter which dominates the mass of galaxies, galaxy clusters, and the Universe. Galaxy clusters are permeated with hot gas in hydrostatic equilibrium with the gravitational potentials of the clusters. Cluster mass distributions can be determined from x ray observations of cluster gas by using the equation of hydrostatic equilibrium and knowledge of the density and temperature structure of the gas. The x ray surface brightness at some distance from the cluster is the result of the volume x ray emissivity being integrated along the line of sight in the cluster.

  3. Microglia Morphological Categorization in a Rat Model of Neuroinflammation by Hierarchical Cluster and Principal Components Analysis

    Science.gov (United States)

    Fernández-Arjona, María del Mar; Grondona, Jesús M.; Granados-Durán, Pablo; Fernández-Llebrez, Pedro; López-Ávalos, María D.

    2017-01-01

    morphological change upon neuraminidase induced inflammation.Hierarchical cluster and principal components analysis allow morphological classification of microglia.Brain location of microglia is a relevant factor. PMID:28848398

  4. [Study of the clinical phenotype of symptomatic chronic airways disease by hierarchical cluster analysis and two-step cluster analyses].

    Science.gov (United States)

    Ning, P; Guo, Y F; Sun, T Y; Zhang, H S; Chai, D; Li, X M

    2016-09-01

    To study the distinct clinical phenotype of chronic airway diseases by hierarchical cluster analysis and two-step cluster analysis. A population sample of adult patients in Donghuamen community, Dongcheng district and Qinghe community, Haidian district, Beijing from April 2012 to January 2015, who had wheeze within the last 12 months, underwent detailed investigation, including a clinical questionnaire, pulmonary function tests, total serum IgE levels, blood eosinophil level and a peak flow diary. Nine variables were chosen as evaluating parameters, including pre-salbutamol forced expired volume in one second(FEV1)/forced vital capacity(FVC) ratio, pre-salbutamol FEV1, percentage of post-salbutamol change in FEV1, residual capacity, diffusing capacity of the lung for carbon monoxide/alveolar volume adjusted for haemoglobin level, peak expiratory flow(PEF) variability, serum IgE level, cumulative tobacco cigarette consumption (pack-years) and respiratory symptoms (cough and expectoration). Subjects' different clinical phenotype by hierarchical cluster analysis and two-step cluster analysis was identified. (1) Four clusters were identified by hierarchical cluster analysis. Cluster 1 was chronic bronchitis in smokers with normal pulmonary function. Cluster 2 was chronic bronchitis or mild chronic obstructive pulmonary disease (COPD) patients with mild airflow limitation. Cluster 3 included COPD patients with heavy smoking, poor quality of life and severe airflow limitation. Cluster 4 recognized atopic patients with mild airflow limitation, elevated serum IgE and clinical features of asthma. Significant differences were revealed regarding pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, maximal mid-expiratory flow curve(MMEF)% pred, carbon monoxide diffusing capacity per liter of alveolar(DLCO)/(VA)% pred, residual volume(RV)% pred, total serum IgE level, smoking history (pack-years), St.George's respiratory questionnaire

  5. Selections of data preprocessing methods and similarity metrics for gene cluster analysis

    Institute of Scientific and Technical Information of China (English)

    YANG Chunmei; WAN Baikun; GAO Xiaofeng

    2006-01-01

    Clustering is one of the major exploratory techniques for gene expression data analysis. Only with suitable similarity metrics and when datasets are properly preprocessed, can results of high quality be obtained in cluster analysis. In this study, gene expression datasets with external evaluation criteria were preprocessed as normalization by line, normalization by column or logarithm transformation by base-2, and were subsequently clustered by hierarchical clustering, k-means clustering and self-organizing maps (SOMs) with Pearson correlation coefficient or Euclidean distance as similarity metric. Finally, the quality of clusters was evaluated by adjusted Rand index. The results illustrate that k-means clustering and SOMs have distinct advantages over hierarchical clustering in gene clustering, and SOMs are a bit better than k-means when randomly initialized. It also shows that hierarchical clustering prefers Pearson correlation coefficient as similarity metric and dataset normalized by line. Meanwhile, k-means clustering and SOMs can produce better clusters with Euclidean distance and logarithm transformed datasets. These results will afford valuable reference to the implementation of gene expression cluster analysis.

  6. Principal Component Analysis and Cluster Analysis in Profile of Electrical System

    Science.gov (United States)

    Iswan; Garniwa, I.

    2017-03-01

    This paper propose to present approach for profile of electrical system, presented approach is combination algorithm, namely principal component analysis (PCA) and cluster analysis. Based on relevant data of gross domestic regional product and electric power and energy use. This profile is set up to show the condition of electrical system of the region, that will be used as a policy in the electrical system of spatial development in the future. This paper consider 24 region in South Sulawesi province as profile center points and use principal component analysis (PCA) to asses the regional profile for development. Cluster analysis is used to group these region into few cluster according to the new variable be produced PCA. The general planning of electrical system of South Sulawesi province can provide support for policy making of electrical system development. The future research can be added several variable into existing variable.

  7. Patterns in longitudinal growth of refraction in Southern Chinese children: cluster and principal component analysis.

    Science.gov (United States)

    Chen, Yanxian; Chang, Billy Heung Wing; Ding, Xiaohu; He, Mingguang

    2016-11-22

    In the present study we attempt to use hypothesis-independent analysis in investigating the patterns in refraction growth in Chinese children, and to explore the possible risk factors affecting the different components of progression, as defined by Principal Component Analysis (PCA). A total of 637 first-born twins in Guangzhou Twin Eye Study with 6-year annual visits (baseline age 7-15 years) were available in the analysis. Cluster 1 to 3 were classified after a partitioning clustering, representing stable, slow and fast progressing groups of refraction respectively. Baseline age and refraction, paternal refraction, maternal refraction and proportion of two myopic parents showed significant differences across the three groups. Three major components of progression were extracted using PCA: "Average refraction", "Acceleration" and the combination of "Myopia stabilization" and "Late onset of refraction progress". In regression models, younger children with more severe myopia were associated with larger "Acceleration". The risk factors of "Acceleration" included change of height and weight, near work, and parental myopia, while female gender, change of height and weight were associated with "Stabilization", and increased outdoor time was related to "Late onset of refraction progress". We therefore concluded that genetic and environmental risk factors have different impacts on patterns of refraction progression.

  8. Clustering and Integrating of Heterogeneous Microbiome Data by Joint Symmetric Nonnegative Matrix Factorization with Laplacian Regularization.

    Science.gov (United States)

    Ma, Yuanyuan; Hu, Xiaohua; He, Tingting; Jiang, Xingpeng

    2017-09-26

    Many datasets existed in the real world are often comprised of different representations or views which provide complementary information to each other. To integrate information from multiple views, data integration approaches such as nonnegative matrix factorization (NMF) have been developed to combine multiple heterogeneous data simultaneously to obtain a comprehensive representation. In this paper, we proposed a novel variant of symmetric nonnegative matrix factorization (SNMF), called Laplacian regularization based joint symmetric nonnegative matrix factorization (LJ-SNMF) for clustering multi-view data. We conduct extensive experiments on several realistic datasets including Human Microbiome Project data. The experimental results show that the proposed method outperforms other variants of NMF, which suggests the potential application of LJ-SNMF in clustering multi-view datasets. Additionally, we also demonstrate the capability of LJ-SNMF in community finding.

  9. Maximum-entropy clustering algorithm and its global convergence analysis

    Institute of Scientific and Technical Information of China (English)

    ZHANG; Zhihua

    2001-01-01

    [1]Bezdek, J. C., Pattern Recognition with Fuzzy Objective Function Algorithm. New York: Plenum, 1981.[2]Krishnapuram, R., Keller, J., A possibilistic approach to clustering, IEEE Trans. on Fuzzy Systems, 1993, 1(2): 98.[3]Yair, E., Zeger, K., Gersho, A., Competitive learning and soft competition for vector quantizer design, IEEE Trans on Signal Processing, 1992, 40(2): 294.[4]Pal, N. R., Bezdek, J. C., Tsao, E. C. K., Generalized clustering networks and Kohonen's self-organizing scheme, IEEE Trans on Neural Networks, 1993, 4(4): 549.[5]Karayiannis, N. B., Bezdek, J. C., Pal, N. R. et al., Repair to GLVQ: a new family of competitive learning schemes, IEEE Trans on Neural Networks, 1996, 7(5): 1062.[6]Karayiannis, N. B., Pai, P. I., Fuzzy algorithms for learning vector quantization, IEEE Trans. on Neural Networks, 1996, 7(5): 1196.[7]Karayiannis, N. B., A methodology for constructing fuzzy algorithms for learning vector quantization, IEEE Trans. on Neural Networks, 1997, 8(3): 505.[8]Karayiannis, N. B., Bezdek, J. C., An integrated approach to fuzzy learning vector quantization and fuzzy C-Means clustering, IEEE Trans. on Fuzzy Systems, 1997, 5(4): 622.[9]Li Xing-si, An efficient approach to nonlinear minimax problems, Chinese Science Bulletin? 1992, 37(10): 802.[10]Li Xing-si, An efficient approach to a class of non-smooth optimization problems, Science in China, Series A,1994, 37(3): 323.[11]. Zangwill, W., Non-linear Programming: A Unified Approach, Englewood Cliffs: Prentice-Hall, 1969.[12]. Fletcher, R., Practical Methods of Optimization,2nd ed., New York: John Wiley & Sons, 1987.[13]. Zhang Zhihua, Zheng Nanning, Wang Tianshu, Behavioral analysis and improving of generalized LVQ neural network, Acta Automatica Sinica, 1999, 25(5): 582.[14]. Kirkpatrick, S., Gelatt, C. D., Vecchi, M. P., Optimization by simulated annealing, Science, 1983, 220(3): 671.[15]. Ross, K., Deterministic annealing for

  10. The Quintuplet Cluster II. Analysis of the WN stars

    CERN Document Server

    Liermann, A; Oskinova, L M; Todt, H; Butler, K; 10.1051/0004-6361/200912612

    2010-01-01

    Based on $K$-band integral-field spectroscopy, we analyze four Wolf-Rayet stars of the nitrogen sequence (WN) found in the inner part of the Quintuplet cluster. All WN stars (WR102d, WR102i, WR102hb, and WR102ea) are of spectral subtype WN9h. One further star, LHO110, is included in the analysis which has been classified as Of/WN? previously but turns out to be most likely a WN9h star as well. The Potsdam Wolf-Rayet (PoWR) models for expanding atmospheres are used to derive the fundamental stellar and wind parameters. The stars turn out to be very luminous, $\\log{(L/L_\\odot)} > 6.0$, with relatively low stellar temperatures, $T_* \\approx$ 25--35\\,kK. Their stellar winds contain a significant fraction of hydrogen, up to $X_\\mathrm{H} \\sim 0.45$ (by mass). We discuss the position of the Galactic center WN stars in the Hertzsprung-Russell diagram and find that they form a distinct group. In this respect, the Quintuplet WN stars are similar to late-type WN stars found in the Arches cluster and elsewhere in the Ga...

  11. Factor Structure of the PTSD Checklist for DSM-5: Relationships Among Symptom Clusters, Anger, and Impulsivity.

    Science.gov (United States)

    Armour, Cherie; Contractor, Ateka; Shea, Tracie; Elhai, Jon D; Pietrzak, Robert H

    2016-02-01

    Scarce data are available regarding the dimensional structure of Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5) posttraumatic stress disorder (PTSD) symptoms and how factors relate to external constructs. We evaluated six competing models of DSM-5 PTSD symptoms, including Anhedonia, Externalizing Behaviors, and Hybrid models, using confirmatory factor analyses in a sample of 412 trauma-exposed college students. We then examined whether PTSD symptom clusters were differentially related to measures of anger and impulsivity using Wald chi-square tests. The seven-factor Hybrid model was deemed optimal compared with the alternatives. All symptom clusters were associated with anger; the strongest association was between externalizing behaviors and anger (r = 0.54). All symptom clusters, except re-experiencing and avoidance, were associated with impulsivity, with the strongest association between externalizing behaviors and impulsivity (r = 0.49). A seven-factor Hybrid model provides superior fit to DSM-5 PTSD symptom data, with the externalizing behaviors factor being most strongly related to anger and impulsivity.

  12. Critical Factors in Transnational Oil Companies Localisation Decisions - Clusters and Portfolio Optimisation

    Energy Technology Data Exchange (ETDEWEB)

    Kind, Hans Jarle; Osmundsen, Petter; Tverteraas, Ragnar

    2001-10-01

    Enhanced understanding of the factors determining transnational companies' localisation decisions is important for regulators and other stakeholders concerned about maintaining current activity levels in a petroleum producing country. This article discusses localisation decisions in the context of theories of industrial clusters and real portfolio optimisation theory (materiality), which we argue are two fruitful lines of explanation for transnational companies' behaviour. The industrial cluster literature is concerned about the level of positive externalities associated with geographic clustering of related production activities. The concept of materiality, implying that investment projects in an oil province must be of a certain minimum size in order to be interesting for oil companies, is evaluated empirically and compared to predictions of mainstream economic theory. (author)

  13. Time-clustering analysis of the 1978–2008 sub-crustal seismicity of Vrancea region

    Directory of Open Access Journals (Sweden)

    L. Telesca

    2011-08-01

    Full Text Available The analysis of time-clustering behaviour of the sub-crustal seismicity (depth larger than 60 km of the Vrancea region has been performed. The time span of the analyzed catalogue is from 1978 to 2008, and only the events with a magnitude of Mw ≥ 3 have been considered. The analysis, carried out on the full and aftershock-depleted catalogues, was performed using the Allan Factor (AF that allows the identificatiion and quantification of correlated temporal structures in temporal point processes. Our results, whose significance was analysed by means of two methods of generation of surrogate series, reveal the presence of time-clustering behaviour in the temporal distribution of seismicity data of the full catalogue. The analysis performed on the aftershock-depleted catalogue indicates that the time-clustering is associated mainly to the aftershocks generated by the two largest events occurred on 30 August 1986 (Mw = 7.1 and 30 May 1990 (Mw = 6.9.

  14. Classification of microvascular patterns via cluster analysis reveals their prognostic significance in glioblastoma.

    Science.gov (United States)

    Chen, Long; Lin, Zhi-Xiong; Lin, Guo-Shi; Zhou, Chang-Fu; Chen, Yu-Peng; Wang, Xing-Fu; Zheng, Zong-Qing

    2015-01-01

    There are limited researches focusing on microvascular patterns (MVPs) in human glioblastoma and their prognostic impact. We evaluated MVPs of 78 glioblastomas by CD34/periodic acid-Schiff dual staining and by cluster analysis of the percentage of microvascular area for distinct microvascular formations. The distribution of 5 types of basic microvascular formations, that is, microvascular sprouting (MS), vascular cluster (VC), vascular garland (VG), glomeruloid vascular proliferation (GVP), and vasculogenic mimicry (VM), was variable. Accordingly, cluster analysis classified MVPs into 2 types: type I MVP displayed prominent MSs and VCs, whereas type II MVP had numerous VGs, GVPs, and VMs. By analyzing the proportion of microvascular area for each type of formation, we determined that glioblastomas with few MSs and VCs had many GVPs and VMs, and vice versa. VG seemed to be a transitional type of formation. In case of type I MVP, expression of Ki-67 and p53 but not MGMT was significantly higher as compared with those of type II MVP (P analysis showed that the type of MVPs presented as an independent prognostic factor of progression-free survival (PFS) and overall survival (OS) (both P < .001). Type II MVP had a more negative influence on PFS and OS than did type I MVP. We conclude that the heterogeneous MVPs in glioblastoma can be categorized properly by certain histopathologic and statistical analyses and may influence clinical outcome. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  15. Analysis of Decision Trees in Context Clustering of Hidden Markov Model Based Thai Speech Synthesis

    Directory of Open Access Journals (Sweden)

    Suphattharachai Chomphan

    2011-01-01

    Full Text Available Problem statement: In Thai speech synthesis using Hidden Markov model (HMM based synthesis system, the tonal speech quality is degraded due to tone distortion. This major problem must be treated appropriately to preserve the tone characteristics of each syllable unit. Since tone brings about the intelligibility of the synthesized speech. It is needed to establish the tone questions and other phonetic questions in tree-based context clustering process accordingly. Approach: This study describes the analysis of questions in tree-based context clustering process of an HMM-based speech synthesis system for Thai language. In the system, spectrum, pitch or F0 and state duration are modeled simultaneously in a unified framework of HMM, their parameter distributions are clustered independently by using a decision-tree based context clustering technique. The contextual factors which affect spectrum, pitch and duration, i.e., part of speech, position and number of phones in a syllable, position and number of syllables in a word, position and number of words in a sentence, phone type and tone type, are taken into account for constructing the questions of the decision tree. All in all, thirteen sets of questions are analyzed in comparison. Results: In the experiment, we analyzed the decision trees by counting the number of questions in each node coming from those thirteen sets and by calculating the dominance score given to each question as the reciprocal of the distance from the root node to the question node. The highest number and dominance score are of the set of phonetic type, while the second, third highest ones are of the set of part of speech and tone type. Conclusion: By counting the number of questions in each node and calculating the dominance score, we can set the priority of each question set. All in all, the analysis results bring about further development of Thai speech synthesis with efficient context clustering process in

  16. MMPI profiles of males accused of severe crimes: a cluster analysis

    NARCIS (Netherlands)

    Spaans, M.; Barendregt, M.; Muller, E.; Beurs, E. de; Nijman, H.L.I.; Rinne, T.

    2009-01-01

    In studies attempting to classify criminal offenders by cluster analysis of Minnesota Multiphasic Personality Inventory-2 (MMPI-2) data, the number of clusters found varied between 10 (the Megargee System) and two (one cluster indicating no psychopathology and one exhibiting serious psychopathology)

  17. Online Cluster Analysis Supporting Real Time Anomaly Detection in Hyperspectral Imagery

    Science.gov (United States)

    2013-06-01

    algorithm is accomplished for this exercise by performing the principal component analysis on the entire image after the removal of the noise and...cluster completely without fully capturing the intended cluster is easily explained by referencing Figure 29. The tree cluster in green is an eccentric

  18. Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis.

    Science.gov (United States)

    Kim, Mi Ae; Shin, Seung Woo; Park, Jong Sook; Uh, Soo Taek; Chang, Hun Soo; Bae, Da Jeong; Cho, You Sook; Park, Hae Sim; Yoon, Ho Joo; Choi, Byoung Whui; Kim, Yong Hoon; Park, Choon Sik

    2017-11-01

    Asthma is a heterogeneous disease characterized by various types of airway inflammation and obstruction. Therefore, it is classified into several subphenotypes, such as early-onset atopic, obese non-eosinophilic, benign, and eosinophilic asthma, using cluster analysis. A number of asthmatics frequently experience exacerbation over a long-term follow-up period, but the exacerbation-prone subphenotype has rarely been evaluated by cluster analysis. This prompted us to identify clusters reflecting asthma exacerbation. A uniform cluster analysis method was applied to 259 adult asthmatics who were regularly followed-up for over 1 year using 12 variables, selected on the basis of their contribution to asthma phenotypes. After clustering, clinical profiles and exacerbation rates during follow-up were compared among the clusters. Four subphenotypes were identified: cluster 1 was comprised of patients with early-onset atopic asthma with preserved lung function, cluster 2 late-onset non-atopic asthma with impaired lung function, cluster 3 early-onset atopic asthma with severely impaired lung function, and cluster 4 late-onset non-atopic asthma with well-preserved lung function. The patients in clusters 2 and 3 were identified as exacerbation-prone asthmatics, showing a higher risk of asthma exacerbation. Two different phenotypes of exacerbation-prone asthma were identified among Korean asthmatics using cluster analysis; both were characterized by impaired lung function, but the age at asthma onset and atopic status were different between the two.

  19. MMPI profiles of males accused of severe crimes: a cluster analysis

    NARCIS (Netherlands)

    Spaans, M.; Barendregt, M.; Muller, E.; Beurs, E. de; Nijman, H.L.I.; Rinne, T.

    2009-01-01

    In studies attempting to classify criminal offenders by cluster analysis of Minnesota Multiphasic Personality Inventory-2 (MMPI-2) data, the number of clusters found varied between 10 (the Megargee System) and two (one cluster indicating no psychopathology and one exhibiting serious

  20. Investigating Faculty Familiarity with Assessment Terminology by Applying Cluster Analysis to Interpret Survey Data

    Science.gov (United States)

    Raker, Jeffrey R.; Holme, Thomas A.

    2014-01-01

    A cluster analysis was conducted with a set of survey data on chemistry faculty familiarity with 13 assessment terms. Cluster groupings suggest a high, middle, and low overall familiarity with the terminology and an independent high and low familiarity with terms related to fundamental statistics. The six resultant clusters were found to be…

  1. Investigating Faculty Familiarity with Assessment Terminology by Applying Cluster Analysis to Interpret Survey Data

    Science.gov (United States)

    Raker, Jeffrey R.; Holme, Thomas A.

    2014-01-01

    A cluster analysis was conducted with a set of survey data on chemistry faculty familiarity with 13 assessment terms. Cluster groupings suggest a high, middle, and low overall familiarity with the terminology and an independent high and low familiarity with terms related to fundamental statistics. The six resultant clusters were found to be…

  2. The Use of Cluster Analysis in Typological Research on Community College Students

    Science.gov (United States)

    Bahr, Peter Riley; Bielby, Rob; House, Emily

    2011-01-01

    One useful and increasingly popular method of classifying students is known commonly as cluster analysis. The variety of techniques that comprise the cluster analytic family are intended to sort observations (for example, students) within a data set into subsets (clusters) that share similar characteristics and differ in meaningful ways from other…

  3. Cluster stability in the analysis of mass cytometry data.

    Science.gov (United States)

    Melchiotti, Rossella; Gracio, Filipe; Kordasti, Shahram; Todd, Alan K; de Rinaldis, Emanuele

    2017-01-01

    Manual gating has been traditionally applied to cytometry data sets to identify cells based on protein expression. The advent of mass cytometry allows for a higher number of proteins to be simultaneously measured on cells, therefore providing a means to define cell clusters in a high dimensional expression space. This enhancement, whilst opening unprecedented opportunities for single cell-level analyses, makes the incremental replacement of manual gating with automated clustering a compelling need. To this aim many methods have been implemented and their successful applications demonstrated in different settings. However, the reproducibility of automatically generated clusters is proving challenging and an analytical framework to distinguish spurious clusters from more stable entities, and presumably more biologically relevant ones, is still missing. One way to estimate cell clusters' stability is the evaluation of their consistent re-occurrence within- and between-algorithms, a metric that is commonly used to evaluate results from gene expression. Herein we report the usage and importance of cluster stability evaluations, when applied to results generated from three popular clustering algorithms - SPADE, FLOCK and PhenoGraph - run on four different data sets. These algorithms were shown to generate clusters with various degrees of statistical stability, many of them being unstable. By comparing the results of automated clustering with manually gated populations, we illustrate how information on cluster stability can assist towards a more rigorous and informed interpretation of clustering results. We also explore the relationships between statistical stability and other properties such as clusters' compactness and isolation, demonstrating that whilst cluster stability is linked to other properties it cannot be reliably predicted by any of them. Our study proposes the introduction of cluster stability as a necessary checkpoint for cluster interpretation and

  4. Maximum-entropy clustering algorithm and its global convergence analysis

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Constructing a batch of differentiable entropy functions touniformly approximate an objective function by means of the maximum-entropy principle, a new clustering algorithm, called maximum-entropy clustering algorithm, is proposed based on optimization theory. This algorithm is a soft generalization of the hard C-means algorithm and possesses global convergence. Its relations with other clustering algorithms are discussed.

  5. Population analysis of open clusters: radii and mass segregation

    CERN Document Server

    Schilbach, E; Piskunov, A E; Röser, S; Scholz, R D

    2006-01-01

    Aims: Based on our well-determined sample of open clusters in the all-sky catalogue ASCC-2.5 we derive new linear sizes of some 600 clusters, and investigate the effect of mass segregation of stars in open clusters. Methods: Using statistical methods, we study the distribution of linear sizes as a function of spatial position and cluster age. We also examine statistically the distribution of stars of different masses within clusters as a function of the cluster age. Results: No significant dependence of the cluster size on location in the Galaxy is detected for younger clusters (< 200 Myr), whereas older clusters inside the solar orbit turned out to be, on average, smaller than outside. Also, small old clusters are preferentially found close to the Galactic plane, whereas larger ones more frequently live farther away from the plane and at larger Galactocentric distances. For clusters with (V - M_V) < 10.5, a clear dependence of the apparent radius on age has been detected: the cluster radii decrease by ...

  6. Unsupervised analysis of classical biomedical markers: robustness and medical relevance of patient clustering using bioinformatics tools.

    Directory of Open Access Journals (Sweden)

    Michal Markovich Gordon

    Full Text Available MOTIVATION: It has been proposed that clustering clinical markers, such as blood test results, can be used to stratify patients. However, the robustness of clusters formed with this approach to data pre-processing and clustering algorithm choices has not been evaluated, nor has clustering reproducibility. Here, we made use of the NHANES survey to compare clusters generated with various combinations of pre-processing and clustering algorithms, and tested their reproducibility in two separate samples. METHOD: Values of 44 biomarkers and 19 health/life style traits were extracted from the National Health and Nutrition Examination Survey (NHANES. The 1999-2002 survey was used for training, while data from the 2003-2006 survey was tested as a validation set. Twelve combinations of pre-processing and clustering algorithms were applied to the training set. The quality of the resulting clusters was evaluated both by considering their properties and by comparative enrichment analysis. Cluster assignments were projected to the validation set (using an artificial neural network and enrichment in health/life style traits in the resulting clusters was compared to the clusters generated from the original training set. RESULTS: The clusters obtained with different pre-processing and clustering combinations differed both in terms of cluster quality measures and in terms of reproducibility of enrichment with health/life style properties. Z-score normalization, for example, dramatically improved cluster quality and enrichments, as compared to unprocessed data, regardless of the clustering algorithm used. Clustering diabetes patients revealed a group of patients enriched with retinopathies. This could indicate that routine laboratory tests can be used to detect patients suffering from complications of diabetes, although other explanations for this observation should also be considered. CONCLUSIONS: Clustering according to classical clinical biomarkers is a robust

  7. Cluster Analysis in Nursing Research: An Introduction, Historical Perspective, and Future Directions.

    Science.gov (United States)

    Dunn, Heather; Quinn, Laurie; Corbridge, Susan J; Eldeirawi, Kamal; Kapella, Mary; Collins, Eileen G

    2017-05-01

    The use of cluster analysis in the nursing literature is limited to the creation of classifications of homogeneous groups and the discovery of new relationships. As such, it is important to provide clarity regarding its use and potential. The purpose of this article is to provide an introduction to distance-based, partitioning-based, and model-based cluster analysis methods commonly utilized in the nursing literature, provide a brief historical overview on the use of cluster analysis in nursing literature, and provide suggestions for future research. An electronic search included three bibliographic databases, PubMed, CINAHL and Web of Science. Key terms were cluster analysis and nursing. The use of cluster analysis in the nursing literature is increasing and expanding. The increased use of cluster analysis in the nursing literature is positioning this statistical method to result in insights that have the potential to change clinical practice.

  8. Interpretability of anatomical variability analysis of abdominal organs via clusterization of decomposition modes.

    Science.gov (United States)

    Reyes, Mauricio; Gonzalez Ballester, Miguel A; Li, Zhixi; Kozic, Nina; Summers, Ronald M; Linguraru, Marius George

    2008-01-01

    Extensive recent work has taken place on the construction of probabilistic atlases of anatomical organs, especially the brain, and their application in medical image analysis. These techniques are leading the way into similar studies of other organs and more comprehensively of groups of organs. In this paper we report results on the analysis of anatomical variability obtained from probabilistic atlases of abdominal organs. Two factor analysis techniques, namely principal component analysis (PCA) and principal factor analysis (PFA), were used to decompose and study shape variability within the abdomen. To assess and ease the interpretability of the resulting deformation modes, a clustering technique of the deformation vectors is proposed. The analysis of deformation fields obtained using these two factor analysis techniques showed strong correlation with anatomical landmarks and known mechanical deformations in the abdomen, allowing us to conclude that PFA is a complementary decomposition technique that offers easy-to-interpret additional information to PCA in a clinical setting. The analysis of organ anatomical variability will represent a potentially important research tool for abdominal diagnosis and modeling.

  9. Patterns of Brucellosis Infection Symptoms in Azerbaijan: A Latent Class Cluster Analysis

    OpenAIRE

    Rita Ismayilova; Emilya Nasirova; Colleen Hanou; Rivard, Robert G.; Bautista, Christian T.

    2014-01-01

    Brucellosis infection is a multisystem disease, with a broad spectrum of symptoms. We investigated the existence of clusters of infected patients according to their clinical presentation. Using national surveillance data from the Electronic-Integrated Disease Surveillance System, we applied a latent class cluster (LCC) analysis on symptoms to determine clusters of brucellosis cases. A total of 454 cases reported between July 2011 and July 2013 were analyzed. LCC identified a two-cluster mo...

  10. Optimum Metallic-Bond Scheme: A Quantitative Analysis of Mass Spectra of Sodium Clusters

    Institute of Scientific and Technical Information of China (English)

    苏长荣; 李家明

    2001-01-01

    Based on the results of the optimum metallic-bond scheme for sodium clusters, we present a quantitative analysis of the detailed features of the mass spectra of sodium clusters. We find that, in the generation of sodium clusters with various abundances, the quasi-steady processes through adding or losing a sodium atom dominate. The quasi-steady processes through adding or losing a sodium dimer are also important to understand the detailed features of mass spectra for small clusters.

  11. Cluster randomized clinical trials in orthodontics: design, analysis and reporting issues.

    Science.gov (United States)

    Pandis, Nikolaos; Walsh, Tanya; Polychronopoulou, Argy; Eliades, Theodore

    2013-10-01

    Cluster randomized trials (CRTs) use as the unit of randomization clusters, which are usually defined as a collection of individuals sharing some common characteristics. Common examples of clusters include entire dental practices, hospitals, schools, school classes, villages, and towns. Additionally, several measurements (repeated measurements) taken on the same individual at different time points are also considered to be clusters. In dentistry, CRTs are applicable as patients may be treated as clusters containing several individual teeth. CRTs require certain methodological procedures during sample calculation, randomization, data analysis, and reporting, which are often ignored in dental research publications. In general, due to similarity of the observations within clusters, each individual within a cluster provides less information compared with an individual in a non-clustered trial. Therefore, clustered designs require larger sample sizes compared with non-clustered randomized designs, and special statistical analyses that account for the fact that observations within clusters are correlated. It is the purpose of this article to highlight with relevant examples the important methodological characteristics of cluster randomized designs as they may be applied in orthodontics and to explain the problems that may arise if clustered observations are erroneously treated and analysed as independent (non-clustered).

  12. Challenges for Cluster Analysis in a Virtual Observatory

    CERN Document Server

    Djorgovski, S G; Mahabal, A A; Williams, R; Granat, R; Stolorz, P

    2002-01-01

    There has been an unprecedented and continuing growth in the volume, quality, and complexity of astronomical data sets over the past few years, mainly through large digital sky surveys. Virtual Observatory (VO) concept represents a scientific and technological framework needed to cope with this data flood. We review some of the applied statistics and computing challenges posed by the analysis of large and complex data sets expected in the VO-based research. The challenges are driven both by the size and the complexity of the data sets (billions of data vectors in parameter spaces of tens or hundreds of dimensions), by the heterogeneity of the data and measurement errors, the selection effects and censored data, and by the intrinsic clustering properties (functional form, topology) of the data distribution in the parameter space of observed attributes. Examples of scientific questions one may wish to address include: objective determination of the numbers of object classes present in the data, and the membersh...

  13. Higgs Pair Production: Choosing Benchmarks With Cluster Analysis

    CERN Document Server

    Dall'Osso, Martino; Gottardo, Carlo A; Oliveira, Alexandra; Tosi, Mia; Goertz, Florian

    2015-01-01

    New physics theories often depend on a large number of free parameters. The precise values of those parameters in some cases drastically affect the resulting phenomenology of fundamental physics processes, while in others finite variations can leave it basically invariant at the level of detail experimentally accessible. When designing a strategy for the analysis of experimental data in the search for a signal predicted by a new physics model, it appears advantageous to categorize the parameter space describing the model according to the corresponding kinematical features of the final state. A multi-dimensional test statistic can be used to gauge the degree of similarity in the kinematics of different models; a clustering algorithm using that metric may then allow the division of the space into homogeneous regions, each of which can be successfully represented by a benchmark point. Searches targeting those benchmark points are then guaranteed to be sensitive to a large area of the parameter space. In this doc...

  14. CLUSTERIZATION – A FACTOR OF EFFICIENCY IN SMALL AND MEDIUM HOSPITALITY ENTERPRISES

    Directory of Open Access Journals (Sweden)

    Zorica Krželj-Čolović

    2016-12-01

    Full Text Available In the modern global economy that is constantly changing and causing constant threats and challenges, various forms of association and networking enterprises are of growing importance. Considering that small and medium enterprises are drivers of economic growth and employment, they should be the most dynamic and most efficient segment of the economy. The same is true for the hospitality industry, where small and medium hospitality enterprises are the main providers of the tourism offer. The lack of networks in clusters of small and medium hospitality enterprises in Croatia is the cause of the unsatisfactory level of competitiveness and quality of hotel facilities with negative implications for economic and social development. The beginning of clustering in Croatia could be a good way to increase the economic efficiency of Croatian small and medium hospitality enterprises. The aim of this paper is to present clustering as a factor that affects the quality of small and medium hospitality enterprises by increasing their competitiveness in the tourism market which is becoming an important element for their business efficiency. For the purposes of the research, a survey was carried out on a sample of 72 small and medium hospitality enterprises in the period from June to September 2012. The survey results have shown that clusterization is a factor of efficiency in small and medium hospitality enterprises.

  15. Archetypal TRMM Radar Profiles Identified Through Cluster Analysis

    Science.gov (United States)

    Boccippio, Dennis J.

    2003-01-01

    It is widely held that identifiable 'convective regimes' exist in nature, although precise definitions of these are elusive. Examples include land / Ocean distinctions, break / monsoon beahvior, seasonal differences in the Amazon (SON vs DJF), etc. These regimes are often described by differences in the realized local convective spectra, and measured by various metrics of convective intensity, depth, areal coverage and rainfall amount. Objective regime identification may be valuable in several ways: regimes may serve as natural 'branch points' in satellite retrieval algorithms or data assimilation efforts; one example might be objective identification of regions that 'should' share a similar 2-R relationship. Similarly, objectively defined regimes may provide guidance on optimal siting of ground validation efforts. Objectively defined regimes could also serve as natural (rather than arbitrary geographic) domain 'controls' in studies of convective response to environmental forcing. Quantification of convective vertical structure has traditionally involved parametric study of prescribed quantities thought to be important to convective dynamics: maximum radar reflectivity, cloud top height, 30-35 dBZ echo top height, rain rate, etc. Individually, these parameters are somewhat deficient as their interpretation is often nonunique (the same metric value may signify different physics in different storm realizations). Individual metrics also fail to capture the coherence and interrelationships between vertical levels available in full 3-D radar datasets. An alternative approach is discovery of natural partitions of vertical structure in a globally representative dataset, or 'archetypal' reflectivity profiles. In this study, this is accomplished through cluster analysis of a very large sample (0[107) of TRMM-PR reflectivity columns. Once achieved, the rainconditional and unconditional 'mix' of archetypal profile types in a given location and/or season provides a description

  16. Archetypal TRMM Radar Profiles Identified Through Cluster Analysis

    Science.gov (United States)

    Boccippio, Dennis J.

    2003-01-01

    It is widely held that identifiable 'convective regimes' exist in nature, although precise definitions of these are elusive. Examples include land / Ocean distinctions, break / monsoon beahvior, seasonal differences in the Amazon (SON vs DJF), etc. These regimes are often described by differences in the realized local convective spectra, and measured by various metrics of convective intensity, depth, areal coverage and rainfall amount. Objective regime identification may be valuable in several ways: regimes may serve as natural 'branch points' in satellite retrieval algorithms or data assimilation efforts; one example might be objective identification of regions that 'should' share a similar 2-R relationship. Similarly, objectively defined regimes may provide guidance on optimal siting of ground validation efforts. Objectively defined regimes could also serve as natural (rather than arbitrary geographic) domain 'controls' in studies of convective response to environmental forcing. Quantification of convective vertical structure has traditionally involved parametric study of prescribed quantities thought to be important to convective dynamics: maximum radar reflectivity, cloud top height, 30-35 dBZ echo top height, rain rate, etc. Individually, these parameters are somewhat deficient as their interpretation is often nonunique (the same metric value may signify different physics in different storm realizations). Individual metrics also fail to capture the coherence and interrelationships between vertical levels available in full 3-D radar datasets. An alternative approach is discovery of natural partitions of vertical structure in a globally representative dataset, or 'archetypal' reflectivity profiles. In this study, this is accomplished through cluster analysis of a very large sample (0[107) of TRMM-PR reflectivity columns. Once achieved, the rainconditional and unconditional 'mix' of archetypal profile types in a given location and/or season provides a description

  17. A COMPARISON BETWEEN SINGLE LINKAGE AND COMPLETE LINKAGE IN AGGLOMERATIVE HIERARCHICAL CLUSTER ANALYSIS FOR IDENTIFYING TOURISTS SEGMENTS

    OpenAIRE

    Noor Rashidah Rashid

    2012-01-01

    Cluster Analysis is a multivariate method in statistics. Agglomerative Hierarchical Cluster Analysis is one of approaches in Cluster Analysis. There are two linkage methods in Agglomerative Hierarchical Cluster Analysis which are Single Linkage and Complete Linkage. The purpose of this study is to compare between Single Linkage and Complete Linkage in Agglomerative Hierarchical Cluster Analysis. The comparison of performances between these linkage methods was shown by using Kruskal-Wallis tes...

  18. Outlier Identification in Model-Based Cluster Analysis.

    Science.gov (United States)

    Evans, Katie; Love, Tanzy; Thurston, Sally W

    2015-04-01

    In model-based clustering based on normal-mixture models, a few outlying observations can influence the cluster structure and number. This paper develops a method to identify these, however it does not attempt to identify clusters amidst a large field of noisy observations. We identify outliers as those observations in a cluster with minimal membership proportion or for which the cluster-specific variance with and without the observation is very different. Results from a simulation study demonstrate the ability of our method to detect true outliers without falsely identifying many non-outliers and improved performance over other approaches, under most scenarios. We use the contributed R package MCLUST for model-based clustering, but propose a modified prior for the cluster-specific variance which avoids degeneracies in estimation procedures. We also compare results from our outlier method to published results on National Hockey League data.

  19. Outlier Identification in Model-Based Cluster Analysis

    Science.gov (United States)

    Evans, Katie; Love, Tanzy; Thurston, Sally W.

    2015-01-01

    In model-based clustering based on normal-mixture models, a few outlying observations can influence the cluster structure and number. This paper develops a method to identify these, however it does not attempt to identify clusters amidst a large field of noisy observations. We identify outliers as those observations in a cluster with minimal membership proportion or for which the cluster-specific variance with and without the observation is very different. Results from a simulation study demonstrate the ability of our method to detect true outliers without falsely identifying many non-outliers and improved performance over other approaches, under most scenarios. We use the contributed R package MCLUST for model-based clustering, but propose a modified prior for the cluster-specific variance which avoids degeneracies in estimation procedures. We also compare results from our outlier method to published results on National Hockey League data. PMID:26806993

  20. A framework for graph-based synthesis, analysis, and visualization of HPC cluster job data.

    Energy Technology Data Exchange (ETDEWEB)

    Mayo, Jackson R.; Kegelmeyer, W. Philip, Jr.; Wong, Matthew H.; Pebay, Philippe Pierre; Gentile, Ann C.; Thompson, David C.; Roe, Diana C.; De Sapio, Vincent; Brandt, James M.

    2010-08-01

    The monitoring and system analysis of high performance computing (HPC) clusters is of increasing importance to the HPC community. Analysis of HPC job data can be used to characterize system usage and diagnose and examine failure modes and their effects. This analysis is not straightforward, however, due to the complex relationships that exist between jobs. These relationships are based on a number of factors, including shared compute nodes between jobs, proximity of jobs in time, etc. Graph-based techniques represent an approach that is particularly well suited to this problem, and provide an effective technique for discovering important relationships in job queuing and execution data. The efficacy of these techniques is rooted in the use of a semantic graph as a knowledge representation tool. In a semantic graph job data, represented in a combination of numerical and textual forms, can be flexibly processed into edges, with corresponding weights, expressing relationships between jobs, nodes, users, and other relevant entities. This graph-based representation permits formal manipulation by a number of analysis algorithms. This report presents a methodology and software implementation that leverages semantic graph-based techniques for the system-level monitoring and analysis of HPC clusters based on job queuing and execution data. Ontology development and graph synthesis is discussed with respect to the domain of HPC job data. The framework developed automates the synthesis of graphs from a database of job information. It also provides a front end, enabling visualization of the synthesized graphs. Additionally, an analysis engine is incorporated that provides performance analysis, graph-based clustering, and failure prediction capabilities for HPC systems.

  1. Advancing health-related cluster analysis methodology: quantification of pairwise activity cluster similarities.

    Science.gov (United States)

    Ferrar, Katia; Maher, Carol; Petkov, John; Olds, Tim

    2015-03-01

    To date, most health-related time-use research has investigated behaviors in isolation; more recently, however, researchers have begun to conceptualize behaviors in the form of multidimensional patterns or clusters. The study employed 2 techniques: radar graphs and centroid vector length, angles and distance to quantify pairwise time-use cluster similarities among adolescents living in Australia (N = 1853) and in New Zealand (N = 679). Based on radar graph shape, 2 pairs of clusters were similar for both boys and girls. Using vector angles (VA), vector length (VL) and centroid distances (CD), 1 pair for each sex was considered most similar (boys: VA = 63°, VL = 44 and 50 units, and CD = 48 units; girls: VA = 23°, VL = 65 and 85 units, and CD = 36 units). Both methods employed to determine similarity had strengths and weaknesses. The description and quantification of cluster similarity is an important step in the research process. An ability to track and compare clusters may provide greater understanding of complex multidimensional relationships, and in relation to health behavior clusters, present opportunities to monitor and to intervene.

  2. Application of Multi-SOM clustering approach to macrophage gene expression analysis.

    Science.gov (United States)

    Ghouila, Amel; Yahia, Sadok Ben; Malouche, Dhafer; Jmel, Haifa; Laouini, Dhafer; Guerfali, Fatma Z; Abdelhak, Sonia

    2009-05-01

    The production of increasingly reliable and accessible gene expression data has stimulated the development of computational tools to interpret such data and to organize them efficiently. The clustering techniques are largely recognized as useful exploratory tools for gene expression data analysis. Genes that show similar expression patterns over a wide range of experimental conditions can be clustered together. This relies on the hypothesis that genes that belong to the same cluster are coregulated and involved in related functions. Nevertheless, clustering algorithms still show limits, particularly for the estimation of the number of clusters and the interpretation of hierarchical dendrogram, which may significantly influence the outputs of the analysis process. We propose here a multi level SOM based clustering algorithm named Multi-SOM. Through the use of clustering validity indices, Multi-SOM overcomes the problem of the estimation of clusters number. To test the validity of the proposed clustering algorithm, we first tested it on supervised training data sets. Results were evaluated by computing the number of misclassified samples. We have then used Multi-SOM for the analysis of macrophage gene expression data generated in vitro from the same individual blood infected with 5 different pathogens. This analysis led to the identification of sets of tightly coregulated genes across different pathogens. Gene Ontology tools were then used to estimate the biological significance of the clustering, which showed that the obtained clusters are coherent and biologically significant.

  3. Cluster analysis application in research on pork quality determinants

    Science.gov (United States)

    Przybylski, W.; Wasiewicz, P.; Zieliński, P.; Gromadzka-Ostrowska, J.; Olczak, E.; Jaworska, D.; Niemyjski, S.; Santé-Lhoutellier, V.

    2010-09-01

    In this paper data mining methods were applied to investigate features determining high quality pork meat. The aim of the study was analysis of conditionality of the pork meat quality defined in coherence with HDL and LDL cholesterol concentration, plasma leptin, triglycerides, plasma glucose and serum. The research was carried out on 54 pigs. originated from crossbreeding of Naima sows with P76-PenArLan boars hybrids line. Meat quality parameters were evaluated in samples derived from the Longissimus (LD) muscle taken behind the last rib on the basis: the pH value, meat colour, drip loss, the RTN, intramuscular fat and glycolytic potential. The results of this study were elaborated by using R environment and show that cluster and regression analysis can be a useful tool for in-depth analysis of the determinants of the quality of pig meat in homogeneous populations of pigs. However, the question of determinants of the level of glycogen and fat in meat requires further research.

  4. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

    Directory of Open Access Journals (Sweden)

    Marco Borri

    Full Text Available To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment.The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4. Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters.The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4, determined with cluster validation, produced the best separation between reducing and non-reducing clusters.The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.

  5. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

    Science.gov (United States)

    Borri, Marco; Schmidt, Maria A; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M; Partridge, Mike; Bhide, Shreerang A; Nutting, Christopher M; Harrington, Kevin J; Newbold, Katie L; Leach, Martin O

    2015-01-01

    To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.

  6. Metabolic risk-factor clustering estimation in children: to draw a line across pediatric metabolic syndrome

    DEFF Research Database (Denmark)

    Brambilla, P; Lissau, I; Flodmark, C-E

    2007-01-01

    BACKGROUND: The diagnostic criteria of the metabolic syndrome (MS) have been applied in studies of obese adults to estimate the metabolic risk-associated with obesity, even though no general consensus exists concerning its definition and clinical value. We reviewed the current literature on the MS...... and adolescents, analyzing the scientific evidence needed to detect a clustering of cardiovascular risk-factors. Finally, we propose a new methodological approach for estimating metabolic risk-factor clustering in children and adolescents. RESULTS: Major concerns were the lack of information on the background...... derived from a child's family and personal history; the lack of consensus on insulin levels, lipid parameters, markers of inflammation or steato-hepatitis; the lack of an additive relevant effect of the MS definition to obesity per se. We propose the adoption of 10 evidence-based items from which...

  7. A combined multidimensional scaling and hierarchical clustering view for the exploratory analysis of multidimensional data

    Science.gov (United States)

    Craig, Paul; Roa-Seïler, Néna

    2013-01-01

    This paper describes a novel information visualization technique that combines multidimensional scaling and hierarchical clustering to support the exploratory analysis of multidimensional data. The technique displays the results of multidimensional scaling using a scatter plot where the proximity of any two items' representations is approximate to their similarity according to a Euclidean distance metric. The results of hierarchical clustering are overlaid onto this view by drawing smoothed outlines around each nested cluster. The difference in similarity between successive cluster combinations is used to colour code clusters and make stronger natural clusters more prominent in the display. When a cluster or group of items is selected, multidimensional scaling and hierarchical clustering are re-applied to a filtered subset of the data, and animation is used to smooth the transition between successive filtered views. As a case study we demonstrate the technique being used to analyse survey data relating to the appropriateness of different phrases to different emotionally charged situations.

  8. Genomic and Metabolomic Profile Associated to Clustering of Cardio-Metabolic Risk Factors

    Science.gov (United States)

    Marrachelli, Vannina G.; Rentero, Pilar; Mansego, María L.; Morales, Jose Manuel; Galan, Inma; Pardo-Tendero, Mercedes; Martinez, Fernando; Martin-Escudero, Juan Carlos; Briongos, Laisa; Chaves, Felipe Javier; Redon, Josep; Monleon, Daniel

    2016-01-01

    Background To identify metabolomic and genomic markers associated with the presence of clustering of cardiometabolic risk factors (CMRFs) from a general population. Methods and Findings One thousand five hundred and two subjects, Caucasian, > 18 years, representative of the general population, were included. Blood pressure measurement, anthropometric parameters and metabolic markers were measured. Subjects were grouped according the number of CMRFs (Group 1: <2; Group 2: 2; Group 3: 3 or more CMRFs). Using SNPlex, 1251 SNPs potentially associated to clustering of three or more CMRFs were analyzed. Serum metabolomic profile was assessed by 1H NMR spectra using a Brucker Advance DRX 600 spectrometer. From the total population, 1217 (mean age 54±19, 50.6% men) with high genotyping call rate were analysed. A differential metabolomic profile, which included products from mitochondrial metabolism, extra mitochondrial metabolism, branched amino acids and fatty acid signals were observed among the three groups. The comparison of metabolomic patterns between subjects of Groups 1 to 3 for each of the genotypes associated to those subjects with three or more CMRFs revealed two SNPs, the rs174577_AA of FADS2 gene and the rs3803_TT of GATA2 transcription factor gene, with minimal or no statistically significant differences. Subjects with and without three or more CMRFs who shared the same genotype and metabolomic profile differed in the pattern of CMRFS cluster. Subjects of Group 3 and the AA genotype of the rs174577 had a lower prevalence of hypertension compared to the CC and CT genotype. In contrast, subjects of Group 3 and the TT genotype of the rs3803 polymorphism had a lower prevalence of T2DM, although they were predominantly males and had higher values of plasma creatinine. Conclusions The results of the present study add information to the metabolomics profile and to the potential impact of genetic factors on the variants of clustering of cardiometabolic risk factors

  9. Significant association of insulin and proinsulin with clustering of cardiovascular risk factors

    Institute of Scientific and Technical Information of China (English)

    En-Zhi Jia; Xin-Li Li; Hai-Yan Wang; Wen-Zhu Ma; Zhi-Jian Yang; Shi-Wei Chen; Guang-Yao Qi; Chun-Fa You; Jian-Feng Ma; Jing-Xin Zhang; Zhen-Zhen Wang; Wei-Chong Qian

    2005-01-01

    AIM: To investigate the association between true insulin and proinsulin and clustering of cardiovascular risk factors.METHODS: Based on the random stratified sampling principles, 1196 Chinese people (533 males and 663 females,aged 35-59 years with an average age of 46.69 years) were recruited. Biotin-avidin based double monoclonal antibody ELISA method was used to detect the true insulin and proinsulin, and a risk factor score was set to evaluate individuals according to the number of risk factors.RESULTS: The median (quartile range) of true insulin and proinsulin was 4.91 mIu/L (3.01-7.09 mIu/L) and 3.49 pmol/L (2.14-5.68 pmol/L) respectively, and the true insulin level of female subjects was significantly higher than that of male subjects (P = 0.000), but the level of proinsulin displayed no significant difference between males and females (P = 0.566). The results of covariate ANOVA after age and sex were controlled showed that subjects with any of the risk factors had a significantly higher true insulin level (P = 0.002 for hypercholesterolemia, P = 0.021 for high low-density lipoprotein cholesterol, P = 0.003 for low high-density lipoprotein cholesterol, and P = 0.000 for other risk factors) and proinsulin level (P = 0.001 for low high-density lipoprotein cholesterol, and P = 0.000 for other risk factors)than those with no risk factors. Furthermore, subjects with higher risk factor scores had a higher true insulin and proinsulin level than those with lower risk factor scores (P = 0.000). The multiple linear regression models showed that true insulin and proinsulin were significantly related to cardiovascular risk factor scores respectively (P = 0.000).CONCLUSION: True insulin and proinsulin are significantly associated with the clustering of cardiovascular risk factors.

  10. ANALYSIS OF TUITION GROWTH RATES BASED ON CLUSTERING AND REGRESSION MODELS

    Directory of Open Access Journals (Sweden)

    Long Cheng

    2016-07-01

    Full Text Available Tuition plays a significant role in determining whether a student could afford higher education, which is one of the major driving forces for country development and social prosperity. So it is necessary to fully understand what factors might affect the tuition and how they affect it. However, many existing studies on the tuition growth rate either lack sufficient real data and proper quantitative models to support their conclusions, or are limited to focus on only a few factors that might affect the tuition growth rate, failing to make a comprehensive analysis. In this paper, we explore a wide variety of factors that might affect the tuition growth rate by use of large amounts of authentic data and different quantitative methods such as clustering and regression models.

  11. A novel symptom cluster analysis among ambulatory HIV/AIDS patients in Uganda.

    Science.gov (United States)

    Namisango, Eve; Harding, Richard; Katabira, Elly T; Siegert, Richard J; Powell, Richard A; Atuhaire, Leonard; Moens, Katrien; Taylor, Steve

    2015-01-01

    Symptom clusters are gaining importance given HIV/AIDS patients experience multiple, concurrent symptoms. This study aimed to: determine clusters of patients with similar symptom combinations; describe symptom combinations distinguishing the clusters; and evaluate the clusters regarding patient socio-demographic, disease and treatment characteristics, quality of life (QOL) and functional performance. This was a cross-sectional study of 302 adult HIV/AIDS outpatients consecutively recruited at two teaching and referral hospitals in Uganda. Socio-demographic and seven-day period symptom prevalence and distress data were self-reported using the Memorial Symptom Assessment Schedule. QOL was assessed using the Medical Outcome Scale and functional performance using the Karnofsky Performance Scale. Symptom clusters were established using hierarchical cluster analysis with squared Euclidean distances using Ward's clustering methods based on symptom occurrence. Analysis of variance compared clusters on mean QOL and functional performance scores. Patient subgroups were categorised based on symptom occurrence rates. Five symptom occurrence clusters were identified: Cluster 1 (n=107), high-low for sensory discomfort and eating difficulties symptoms; Cluster 2 (n=47), high-low for psycho-gastrointestinal symptoms; Cluster 3 (n=71), high for pain and sensory disturbance symptoms; Cluster 4 (n=35), all high for general HIV/AIDS symptoms; and Cluster 5 (n=48), all low for mood-cognitive symptoms. The all high occurrence cluster was associated with worst functional status, poorest QOL scores and highest symptom-associated distress. Use of antiretroviral therapy was associated with all high symptom occurrence rate (Fisher's exact=4, Pcluster (Fisher's exact=41, Pclusters have a differential, affect HIV/AIDS patients' self-reported outcomes, with the subgroup experiencing high-symptom occurrence rates having a higher risk of poorer outcomes. Identification of symptom clusters could

  12. Attitude Exploration Using Factor Analysis Technique

    Directory of Open Access Journals (Sweden)

    Monika Raghuvanshi

    2016-12-01

    Full Text Available Attitude is a psychological variable that contains positive or negative evaluation about people or an environment. The growing generation possesses learning skills, so if positive attitude is inculcated at the right age, it might therefore become habitual. Students in the age group 14-20 years from the city of Bikaner, India, are the target population for this study. An inventory of 30Likert-type scale statements was prepared in order to measure attitude towards the environment and matters related to conservation. The primary data is collected though a structured questionnaire, using cluster sampling technique and analyzed using the IBM SPSS 23 statistical tool. Factor analysis is used to reduce 30 variables to a smaller number of more identifiable groups of variables. Results show that students “need more regulation and voluntary participation to protect the environment”, “need conservation of water and electricity”, “are concerned for undue wastage of water”, “need visible actions to protect the environment”, “need strengthening of the public transport system”, “are a little bit ignorant about the consequences of global warming”, “want prevention of water pollution by industries”, “need changing of personal habits to protect the environment”, and “don’t have firsthand experience of global warming”. Analysis revealed that nine factors obtained could explain about 58.5% variance in the attitude of secondary school students towards the environment in the city of Bikaner, India. The remaining 39.6% variance is attributed to other elements not explained by this analysis. A global campaign for improvement in attitude about environmental issues and its utility in daily lives may boost positive youth attitudes, potentially impacting worldwide. A cross-disciplinary approach may be developed by teaching along with other related disciplines such as science, economics, and social studies etc.

  13. Hierarchical Cluster Analysis: Comparison of Three Linkage Measures and Application to Psychological Data

    Directory of Open Access Journals (Sweden)

    Odilia Yim

    2015-02-01

    Full Text Available Cluster analysis refers to a class of data reduction methods used for sorting cases, observations, or variables of a given dataset into homogeneous groups that differ from each other. The present paper focuses on hierarchical agglomerative cluster analysis, a statistical technique where groups are sequentially created by systematically merging similar clusters together, as dictated by the distance and linkage measures chosen by the researcher. Specific distance and linkage measures are reviewed, including a discussion of how these choices can influence the clustering process by comparing three common linkage measures (single linkage, complete linkage, average linkage. The tutorial guides researchers in performing a hierarchical cluster analysis using the SPSS statistical software. Through an example, we demonstrate how cluster analysis can be used to detect meaningful subgroups in a sample of bilinguals by examining various language variables.

  14. Profiles of exercise motivation, physical activity, exercise habit, and academic performance in Malaysian adolescents: A cluster analysis

    OpenAIRE

    Hairul Anuar Hashim; Freddy Golok; Rosmatunisah Ali

    2011-01-01

    Objectives: This study examined Malaysian adolescents’ profiles of exercise motivation, exercise habit strength, academic performance, and levels of physical activity (PA) using cluster analysis.Methods: The sample (n = 300) consisted of 65.6% males and 34.4% females with a mean age of 13.40 ± 0.49. Statistical analysis was performed using cluster analysis.Results: Cluster analysis revealed three distinct cluster groups. Cluster 1 is characterized by a moderate level of PA, relatively high in...

  15. Genome-scale analysis of positional clustering of mouse testis-specific genes

    Directory of Open Access Journals (Sweden)

    Lee Bernett TK

    2005-01-01

    Full Text Available Abstract Background Genes are not randomly distributed on a chromosome as they were thought even after removal of tandem repeats. The positional clustering of co-expressed genes is known in prokaryotes and recently reported in several eukaryotic organisms such as Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens. In order to further investigate the mode of tissue-specific gene clustering in higher eukaryotes, we have performed a genome-scale analysis of positional clustering of the mouse testis-specific genes. Results Our computational analysis shows that a large proportion of testis-specific genes are clustered in groups of 2 to 5 genes in the mouse genome. The number of clusters is much higher than expected by chance even after removal of tandem repeats. Conclusion Our result suggests that testis-specific genes tend to cluster on the mouse chromosomes. This provides another piece of evidence for the hypothesis that clusters of tissue-specific genes do exist.

  16. Types of Obesity and Its Association with the Clustering of Cardiovascular Disease Risk Factors in Jilin Province of China.

    Science.gov (United States)

    Zhang, Peng; Wang, Rui; Gao, Chunshi; Song, Yuanyuan; Lv, Xin; Jiang, Lingling; Yu, Yaqin; Wang, Yuhan; Li, Bo

    2016-07-07

    Cardiovascular disease (CVD) has become a serious public health problem in recent years in China. Aggregation of CVD risk factors in one individual increases the risk of CVD and the risk increases substantially with each additional risk factor. This study aims to explore the relationship between the number of clustered CVD risk factors and different types of obesity. A multistage stratified random cluster sampling design was used in this population-based cross-sectional study in 2012. Information was collected by face to face interviews. One-way analysis of variance (ANOVA), chi-square test, Kruskal-Wallis test and multiple logistic regression were used in this study. The prevalence of general obesity, central obesity and compound obesity were 0.3%, 36.1% and 14.7%, respectively. The prevalence of hypertension, hyperlipidemia and diabetes in the compound obesity group were higher than those in other groups (compound obesity > central obesity > general obesity > non-obesity), while smoking rate in the non-obesity group was higher than those in other groups (non-obesity > general obesity > central obesity > compound obesity). People with obesity were more likely to have one or more CVD risk factor compared with non-obesity subjects (general obesity (OR: 2.27, 95% CI: 1.13-4.56), central obesity (OR: 2.64, 95% CI: 2.41-2.89), compound obesity (OR: 5.09, 95% CI: 4.38-5.90). The results were similar when the number of clustered CVD risk factors was ≥ 2 and ≥ 3. As a conclusion, more than half of the residents in Jilin Province have a problem of obesity, especially central obesity. Government and health department should take measures to improve people's awareness of central obesity in Jilin Province of China. The prevalence of hypertension, hyperlipidemia and diabetes are associated with obesity types. Compound obesity has a greater risk to cluster multiple CVD risk factors than central obesity and general obesity. Taking measures to control obesity will reduce the

  17. Types of Obesity and Its Association with the Clustering of Cardiovascular Disease Risk Factors in Jilin Province of China

    Directory of Open Access Journals (Sweden)

    Peng Zhang

    2016-07-01

    Full Text Available Cardiovascular disease (CVD has become a serious public health problem in recent years in China. Aggregation of CVD risk factors in one individual increases the risk of CVD and the risk increases substantially with each additional risk factor. This study aims to explore the relationship between the number of clustered CVD risk factors and different types of obesity. A multistage stratified random cluster sampling design was used in this population-based cross-sectional study in 2012. Information was collected by face to face interviews. One-way analysis of variance (ANOVA, chi-square test, Kruskal-Wallis test and multiple logistic regression were used in this study. The prevalence of general obesity, central obesity and compound obesity were 0.3%, 36.1% and 14.7%, respectively. The prevalence of hypertension, hyperlipidemia and diabetes in the compound obesity group were higher than those in other groups (compound obesity > central obesity > general obesity > non-obesity, while smoking rate in the non-obesity group was higher than those in other groups (non-obesity > general obesity > central obesity > compound obesity. People with obesity were more likely to have one or more CVD risk factor compared with non-obesity subjects (general obesity (OR: 2.27, 95% CI: 1.13–4.56, central obesity (OR: 2.64, 95% CI: 2.41–2.89, compound obesity (OR: 5.09, 95% CI: 4.38–5.90. The results were similar when the number of clustered CVD risk factors was ≥ 2 and ≥ 3. As a conclusion, more than half of the residents in Jilin Province have a problem of obesity, especially central obesity. Government and health department should take measures to improve people’s awareness of central obesity in Jilin Province of China. The prevalence of hypertension, hyperlipidemia and diabetes are associated with obesity types. Compound obesity has a greater risk to cluster multiple CVD risk factors than central obesity and general obesity. Taking measures to control

  18. Identification and validation of asthma phenotypes in Chinese population using cluster analysis.

    Science.gov (United States)

    Wang, Lei; Liang, Rui; Zhou, Ting; Zheng, Jing; Liang, Bing Miao; Zhang, Hong Ping; Luo, Feng Ming; Gibson, Peter G; Wang, Gang

    2017-08-30

    Asthma is a heterogeneous airway disease, so it is crucial to clearly identify clinical phenotypes to achieve better asthma management. To identify and prospectively validate asthma clusters in a Chinese population. Two hundred eighty-four patients were consecutively recruited and 18 sociodemographic and clinical variables were collected. Hierarchical cluster analysis was performed by the Ward method followed by k-means cluster analysis. Then, a prospective 12-month cohort study was used to validate the identified clusters. Five clusters were successfully identified. Clusters 1 (n = 71) and 3 (n = 81) were mild asthma phenotypes with slight airway obstruction and low exacerbation risk, but with a sex differential. Cluster 2 (n = 65) described an "allergic" phenotype, cluster 4 (n = 33) featured a "fixed airflow limitation" phenotype with smoking, and cluster 5 (n = 34) was a "low socioeconomic status" phenotype. Patients in clusters 2, 4, and 5 had distinctly lower socioeconomic status and more psychological symptoms. Cluster 2 had a significantly increased risk of exacerbations (risk ratio [RR] 1.13, 95% confidence interval [CI] 1.03-1.25), unplanned visits for asthma (RR 1.98, 95% CI 1.07-3.66), and emergency visits for asthma (RR 7.17, 95% CI 1.26-40.80). Cluster 4 had an increased risk of unplanned visits (RR 2.22, 95% CI 1.02-4.81), and cluster 5 had increased emergency visits (RR 12.72, 95% CI 1.95-69.78). Kaplan-Meier analysis confirmed that cluster grouping was predictive of time to the first asthma exacerbation, unplanned visit, emergency visit, and hospital admission (P clusters as "allergic asthma," "fixed airflow limitation," and "low socioeconomic status" phenotypes that are at high risk of severe asthma exacerbations and that have management implications for clinical practice in developing countries. Copyright © 2017 American College of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  19. A functional clustering algorithm for the analysis of neural relationships

    CERN Document Server

    Feldt, S; Hetrick, V L; Berke, J D; Zochowski, M

    2008-01-01

    We formulate a novel technique for the detection of functional clusters in neural data. In contrast to prior network clustering algorithms, our procedure progressively combines spike trains and derives the optimal clustering cutoff in a simple and intuitive manner. To demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. We observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.

  20. X-Ray Morphological Analysis of the Planck ESZ Clusters

    Science.gov (United States)

    Lovisari, Lorenzo; Forman, William R.; Jones, Christine; Ettori, Stefano; Andrade-Santos, Felipe; Arnaud, Monique; Démoclès, Jessica; Pratt, Gabriel W.; Randall, Scott; Kraft, Ralph

    2017-09-01

    X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev–Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper we determine eight morphological parameters for the Planck Early Sunyaev–Zeldovich (ESZ) objects observed with XMM-Newton. We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.

  1. The Galaxy Cluster RBS380: Xray and Optical Analysis

    OpenAIRE

    Gil-Merino, R.; Schindler, S.

    2002-01-01

    We present X-ray and optical observations of the z=0.52 galaxy cluster RBS380. This is the most distant cluster in the ROSAT Bright Source catalog. The cluster was observed with the CHANDRA satellite in September 2000. The optical observations were carried out with the NTT-SUSI2 camara in filters V and R in August and September 2001. The preliminary conclusions are that we see a very rich optical galaxy cluster but with a relative low X-ray luminosity. We also compare our results to other clu...

  2. Significance analysis and statistical mechanics: an application to clustering.

    Science.gov (United States)

    Łuksza, Marta; Lässig, Michael; Berg, Johannes

    2010-11-26

    This Letter addresses the statistical significance of structures in random data: given a set of vectors and a measure of mutual similarity, how likely is it that a subset of these vectors forms a cluster with enhanced similarity among its elements? The computation of this cluster p value for randomly distributed vectors is mapped onto a well-defined problem of statistical mechanics. We solve this problem analytically, establishing a connection between the physics of quenched disorder and multiple-testing statistics in clustering and related problems. In an application to gene expression data, we find a remarkable link between the statistical significance of a cluster and the functional relationships between its genes.

  3. Study on Cluster Analysis Used with Laser-Induced Breakdown Spectroscopy

    Science.gov (United States)

    He, Li'ao; Wang, Qianqian; Zhao, Yu; Liu, Li; Peng, Zhong

    2016-06-01

    Supervised learning methods (eg. PLS-DA, SVM, etc.) have been widely used with laser-induced breakdown spectroscopy (LIBS) to classify materials; however, it may induce a low correct classification rate if a test sample type is not included in the training dataset. Unsupervised cluster analysis methods (hierarchical clustering analysis, K-means clustering analysis, and iterative self-organizing data analysis technique) are investigated in plastics classification based on the line intensities of LIBS emission in this paper. The results of hierarchical clustering analysis using four different similarity measuring methods (single linkage, complete linkage, unweighted pair-group average, and weighted pair-group average) are compared. In K-means clustering analysis, four kinds of choosing initial centers methods are applied in our case and their results are compared. The classification results of hierarchical clustering analysis, K-means clustering analysis, and ISODATA are analyzed. The experiment results demonstrated cluster analysis methods can be applied to plastics discrimination with LIBS. supported by Beijing Natural Science Foundation of China (No. 4132063)

  4. An easy guide to factor analysis

    CERN Document Server

    Kline, Paul

    2014-01-01

    Factor analysis is a statistical technique widely used in psychology and the social sciences. With the advent of powerful computers, factor analysis and other multivariate methods are now available to many more people. An Easy Guide to Factor Analysis presents and explains factor analysis as clearly and simply as possible. The author, Paul Kline, carefully defines all statistical terms and demonstrates step-by-step how to work out a simple example of principal components analysis and rotation. He further explains other methods of factor analysis, including confirmatory and path analysis, a

  5. Using cluster analysis to identify phenotypes and validation of mortality in men with COPD.

    Science.gov (United States)

    Chen, Chiung-Zuei; Wang, Liang-Yi; Ou, Chih-Ying; Lee, Cheng-Hung; Lin, Chien-Chung; Hsiue, Tzuen-Ren

    2014-12-01

    Cluster analysis has been proposed to examine phenotypic heterogeneity in chronic obstructive pulmonary disease (COPD). The aim of this study was to use cluster analysis to define COPD phenotypes and validate them by assessing their relationship with mortality. Male subjects with COPD were recruited to identify and validate COPD phenotypes. Seven variables were assessed for their relevance to COPD, age, FEV(1) % predicted, BMI, history of severe exacerbations, mMRC, SpO(2), and Charlson index. COPD groups were identified by cluster analysis and validated prospectively against mortality during a 4-year follow-up. Analysis of 332 COPD subjects identified five clusters from cluster A to cluster E. Assessment of the predictive validity of these clusters of COPD showed that cluster E patients had higher all cause mortality (HR 18.3, p Cluster E patients also had higher all cause mortality (HR 14.3, p = 0.0002) and respiratory cause mortality (HR 10.1, p = 0.0013) than patients in cluster D alone. COPD patient with severe airflow limitation, many symptoms, and a history of frequent severe exacerbations was a novel and distinct clinical phenotype predicting mortality in men with COPD.

  6. Clustered frequency analysis of shear Alfven modes in stellarators

    Energy Technology Data Exchange (ETDEWEB)

    Spong, Donald A [ORNL; D' Azevedo, Ed F [ORNL; Todo, Yasushi [National Institute for Fusion Science, Toki, Japan

    2010-01-01

    The shear Alfven spectrum in three-dimensional configurations, such as stellarators and rippled tokamaks, is more densely populated due to the larger number of mode couplings caused by the variation in the magnetic field in the toroidal dimension. This implies more significant computational requirements that can rapidly become prohibitive as more resolution is requested. Alfven eigenfrequencies and mode structures are a primary point of contact between theory and experiment. A new algorithm based on the Jacobi-Davidson method is developed here and applied for a reduced magnetohydrodynamics model to several stellarator configurations. This technique focuses on finding a subset of eigenmodes clustered about a specified input frequency. This approach can be especially useful in modeling experimental observations, where the mode frequency can generally be measured with good accuracy and several different simultaneous frequency lines may be of interest. For cases considered in this paper, it can be a factor of 10{sup 2}-10{sup 3} times faster than more conventional methods.

  7. The Norma Cluster (ACO 3627): I. A Dynamical Analysis of the Most Massive Cluster in the Great Attractor

    CERN Document Server

    Woudt, P A; Lucey, J; Fairall, A P; Moore, S A W

    2007-01-01

    A detailed dynamical analysis of the nearby rich Norma cluster (ACO 3627) is presented. From radial velocities of 296 cluster members, we find a mean velocity of 4871 +/- 54 km/s and a velocity dispersion of 925 km/s. The mean velocity of the E/S0 population (4979 +/- 85 km/s) is offset with respect to that of the S/Irr population (4812 +/- 70 km/s) by `Delta' v = 164 km/s in the cluster rest frame. This offset increases towards the core of the cluster. The E/S0 population is free of any detectable substructure and appears relaxed. Its shape is clearly elongated with a position angle that is aligned along the dominant large-scale structures in this region, the so-called Norma wall. The central cD galaxy has a very large peculiar velocity of 561 km/s which is most probably related to an ongoing merger at the core of the cluster. The spiral/irregular galaxies reveal a large amount of substructure; two dynamically distinct subgroups within the overall spiral-population have been identified, located along the Nor...

  8. Positive Study on Evaluation and Comparison of Government Affairs Micro-blog Influence:Based on Factor Analysis and Cluster Analysis%政务微博影响力评价与比较实证研究--基于因子分析和聚类分析

    Institute of Scientific and Technical Information of China (English)

    赵阿敏; 曹桂全

    2014-01-01

    以16家省级政务微博作为研究样本,甄选出4项一级指标、10项二级指标,构建政务微博影响力评价指标体系,并运用因子分析法和聚类分析法,进行政务微博影响力评价与比较实证研究。政务微博影响力绝大部分信息可以通过“公开-互动因子”和“获取-反馈因子”两个公因子反映出来。16家政务微博影响力水平可划分为5大类型,即强度领先型、综合中等型、综合落后型、均衡发展型和综合领先型。研究结果表明,政务微博影响力发展水平不均衡;低影响力政务微博偏多,高影响力政务微博相对较少,16家政务微博影响力呈底部巨大的“金字塔型”结构分布。%This paper takes 16 provincial government affairs micro-blogs as examples, establishes evaluation system composed of 10 inde-xes, and applies the method of factor analysis and cluster analysis to study influence evaluation and comparison of government affairs micro-blogs. Two components, namely“publicity-interaction factor” and“acquisition-feedback factor”, can better explain main information of influence of government affairs micro-blogs. The influence of government affairs micro-blogs can be divided into five types, which are“strength lead type”,“comprehensive medium type”,“comprehensive behind type”,“balanced development type” and“comprehensive lead type”. The results show that the levels of influence of government affairs micro-blogs are not balanced. Further, the distribution of influence of 16 government affairs micro-blogs presents a pyramidal structure.

  9. General and Specific Effects on Cattell-Horn-Carroll Broad Ability Composites: Analysis of the Woodcock-Johnson III Normative Update Cattell-Horn-Carroll Factor Clusters across Development

    Science.gov (United States)

    Floyd, Randy G.; McGrew, Kevin S.; Barry, Amberly; Rafael, Fawziya; Rogers, Joshua

    2009-01-01

    Many school psychologists focus their interpretation on composite scores from intelligence test batteries designed to measure the broad abilities from the Cattell-Horn-Carroll theory. The purpose of this study was to investigate the general factor loadings and specificity of the broad ability composite scores from one such intelligence test…

  10. Propuesta teorica de factores que impulsan la colaboracion interempresarial en la etapa de la conformacion de los Clusters

    Directory of Open Access Journals (Sweden)

    Rolando Porchini

    2010-01-01

    Full Text Available Present research intends to clarify relationship between intercompanies collaboration and cluster successful conformation. This project shows theoretical concepts about clusters, origins and how clusters evolution parallels the globalization process. The investigation also clarifies differences about concepts of intercompanies cooperation and collaboration used so far without distinction. Actual scientific literature is analyzed about early phase of cluster conformation. intercompanies collaboration (C.I. is considered key to cluster successful conformation, and highlights which key factors are most relevant in cluster conformation, its consolidation and its competitiveness. This is why this research is important regarding what theoretical framework lies behind the 7 factors recognized as the intercompanies collaboration (C.I. construct. Such factors are: i Interchange of strategic information (I.E., ii formalized and consensual rules (R.C.; iii preexistence of particular strategies (P.E.; iv Process of firms selection (P.S.; v Government roll as facilitator (R.G.; vi Expected leadership in first cluster president (L.P. and vii Expected leadership in first cluster manager (L.G.. This theoretical framework is the first part of an investigation presented here in qualitative terms and the quantitative results will be presented shortly. Finally, some recommendations are presented useful to new clusters being founded in the state as well in Mexico.

  11. Joint cluster and non-negative least squares analysis for aerosol mass spectrum data

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, T; Zhu, W [Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794-3600 (United States); McGraw, R [Environmental Sciences Department, Brookhaven National Laboratory, Upton, NY 11973-5000 (United States)], E-mail: zhu@ams.sunysb.edu

    2008-07-15

    Aerosol mass spectrum (AMS) data contain hundreds of mass to charge ratios and their corresponding intensities from air collected through the mass spectrometer. The observations are usually taken sequentially in time to monitor the air composition, quality and temporal change in an area of interest. An important goal of AMS data analysis is to reduce the dimensionality of the original data yielding a small set of representing tracers for various atmospheric and climatic models. In this work, we present an approach to jointly apply the cluster analysis and the non-negative least squares method towards this goal. Application to a relevant study demonstrates the effectiveness of this new approach. Comparisons are made to other relevant multivariate statistical techniques including the principal component analysis and the positive matrix factorization method, and guidelines are provided.

  12. Multilevel Analysis Methods for Partially Nested Cluster Randomized Trials

    Science.gov (United States)

    Sanders, Elizabeth A.

    2011-01-01

    This paper explores multilevel modeling approaches for 2-group randomized experiments in which a treatment condition involving clusters of individuals is compared to a control condition involving only ungrouped individuals, otherwise known as partially nested cluster randomized designs (PNCRTs). Strategies for comparing groups from a PNCRT in the…

  13. Sequential Combination Methods forData Clustering Analysis

    Institute of Scientific and Technical Information of China (English)

    钱 涛; Ching Y.Suen; 唐远炎

    2002-01-01

    This paper proposes the use of more than one clustering method to improve clustering performance. Clustering is an optimization procedure based on a specific clustering criterion. Clustering combination can be regardedasatechnique that constructs and processes multiple clusteringcriteria.Sincetheglobalandlocalclusteringcriteriaarecomplementary rather than competitive, combining these two types of clustering criteria may enhance theclustering performance. In our past work, a multi-objective programming based simultaneous clustering combination algorithmhasbeenproposed, which incorporates multiple criteria into an objective function by a weighting method, and solves this problem with constrained nonlinear optimization programming. But this algorithm has high computationalcomplexity.Hereasequential combination approach is investigated, which first uses the global criterion based clustering to produce an initial result, then uses the local criterion based information to improve the initial result with aprobabilisticrelaxation algorithm or linear additive model.Compared with the simultaneous combination method, sequential combination haslow computational complexity. Results on some simulated data and standard test data arereported.Itappearsthatclustering performance improvement can be achieved at low cost through sequential combination.

  14. Multidimensional cluster stability analysis from a Brazilian Bradyrhizobium sp. RFLP/PCR data set

    Science.gov (United States)

    Milagre, S. T.; Maciel, C. D.; Shinoda, A. A.; Hungria, M.; Almeida, J. R. B.

    2009-05-01

    The taxonomy of the N2-fixing bacteria belonging to the genus Bradyrhizobium is still poorly refined, mainly due to conflicting results obtained by the analysis of the phenotypic and genotypic properties. This paper presents an application of a method aiming at the identification of possible new clusters within a Brazilian collection of 119 Bradyrhizobium strains showing phenotypic characteristics of B. japonicum and B. elkanii. The stability was studied as a function of the number of restriction enzymes used in the RFLP-PCR analysis of three ribosomal regions with three restriction enzymes per region. The method proposed here uses clustering algorithms with distances calculated by average-linkage clustering. Introducing perturbations using sub-sampling techniques makes the stability analysis. The method showed efficacy in the grouping of the species B. japonicum and B. elkanii. Furthermore, two new clusters were clearly defined, indicating possible new species, and sub-clusters within each detected cluster.

  15. Towards Effective Clustering Techniques for the Analysis of Electric Power Grids

    Energy Technology Data Exchange (ETDEWEB)

    Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh; Wang, Shaobu; Mackey, Patrick S.; Hines, Paul; Huang, Zhenyu

    2013-11-30

    Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques on two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.

  16. Mesoscopic analysis of networks: applications to exploratory analysis and data clustering

    CERN Document Server

    Granell, Clara; Arenas, Alex

    2011-01-01

    We investigate the adaptation and performance of modularity-based algorithms, designed in the scope of complex networks, to analyze the mesoscopic structure of correlation matrices. Using a multi-resolution analysis we are able to describe the structure of the data in terms of clusters at different topological levels. We demonstrate the applicability of our findings in two different scenarios: to analyze the neural connectivity of the nematode {\\em Caenorhabditis elegans}, and to automatically classify a typical benchmark of unsupervised clustering, the Iris data set, with considerable success.

  17. Finding "Problem Types" in Judgments of Problem-Similarity: Comparison of Cluster Analysis with Subject Protocols.

    Science.gov (United States)

    Herring, Richard D.

    Literature in mathematic problem-solving suggests that learners store information in memory which helps them solve stereotyped algebra word problems. Cluster analysis has been used as an exploratory tool to infer the types of problems which have common representations in memory. This study compares the results of a hierarchical cluster analysis of…

  18. Tracking Undergraduate Student Achievement in a First-Year Physiology Course Using a Cluster Analysis Approach

    Science.gov (United States)

    Brown, S. J.; White, S.; Power, N.

    2015-01-01

    A cluster analysis data classification technique was used on assessment scores from 157 undergraduate nursing students who passed 2 successive compulsory courses in human anatomy and physiology. Student scores in five summative assessment tasks, taken in each of the courses, were used as inputs for a cluster analysis procedure. We aimed to group…

  19. Schedulability Analysis and Optimization for the Synthesis of Multi-Cluster Distributed Embedded Systems

    DEFF Research Database (Denmark)

    Pop, Paul; Eles, Petru; Peng, Zebo

    2003-01-01

    An approach to schedulability analysis for the synthesis of multi-cluster distributed embedded systems consisting of time-triggered and event-triggered clusters, interconnected via gateways, is presented. A buffer size and worst case queuing delay analysis for the gateways, responsible for routing...

  20. Cluster Analysis of the Luria-Nebraska Neuropsychological Battery with Learning Disabled Adults.

    Science.gov (United States)

    McCue, Michael; And Others

    The study reports a cluster analysis of Luria-Nebraska Neuropsychological Battery sources of 25 learning disabled adults. The cluster analysis suggested the presence of three subgroups within this sample, one having high elevations on the Rhythm, Writing, Reading, and Arithmetic Rhythm scales, the second having an extremely high evelation on the…

  1. Cluster Analysis as a Method of Recovering Types of Intraindividual Growth Trajectories: A Monte Carlo Study.

    Science.gov (United States)

    Dumenci, Levent; Windle, Michael

    2001-01-01

    Used Monte Carlo methods to evaluate the adequacy of cluster analysis to recover group membership based on simulated latent growth curve (LCG) models. Cluster analysis failed to recover growth subtypes adequately when the difference between growth curves was shape only. Discusses circumstances under which it was more successful. (SLD)

  2. Segmenting Business Students Using Cluster Analysis Applied to Student Satisfaction Survey Results

    Science.gov (United States)

    Gibson, Allen

    2009-01-01

    This paper demonstrates a new application of cluster analysis to segment business school students according to their degree of satisfaction with various aspects of the academic program. The resulting clusters provide additional insight into drivers of student satisfaction that are not evident from analysis of the responses of the student body as a…

  3. Tracking Undergraduate Student Achievement in a First-Year Physiology Course Using a Cluster Analysis Approach

    Science.gov (United States)

    Brown, S. J.; White, S.; Power, N.

    2015-01-01

    A cluster analysis data classification technique was used on assessment scores from 157 undergraduate nursing students who passed 2 successive compulsory courses in human anatomy and physiology. Student scores in five summative assessment tasks, taken in each of the courses, were used as inputs for a cluster analysis procedure. We aimed to group…

  4. Detecting Hotspots from Taxi Trajectory Data Using Spatial Cluster Analysis

    Science.gov (United States)

    Zhao, P. X.; Qin, K.; Zhou, Q.; Liu, C. K.; Chen, Y. X.

    2015-07-01

    A method of trajectory clustering based on decision graph and data field is proposed in this paper. The method utilizes data field to describe spatial distribution of trajectory points, and uses decision graph to discover cluster centres. It can automatically determine cluster parameters and is suitable to trajectory clustering. The method is applied to trajectory clustering on taxi trajectory data, which are on the holiday (May 1st, 2014), weekday (Wednesday, May 7th, 2014) and weekend (Saturday, May 10th, 2014) respectively, in Wuhan City, China. The hotspots in four hours (8:00-9:00, 12:00-13:00, 18:00-19:00 and 23:00-24:00) for three days are discovered and visualized in heat maps. In the future, we will further research the spatiotemporal distribution and laws of these hotspots, and use more data to carry out the experiments.

  5. MASSCLEAN - MASSive CLuster Evolution and ANalysis Package - Description and Tests

    CERN Document Server

    Popescu, Bogdan

    2008-01-01

    We present MASSCLEAN, a new, sophisticated and robust stellar cluster image and photometry simulation package. This package is able to create color-magnitude diagrams and standard FITS images in any of the traditional optical and near-infrared bands based on cluster characteristics input by the user, including but not limited to distance, age, mass, radius and extinction. At the limit of very distant, unresolved clusters, we have checked the integrated colors created in MASSCLEAN against those from other single stellar population models with consistent results. We have also tested models which provide a reasonable estimate of the field star contamination in images and color-magnitude diagrams. We demonstrate the package by simulating images and color-magnitude diagrams of well known massive Milky Way clusters and compare their appearance to real data. Because the algorithm populates the cluster with a discrete number of tenable stars, it can be used as part of a Monte Carlo Method to derive the probabilistic ...

  6. Boundaries, links and clusters: a new paradigm in spatial analysis?

    Science.gov (United States)

    Jacquez, Geoff M; Kaufmann, Andy; Goovaerts, Pierre

    2008-12-01

    This paper develops and applies new techniques for the simultaneous detection of boundaries and clusters within a probabilistic framework. The new statistic "little b" (written b(ij)) evaluates boundaries between adjacent areas with different values, as well as links between adjacent areas with similar values. Clusters of high values (hotspots) and low values (coldspots) are then constructed by joining areas abutting locations that are significantly high (e.g., an unusually high disease rate) and that are connected through a "link" such that the values in the adjoining areas are not significantly different. Two techniques are proposed and evaluated for accomplishing cluster construction: "big B" and the "ladder" approach. We compare the statistical power and empirical Type I and Type II error of these approaches to those of wombling and the local Moran test. Significance may be evaluated using distribution theory based on the product of two continuous (e.g., non-discrete) variables. We also provide a "distribution free" algorithm based on resampling of the observed values. The methods are applied to simulated data for which the locations of boundaries and clusters is known, and compared and contrasted with clusters found using the local Moran statistic and with polygon Womble boundaries. The little b approach to boundary detection is comparable to polygon wombling in terms of Type I error, Type II error and empirical statistical power. For cluster detection, both the big B and ladder approaches have lower Type I and Type II error and are more powerful than the local Moran statistic. The new methods are not constrained to find clusters of a pre-specified shape, such as circles, ellipses and donuts, and yield a more accurate description of geographic variation than alternative cluster tests that presuppose a specific cluster shape. We recommend these techniques over existing cluster and boundary detection methods that do not provide such a comprehensive description

  7. Two worlds collide: image analysis methods for quantifying structural variation in cluster molecular dynamics.

    Science.gov (United States)

    Steenbergen, K G; Gaston, N

    2014-02-14

    Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement for a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.

  8. Fully Automated Operational Modal Analysis using multi-stage clustering

    Science.gov (United States)

    Neu, Eugen; Janser, Frank; Khatibi, Akbar A.; Orifici, Adrian C.

    2017-02-01

    The interest for robust automatic modal parameter extraction techniques has increased significantly over the last years, together with the rising demand for continuous health monitoring of critical infrastructure like bridges, buildings and wind turbine blades. In this study a novel, multi-stage clustering approach for Automated Operational Modal Analysis (AOMA) is introduced. In contrast to existing approaches, the procedure works without any user-provided thresholds, is applicable within large system order ranges, can be used with very small sensor numbers and does not place any limitations on the damping ratio or the complexity of the system under investigation. The approach works with any parametric system identification algorithm that uses the system order n as sole parameter. Here a data-driven Stochastic Subspace Identification (SSI) method is used. Measurements from a wind tunnel investigation with a composite cantilever equipped with Fiber Bragg Grating Sensors (FBGSs) and piezoelectric sensors are used to assess the performance of the algorithm with a highly damped structure and low signal to noise ratio conditions. The proposed method was able to identify all physical system modes in the investigated frequency range from over 1000 individual datasets using FBGSs under challenging signal to noise ratio conditions and under better signal conditions but from only two sensors.

  9. A Grouping Method of Distribution Substations Using Cluster Analysis

    Science.gov (United States)

    Ohtaka, Toshiya; Iwamoto, Shinichi

    Recently, it has been considered to group distribution substations together for evaluating the reinforcement planning of distribution systems. However, the grouping is carried out by the knowledge and experience of an expert who is in charge of distribution systems, and a subjective feeling of a human being causes ambiguous grouping at the moment. Therefore, a method for imitating the grouping by the expert has been desired in order to carry out a systematic grouping which has numerical corroboration. In this paper, we propose a grouping method of distribution substations using cluster analysis based on the interconnected power between the distribution substations. Moreover, we consider the geographical constraints such as rivers, roads, business office boundaries and branch boundaries, and also examine a method for adjusting the interconnected power. Simulations are carried out to verify the validity of the proposed method using an example system. From the simulation results, we can find that the imitation of the grouping by the expert becomes possible due to considering the geographical constraints and adjusting the interconnected power, and also the calculation time and iterations can be greatly reduced by introducing the local and tabu search methods.

  10. An Analysis of Particle Swarm Optimization with Data Clustering-Technique for Optimization in Data Mining

    Directory of Open Access Journals (Sweden)

    Amreen Khan,

    2010-07-01

    Full Text Available Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. Clustering aims at representing large datasets by a fewer number of prototypes or clusters. It brings simplicity in modeling data and thus plays a central role in the process of knowledge discovery and data mining. Data mining tasks require fast and accurate partitioning of huge datasets, which may come with a variety of attributes or features. This imposes severe computational requirements on the relevant clustering techniques. A family of bio-inspired algorithms, well-known as Swarm Intelligence (SI has recently emerged that meets these requirements and has successfully been applied to a number ofreal world clustering problems. This paper looks into the use ofParticle Swarm Optimization for cluster analysis. The effectiveness of Fuzzy C-means clustering provides enhanced performance and maintains more diversity in the swarm and also allows the particles to be robust to trace the changing environment.

  11. RNA-seq analysis identifies an intricate regulatory network controlling cluster root development in white lupin

    Science.gov (United States)

    2014-01-01

    Background Highly adapted plant species are able to alter their root architecture to improve nutrient uptake and thrive in environments with limited nutrient supply. Cluster roots (CRs) are specialised structures of dense lateral roots formed by several plant species for the effective mining of nutrient rich soil patches through a combination of increased surface area and exudation of carboxylates. White lupin is becoming a model-species allowing for the discovery of gene networks involved in CR development. A greater understanding of the underlying molecular mechanisms driving these developmental processes is important for the generation of smarter plants for a world with diminishing resources to improve food security. Results RNA-seq analyses for three developmental stages of the CR formed under phosphorus-limited conditions and two of non-cluster roots have been performed for white lupin. In total 133,045,174 high-quality paired-end reads were used for a de novo assembly of the root transcriptome and merged with LAGI01 (Lupinus albus gene index) to generate an improved LAGI02 with 65,097 functionally annotated contigs. This was followed by comparative gene expression analysis. We show marked differences in the transcriptional response across the various cluster root stages to adjust to phosphate limitation by increasing uptake capacity and adjusting metabolic pathways. Several transcription factors such as PLT, SCR, PHB, PHV or AUX/IAA with a known role in the control of meristem activity and developmental processes show an increased expression in the tip of the CR. Genes involved in hormonal responses (PIN, LAX, YUC) and cell cycle control (CYCA/B, CDK) are also differentially expressed. In addition, we identify primary transcripts of miRNAs with established function in the root meristem. Conclusions Our gene expression analysis shows an intricate network of transcription factors and plant hormones controlling CR initiation and formation. In addition

  12. Clustering of Risk Factors for Non-Communicable Diseases among Adolescents from Southern Brazil.

    Directory of Open Access Journals (Sweden)

    Heloyse Elaine Gimenes Nunes

    Full Text Available The aim of this study was to investigate the simultaneous presence of risk factors for non-communicable diseases and the association of these risk factors with demographic and economic factors among adolescents from southern Brazil.The study included 916 students (14-19 years old enrolled in the 2014 school year at state schools in São José, Santa Catarina, Brazil. Risk factors related to lifestyle (i.e., physical inactivity, excessive alcohol consumption, smoking, sedentary behaviour and unhealthy diet, demographic variables (sex, age and skin colour and economic variables (school shift and economic level were assessed through a questionnaire. Simultaneous behaviours were assessed by the ratio between observed and expected prevalences of risk factors for non-communicable diseases. The clustering of risk factors was analysed by multinomial logistic regression. The clusters of risk factors that showed a higher prevalence were analysed by binary logistic regression.The clustering of two, three, four, and five risk factors were found in 22.2%, 49.3%, 21.7% and 3.1% of adolescents, respectively. Subgroups that were more likely to have both behaviours of physical inactivity and unhealthy diet simultaneously were mostly composed of girls (OR = 3.03, 95% CI = 1.57-5.85 and those with lower socioeconomic status (OR = 1.83, 95% CI = 1.05-3.21; simultaneous physical inactivity, excessive alcohol consumption, sedentary behaviour and unhealthy diet were mainly observed among older adolescents (OR = 1.49, 95% CI = 1.05-2.12. Subgroups less likely to have both behaviours of sedentary behaviour and unhealthy diet were mostly composed of girls (OR = 0.58, 95% CI = 0.38-0.89; simultaneous physical inactivity, sedentary behaviour and unhealthy diet were mainly observed among older individuals (OR = 0.66, 95% CI = 0.49-0.87 and those of the night shift (OR = 0.59, 95% CI = 0.43-0.82.Adolescents had a high prevalence of simultaneous risk factors for NCDs

  13. DISCLOSE : DISsection of CLusters Obtained by SEries of transcriptome data using functional annotations and putative transcription factor binding sites

    Directory of Open Access Journals (Sweden)

    Silvis Remko

    2008-12-01

    Full Text Available Abstract Background A typical step in the analysis of gene expression data is the determination of clusters of genes that exhibit similar expression patterns. Researchers are confronted with the seemingly arbitrary choice between numerous algorithms to perform cluster analysis. Results We developed an exploratory application that benchmarks the results of clustering methods using functional annotations. In addition, a de novo DNA motif discovery algorithm is integrated in our program which identifies overrepresented DNA binding sites in the upstream DNA sequences of genes from the clusters that are indicative of sites of transcriptional control. The performance of our program was evaluated by comparing the original results of a time course experiment with the findings of our application. Conclusion DISCLOSE assists researchers in the prokaryotic research community in systematically evaluating results of the application of a range of clustering algorithms to transcriptome data. Different performance measures allow to quickly and comprehensively determine the best suited clustering approach for a given dataset.

  14. Cluster analysis in retail segmentation for credit scoring

    Directory of Open Access Journals (Sweden)

    Sanja Scitovski

    2014-12-01

    Full Text Available The aim of this paper is to segment retail clients by using adaptive Mahalanobis clustering in a way that each segment can be suitable for separate credit scoring development such that a better risk assessment of retail clients could be accomplished. A real data set on retail clients from a Croatian bank was used in the paper. Grouping of the data point set is carried out by using the adaptive Mahalanobis partitioning algorithm (see, e.g., [20]. It is an incremental algorithm, which recognizes ellipsoidal clusters with the main axes in the directions of eigenvectors of the corresponding covariance matrix of the data set. On the basis of the given data set, by using the well-known DIRECT algorithm for global optimization it is possible to search successively for an optimal partition with k=2, 3,... clusters. After that, a partition with the most appropriate number of clusters is determined by using various validity indexes. Based on the description of each cluster, banks could decide to develop a separate credit scoring model for each cluster as well as to create a business strategy customized to each cluster.

  15. Structural parameters of young star clusters: fractal analysis

    Science.gov (United States)

    Hetem, A.

    2017-07-01

    A unified view of star formation in the Universe demand detailed and in-depth studies of young star clusters. This work is related to our previous study of fractal statistics estimated for a sample of young stellar clusters (Gregorio-Hetem et al. 2015, MNRAS 448, 2504). The structural properties can lead to significant conclusions about the early stages of cluster formation: 1) virial conditions can be used to distinguish warm collapsed; 2) bound or unbound behaviour can lead to conclusions about expansion; and 3) fractal statistics are correlated to the dynamical evolution and age. The technique of error bars estimation most used in the literature is to adopt inferential methods (like bootstrap) to estimate deviation and variance, which are valid only for an artificially generated cluster. In this paper, we expanded the number of studied clusters, in order to enhance the investigation of the cluster properties and dynamic evolution. The structural parameters were compared with fractal statistics and reveal that the clusters radial density profile show a tendency of the mean separation of the stars increase with the average surface density. The sample can be divided into two groups showing different dynamic behaviour, but they have the same dynamic evolution, since the entire sample was revealed as being expanding objects, for which the substructures do not seem to have been completely erased. These results are in agreement with the simulations adopting low surface densities and supervirial conditions.

  16. Cluster Computing For Real Time Seismic Array Analysis.

    Science.gov (United States)

    Martini, M.; Giudicepietro, F.

    A seismic array is an instrument composed by a dense distribution of seismic sen- sors that allow to measure the directional properties of the wavefield (slowness or wavenumber vector) radiated by a seismic source. Over the last years arrays have been widely used in different fields of seismological researches. In particular they are applied in the investigation of seismic sources on volcanoes where they can be suc- cessfully used for studying the volcanic microtremor and long period events which are critical for getting information on the volcanic systems evolution. For this reason arrays could be usefully employed for the volcanoes monitoring, however the huge amount of data produced by this type of instruments and the processing techniques which are quite time consuming limited their potentiality for this application. In order to favor a direct application of arrays techniques to continuous volcano monitoring we designed and built a small PC cluster able to near real time computing the kinematics properties of the wavefield (slowness or wavenumber vector) produced by local seis- mic source. The cluster is composed of 8 Intel Pentium-III bi-processors PC working at 550 MHz, and has 4 Gigabytes of RAM memory. It runs under Linux operating system. The developed analysis software package is based on the Multiple SIgnal Classification (MUSIC) algorithm and is written in Fortran. The message-passing part is based upon the LAM programming environment package, an open-source imple- mentation of the Message Passing Interface (MPI). The developed software system includes modules devote to receiving date by internet and graphical applications for the continuous displaying of the processing results. The system has been tested with a data set collected during a seismic experiment conducted on Etna in 1999 when two dense seismic arrays have been deployed on the northeast and the southeast flanks of this volcano. A real time continuous acquisition system has been simulated by

  17. Hierarchical cluster analysis of progression patterns in open-angle glaucoma patients with medical treatment.

    Science.gov (United States)

    Bae, Hyoung Won; Rho, Seungsoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2014-04-29

    To classify medically treated open-angle glaucoma (OAG) by the pattern of progression using hierarchical cluster analysis, and to determine OAG progression characteristics by comparing clusters. Ninety-five eyes of 95 OAG patients who received medical treatment, and who had undergone visual field (VF) testing at least once per year for 5 or more years. OAG was classified into subgroups using hierarchical cluster analysis based on the following five variables: baseline mean deviation (MD), baseline visual field index (VFI), MD slope, VFI slope, and Glaucoma Progression Analysis (GPA) printout. After that, other parameters were compared between clusters. Two clusters were made after a hierarchical cluster analysis. Cluster 1 showed -4.06 ± 2.43 dB baseline MD, 92.58% ± 6.27% baseline VFI, -0.28 ± 0.38 dB per year MD slope, -0.52% ± 0.81% per year VFI slope, and all "no progression" cases in GPA printout, whereas cluster 2 showed -8.68 ± 3.81 baseline MD, 77.54 ± 12.98 baseline VFI, -0.72 ± 0.55 MD slope, -2.22 ± 1.89 VFI slope, and seven "possible" and four "likely" progression cases in GPA printout. There were no significant differences in age, sex, mean IOP, central corneal thickness, and axial length between clusters. However, cluster 2 included more high-tension glaucoma patients and used a greater number of antiglaucoma eye drops significantly compared with cluster 1. Hierarchical cluster analysis of progression patterns divided OAG into slow and fast progression groups, evidenced by assessing the parameters of glaucomatous progression in VF testing. In the fast progression group, the prevalence of high-tension glaucoma was greater and the number of antiglaucoma medications administered was increased versus the slow progression group. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.

  18. Nonlinear analysis of nano-cluster doped fiber

    Institute of Scientific and Technical Information of China (English)

    LIU Gang; ZHANG Ru

    2007-01-01

    There are prominent nonlinear characteristics that we hope for the semiconductor nano-clusters doped fiber. Refractive index of fiber core can be effectively changed by adulteration. This technology can provide a new method for developing photons components. Because the semiconductor nano-cluster has quantum characteristics,Based on first-order perturbation theory and classical theory of fiber,we deduced refractive index expressions of fiber core,which was semiconductor nano-cluster doped fiber. Finally,third-order nonlinear coefficient equation was gained. Using this equation,we calculated SMF-28 fiber nonlinear coefficient. The equation shows that new third-order coefficient was greater.

  19. DNA splice site sequences clustering method for conservativeness analysis

    Institute of Scientific and Technical Information of China (English)

    Quanwei Zhang; Qinke Peng; Tao Xu

    2009-01-01

    DNA sequences that are near to splice sites have remarkable conservativeness,and many researchers have contributed to the prediction of splice site.In order to mine the underlying biological knowledge,we analyze the conservativeness of DNA splice site adjacent sequences by clustering.Firstly,we propose a kind of DNA splice site sequences clustering method which is based on DBSCAN,and use four kinds of dissimilarity calculating methods.Then,we analyze the conservative feature of the clustering results and the experimental data set.

  20. Spatial-temporal clusters and risk factors of hand, foot, and mouth disease at the district level in Guangdong Province, China.

    Directory of Open Access Journals (Sweden)

    Te Deng

    Full Text Available OBJECTIVE: Hand, foot, and mouth disease (HFMD has posed a great threat to the health of children and become a public health priority in China. This study aims to investigate the epidemiological characteristics, spatial-temporal patterns, and risk factors of HFMD in Guangdong Province, China, and to provide scientific information for public health responses and interventions. METHODS: HFMD surveillance data from May 2008 to December 2011were provided by the Chinese Center for Disease Control and Prevention. We firstly conducted a descriptive analysis to evaluate the epidemic characteristics of HFMD. Then, Kulldorff scan statistic based on a discrete Poisson model was used to detect spatial-temporal clusters. Finally, a spatial paneled model was applied to identify the risk factors. RESULTS: A total of 641,318 HFMD cases were reported in Guangdong Province during the study period (total population incidence: 17.51 per 10,000. Male incidence was higher than female incidence for all age groups, and approximately 90% of the cases were children [Formula: see text] years old. Spatial-temporal cluster analysis detected four most likely clusters and several secondary clusters (P<0.001 with the maximum cluster size 50% and 20% respectively during 2008-2011. Monthly average temperature, relative humidity, the proportion of population [Formula: see text] years, male-to-female ratio, and total sunshine were demonstrated to be the risk factors for HFMD. CONCLUSION: Children [Formula: see text] years old, especially boys, were more susceptible to HFMD and we should take care of their vulnerability. Provincial capital city Guangzhou and the Pearl River Delta regions had always been the spatial-temporal clusters and future public health planning and resource allocation should be focused on these areas. Furthermore, our findings showed a strong association between HFMD and meteorological factors, which may assist in predicting HFMD incidence.

  1. [Relationship between central obesity and clustering of cardiovascular risk factors in adults of Jiangsu province].

    Science.gov (United States)

    Su, Jian; Xiang, Quanyong; Lyu, Shurong; Pan, Xiaoqun; Qin, Yu; Yang, Jie; Zhou, Jinyi; Zhang, Yongqing; Wu, Ming; Tao, Ran

    2015-06-01

    To explore the relationship between central obesity and cardiovascular risk factors and their clustering in adults of Jiangsu province. Multi-stratified clustering sampling method was used to sample 8 400 residents aged 18 years and over from 14 diseases surveillance units in Jiangsu province from October to December 2010. Information was obtained with face-to-face interview, physical examination and laboratory testing. A total of 8 380 residents finished the study protocol and their data were analyzed. Central obesity was defined as waist circumference ≥ 85 cm in males or ≥ 80 cm in females. Following complex weighting of the samples, level and proportion of cardiovascular risk factors in group with different waist circumference were analyzed. The prevalence of central obesity among adults in Jiangsu province was 46.2%, the proportion of males and females was 46.4% and 46.1%, respectively (P > 0.05). The prevalence of center obesity varied significantly in residents with different age, area, education and occupation (all P risk factors increased in proportion to increasing waist circumference (all P risk of hypertension, diabetes, dyslipidemia and clustering of cardiovascular risk factors was 2.2 (OR = 2.2, 95% CI: 2.0-2.4) and 4.7 (OR = 4.7, 95% CI: 3.9-5.7); 2.1 (OR = 2.1, 95% CI: 1.7-2.5) and 3.8 (OR = 3.8, 95% CI: 3.2-4.5); 2.3 (OR = 2.3, 95% CI: 1.8-2.9) and 4.1 (OR = 4.1, 95% CI: 3.2-5.3); 3.4 (OR = 3.4, 95% CI: 2.9-3.9) and 8.0 (OR = 8.0, 95% CI: 6.2-10.2) fold higher in residents with mild and severe central obesity than residents without central obesity. The extent of central obesity positively correlates with the prevalence of cardiovascular risk factors and their clustering in adults of Jiangsu province. Comprehensive interventions on obesity serve as an important tool to reduce the cardiovascular risk in adult Jiangshu residents.

  2. Social and Behavioral Risk Marker Clustering Associated with Biological Risk Factors for Coronary Heart Disease: NHANES 2001–2004

    Directory of Open Access Journals (Sweden)

    Nicholas J. Everage

    2014-01-01

    Full Text Available Background. Social and behavioral risk markers (e.g., physical activity, diet, smoking, and socioeconomic position cluster; however, little is known whether clustering is associated with coronary heart disease (CHD risk. Objectives were to determine if sociobehavioral clustering is associated with biological CHD risk factors (total cholesterol, HDL cholesterol, systolic blood pressure, body mass index, waist circumference, and diabetes and whether associations are independent of individual clustering components. Methods. Participants included 4,305 males and 4,673 females aged ≥20 years from NHANES 2001–2004. Sociobehavioral Risk Marker Index (SRI included a summary score of physical activity, fruit/vegetable consumption, smoking, and educational attainment. Regression analyses evaluated associations of SRI with aforementioned biological CHD risk factors. Receiver operator curve analyses assessed independent predictive ability of SRI. Results. Healthful clustering (SRI = 0 was associated with improved biological CHD risk factor levels in 5 of 6 risk factors in females and 2 of 6 risk factors in males. Adding SRI to models containing age, race, and individual SRI components did not improve C-statistics. Conclusions. Findings suggest that healthful sociobehavioral risk marker clustering is associated with favorable CHD risk factor levels, particularly in females. These findings should inform social ecological interventions that consider health impacts of addressing social and behavioral risk factors.

  3. Bivariate Mixed Effects Analysis of Clustered Data with Large Cluster Sizes.

    Science.gov (United States)

    Zhang, Daowen; Sun, Jie Lena; Pieper, Karen

    2016-10-01

    Linear mixed effects models are widely used to analyze a clustered response variable. Motivated by a recent study to examine and compare the hospital length of stay (LOS) between patients undertaking percutaneous coronary intervention (PCI) and coronary artery bypass graft (CABG) from several international clinical trials, we proposed a bivariate linear mixed effects model for the joint modeling of clustered PCI and CABG LOS's where each clinical trial is considered a cluster. Due to the large number of patients in some trials, commonly used commercial statistical software for fitting (bivariate) linear mixed models failed to run since it could not allocate enough memory to invert large dimensional matrices during the optimization process. We consider ways to circumvent the computational problem in the maximum likelihood (ML) inference and restricted maximum likelihood (REML) inference. Particularly, we developed an expected and maximization (EM) algorithm for the REML inference and presented an ML implementation using existing software. The new REML EM algorithm is easy to implement and computationally stable and efficient. With this REML EM algorithm, we could analyze the LOS data and obtained meaningful results.

  4. Critical clusters in interdependent economic sectors. A data-driven spectral clustering analysis

    Science.gov (United States)

    Oliva, Gabriele; Setola, Roberto; Panzieri, Stefano

    2016-10-01

    In this paper we develop a data-driven hierarchical clustering methodology to group the economic sectors of a country in order to highlight strongly coupled groups that are weakly coupled with other groups. Specifically, we consider an input-output representation of the coupling among the sectors and we interpret the relation among sectors as a directed graph; then we recursively apply the spectral clustering methodology over the graph, without a priori information on the number of groups that have to be obtained. In order to do this, we resort to the eigengap criterion, where a suitable number of groups is selected automatically based on the intensity and structure of the coupling among the sectors. We validate the proposed methodology considering a case study for Italy, inspecting how the coupling among clusters and sectors changes from the year 1995 to 2011, showing that in the years the Italian structure underwent deep changes, becoming more and more interdependent, i.e., a large part of the economy has become tightly coupled.

  5. Modest validity and fair reproducibility of dietary patterns derived by cluster analysis.

    Science.gov (United States)

    Funtikova, Anna N; Benítez-Arciniega, Alejandra A; Fitó, Montserrat; Schröder, Helmut

    2015-03-01

    Cluster analysis is widely used to analyze dietary patterns. We aimed to analyze the validity and reproducibility of the dietary patterns defined by cluster analysis derived from a food frequency questionnaire (FFQ). We hypothesized that the dietary patterns derived by cluster analysis have fair to modest reproducibility and validity. Dietary data were collected from 107 individuals from population-based survey, by an FFQ at baseline (FFQ1) and after 1 year (FFQ2), and by twelve 24-hour dietary recalls (24-HDR). Repeatability and validity were measured by comparing clusters obtained by the FFQ1 and FFQ2 and by the FFQ2 and 24-HDR (reference method), respectively. Cluster analysis identified a "fruits & vegetables" and a "meat" pattern in each dietary data source. Cluster membership was concordant for 66.7% of participants in FFQ1 and FFQ2 (reproducibility), and for 67.0% in FFQ2 and 24-HDR (validity). Spearman correlation analysis showed reasonable reproducibility, especially in the "fruits & vegetables" pattern, and lower validity also especially in the "fruits & vegetables" pattern. κ statistic revealed a fair validity and reproducibility of clusters. Our findings indicate a reasonable reproducibility and fair to modest validity of dietary patterns derived by cluster analysis. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. Network analysis of 3D complex plasma clusters in a rotating electric field

    CERN Document Server

    Laut, Ingo; Wörner, Lisa; Nosenko, Vladimir; Zhdanov, Sergey K; Schablinski, Jan; Block, Dietmar; Thomas, Hubertus M; Morfill, Gregor E

    2014-01-01

    Network analysis was used to study the structure and time evolution of driven three-dimensional complex plasma clusters. The clusters were created by suspending micron-size particles in a glass box placed on top of the rf electrode in a capacitively coupled discharge. The particles were highly charged and manipulated by an external electric field that had a constant magnitude and uniformly rotated in the horizontal plane. Depending on the frequency of the applied electric field, the clusters rotated in the direction of the electric field or remained stationary. The positions of all particles were measured using stereoscopic digital in-line holography. The network analysis revealed the interplay between two competing symmetries in the cluster. The rotating cluster was shown to be more cylindrical than the nonrotating cluster. The emergence of vertical strings of particles was also confirmed.

  7. Quality assessment of cortex cinnamomi by HPLC chemical fingerprint, principle component analysis and cluster analysis.

    Science.gov (United States)

    Yang, Jie; Chen, Li-Hong; Zhang, Qin; Lai, Mao-Xiang; Wang, Qiang

    2007-06-01

    HPLC fingerprint analysis, principle component analysis (PCA), and cluster analysis were introduced for quality assessment of Cortex cinnamomi (CC). The fingerprint of CC was developed and validated by analyzing 30 samples of CC from different species and geographic locations. Seventeen chromatographic peaks were selected as characteristic peaks and their relative peak areas (RPA) were calculated for quantitative expression of the HPLC fingerprints. The correlation coefficients of similarity in chromatograms were higher than 0.95 for the same species while much lower than 0.6 for different species. Besides, two principal components (PCs) have been extracted by PCA. PC1 separated Cinnamomum cassia from other species, capturing 56.75% of variance while PC2 contributed for their further separation, capturing 19.08% variance. The scores of the samples showed that the samples could be clustered reasonably into different groups corresponding to different species and different regions. The scores and loading plots together revealed different chemical properties of each group clearly. The cluster analysis confirmed the results of PCA analysis. Therefore, HPLC fingerprint in combination with chemometric techniques provide a very flexible and reliable method for quality assessment of traditional Chinese medicines.

  8. Cluster Analysis in Patients with GOLD 1 Chronic Obstructive Pulmonary Disease.

    Directory of Open Access Journals (Sweden)

    Philippe Gagnon

    Full Text Available We hypothesized that heterogeneity exists within the Global Initiative for Chronic Obstructive Lung Disease (GOLD 1 spirometric category and that different subgroups could be identified within this GOLD category.Pre-randomization study participants from two clinical trials were symptomatic/asymptomatic GOLD 1 chronic obstructive pulmonary disease (COPD patients and healthy controls. A hierarchical cluster analysis used pre-randomization demographics, symptom scores, lung function, peak exercise response and daily physical activity levels to derive population subgroups.Considerable heterogeneity existed for clinical variables among patients with GOLD 1 COPD. All parameters, except forced expiratory volume in 1 second (FEV1/forced vital capacity (FVC, had considerable overlap between GOLD 1 COPD and controls. Three-clusters were identified: cluster I (18 [15%] COPD patients; 105 [85%] controls; cluster II (45 [80%] COPD patients; 11 [20%] controls; and cluster III (22 [92%] COPD patients; 2 [8%] controls. Apart from reduced diffusion capacity and lower baseline dyspnea index versus controls, cluster I COPD patients had otherwise preserved lung volumes, exercise capacity and physical activity levels. Cluster II COPD patients had a higher smoking history and greater hyperinflation versus cluster I COPD patients. Cluster III COPD patients had reduced physical activity versus controls and clusters I and II COPD patients, and lower FEV1/FVC versus clusters I and II COPD patients.The results emphasize heterogeneity within GOLD 1 COPD, supporting an individualized therapeutic approach to patients.www.clinicaltrials.gov. NCT01360788 and NCT01072396.

  9. Molecular analysis of SCARECROW genes expressed in white lupin cluster roots.

    Science.gov (United States)

    Sbabou, Laila; Bucciarelli, Bruna; Miller, Susan; Liu, Junqi; Berhada, Fatiha; Filali-Maltouf, Abdelkarim; Allan, Deborah; Vance, Carroll

    2010-03-01

    The Scarecrow (SCR) transcription factor plays a crucial role in root cell radial patterning and is required for maintenance of the quiescent centre and differentiation of the endodermis. In response to phosphorus (P) deficiency, white lupin (Lupinus albus L.) root surface area increases some 50-fold to 70-fold due to the development of cluster (proteoid) roots. Previously it was reported that SCR-like expressed sequence tags (ESTs) were expressed during early cluster root development. Here the cloning of two white lupin SCR genes, LaSCR1 and LaSCR2, is reported. The predicted amino acid sequences of both LaSCR gene products are highly similar to AtSCR and contain C-terminal conserved GRAS family domains. LaSCR1 and LaSCR2 transcript accumulation localized to the endodermis of both normal and cluster roots as shown by in situ hybridization and gene promoter::reporter staining. Transcript analysis as evaluated by quantitative real-time-PCR (qRT-PCR) and RNA gel hybridization indicated that the two LaSCR genes are expressed predominantly in roots. Expression of LaSCR genes was not directly responsive to the P status of the plant but was a function of cluster root development. Suppression of LaSCR1 in transformed roots of lupin and Medicago via RNAi (RNA interference) delivered through Agrobacterium rhizogenes resulted in decreased root numbers, reflecting the potential role of LaSCR1 in maintaining root growth in these species. The results suggest that the functional orthologues of AtSCR have been characterized.

  10. The impact and importance of clinical learning experience in supporting nursing students in end-of-life care: cluster analysis.

    Science.gov (United States)

    Chow, Susan Ka Yee; Wong, Lina T W; Chan, Yik Kam; Chung, Tin Yu

    2014-09-01

    Nursing students are often expected to provide end-of-life care to patients during clinical practice. Little research has been conducted to examine the heterogeneity of the students and how learning outcomes are affected by their education experience and other demographic factors. The aim of this study was to identify and compare groups of nursing students based on their demographics, clinical experience, knowledge, perceived competency, and attitude towards end-of-life care. A group of 253 nursing students was asked to complete a cross-sectional survey to explore their clinical experience, knowledge, attitude, and perceived competency towards end-of-life care. Cluster analysis was used to determine whether specific groups of students could be identified within the study cohort. Three distinct clusters were identified. Students from the three clusters showed no significant differences in end-of-life knowledge. Significant differences were identified in clinical experience amongst the three clusters and in attitude and perceived competency within the clusters. The cluster of students that had greater clinical experience demonstrated higher perceived competency and a more positive attitude towards end-of-life care. Clinical experience was found to be crucial in enhancing the perceived competency and attitude of nursing students in end-of-life care. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. Geographic clustering of firms and urban form: a multivariate analysis

    Science.gov (United States)

    Maoh, Hanna; Kanaroglou, Pavlos

    2007-04-01

    This paper provides an empirical framework that applies spatial statistics methods to assess the relation between the change in the geographical clustering of firms and the emergence of urban form. We contend that where firms locate and eventually cluster give rise to the way commercial and industrial land uses are organized over space, which in turn defines the shape of urban form. Accordingly, the objectives of our work are twofold: (1) to identify the extent and shape of firm clustering and co-location at the intra-metropolitan level, and (2) examine how the change in the geographic clustering of different industries contributes to decentralization and the evolution of urban form. Spatial statistics methods and tools were vital and helped to fulfill these objectives.

  12. First PPMXL photometric analysis of open cluster Ruprecht 15

    Institute of Scientific and Technical Information of China (English)

    Ashraf Latif Tadross

    2012-01-01

    We present the first in a series studying the astrophysical parameters of open clusters using the PPMXL* database whose data are applied to study Ruprecht 15.The astrophysical parameters of Ruprecht 15 have been estimated for the first time.

  13. Arrangement of the Clostridium baratii F7 toxin gene cluster with identification of a σ factor that recognizes the botulinum toxin gene cluster promoters.

    Science.gov (United States)

    Dover, Nir; Barash, Jason R; Burke, Julianne N; Hill, Karen K; Detter, John C; Arnon, Stephen S

    2014-01-01

    Botulinum neurotoxin (BoNT) is the most poisonous substances known and its eight toxin types (A to H) are distinguished by the inability of polyclonal antibodies that neutralize one toxin type to neutralize any of the other seven toxin types. Infant botulism, an intestinal toxemia orphan disease, is the most common form of human botulism in the United States. It results from swallowed spores of Clostridium botulinum (or rarely, neurotoxigenic Clostridium butyricum or Clostridium baratii) that germinate and temporarily colonize the lumen of the large intestine, where, as vegetative cells, they produce botulinum toxin. Botulinum neurotoxin is encoded by the bont gene that is part of a toxin gene cluster that includes several accessory genes. We sequenced for the first time the complete botulinum neurotoxin gene cluster of nonproteolytic C. baratii type F7. Like the type E and the nonproteolytic type F6 botulinum toxin gene clusters, the C. baratii type F7 had an orfX toxin gene cluster that lacked the regulatory botR gene which is found in proteolytic C. botulinum strains and codes for an alternative σ factor. In the absence of botR, we identified a putative alternative regulatory gene located upstream of the C. baratii type F7 toxin gene cluster. This putative regulatory gene codes for a predicted σ factor that contains DNA-binding-domain homologues to the DNA-binding domains both of BotR and of other members of the TcdR-related group 5 of the σ70 family that are involved in the regulation of toxin gene expression in clostridia. We showed that this TcdR-related protein in association with RNA polymerase core enzyme specifically binds to the C. baratii type F7 botulinum toxin gene cluster promoters. This TcdR-related protein may therefore be involved in regulating the expression of the genes of the botulinum toxin gene cluster in neurotoxigenic C. baratii.

  14. Cardiorespiratory fitness, cardiovascular workload and risk factors among cleaners; a cluster randomized worksite intervention

    DEFF Research Database (Denmark)

    Korshøj, Mette; Krustrup, Peter; Jørgensen, Marie Birk;

    2012-01-01

    Prevalence of cardiovascular risk factors is unevenly distributed among occupational groups. The working environment, as well as lifestyle and socioeconomic status contribute to the disparity and variation in prevalence of these risk factors. High physical work demands have been shown to increase...... the risk for cardiovascular disease and mortality, contrary to leisure time physical activity. High physical work demands in combination with a low cardiorespiratory fitness infer a high relative workload and an excessive risk for cardiovascular mortality. Therefore, the aim of this study is to examine...... and cardiovascular risk factors among cleaners. Cleaners are eligible if they are employed ≥ 20 hours/week, at one of the enrolled companies. In the randomization, strata are formed according to the manager the participant reports to. The clusters will be balanced on the following criteria: Geographical work...

  15. Cluster Forests

    CERN Document Server

    Yan, Donghui; Jordan, Michael I

    2011-01-01

    Inspired by Random Forests (RF) in the context of classification, we propose a new clustering ensemble method---Cluster Forests (CF). Geometrically, CF randomly probes a high-dimensional data cloud to obtain "good local clusterings" and then aggregates via spectral clustering to obtain cluster assignments for the whole dataset. The search for good local clusterings is guided by a cluster quality measure $\\kappa$. CF progressively improves each local clustering in a fashion that resembles the tree growth in RF. Empirical studies on several real-world datasets under two different performance metrics show that CF compares favorably to its competitors. Theoretical analysis shows that the $\\kappa$ criterion is shown to grow each local clustering in a desirable way---it is "noise-resistant." A closed-form expression is obtained for the mis-clustering rate of spectral clustering under a perturbation model, which yields new insights into some aspects of spectral clustering.

  16. Mortality in Danish Swine herds: Spatio-temporal clusters and risk factors

    DEFF Research Database (Denmark)

    Lopes Antunes, Ana Carolina; Ersbøll, Annette Kjær; Bihrmann, Kristine

    2017-01-01

    The aim of this study was to explore spatio-temporal mortality patterns in Danish swine herds from December 2013 to October 2015, and to discuss the use of mortality data for syndromic surveillance in Denmark. Although it has previously been assessed within the context of syndromic surveillance......, the value of mortality data generated on a regular and mandatory basis from all swine herds remains unexplored in terms of swine surveillance in Denmark. A total of 5010 farms were included in the analysis, corresponding to 1896 weaner herds, 1490 sow herds and 3839 finisher herds. The spatio...... indicate welfare and disease issues, while multiple-herd clusters could suggest the presence of infectious diseases within the cluster area. The impact of farm type is linked to the fact that larger farms specialize in only one age group, with high biosecurity and more specialized personnel...

  17. Method for exploratory cluster analysis and visualisation of single-trial ERP ensembles.

    Science.gov (United States)

    Williams, N J; Nasuto, S J; Saddy, J D

    2015-07-30

    The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. We propose a complete pipeline for the cluster analysis of ERP data. To increase the signal-to-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA) to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). After validating the pipeline on simulated data, we tested it on data from two experiments - a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership. Our analysis operates on denoised single-trials, the number of clusters are determined in a principled manner and the results are presented through an intuitive visualisation. Given the cluster structure in some experimental conditions, we suggest application of cluster analysis as a preliminary step before ensemble averaging. Copyright © 2015 Elsevier B.V. All rights reserved.

  18. Active trachoma among children in Mali: Clustering and environmental risk factors.

    Science.gov (United States)

    Hägi, Mathieu; Schémann, Jean-François; Mauny, Frédéric; Momo, Germain; Sacko, Doulaye; Traoré, Lamine; Malvy, Denis; Viel, Jean-François

    2010-01-19

    Active trachoma is not uniformly distributed in endemic areas, and local environmental factors influencing its prevalence are not yet adequately understood. Determining whether clustering is a consistent phenomenon may help predict likely modes of transmission and help to determine the appropriate level at which to target control interventions. The aims of this study were, therefore, to disentangle the relative importance of clustering at different levels and to assess the respective role of individual, socio-demographic, and environmental factors on active trachoma prevalence among children in Mali. We used anonymous data collected during the Mali national trachoma survey (1996-1997) at different levels of the traditional social structure (14,627 children under 10 years of age, 6,251 caretakers, 2,269 households, 203 villages). Besides field-collected data, environmental variables were retrieved later from various databases at the village level. Bayesian hierarchical logistic models were fit to these prevalence and exposure data. Clustering revealed significant results at four hierarchical levels. The higher proportion of the variation in the occurrence of active trachoma was attributable to the village level (36.7%), followed by household (25.3%), and child (24.7%) levels. Beyond some well-established individual risk factors (age between 3 and 5, dirty face, and flies on the face), we showed that caretaker-level (wiping after body washing), household-level (common ownership of radio, and motorbike), and village-level (presence of a women's association, average monthly maximal temperature and sunshine fraction, average annual mean temperature, presence of rainy days) features were associated with reduced active trachoma prevalence. This study clearly indicates the importance of directing control efforts both at children with active trachoma as well as those with close contact, and at communities. The results support facial cleanliness and environmental

  19. Active trachoma among children in Mali: Clustering and environmental risk factors.

    Directory of Open Access Journals (Sweden)

    Mathieu Hägi

    Full Text Available BACKGROUND: Active trachoma is not uniformly distributed in endemic areas, and local environmental factors influencing its prevalence are not yet adequately understood. Determining whether clustering is a consistent phenomenon may help predict likely modes of transmission and help to determine the appropriate level at which to target control interventions. The aims of this study were, therefore, to disentangle the relative importance of clustering at different levels and to assess the respective role of individual, socio-demographic, and environmental factors on active trachoma prevalence among children in Mali. METHODOLOGY/PRINCIPAL FINDINGS: We used anonymous data collected during the Mali national trachoma survey (1996-1997 at different levels of the traditional social structure (14,627 children under 10 years of age, 6,251 caretakers, 2,269 households, 203 villages. Besides field-collected data, environmental variables were retrieved later from various databases at the village level. Bayesian hierarchical logistic models were fit to these prevalence and exposure data. Clustering revealed significant results at four hierarchical levels. The higher proportion of the variation in the occurrence of active trachoma was attributable to the village level (36.7%, followed by household (25.3%, and child (24.7% levels. Beyond some well-established individual risk factors (age between 3 and 5, dirty face, and flies on the face, we showed that caretaker-level (wiping after body washing, household-level (common ownership of radio, and motorbike, and village-level (presence of a women's association, average monthly maximal temperature and sunshine fraction, average annual mean temperature, presence of rainy days features were associated with reduced active trachoma prevalence. CONCLUSIONS/SIGNIFICANCE: This study clearly indicates the importance of directing control efforts both at children with active trachoma as well as those with close contact, and at

  20. Active trachoma among children in Mali: Clustering and environmental risk factors.

    Directory of Open Access Journals (Sweden)

    Mathieu Hägi

    Full Text Available BACKGROUND: Active trachoma is not uniformly distributed in endemic areas, and local environmental factors influencing its prevalence are not yet adequately understood. Determining whether clustering is a consistent phenomenon may help predict likely modes of transmission and help to determine the appropriate level at which to target control interventions. The aims of this study were, therefore, to disentangle the relative importance of clustering at different levels and to assess the respective role of individual, socio-demographic, and environmental factors on active trachoma prevalence among children in Mali. METHODOLOGY/PRINCIPAL FINDINGS: We used anonymous data collected during the Mali national trachoma survey (1996-1997 at different levels of the traditional social structure (14,627 children under 10 years of age, 6,251 caretakers, 2,269 households, 203 villages. Besides field-collected data, environmental variables were retrieved later from various databases at the village level. Bayesian hierarchical logistic models were fit to these prevalence and exposure data. Clustering revealed significant results at four hierarchical levels. The higher proportion of the variation in the occurrence of active trachoma was attributable to the village level (36.7%, followed by household (25.3%, and child (24.7% levels. Beyond some well-established individual risk factors (age between 3 and 5, dirty face, and flies on the face, we showed that caretaker-level (wiping after body washing, household-level (common ownership of radio, and motorbike, and village-level (presence of a women's association, average monthly maximal temperature and sunshine fraction, average annual mean temperature, presence of rainy days features were associated with reduced active trachoma prevalence. CONCLUSIONS/SIGNIFICANCE: This study clearly indicates the importance of directing control efforts both at children with active trachoma as well as those with close contact, and at