WorldWideScience

Sample records for class cluster analysis

  1. Investigating Subtypes of Child Development: A Comparison of Cluster Analysis and Latent Class Cluster Analysis in Typology Creation

    Science.gov (United States)

    DiStefano, Christine; Kamphaus, R. W.

    2006-01-01

    Two classification methods, latent class cluster analysis and cluster analysis, are used to identify groups of child behavioral adjustment underlying a sample of elementary school children aged 6 to 11 years. Behavioral rating information across 14 subscales was obtained from classroom teachers and used as input for analyses. Both the procedures…

  2. Patterns of Brucellosis Infection Symptoms in Azerbaijan: A Latent Class Cluster Analysis

    OpenAIRE

    Rita Ismayilova; Emilya Nasirova; Colleen Hanou; Rivard, Robert G.; Bautista, Christian T.

    2014-01-01

    Brucellosis infection is a multisystem disease, with a broad spectrum of symptoms. We investigated the existence of clusters of infected patients according to their clinical presentation. Using national surveillance data from the Electronic-Integrated Disease Surveillance System, we applied a latent class cluster (LCC) analysis on symptoms to determine clusters of brucellosis cases. A total of 454 cases reported between July 2011 and July 2013 were analyzed. LCC identified a two-cluster mo...

  3. Patterns of Brucellosis Infection Symptoms in Azerbaijan: A Latent Class Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Rita Ismayilova

    2014-01-01

    Full Text Available Brucellosis infection is a multisystem disease, with a broad spectrum of symptoms. We investigated the existence of clusters of infected patients according to their clinical presentation. Using national surveillance data from the Electronic-Integrated Disease Surveillance System, we applied a latent class cluster (LCC analysis on symptoms to determine clusters of brucellosis cases. A total of 454 cases reported between July 2011 and July 2013 were analyzed. LCC identified a two-cluster model and the Vuong-Lo-Mendell-Rubin likelihood ratio supported the cluster model. Brucellosis cases in the second cluster (19% reported higher percentages of poly-lymphadenopathy, hepatomegaly, arthritis, myositis, and neuritis and changes in liver function tests compared to cases of the first cluster. Patients in the second cluster had a severe brucellosis disease course and were associated with longer delay in seeking medical attention. Moreover, most of them were from Beylagan, a region focused on sheep and goat livestock production in south-central Azerbaijan. Patients in cluster 2 accounted for one-quarter of brucellosis cases and had a more severe clinical presentation. Delay in seeking medical care may explain severe illness. Future work needs to determine the factors that influence brucellosis case seeking and identify brucellosis species, particularly among cases from Beylagan.

  4. Analysis of traffic accidents on rural highways using Latent Class Clustering and Bayesian Networks.

    Science.gov (United States)

    de Oña, Juan; López, Griselda; Mujalli, Randa; Calvo, Francisco J

    2013-03-01

    One of the principal objectives of traffic accident analyses is to identify key factors that affect the severity of an accident. However, with the presence of heterogeneity in the raw data used, the analysis of traffic accidents becomes difficult. In this paper, Latent Class Cluster (LCC) is used as a preliminary tool for segmentation of 3229 accidents on rural highways in Granada (Spain) between 2005 and 2008. Next, Bayesian Networks (BNs) are used to identify the main factors involved in accident severity for both, the entire database (EDB) and the clusters previously obtained by LCC. The results of these cluster-based analyses are compared with the results of a full-data analysis. The results show that the combined use of both techniques is very interesting as it reveals further information that would not have been obtained without prior segmentation of the data. BN inference is used to obtain the variables that best identify accidents with killed or seriously injured. Accident type and sight distance have been identify in all the cases analysed; other variables such as time, occupant involved or age are identified in EDB and only in one cluster; whereas variables vehicles involved, number of injuries, atmospheric factors, pavement markings and pavement width are identified only in one cluster.

  5. Clustering Educational Digital Library Usage Data: A Comparison of Latent Class Analysis and K-Means Algorithms

    Science.gov (United States)

    Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei

    2013-01-01

    This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…

  6. Context-sensitive intra-class clustering

    KAUST Repository

    Yu, Yingwei

    2014-02-01

    This paper describes a new semi-supervised learning algorithm for intra-class clustering (ICC). ICC partitions each class into sub-classes in order to minimize overlap across clusters from different classes. This is achieved by allowing partitioning of a certain class to be assisted by data points from other classes in a context-dependent fashion. The result is that overlap across sub-classes (both within- and across class) is greatly reduced. ICC is particularly useful when combined with algorithms that assume that each class has a unimodal Gaussian distribution (e.g., Linear Discriminant Analysis (LDA), quadratic classifiers), an assumption that is not always true in many real-world situations. ICC can help partition non-Gaussian, multimodal distributions to overcome such a problem. In this sense, ICC works as a preprocessor. Experiments with our ICC algorithm on synthetic data sets and real-world data sets indicated that it can significantly improve the performance of LDA and quadratic classifiers. We expect our approach to be applicable to a broader class of pattern recognition problems where class-conditional densities are significantly non-Gaussian or multi-modal. © 2013 Elsevier Ltd. All rights reserved.

  7. 1842676957299765Latent class cluster analysis to understand heterogeneity in prostate cancer treatment utilities

    Directory of Open Access Journals (Sweden)

    Meghani Salimah

    2009-01-01

    Full Text Available Abstract Background Men with prostate cancer are often challenged to choose between conservative management and a range of available treatment options each carrying varying risks and benefits. The trade-offs are between an improved life-expectancy with treatment accompanied by important risks such as urinary incontinence and erectile dysfunction. Previous studies of preference elicitation for prostate cancer treatment have found considerable heterogeneity in individuals' preferences for health states given similar treatments and clinical risks. Methods Using latent class mixture model (LCA, we first sought to understand if there are unique patterns of heterogeneity or subgroups of individuals based on their prostate cancer treatment utilities (calculated time trade-off utilities for various health states and if such unique subgroups exist, what demographic and urological variables may predict membership in these subgroups. Results The sample (N = 244 included men with prostate cancer (n = 188 and men at-risk for disease (n = 56. The sample was predominantly white (77%, with mean age of 60 years (SD ± 9.5. Most (85.9% were married or living with a significant other. Using LCA, a three class solution yielded the best model evidenced by the smallest Bayesian Information Criterion (BIC, substantial reduction in BIC from a 2-class solution, and Lo-Mendell-Rubin significance of < .001. The three identified clusters were named high-traders (n = 31, low-traders (n = 116, and no-traders (n = 97. High-traders were more likely to trade survival time associated with treatment to avoid potential risks of treatment. Low-traders were less likely to trade survival time and accepted risks of treatment. The no-traders were likely to make no trade-offs in any direction favouring the status quo. There was significant difference among the clusters in the importance of sexual activity (Pearson's χ2 = 16.55, P = 0.002; Goodman and Kruskal tau = 0.039, P < 0.001. In

  8. Cluster analysis

    CERN Document Server

    Everitt, Brian S; Leese, Morven; Stahl, Daniel

    2011-01-01

    Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.Real life examples are used throughout to demons

  9. Clustering and combining pattern of metabolic syndrome components among Iranian population with latent class analysis

    Science.gov (United States)

    Abbasi-Ghahramanloo, Abbas; Soltani, Sepideh; Gholami, Ali; Erfani, Mohammadreza; Yosaee, Somayeh

    2016-01-01

    Background: Metabolic syndrome (MetS), a combination of coronary heart disease and diabetes mellitus risk factor, refer to one of the most challenging public health issues in worldwide. The aim of this study was to identify the subgroups of participants in a study on the basis of MetS components. Methods: The cross-sectional study took place in the districts related to Tehran University of Medical Sciences. The randomly selected sample consists of 415 subjects. All participants provided written informed consent. Latent class analysis was performed to achieve the study’s objectives. Analyses were conducted by using proc LCA in SAS 9.2 software. Results: Except systolic and diastolic blood pressure, the prevalence of all MetS components is common in female than male. Four latent classes were identified: (a) non MetS, (b) low risk, (c) high risk, and (d) MetS. Notably, 24.2% and 1.3% of the subjects were in the high risk and MetS classes respectively. Conclusion: Most of the study participants were identified as high risk and MetS. Design and implementation of preventive interventions for this segment of the population are necessary. PMID:28210610

  10. Cluster analysis for applications

    CERN Document Server

    Anderberg, Michael R

    1973-01-01

    Cluster Analysis for Applications deals with methods and various applications of cluster analysis. Topics covered range from variables and scales to measures of association among variables and among data units. Conceptual problems in cluster analysis are discussed, along with hierarchical and non-hierarchical clustering methods. The necessary elements of data analysis, statistics, cluster analysis, and computer implementation are integrated vertically to cover the complete path from raw data to a finished analysis.Comprised of 10 chapters, this book begins with an introduction to the subject o

  11. CLEAN: CLustering Enrichment ANalysis

    Directory of Open Access Journals (Sweden)

    Medvedovic Mario

    2009-07-01

    Full Text Available Abstract Background Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyses have been used to identify biologically meaningful clusters, compare clustering algorithms and identify biological pathways associated with the biological process under investigation. Results We developed a computational framework for analytically and visually integrating knowledge-based functional categories with the cluster analysis of genomics data. The framework is based on the simple, conceptually appealing, and biologically interpretable gene-specific functional coherence score (CLEAN score. The score is derived by correlating the clustering structure as a whole with functional categories of interest. We directly demonstrate that integrating biological knowledge in this way improves the reproducibility of conclusions derived from cluster analysis. The CLEAN score differentiates between the levels of functional coherence for genes within the same cluster based on their membership in enriched functional categories. We show that this aspect results in higher reproducibility across independent datasets and produces more informative genes for distinguishing different sample types than the scores based on the traditional cluster-wide analysis. We also demonstrate the utility of the CLEAN framework in comparing clusterings produced by different algorithms. CLEAN was implemented as an add-on R package and can be downloaded at http://Clusteranalysis.org. The package integrates routines for calculating gene specific functional coherence scores and the open source interactive Java-based viewer Functional TreeView (FTreeView. Conclusion Our results indicate that using the gene-specific functional coherence score improves the reproducibility of the

  12. Genomic sequence analysis of the 238-kb swine segment with a cluster of TRIM and olfactory receptor genes located, but with no class I genes, at the distal end of the SLA class I region.

    Science.gov (United States)

    Ando, Asako; Shigenari, Atsuko; Kulski, Jerzy K; Renard, Christine; Chardon, Patrick; Shiina, Takashi; Inoko, Hidetoshi

    2005-12-01

    Continuous genomic sequence has been previously determined for the swine leukocyte antigen (SLA) class I region from the TNF gene cluster at the border between the major histocompatibility complex (MHC) class III and class I regions to the UBD gene at the telomeric end of the classical class I gene cluster (SLA-1 to SLA-5, SLA-9, SLA-11). To complete the genomic sequence of the entire SLA class I genomic region, we have analyzed the genomic sequences of two BAC clones carrying a continuous 237,633-bp-long segment spanning from the TRIM15 gene to the UBD gene located on the telomeric side of the classical SLA class I gene cluster. Fifteen non-class I genes, including the zinc finger and the tripartite motif (TRIM) ring-finger-related family genes and olfactory receptor genes, were identified in the 238-kilobase (kb) segment, and their location in the segment was similar to their apparent human homologs. In contrast, a human segment (alpha block) spanning about 375 kb from the gene ETF1P1 and from the HLA-J to HLA-F genes was absent from the 238-kb swine segment. We conclude that the gene organization of the MHC non-class I genes located in the telomeric side of the classical SLA class I gene cluster is remarkably similar between the swine and the human segments, although the swine lacks a 375-kb segment corresponding to the human alpha block.

  13. Mutation classes of finite type cluster algebras with principal coefficients

    CERN Document Server

    Seven, Ahmet

    2011-01-01

    In this paper, we prove Conjecture 4.8 of "Cluster algebras IV" by S. Fomin and A. Zelevinsky, stating that the mutation classes of rectangular matrices associated with cluster algebras of finite type are precisely those classes which are finite.

  14. Supermodel Analysis of Galaxy Clusters

    CERN Document Server

    Fusco-Femiano, R; Lapi, A

    2009-01-01

    [abridged] We present the analysis of the X-ray brightness and temperature profiles for six clusters belonging to both the Cool Core and Non Cool Core classes, in terms of the Supermodel (SM) developed by Cavaliere, Lapi & Fusco-Femiano (2009). Based on the gravitational wells set by the dark matter halos, the SM straightforwardly expresses the equilibrium of the IntraCluster Plasma (ICP) modulated by the entropy deposited at the boundary by standing shocks from gravitational accretion, and injected at the center by outgoing blastwaves from mergers or from outbursts of Active Galactic Nuclei. The cluster set analyzed here highlights not only how simply the SM represents the main dichotomy Cool vs. Non Cool Core clusters in terms of a few ICP parameters governing the radial entropy run, but also how accurately it fits even complex brightness and temperature profiles. For Cool Core clusters like A2199 and A2597, the SM with a low level of central entropy straightforwardly yields the characteristic peaked pr...

  15. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    Directory of Open Access Journals (Sweden)

    Landfors Mattias

    2010-10-01

    Full Text Available Abstract Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered, missing value imputation (2, standardization of data (2, gene selection (19 or clustering method (11. The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that

  16. Arabic web pages clustering and annotation using semantic class features

    Directory of Open Access Journals (Sweden)

    Hanan M. Alghamdi

    2014-12-01

    Full Text Available To effectively manage the great amount of data on Arabic web pages and to enable the classification of relevant information are very important research problems. Studies on sentiment text mining have been very limited in the Arabic language because they need to involve deep semantic processing. Therefore, in this paper, we aim to retrieve machine-understandable data with the help of a Web content mining technique to detect covert knowledge within these data. We propose an approach to achieve clustering with semantic similarities. This approach comprises integrating k-means document clustering with semantic feature extraction and document vectorization to group Arabic web pages according to semantic similarities and then show the semantic annotation. The document vectorization helps to transform text documents into a semantic class probability distribution or semantic class density. To reach semantic similarities, the approach extracts the semantic class features and integrates them into the similarity weighting schema. The quality of the clustering result has evaluated the use of the purity and the mean intra-cluster distance (MICD evaluation measures. We have evaluated the proposed approach on a set of common Arabic news web pages. We have acquired favorable clustering results that are effective in minimizing the MICD, expanding the purity and lowering the runtime.

  17. Cyclist–motorist crash patterns in Denmark: A latent class clustering approach

    DEFF Research Database (Denmark)

    Kaplan, Sigal; Prato, Carlo Giacomo

    2013-01-01

    Objective: The current study aimed at uncovering patterns of cyclist–motorist crashes in Denmark and investigating their prevalence and severity. The importance of implementing clustering techniques for providing a holistic overview of vulnerable road users’ crash patterns derives from the need...... to prioritize safety issues and to devise efficient preventive measures. Method: The current study focused on cyclist–motorist crashes that occurred in Denmark during the period between 2007 and 2011. To uncover crash patterns, the current analysis applied latent class clustering, an unsupervised probabilistic...... clustering approach that relies on the statistical concept of likelihood and allows partial overlap across clusters. Results: The analysis yielded 13 distinguishable cyclist–motorist latent classes. Specific crash patterns for urban and rural areas were revealed. Prevalent features that allowed...

  18. Semantic Analysis of Virtual Classes and Nested Classes

    DEFF Research Database (Denmark)

    Madsen, Ole Lehrmann

    1999-01-01

    Virtual classes and nested classes are distinguishing features of BETA. Nested classes originated from Simula, but until recently they have not been part of main stream object- oriented languages. C++ has a restricted form of nested classes and they were included in Java 1.1. Virtual classes...... the central elements of the semantic analysis used in the Mjølner BETA compiler....

  19. Integrative cluster analysis in bioinformatics

    CERN Document Server

    Abu-Jamous, Basel; Nandi, Asoke K

    2015-01-01

    Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review o

  20. A Survey of Popular R Packages for Cluster Analysis

    Science.gov (United States)

    Flynt, Abby; Dean, Nema

    2016-01-01

    Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans, and hclust functions; the mclust library; the poLCA…

  1. Fatal and serious road crashes involving young New Zealand drivers: a latent class clustering approach

    DEFF Research Database (Denmark)

    Weiss, Harold B.; Kaplan, Sigal; Prato, Carlo Giacomo

    2016-01-01

    classification that revealed how the identified clusters contain mostly crashes of a particular class and all the crashes of that class. The results raised three major safety concerns for young drivers that should be addressed: (1) reckless driving and traffic law violations; (2) inattention, error, and hazard......The over-representation of young drivers in road crashes remains an important concern worldwide. Cluster analysis has been applied to young driver sub-groups, but its application by analysing crash occurrence is just emerging. We present a classification analysis that advances the field through...... a holistic overview of crash patterns useful for designing youth-targeted road safety programmes. We compiled a database of 8644 New Zealand crashes from 2002 to 2011 involving at least one 15–24-year-old driver and a fatal or serious injury for at least one road user. We considered crash location...

  2. Mapping Cigarettes Similarities using Cluster Analysis Methods

    Directory of Open Access Journals (Sweden)

    Lorentz Jäntschi

    2007-09-01

    Full Text Available The aim of the research was to investigate the relationship and/or occurrences in and between chemical composition information (tar, nicotine, carbon monoxide, market information (brand, manufacturer, price, and public health information (class, health warning as well as clustering of a sample of cigarette data. A number of thirty cigarette brands have been analyzed. Six categorical (cigarette brand, manufacturer, health warnings, class and four continuous (tar, nicotine, carbon monoxide concentrations and package price variables were collected for investigation of chemical composition, market information and public health information. Multiple linear regression and two clusterization techniques have been applied. The study revealed interesting remarks. The carbon monoxide concentration proved to be linked with tar and nicotine concentration. The applied clusterization methods identified groups of cigarette brands that shown similar characteristics. The tar and carbon monoxide concentrations were the main criteria used in clusterization. An analysis of a largest sample could reveal more relevant and useful information regarding the similarities between cigarette brands.

  3. Global Clustering Quality Coefficient Assessing the Efficiency of PCA Class Assignment

    Directory of Open Access Journals (Sweden)

    Mirela Praisler

    2014-01-01

    Full Text Available An essential factor influencing the efficiency of the predictive models built with principal component analysis (PCA is the quality of the data clustering revealed by the score plots. The sensitivity and selectivity of the class assignment are strongly influenced by the relative position of the clusters and by their dispersion. We are proposing a set of indicators inspired from analytical geometry that may be used for an objective quantitative assessment of the data clustering quality as well as a global clustering quality coefficient (GCQC that is a measure of the overall predictive power of the PCA models. The use of these indicators for evaluating the efficiency of the PCA class assignment is illustrated by a comparative study performed for the identification of the preprocessing function that is generating the most efficient PCA system screening for amphetamines based on their GC-FTIR spectra. The GCQC ranking of the tested feature weights is explained based on estimated density distributions and validated by using quadratic discriminant analysis (QDA.

  4. Nonlinear analysis of EAS clusters

    CERN Document Server

    Zotov, M Yu; Fomin, Y A; Fomin, Yu. A.

    2002-01-01

    We apply certain methods of nonlinear time series analysis to the extensive air shower clusters found earlier in the data set obtained with the EAS-1000 Prototype array. In particular, we use the Grassberger-Procaccia algorithm to compute the correlation dimension of samples in the vicinity of the clusters. The validity of the results is checked by surrogate data tests and some additional quantities. We compare our conclusions with the results of similar investigations performed by the EAS-TOP and LAAS groups.

  5. Coupled Two-Way Clustering Analysis of Breast Cancer and Colon Cancer Gene Expression Data

    CERN Document Server

    Getz, G; Kela, I; Domany, E; Notterman, D A; Getz, Gad; Gal, Hilah; Kela, Itai; Domany, Eytan; Notterman, Dan A.

    2003-01-01

    We present and review Coupled Two Way Clustering, a method designed to mine gene expression data. The method identifies submatrices of the total expression matrix, whose clustering analysis reveals partitions of samples (and genes) into biologically relevant classes. We demonstrate, on data from colon and breast cancer, that we are able to identify partitions that elude standard clustering analysis.

  6. The SMART CLUSTER METHOD - adaptive earthquake cluster analysis and declustering

    Science.gov (United States)

    Schaefer, Andreas; Daniell, James; Wenzel, Friedemann

    2016-04-01

    Earthquake declustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity with usual applications comprising of probabilistic seismic hazard assessments (PSHAs) and earthquake prediction methods. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation. Various methods have been developed to address this issue from other researchers. These have differing ranges of complexity ranging from rather simple statistical window methods to complex epidemic models. This study introduces the smart cluster method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal identification. Hereby, an adaptive search algorithm for data point clusters is adopted. It uses the earthquake density in the spatio-temporal neighbourhood of each event to adjust the search properties. The identified clusters are subsequently analysed to determine directional anisotropy, focussing on a strong correlation along the rupture plane and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010/2011 Darfield-Christchurch events, an adaptive classification procedure is applied to disassemble subsequent ruptures which may have been grouped into an individual cluster using near-field searches, support vector machines and temporal splitting. The steering parameters of the search behaviour are linked to local earthquake properties like magnitude of completeness, earthquake density and Gutenberg-Richter parameters. The method is capable of identifying and classifying earthquake clusters in space and time. It is tested and validated using earthquake data from California and New Zealand. As a result of the cluster identification process, each event in

  7. Cluster Analysis of Ranunculus Species

    Directory of Open Access Journals (Sweden)

    SURANTO

    2002-01-01

    Full Text Available The aim of the experiment was to examine whether the morphological characters of eleven species of Ranunculus collected from a number of populations were in agreement with the genetic data (isozyme. The method used in this study was polyacrilamide gel electrophoresis using peroxides, estarase, malate dehydrogenase, and acid phosphatase enzymes. The results showed that cluster analysis based on isozyme data have given a good support to classification of eleven species based on morphological groups. This study concluded that in certain species each morphological variation was profit to be genetically based.

  8. Parallel unstructured AMR and gigabit networking for Beowulf-class clusters

    Science.gov (United States)

    Norton, C. D.; Cwik, T. A.

    2001-01-01

    The impact of gigabit networking with Myrinet 2000 hardware and MPICH-GM software on a 2-way SMP Beowulf-class cluster for parallel unstructured adaptive mesh refinement using the PYRAMID library is described.

  9. Measuring Class Cohesion Based on Dependence Analysis

    Institute of Scientific and Technical Information of China (English)

    Zhen-Qiang Chen; Bao-Wen Xu; Yu-Ming Zhou

    2004-01-01

    Classes are the basic modules in object-oriented (OO) software, which consist of attributes and methods. Thus, in OO environment, the cohesion is mainly about the tightness of the attributes and methods of classes. This paper discusses the relationships between attributes and attributes, attributes and methods, methods and methods of a class based on dependence analysis. Then the paper presents methods to compute these dependencies. Based on these, the paper proposes a method to measure the class cohesion, which satisfies the properties that a good measurement should have. The approach overcomes the limitations of previous class cohesion measures, which consider only one or two of the three relationships in a class.

  10. A Beowulf-class computing cluster for the Monte Carlo production of the LHCb experiment

    CERN Document Server

    Avoni, G; Bertin, A; Bruschi, M; Capponi, M; Carbone, A; Collamati, A; De Castro, S; Fabbri, Franco Luigi; Faccioli, P; Galli, D; Giacobbe, B; Lax, I; Marconi, U; Massa, I; Piccinini, M; Poli, M; Semprini-Cesari, N; Spighi, R; Vagnoni, V M; Vecchi, S; Villa, M; Vitale, A; Zoccoli, A

    2003-01-01

    The computing cluster built at Bologna to provide the LHCb Collaboration with a powerful Monte Carlo production tool is presented. It is a performance oriented Beowulf-class cluster, made of rack mounted commodity components, designed to minimize operational support requirements and to provide full and continuous availability of the computing resources. In this paper we describe the architecture of the cluster, and discuss the technical solutions adopted for each specialized sub-system.

  11. Latent cluster analysis of ALS phenotypes identifies prognostically differing groups.

    Directory of Open Access Journals (Sweden)

    Jeban Ganesalingam

    Full Text Available BACKGROUND: Amyotrophic lateral sclerosis (ALS is a degenerative disease predominantly affecting motor neurons and manifesting as several different phenotypes. Whether these phenotypes correspond to different underlying disease processes is unknown. We used latent cluster analysis to identify groupings of clinical variables in an objective and unbiased way to improve phenotyping for clinical and research purposes. METHODS: Latent class cluster analysis was applied to a large database consisting of 1467 records of people with ALS, using discrete variables which can be readily determined at the first clinic appointment. The model was tested for clinical relevance by survival analysis of the phenotypic groupings using the Kaplan-Meier method. RESULTS: The best model generated five distinct phenotypic classes that strongly predicted survival (p<0.0001. Eight variables were used for the latent class analysis, but a good estimate of the classification could be obtained using just two variables: site of first symptoms (bulbar or limb and time from symptom onset to diagnosis (p<0.00001. CONCLUSION: The five phenotypic classes identified using latent cluster analysis can predict prognosis. They could be used to stratify patients recruited into clinical trials and generating more homogeneous disease groups for genetic, proteomic and risk factor research.

  12. Introduction to Latent Class Analysis with Applications

    Science.gov (United States)

    Porcu, Mariano; Giambona, Francesca

    2017-01-01

    Latent class analysis (LCA) is a statistical method used to group individuals (cases, units) into classes (categories) of an unobserved (latent) variable on the basis of the responses made on a set of nominal, ordinal, or continuous observed variables. In this article, we introduce LCA in order to demonstrate its usefulness to early adolescence…

  13. Mini-Cluster on Teaching about the 1%, the Rich, the Upper Class, the Ruling Class

    Directory of Open Access Journals (Sweden)

    Marcial González

    2015-02-01

    Full Text Available Marcial González, Greg Meyerson, and Richard Ohmann worked together on these three articles. We spoke on a panel organized by the Radical Caucus of the Modern Language Association for MLA's 2014 convention. Our topic was “Teaching About the 1%, the Rich, the Upper Class, the Ruling Class . . . . " As that list suggests, we meant to explore common ways of conceptualizing the wealthiest people in the U. S., and in capitalist society generally. We argued that the best way is to see them structurally, as integral to a class system. And we sketched out ways for teachers to do that.

  14. A dense micro-cluster of Class 0 protostars in NGC 2264 D-MM1

    CERN Document Server

    Teixeira, Paula S; Lada, Charles J

    2007-01-01

    We present sensitive and high angular resolution (~1") 1.3 mm continuum observations of the dusty core D-MM1 in the Spokes cluster in NGC 2264 using the Submillimeter Array. A dense micro-cluster of seven Class 0 sources was detected in a 20" x 20" region with masses between 0.4 to 1.2 solar masses and deconvolved sizes of about 600 AU. We interpret the 1.3 mm emission as arising from the envelopes of the Class 0 protostellar sources. The mean separation of the 11 known sources (SMA Class 0 and previously known infrared sources) within D-MM1 is considerably smaller than the characteristic spacing between sources in the larger Spokes cluster and is consistent with hierarchical thermal fragmentation of the dense molecular gas in this region.

  15. Three classes of plasmid (47-63 kb) carry the type B neurotoxin gene cluster of group II Clostridium botulinum.

    Science.gov (United States)

    Carter, Andrew T; Austin, John W; Weedmark, Kelly A; Corbett, Cindi; Peck, Michael W

    2014-08-01

    Pulsed-field gel electrophoresis and DNA sequence analysis of 26 strains of Group II (nonproteolytic) Clostridium botulinum type B4 showed that 23 strains carried their neurotoxin gene cluster on a 47-63 kb plasmid (three strains lacked any hybridization signal for the neurotoxin gene, presumably having lost their plasmid). Unexpectedly, no neurotoxin genes were found on the chromosome. This apparent constraint on neurotoxin gene transfer to the chromosome stands in marked contrast to Group I C. botulinum, in which neurotoxin gene clusters are routinely found in both locations. The three main classes of type B4 plasmid identified in this study shared different regions of homology, but were unrelated to any Group I or Group III plasmid. An important evolutionary aspect firmly links plasmid class to geographical origin, with one class apparently dominant in marine environments, whereas a second class is dominant in European terrestrial environments. A third class of plasmid is a hybrid between the other two other classes, providing evidence for contact between these seemingly geographically separated populations. Mobility via conjugation has been previously demonstrated for the type B4 plasmid of strain Eklund 17B, and similar genes associated with conjugation are present in all type B4 plasmids now described. A plasmid toxin-antitoxin system pemI gene located close to the neurotoxin gene cluster and conserved in each type B4 plasmid class may be important in understanding the mechanism which regulates this unique and unexpected bias toward plasmid-borne neurotoxin genes in Group II C. botulinum type B4.

  16. Robust cluster analysis and variable selection

    CERN Document Server

    Ritter, Gunter

    2014-01-01

    Clustering remains a vibrant area of research in statistics. Although there are many books on this topic, there are relatively few that are well founded in the theoretical aspects. In Robust Cluster Analysis and Variable Selection, Gunter Ritter presents an overview of the theory and applications of probabilistic clustering and variable selection, synthesizing the key research results of the last 50 years. The author focuses on the robust clustering methods he found to be the most useful on simulated data and real-time applications. The book provides clear guidance for the varying needs of bot

  17. Cluster analysis for computer workload evaluation

    CERN Document Server

    Landau, K

    1976-01-01

    An introduction to computer workload analysis is given, showing its range of application in computer centre management, system and application programming. Cluster methods are discussed which can be used in conjunction with workload data and cluster algorithms are adapted to the specific set problem. Several samples of CDC 7600- accounting-data-collected at CERN, the European Organization for Nuclear Research-underwent a cluster analysis to determine job groups. The conclusions from resource usage of typical job groups in relation to computer workload analysis are discussed. (17 refs).

  18. ASteCA - Automated Stellar Cluster Analysis

    CERN Document Server

    Perren, Gabriel I; Piatti, Andrés E

    2014-01-01

    We present ASteCA (Automated Stellar Cluster Analysis), a suit of tools designed to fully automatize the standard tests applied on stellar clusters to determine their basic parameters. The set of functions included in the code make use of positional and photometric data to obtain precise and objective values for a given cluster's center coordinates, radius, luminosity function and integrated color magnitude, as well as characterizing through a statistical estimator its probability of being a true physical cluster rather than a random overdensity of field stars. ASteCA incorporates a Bayesian field star decontamination algorithm capable of assigning membership probabilities using photometric data alone. An isochrone fitting process based on the generation of synthetic clusters from theoretical isochrones and selection of the best fit through a genetic algorithm is also present, which allows ASteCA to provide accurate estimates for a cluster's metallicity, age, extinction and distance values along with its unce...

  19. DEPENDENCE ANALYSIS FOR UML CLASS DIAGRAMS

    Institute of Scientific and Technical Information of China (English)

    Wu Fangjun; Yi Tong

    2004-01-01

    Though Unified Modeling Language (UML) has been widely used in software development, the major problems confronted lie in comprehension and testing. Dependence analysis is an important approach to analyze, understand, test and maintain programs. A new kind of dependence analysis method for UML class diagrams is developed. A set of dependence relations is definedcorresponding to the relations among classes. Thus, the dependence graph of UML class diagram can be constructed from these dependence relations. Based on this model, both slicing and measurement coupling are further given as its two applications.

  20. Cluster Analysis of Adolescent Blogs

    Science.gov (United States)

    Liu, Eric Zhi-Feng; Lin, Chun-Hung; Chen, Feng-Yi; Peng, Ping-Chuan

    2012-01-01

    Emerging web applications and networking systems such as blogs have become popular, and they offer unique opportunities and environments for learners, especially for adolescent learners. This study attempts to explore the writing styles and genres used by adolescents in their blogs by employing content, factor, and cluster analyses. Factor…

  1. [Cluster analysis and its application].

    Science.gov (United States)

    Půlpán, Zdenĕk

    2002-01-01

    The study exploits knowledge-oriented and context-based modification of well-known algorithms of (fuzzy) clustering. The role of fuzzy sets is inherently inclined towards coping with linguistic domain knowledge also. We try hard to obtain from rich diverse data and knowledge new information about enviroment that is being explored.

  2. Merged consensus clustering to assess and improve class discovery with microarray data

    Directory of Open Access Journals (Sweden)

    Jarman Andrew P

    2010-12-01

    Full Text Available Abstract Background One of the most commonly performed tasks when analysing high throughput gene expression data is to use clustering methods to classify the data into groups. There are a large number of methods available to perform clustering, but it is often unclear which method is best suited to the data and how to quantify the quality of the classifications produced. Results Here we describe an R package containing methods to analyse the consistency of clustering results from any number of different clustering methods using resampling statistics. These methods allow the identification of the the best supported clusters and additionally rank cluster members by their fidelity within the cluster. These metrics allow us to compare the performance of different clustering algorithms under different experimental conditions and to select those that produce the most reliable clustering structures. We show the application of this method to simulated data, canonical gene expression experiments and our own novel analysis of genes involved in the specification of the peripheral nervous system in the fruitfly, Drosophila melanogaster. Conclusions Our package enables users to apply the merged consensus clustering methodology conveniently within the R programming environment, providing both analysis and graphical display functions for exploring clustering approaches. It extends the basic principle of consensus clustering by allowing the merging of results between different methods to provide an averaged clustering robustness. We show that this extension is useful in correcting for the tendency of clustering algorithms to treat outliers differently within datasets. The R package, clusterCons, is freely available at CRAN and sourceforge under the GNU public licence.

  3. Identification of rural landscape classes through a GIS clustering method

    Directory of Open Access Journals (Sweden)

    Irene Diti

    2013-09-01

    Full Text Available The paper presents a methodology aimed at supporting the rural planning process. The analysis of the state of the art of local and regional policies focused on rural and suburban areas, and the study of the scientific literature in the field of spatial analysis methodologies, have allowed the definition of the basic concept of the research. The proposed method, developed in a GIS, is based on spatial metrics selected and defined to cover various agricultural, environmental, and socio-economic components. The specific goal of the proposed methodology is to identify homogeneous extra-urban areas through their objective characterization at different scales. Once areas with intermediate urban-rural characters have been identified, the analysis is then focused on the more detailed definition of periurban agricultural areas. The synthesis of the results of the analysis of the various landscape components is achieved through an original interpretative key which aims to quantify the potential impacts of rural areas on the urban system. This paper presents the general framework of the methodology and some of the main results of its first implementation through an Italian case study.

  4. Using cluster analysis to explore survey data.

    Science.gov (United States)

    Spencer, Llinos; Roberts, Gwerfyl; Irvine, Fiona; Jones, Peter; Baker, Colin

    2007-01-01

    Llinos Haf Spencer reports on the use of the cluster analysis statistical technique in nursing research and uses data from the Welsh Language Awareness in Healthcare Provision in Wales survey as an exemplar She concludes that cluster analysis is a valuable tool to tease out patterns in data that are not initially evident in bivariate analyses and thus should be considered as a viable option for nursing research.

  5. Cluster Analysis of the Malaysian Hipposideros

    Science.gov (United States)

    Sazali, Siti Nurlydia; Laman, Charlie J.; Abdullah, M. T.

    2008-01-01

    A preliminary study on the morphometric variations among species in the genus Hipposideros was conducted using voucher specimens from the Universiti Malaysia Sarawak (UNIMAS) Zoological Museum and the Department of Wildlife and National Park (DWNP) Kuala Lumpur. A total of 24 individuals from six species of this genus were morphologically studied where all related measurements of body, skull and dental were measured and recorded. The statistical data subjected to the cluster analysis shows that the genus Hipposideros is divided into two major clusters where each species was clearly separated. The cluster analysis among Hipposideros species is useful for aiding in species identification.

  6. Cluster Analysis and Clinical Asthma Phenotypes

    Science.gov (United States)

    Shaw, Dominic E.; Berry, Michael A.; Thomas, Michael; Brightling, Christopher E.; Wardlaw, Andrew J.

    2014-01-01

    Rationale Heterogeneity in asthma expression is multidimensional, including variability in clinical, physiologic, and pathologic parameters. Classification requires consideration of these disparate domains in a unified model. Objectives To explore the application of a multivariate mathematical technique, k-means cluster analysis, for identifying distinct phenotypic groups. Methods We performed k-means cluster analysis in three independent asthma populations. Clusters of a population managed in primary care (n = 184) with predominantly mild to moderate disease, were compared with a refractory asthma population managed in secondary care (n = 187). We then compared differences in asthma outcomes (exacerbation frequency and change in corticosteroid dose at 12 mo) between clusters in a third population of 68 subjects with predominantly refractory asthma, clustered at entry into a randomized trial comparing a strategy of minimizing eosinophilic inflammation (inflammation-guided strategy) with standard care. Measurements and Main Results Two clusters (early-onset atopic and obese, noneosinophilic) were common to both asthma populations. Two clusters characterized by marked discordance between symptom expression and eosinophilic airway inflammation (early-onset symptom predominant and late-onset inflammation predominant) were specific to refractory asthma. Inflammation-guided management was superior for both discordant subgroups leading to a reduction in exacerbation frequency in the inflammation-predominant cluster (3.53 [SD, 1.18] vs. 0.38 [SD, 0.13] exacerbation/patient/yr, P = 0.002) and a dose reduction of inhaled corticosteroid in the symptom-predominant cluster (mean difference, 1,829 μg beclomethasone equivalent/d [95% confidence interval, 307–3,349 μg]; P = 0.02). Conclusions Cluster analysis offers a novel multidimensional approach for identifying asthma phenotypes that exhibit differences in clinical response to treatment algorithms. PMID:18480428

  7. Performance Analysis of Hierarchical Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    K.Ranjini

    2011-07-01

    Full Text Available Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters, so that the data in each subset (ideally share some common trait - often proximity according to some defined distance measure. Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. This paper explains the implementation of agglomerative and divisive clustering algorithms applied on various types of data. The details of the victims of Tsunami in Thailand during the year 2004, was taken as the test data. Visual programming is used for implementation and running time of the algorithms using different linkages (agglomerative to different types of data are taken for analysis.

  8. Clustering analysis of telecommunication customers

    Institute of Scientific and Technical Information of China (English)

    REN Hong; ZHENG Yan; WU Ye-rong

    2009-01-01

    In this article, a clustering method based on genetic algorithm (GA) for telecommunication customer subdivision is presented. First, the features of telecommunication customers (such as the calling behavior and consuming behavior) are extracted. Second, the similarities between the multidimensional feature vectors of telecommunication customers are computed and mapped as the distance between samples on a two-dimensional plane. Finally, the distances are adjusted to approximate the similarities gradually by GA. One advantage of this method is the independent distribution of the sample space. The experiments demonstrate the feasibility of the proposed method.

  9. Filtering Genes for Cluster and Network Analysis

    Directory of Open Access Journals (Sweden)

    Parkhomenko Elena

    2009-06-01

    Full Text Available Abstract Background Prior to cluster analysis or genetic network analysis it is customary to filter, or remove genes considered to be irrelevant from the set of genes to be analyzed. Often genes whose variation across samples is less than an arbitrary threshold value are deleted. This can improve interpretability and reduce bias. Results This paper introduces modular models for representing network structure in order to study the relative effects of different filtering methods. We show that cluster analysis and principal components are strongly affected by filtering. Filtering methods intended specifically for cluster and network analysis are introduced and compared by simulating modular networks with known statistical properties. To study more realistic situations, we analyze simulated "real" data based on well-characterized E. coli and S. cerevisiae regulatory networks. Conclusion The methods introduced apply very generally, to any similarity matrix describing gene expression. One of the proposed methods, SUMCOV, performed well for all models simulated.

  10. Structural variation of the ribosomal gene cluster within the class Insecta

    Energy Technology Data Exchange (ETDEWEB)

    Mukha, D.V.; Sidorenko, A.P.; Lazebnaya, I.V. [Vavilov Institute of General Genetics, Moscow (Russian Federation)] [and others

    1995-09-01

    General estimation of ribosomal DNA variation within the class Insecta is presented. It is shown that, using blot-hybridization, one can detect differences in the structure of the ribosomal gene cluster not only between genera within an order, but also between species within a genera, including sibling species. Structure of the ribosomal gene cluster of the Coccinellidae family (ladybirds) is analyzed. It is shown that cloned highly conservative regions of ribosomal DNA of Tetrahymena pyriformis can be used as probes for analyzing ribosomal genes in insects. 24 refs., 4 figs.

  11. Clustering analysis of seismicity and aftershock identification.

    Science.gov (United States)

    Zaliapin, Ilya; Gabrielov, Andrei; Keilis-Borok, Vladimir; Wong, Henry

    2008-07-01

    We introduce a statistical methodology for clustering analysis of seismicity in the time-space-energy domain and use it to establish the existence of two statistically distinct populations of earthquakes: clustered and nonclustered. This result can be used, in particular, for nonparametric aftershock identification. The proposed approach expands the analysis of Baiesi and Paczuski [Phys. Rev. E 69, 066106 (2004)10.1103/PhysRevE.69.066106] based on the space-time-magnitude nearest-neighbor distance eta between earthquakes. We show that for a homogeneous Poisson marked point field with exponential marks, the distance eta has the Weibull distribution, which bridges our results with classical correlation analysis for point fields. The joint 2D distribution of spatial and temporal components of eta is used to identify the clustered part of a point field. The proposed technique is applied to several seismicity models and to the observed seismicity of southern California.

  12. A class of spherical, truncated, anisotropic models for application to globular clusters

    Science.gov (United States)

    de Vita, Ruggero; Bertin, Giuseppe; Zocchi, Alice

    2016-05-01

    Recently, a class of non-truncated, radially anisotropic models (the so-called f(ν)-models), originally constructed in the context of violent relaxation and modelling of elliptical galaxies, has been found to possess interesting qualities in relation to observed and simulated globular clusters. In view of new applications to globular clusters, we improve this class of models along two directions. To make them more suitable for the description of small stellar systems hosted by galaxies, we introduce a "tidal" truncation by means of a procedure that guarantees full continuity of the distribution function. The new fT(ν)-models are shown to provide a better fit to the observed photometric and spectroscopic profiles for a sample of 13 globular clusters studied earlier by means of non-truncated models; interestingly, the best-fit models also perform better with respect to the radial-orbit instability. Then, we design a flexible but simple two-component family of truncated models to study the separate issues of mass segregation and multiple populations. We do not aim at a fully realistic description of globular clusters to compete with the description currently obtained by means of dedicated simulations. The goal here is to try to identify the simplest models, that is, those with the smallest number of free parameters, but still have the capacity to provide a reasonable description for clusters that are evidently beyond the reach of one-component models. With this tool, we aim at identifying the key factors that characterize mass segregation or the presence of multiple populations. To reduce the relevant parameter space, we formulate a few physical arguments based on recent observations and simulations. A first application to two well-studied globular clusters is briefly described and discussed.

  13. Cluster and constraint analysis in tetrahedron packings.

    Science.gov (United States)

    Jin, Weiwei; Lu, Peng; Liu, Lufeng; Li, Shuixiang

    2015-04-01

    The disordered packings of tetrahedra often show no obvious macroscopic orientational or positional order for a wide range of packing densities, and it has been found that the local order in particle clusters is the main order form of tetrahedron packings. Therefore, a cluster analysis is carried out to investigate the local structures and properties of tetrahedron packings in this work. We obtain a cluster distribution of differently sized clusters, and peaks are observed at two special clusters, i.e., dimer and wagon wheel. We then calculate the amounts of dimers and wagon wheels, which are observed to have linear or approximate linear correlations with packing density. Following our previous work, the amount of particles participating in dimers is used as an order metric to evaluate the order degree of the hierarchical packing structure of tetrahedra, and an order map is consequently depicted. Furthermore, a constraint analysis is performed to determine the isostatic or hyperstatic region in the order map. We employ a Monte Carlo algorithm to test jamming and then suggest a new maximally random jammed packing of hard tetrahedra from the order map with a packing density of 0.6337.

  14. Cluster randomized clinical trials in orthodontics: design, analysis and reporting issues.

    Science.gov (United States)

    Pandis, Nikolaos; Walsh, Tanya; Polychronopoulou, Argy; Eliades, Theodore

    2013-10-01

    Cluster randomized trials (CRTs) use as the unit of randomization clusters, which are usually defined as a collection of individuals sharing some common characteristics. Common examples of clusters include entire dental practices, hospitals, schools, school classes, villages, and towns. Additionally, several measurements (repeated measurements) taken on the same individual at different time points are also considered to be clusters. In dentistry, CRTs are applicable as patients may be treated as clusters containing several individual teeth. CRTs require certain methodological procedures during sample calculation, randomization, data analysis, and reporting, which are often ignored in dental research publications. In general, due to similarity of the observations within clusters, each individual within a cluster provides less information compared with an individual in a non-clustered trial. Therefore, clustered designs require larger sample sizes compared with non-clustered randomized designs, and special statistical analyses that account for the fact that observations within clusters are correlated. It is the purpose of this article to highlight with relevant examples the important methodological characteristics of cluster randomized designs as they may be applied in orthodontics and to explain the problems that may arise if clustered observations are erroneously treated and analysed as independent (non-clustered).

  15. Psychiatric comorbidity among adults with schizophrenia: a latent class analysis.

    Science.gov (United States)

    Tsai, Jack; Rosenheck, Robert A

    2013-11-30

    Schizophrenia is a severe mental illness that often co-occurs with and can be exacerbated by other psychiatric conditions. There have not been adequate efforts to examine schizophrenia and psychiatric comorbidity beyond pairwise examination using clusters of diagnoses. This study used latent class analysis to characterize patterns of 5-year psychiatric comorbidity among a national sample of adults with schizophrenia. Baseline data from 1446 adults with schizophrenia across 57 sites in the United States were analyzed. Three latent classes were identified labeled Solely Schizophrenia, Comorbid Anxiety and Depressive Disorders with Schizophrenia, and Comorbid Addiction and Schizophrenia. Adults in the Solely Schizophrenia class had significantly better mental health than those in the two comorbid classes, but poorer illness and treatment insight than those with comorbid anxiety and depressive disorders. These results suggest that addiction and schizophrenia may represent a separate latent profile from depression, anxiety, and schizophrenia. More research is needed on how treatment can take advantage of the greater insight possessed by those with schizophrenia and comorbid anxiety and depression.

  16. Cluster analysis of obesity and asthma phenotypes.

    Directory of Open Access Journals (Sweden)

    E Rand Sutherland

    Full Text Available BACKGROUND: Asthma is a heterogeneous disease with variability among patients in characteristics such as lung function, symptoms and control, body weight, markers of inflammation, and responsiveness to glucocorticoids (GC. Cluster analysis of well-characterized cohorts can advance understanding of disease subgroups in asthma and point to unsuspected disease mechanisms. We utilized an hypothesis-free cluster analytical approach to define the contribution of obesity and related variables to asthma phenotype. METHODOLOGY AND PRINCIPAL FINDINGS: In a cohort of clinical trial participants (n = 250, minimum-variance hierarchical clustering was used to identify clinical and inflammatory biomarkers important in determining disease cluster membership in mild and moderate persistent asthmatics. In a subset of participants, GC sensitivity was assessed via expression of GC receptor alpha (GCRα and induction of MAP kinase phosphatase-1 (MKP-1 expression by dexamethasone. Four asthma clusters were identified, with body mass index (BMI, kg/m(2 and severity of asthma symptoms (AEQ score the most significant determinants of cluster membership (F = 57.1, p<0.0001 and F = 44.8, p<0.0001, respectively. Two clusters were composed of predominantly obese individuals; these two obese asthma clusters differed from one another with regard to age of asthma onset, measures of asthma symptoms (AEQ and control (ACQ, exhaled nitric oxide concentration (F(ENO and airway hyperresponsiveness (methacholine PC(20 but were similar with regard to measures of lung function (FEV(1 (% and FEV(1/FVC, airway eosinophilia, IgE, leptin, adiponectin and C-reactive protein (hsCRP. Members of obese clusters demonstrated evidence of reduced expression of GCRα, a finding which was correlated with a reduced induction of MKP-1 expression by dexamethasone CONCLUSIONS AND SIGNIFICANCE: Obesity is an important determinant of asthma phenotype in adults. There is heterogeneity in

  17. Incremental multi-class semi-supervised clustering regularized by Kalman filtering.

    Science.gov (United States)

    Mehrkanoon, Siamak; Agudelo, Oscar Mauricio; Suykens, Johan A K

    2015-11-01

    This paper introduces an on-line semi-supervised learning algorithm formulated as a regularized kernel spectral clustering (KSC) approach. We consider the case where new data arrive sequentially but only a small fraction of it is labeled. The available labeled data act as prototypes and help to improve the performance of the algorithm to estimate the labels of the unlabeled data points. We adopt a recently proposed multi-class semi-supervised KSC based algorithm (MSS-KSC) and make it applicable for on-line data clustering. Given a few user-labeled data points the initial model is learned and then the class membership of the remaining data points in the current and subsequent time instants are estimated and propagated in an on-line fashion. The update of the memberships is carried out mainly using the out-of-sample extension property of the model. Initially the algorithm is tested on computer-generated data sets, then we show that video segmentation can be cast as a semi-supervised learning problem. Furthermore we show how the tracking capabilities of the Kalman filter can be used to provide the labels of objects in motion and thus regularizing the solution obtained by the MSS-KSC algorithm. In the experiments, we demonstrate the performance of the proposed method on synthetic data sets and real-life videos where the clusters evolve in a smooth fashion over time.

  18. An Analysis of Particle Swarm Optimization with Data Clustering-Technique for Optimization in Data Mining

    Directory of Open Access Journals (Sweden)

    Amreen Khan,

    2010-07-01

    Full Text Available Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. Clustering aims at representing large datasets by a fewer number of prototypes or clusters. It brings simplicity in modeling data and thus plays a central role in the process of knowledge discovery and data mining. Data mining tasks require fast and accurate partitioning of huge datasets, which may come with a variety of attributes or features. This imposes severe computational requirements on the relevant clustering techniques. A family of bio-inspired algorithms, well-known as Swarm Intelligence (SI has recently emerged that meets these requirements and has successfully been applied to a number ofreal world clustering problems. This paper looks into the use ofParticle Swarm Optimization for cluster analysis. The effectiveness of Fuzzy C-means clustering provides enhanced performance and maintains more diversity in the swarm and also allows the particles to be robust to trace the changing environment.

  19. Clustering Analysis on E-commerce Transaction Based on K-means Clustering

    Directory of Open Access Journals (Sweden)

    Xuan HUANG

    2014-02-01

    Full Text Available Based on the density, increment and grid etc, shortcomings like the bad elasticity, weak handling ability of high-dimensional data, sensitive to time sequence of data, bad independence of parameters and weak handling ability of noise are usually existed in clustering algorithm when facing a large number of high-dimensional transaction data. Making experiments by sampling data samples of the 300 mobile phones of Taobao, the following conclusions can be obtained: compared with Single-pass clustering algorithm, the K-means clustering algorithm has a high intra-class dissimilarity and inter-class similarity when analyzing e-commerce transaction. In addition, the K-means clustering algorithm has very high efficiency and strong elasticity when dealing with a large number of data items. However, clustering effects of this algorithm are affected by clustering number and initial positions of clustering center. Therefore, it is easy to show the local optimization for clustering results. Therefore, how to determine clustering number and initial positions of the clustering center of this algorithm is still the important job to be researched in the future.

  20. Predicting the decision to pursue mediation in civil disputes: a hierarchical classes analysis.

    Science.gov (United States)

    Reich, Warren A; Kressel, Kenneth; Scanlon, Kathleen M; Weiner, Gary A

    2007-11-01

    Clients (N = 185) involved in civil court cases completed the CPR Institute's Mediation Screen, which is designed to assist in making a decision about pursuing mediation. The authors modeled data using hierarchical classes analysis (HICLAS), a clustering algorithm that places clients into 1 set of classes and CPRMS items into another set of classes. HICLAS then links the sets of classes so that any class of clients can be identified in terms of the classes of items they endorsed. HICLAS-derived item classes reflected 2 underlying themes: (a) suitability of the dispute for a problem-solving process and (b) potential benefits of mediation. All clients who perceived that mediation would be beneficial also believed that the context of their conflict was favorable to mediation; however, not all clients who saw a favorable context believed they would benefit from mediation. The majority of clients who agreed to pursue mediation endorsed items reflecting both contextual suitability and perceived benefits of mediation.

  1. MANNER OF STOCKS SORTING USING CLUSTER ANALYSIS METHODS

    Directory of Open Access Journals (Sweden)

    Jana Halčinová

    2014-06-01

    Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.

  2. AMOEBA clustering revisited. [cluster analysis, classification, and image display program

    Science.gov (United States)

    Bryant, Jack

    1990-01-01

    A description of the clustering, classification, and image display program AMOEBA is presented. Using a difficult high resolution aircraft-acquired MSS image, the steps the program takes in forming clusters are traced. A number of new features are described here for the first time. Usage of the program is discussed. The theoretical foundation (the underlying mathematical model) is briefly presented. The program can handle images of any size and dimensionality.

  3. Using existing questionnaires in latent class analysis

    DEFF Research Database (Denmark)

    Nielsen, Anne Molgaard; Vach, Werner; Kent, Peter;

    2016-01-01

    BACKGROUND: Latent class analysis (LCA) is increasingly being used in health research, but optimal approaches to handling complex clinical data are unclear. One issue is that commonly used questionnaires are multidimensional, but expressed as summary scores. Using the example of low back pain (LBP...... classified into four health domains (psychology, pain, activity, and participation) using the World Health Organization's International Classification of Functioning, Disability, and Health framework. LCA was performed within each health domain using the strategies of summary-score and single-item analyses...

  4. Cluster Analysis of the Newcastle Electronic Corpus of Tyneside English: A Comparison of Methods

    NARCIS (Netherlands)

    Moisl, Hermann; Jones, Val

    2005-01-01

    This article examines the feasibility of an empirical approach to sociolinguistic analysis of the Newcastle Electronic Corpus of Tyneside English using exploratory multivariate methods. It addresses a known problem with one class of such methods, hierarchical cluster analysis¿that different clusteri

  5. Equivalent damage validation by variable cluster analysis

    Science.gov (United States)

    Drago, Carlo; Ferlito, Rachele; Zucconi, Maria

    2016-06-01

    The main aim of this work is to perform a clustering analysis on the damage relieved in the old center of L'Aquila after the earthquake occurred on April 6, 2009 and to validate an Indicator of Equivalent Damage ED that summarizes the information reported on the AeDES card regarding the level of damage and their extension on the surface of the buildings. In particular we used a sample of 13442 masonry buildings located in an area characterized by a Macroseismic Intensity equal to 8 [1]. The aim is to ensure the coherence between the clusters and its hierarchy identified in the data of damage detected and in the data of the ED elaborated.

  6. Data Clustering Analysis Based on Wavelet Feature Extraction

    Institute of Scientific and Technical Information of China (English)

    QIANYuntao; TANGYuanyan

    2003-01-01

    A novel wavelet-based data clustering method is presented in this paper, which includes wavelet feature extraction and cluster growing algorithm. Wavelet transform can provide rich and diversified information for representing the global and local inherent structures of dataset. therefore, it is a very powerful tool for clustering feature extraction. As an unsupervised classification, the target of clustering analysis is dependent on the specific clustering criteria. Several criteria that should be con-sidered for general-purpose clustering algorithm are pro-posed. And the cluster growing algorithm is also con-structed to connect clustering criteria with wavelet fea-tures. Compared with other popular clustering methods,our clustering approach provides multi-resolution cluster-ing results,needs few prior parameters, correctly deals with irregularly shaped clusters, and is insensitive to noises and outliers. As this wavelet-based clustering method isaimed at solving two-dimensional data clustering prob-lem, for high-dimensional datasets, self-organizing mapand U-matrlx method are applied to transform them intotwo-dimensional Euclidean space, so that high-dimensional data clustering analysis,Results on some sim-ulated data and standard test data are reported to illus-trate the power of our method.

  7. Constructing storyboards based on hierarchical clustering analysis

    Science.gov (United States)

    Hasebe, Satoshi; Sami, Mustafa M.; Muramatsu, Shogo; Kikuchi, Hisakazu

    2005-07-01

    There are growing needs for quick preview of video contents for the purpose of improving accessibility of video archives as well as reducing network traffics. In this paper, a storyboard that contains a user-specified number of keyframes is produced from a given video sequence. It is based on hierarchical cluster analysis of feature vectors that are derived from wavelet coefficients of video frames. Consistent use of extracted feature vectors is the key to avoid a repetition of computationally-intensive parsing of the same video sequence. Experimental results suggest that a significant reduction in computational time is gained by this strategy.

  8. A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering

    CERN Document Server

    Seldin, Yevgeny

    2010-01-01

    We formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. This formulation enables practical and theoretical comparison of different approaches to graph clustering as well as comparison of graph clustering with other possible ways to model the graph. We adapt the PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008; Seldin, 2009) to derive a PAC-Bayesian generalization bound for graph clustering. The bound shows that graph clustering should optimize a trade-off between empirical data fit and the mutual information that clusters preserve on the graph nodes. A similar trade-off derived from information-theoretic considerations was already shown to produce state-of-the-art results in practice (Slonim et al., 2005; Yom-Tov and Slonim, 2009). This paper supports the empirical evidence by providing a better theoretical foundation, suggesting formal generalization guarantees, and offering...

  9. Latent Class Analysis of Incomplete Data via an Entropy-Based Criterion.

    Science.gov (United States)

    Larose, Chantal; Harel, Ofer; Kordas, Katarzyna; Dey, Dipak K

    2016-09-01

    Latent class analysis is used to group categorical data into classes via a probability model. Model selection criteria then judge how well the model fits the data. When addressing incomplete data, the current methodology restricts the imputation to a single, pre-specified number of classes. We seek to develop an entropy-based model selection criterion that does not restrict the imputation to one number of clusters. Simulations show the new criterion performing well against the current standards of AIC and BIC, while a family studies application demonstrates how the criterion provides more detailed and useful results than AIC and BIC.

  10. 基于类轮廓层次聚类方法的研究%RESEARCH ON CLASS-PROFILE-BASED HIERARCHICAL CLUSTERING METHOD

    Institute of Scientific and Technical Information of China (English)

    孟海东; 唐旋

    2011-01-01

    传统的聚类算法在考虑类与类之间的连通性特征和近似性特征上往往顾此失彼.首先给出类边界点和类轮廓的基本定义以及寻求方法,然后基于类间连通性特征和近似性特征的综合考虑,拟定一些类间相似性度量标准和方法,最后提出一种基于类轮廓的层次聚类算法.该算法能够有效处理任意形状的簇,且能够区分孤立点和噪声数据.通过对图像数据集和Iris标准数据集的聚类分析,验证了该算法的可行性和有效性.%Traditional clustering algorithms are often incapable of roundly considering the connectivity and similarity characteristics among classes. The thesis firstly presents the fundamental definition of class boundary point and class profile; secondly, with comprehensive consideration based on connectivity characteristics and similarity characteristics among classes, defines some standards and methods for inter class similarity measurement; thirdly, proposes a class-profile-based hierarchical clustering algorithm, which is able to effectively process arbitrary shaped clusters and distinguish isolated points from noise data. The feasibility and effectiveness of the algorithm is validated through clustering analysis on image data sets and Iris standard data sets.

  11. External Defect classification of Citrus Fruit Images using Linear Discriminant Analysis Clustering and ANN classifiers

    Directory of Open Access Journals (Sweden)

    K.Vijayarekha

    2012-12-01

    Full Text Available Linear Discriminant Analysis (LDA is one technique for transforming raw data into a new feature space in which classification can be carried out more robustly. It is useful where the within-class frequencies are unequal. This method maximizes the ratio of between-class variance to the within-class variance in any particular data set and the maximal separability is guaranteed. LDA clustering models are used to classify object into different category. This study makes use of LDA for clustering the features obtained for the citrus fruit images taken in five different domains. Sub-windows of size 40x40 are cropped from the citrus fruit images having defects such as pitting, splitting and stem end rot. Features are extracted in four domains such as statistical features, fourier transform based features, discrete wavelet transform based features and stationary wavelet transform based features. The results of clustering and classification using LDA and ANN classifiers are reported

  12. Clustered Stomates in "Begonia": An Exercise in Data Collection & Statistical Analysis of Biological Space

    Science.gov (United States)

    Lau, Joann M.; Korn, Robert W.

    2007-01-01

    In this article, the authors present a laboratory exercise in data collection and statistical analysis in biological space using clustered stomates on leaves of "Begonia" plants. The exercise can be done in middle school classes by students making their own slides and seeing imprints of cells, or at the high school level through collecting data of…

  13. A Spitzer Survey of Young Stellar Clusters within One Kiloparsec of the Sun: Cluster Core Extraction and Basic Structural Analysis

    CERN Document Server

    Gutermuth, R A; Myers, P C; Allen, L E; Pipher, J L; Fazio, G G

    2009-01-01

    We present a uniform mid-infrared imaging and photometric survey of 36 young, nearby, star-forming clusters and groups using {\\it Spitzer} IRAC and MIPS. We have confidently identified and classified 2548 young stellar objects using recently established mid-infrared color-based methods. We have devised and applied a new algorithm for the isolation of local surface density enhancements from point source distributions, enabling us to extract the overdense cores of the observed star forming regions for further analysis. We have compiled several basic structural measurements of these cluster cores from the data, such as mean surface densities of sources, cluster core radii, and aspect ratios, in order to characterize the ranges for these quantities. We find that a typical cluster core is 0.39 pc in radius, has 26 members with infrared excess in a ratio of Class II to Class I sources of 3.7, is embedded in a $A_K$=0.8 mag cloud clump, and has a surface density of 60 pc$^{-2}$. We examine the nearest neighbor dista...

  14. Organo-Zintl Clusters [P7R4]: A New Class of Superalkalis.

    Science.gov (United States)

    Giri, Santanab; Reddy, G N; Jena, Puru

    2016-03-03

    Zintl ions composed of Group 13, 14, and 15 elements are multiply charged cluster anions that form the building blocks of the Zintl phase. Superalkalis, on the other hand, are cationic clusters that mimic the chemistry of the alkali atoms. It is, therefore, counterintuitive to expect that Zintl anions can be used as a core to construct superalkalis. In this paper, using density functional theory, we show that this is indeed possible. The results are compared with calculations at the MP2 level of theory. A systematic study of a P7(3-) Zintl core decorated with organic ligands [R = Me, CH2Me, CH(Me)2 and C(Me)3] shows that the ionization energies of some of the P7R4 species are smaller than those of the alkali atoms and hence can be classified as superalkalis. This opens the door to the design and synthesis of a new class of superalkali moieties apart from the traditional ones composed of only inorganic elements.

  15. A conserved cluster of three PRD-class homeobox genes (homeobrain, rx and orthopedia in the Cnidaria and Protostomia

    Directory of Open Access Journals (Sweden)

    Mazza Maureen E

    2010-07-01

    Full Text Available Abstract Background Homeobox genes are a superclass of transcription factors with diverse developmental regulatory functions, which are found in plants, fungi and animals. In animals, several Antennapedia (ANTP-class homeobox genes reside in extremely ancient gene clusters (for example, the Hox, ParaHox, and NKL clusters and the evolution of these clusters has been implicated in the morphological diversification of animal bodyplans. By contrast, similarly ancient gene clusters have not been reported among the other classes of homeobox genes (that is, the LIM, POU, PRD and SIX classes. Results Using a combination of in silico queries and phylogenetic analyses, we found that a cluster of three PRD-class homeobox genes (Homeobrain (hbn, Rax (rx and Orthopedia (otp is present in cnidarians, insects and mollusks (a partial cluster comprising hbn and rx is present in the placozoan Trichoplax adhaerens. We failed to identify this 'HRO' cluster in deuterostomes; in fact, the Homeobrain gene appears to be missing from the chordate genomes we examined, although it is present in hemichordates and echinoderms. To illuminate the ancestral organization and function of this ancient cluster, we mapped the constituent genes against the assembled genome of a model cnidarian, the sea anemone Nematostella vectensis, and characterized their spatiotemporal expression using in situ hybridization. In N. vectensis, these genes reside in a span of 33 kb with the same gene order as previously reported in insects. Comparisons of genomic sequences and expressed sequence tags revealed the presence of alternative transcripts of Nv-otp and two highly unusual protein-coding polymorphisms in the terminal helix of the Nv-rx homeodomain. A population genetic survey revealed the Rx polymorphisms to be widespread in natural populations. During larval development, all three genes are expressed in the ectoderm, in non-overlapping territories along the oral-aboral axis, with distinct

  16. Clustering Analysis for Credit Default Probabilities in a Retail Bank Portfolio

    Directory of Open Access Journals (Sweden)

    Elena ANDREI (DRAGOMIR

    2012-08-01

    Full Text Available Methods underlying cluster analysis are very useful in data analysis, especially when the processed volume of data is very large, so that it becomes impossible to extract essential information, unless specific instruments are used to summarize and structure the gross information. In this context, cluster analysis techniques are used particularly, for systematic information analysis. The aim of this article is to build an useful model for banking field, based on data mining techniques, by dividing the groups of borrowers into clusters, in order to obtain a profile of the customers (debtors and good payers. We assume that a class is appropriate if it contains members that have a high degree of similarity and the standard method for measuring the similarity within a group shows the lowest variance. After clustering, data mining techniques are implemented on the cluster with bad debtors, reaching a very high accuracy after implementation. The paper is structured as follows: Section 2 describes the model for data analysis based on a specific scoring model that we proposed. In section 3, we present a cluster analysis using K-means algorithm and the DM models are applied on a specific cluster. Section 4 shows the conclusions.

  17. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

    CERN Document Server

    Emmons, Scott; Gallant, Mike; Börner, Katy

    2016-01-01

    Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Blondel, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 o...

  18. Challenges for Cluster Analysis in a Virtual Observatory

    CERN Document Server

    Djorgovski, S G; Mahabal, A A; Williams, R; Granat, R; Stolorz, P

    2002-01-01

    There has been an unprecedented and continuing growth in the volume, quality, and complexity of astronomical data sets over the past few years, mainly through large digital sky surveys. Virtual Observatory (VO) concept represents a scientific and technological framework needed to cope with this data flood. We review some of the applied statistics and computing challenges posed by the analysis of large and complex data sets expected in the VO-based research. The challenges are driven both by the size and the complexity of the data sets (billions of data vectors in parameter spaces of tens or hundreds of dimensions), by the heterogeneity of the data and measurement errors, the selection effects and censored data, and by the intrinsic clustering properties (functional form, topology) of the data distribution in the parameter space of observed attributes. Examples of scientific questions one may wish to address include: objective determination of the numbers of object classes present in the data, and the membersh...

  19. Interaction of Fanaroff-Riley class II radio jets with a randomly magnetised intra-cluster medium

    CERN Document Server

    Huarte-Espinosa, Martín; Alexander, Paul

    2011-01-01

    A combination of three-dimensional (3D) magnetohydrodynamics (MHD) and synthetic numerical simulations are presented to follow the evolution of a randomly magnetised plasma that models the intra-cluster medium (ICM), under the isolated effects of powerful, light, hypersonic and bipolar Fanaroff-Riley class II (FR II) jets. We prescribe the cluster magnetic field (CMF) as a Gaussian random field with a Kolmogorov-like energy spectrum. Both the power of the jets and the viewing angle that is used for the synthetic Rotation Measure (RM) observations are investigated. We find the model radio sources introduce and amplify fluctuations on the RM statistical properties which we analyse as a function of time as well as the viewing angle. The average RM and the RM standard deviation are increased by the action of the jets. Energetics, RM statistics and magnetic power spectral analysis consistently show that the effects also correlate with the jets' power, and that the lightest, fastest jets produce the strongest chang...

  20. Cluster analysis of word frequency dynamics

    Science.gov (United States)

    Maslennikova, Yu S.; Bochkarev, V. V.; Belashova, I. A.

    2015-01-01

    This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations.

  1. Semi Supervised Weighted K-Means Clustering for Multi Class Data Classification

    Directory of Open Access Journals (Sweden)

    Vijaya Geeta Dharmavaram

    2013-01-01

    Full Text Available Supervised Learning techniques require large number of labeled examples to train a classifier model. Research on Semi Supervised Learning is motivated by the availability of unlabeled examples in abundance even in domains with limited number of labeled examples. In such domains semi supervised classifier uses the results of clustering for classifier development since clustering does not rely only on labeled examples as it groups the objects based on their similarities. In this paper, the authors propose a new algorithm for semi supervised classification namely Semi Supervised Weighted K-Means (SSWKM. In this algorithm, the authors suggest the usage of weighted Euclidean distance metric designed as per the purpose of clustering for estimating the proximity between a pair of points and used it for building semi supervised classifier. The authors propose a new approach for estimating the weights of features by appropriately adopting the results of multiple discriminant analysis. The proposed method was then tested on benchmark datasets from UCI repository with varied percentage of labeled examples and found to be consistent and promising.

  2. Impacts of fast food and food retail environment on overweight and obesity in China: a multilevel latent class cluster approach

    NARCIS (Netherlands)

    Zhang XiaoYong, Xiaoyong; Lans, van der I.A.; Dagevos, H.

    2012-01-01

    Objective To simultaneously identify consumer segments based on individual-level consumption and community-level food retail environment data and to investigate whether the segments are associated with BMI and dietary knowledge in China. Design A multilevel latent class cluster model was applied to

  3. Transition-Metal Planar Boron Clusters: a New Class of Aromatic Compounds with High Coordination

    Science.gov (United States)

    Wang, Lai-Sheng

    2012-06-01

    Photoelectron spectroscopy in combination with computational studies over the past decade has shown that boron clusters possess planar or quasi-planar structures, in contrast to that of bulk boron, which is dominated by three-dimensional cage-like building blocks. All planar or quasi-planar boron clusters are observed to consist of a monocyclic circumference with one or more interior atoms. The propensity for planarity has been found to be due to both σ and π electron delocalization throughout the molecular plane, giving rise to concepts of σ and π double aromaticity. We have found further that the central boron atoms can be substituted by transition metal atoms to form a new class of aromatic compounds, which consist of a central metal atom and a monocyclic boron ring (M B_n). Eight-, nine-, and ten-membered rings of boron have been observed, giving rise to octa-, ennea-, and deca-coordinated aromatic transition metal compounds [1-3]. References: [1] ``Aromatic Metal-Centered Monocyclic Boron Rings: Co B_9^- and Ru B_9^-" (Constantin Romanescu, Timur R. Galeev, Wei-Li Li, A. I. Boldyrev, and L. S. Wang), Angew. Chem. Int. Ed. {50}, 9334-9337 (2011). [2] ``Transition-Metal-Centered Nine-Membered Boron Rings: M B_9 and M B_9^- (M = Rh, Ir)" (Wei-Li Li, Constantin Romanescu, Timur R. Galeev, Zachary Piazza, A. I. Boldyrev, and L. S. Wang), J. Am. Chem. Soc. {134}, 165-168 (2012). [3] ``Observation of the Highest Coordination Number in Planar Species: Decacoordinated Ta B10^- and Nb B_9^- Anions" (Timur R. Galeev, Constantin Romanescu, Wei-Li Li, L. S. Wang, and A. I. Boldyrev), Angew. Chem. Int. Ed. {51}, 2101-2105 (2012).

  4. Somatotyping using 3D anthropometry: a cluster analysis.

    Science.gov (United States)

    Olds, Tim; Daniell, Nathan; Petkov, John; David Stewart, Arthur

    2013-01-01

    Somatotyping is the quantification of human body shape, independent of body size. Hitherto, somatotyping (including the most popular method, the Heath-Carter system) has been based on subjective visual ratings, sometimes supported by surface anthropometry. This study used data derived from three-dimensional (3D) whole-body scans as inputs for cluster analysis to objectively derive clusters of similar body shapes. Twenty-nine dimensions normalised for body size were measured on a purposive sample of 301 adults aged 17-56 years who had been scanned using a Vitus Smart laser scanner. K-means Cluster Analysis with v-fold cross-validation was used to determine shape clusters. Three male and three female clusters emerged, and were visualised using those scans closest to the cluster centroid and a caricature defined by doubling the difference between the average scan and the cluster centroid. The male clusters were decidedly endomorphic (high fatness), ectomorphic (high linearity), and endo-mesomorphic (a mixture of fatness and muscularity). The female clusters were clearly endomorphic, ectomorphic, and the ecto-mesomorphic (a mixture of linearity and muscularity). An objective shape quantification procedure combining 3D scanning and cluster analysis yielded shape clusters strikingly similar to traditional somatotyping.

  5. A hybrid monkey search algorithm for clustering analysis.

    Science.gov (United States)

    Chen, Xin; Zhou, Yongquan; Luo, Qifang

    2014-01-01

    Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.

  6. A Hybrid Monkey Search Algorithm for Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Xin Chen

    2014-01-01

    Full Text Available Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.

  7. Instantaneous normal mode analysis of melting of finite dust clusters.

    Science.gov (United States)

    Melzer, André; Schella, André; Schablinski, Jan; Block, Dietmar; Piel, Alexander

    2012-06-01

    The experimental melting transition of finite two-dimensional dust clusters in a dusty plasma is analyzed using the method of instantaneous normal modes. In the experiment, dust clusters are heated in a thermodynamic equilibrium from a solid to a liquid state using a four-axis laser manipulation system. The fluid properties of the dust cluster, such as the diffusion constant, are measured from the instantaneous normal mode analysis. Thereby, the phase transition of these finite clusters is approached from the liquid phase. From the diffusion constants, unique melting temperatures have been assigned to dust clusters of various sizes that very well reflect their dynamical stability properties.

  8. Smartness and Italian Cities. A Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Flavio Boscacci

    2014-05-01

    Full Text Available Smart cities have been recently recognized as the most pleasing and attractive places to live in; due to this, both scholars and policy-makers pay close attention to this topic. Specifically, urban “smartness” has been identified by plenty of characteristics that can be grouped into six dimensions (Giffinger et al. 2007: smart Economy (competitiveness, smart People (social and human capital, smart Governance (participation, smart Mobility (both ICTs and transport, smart Environment (natural resources, and smart Living (quality of life. According to this analytical framework, in the present paper the relation between urban attractiveness and the “smart” characteristics has been investigated in the 103 Italian NUTS3 province capitals in the year 2011. To this aim, a descriptive statistics has been followed by a regression analysis (OLS, where the dependent variable measuring the urban attractiveness has been proxied by housing market prices. Besides, a Cluster Analysis (CA has been developed in order to find differences and commonalities among the province capitals.The OLS results indicate that living, people and economy are the key drivers for achieving a better urban attractiveness. Environment, instead, keeps on playing a minor role. Besides, the CA groups the province capitals a

  9. Using Cluster Analysis for Data Mining in Educational Technology Research

    Science.gov (United States)

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

    2012-01-01

    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  10. An Analysis on an ESL Class

    Institute of Scientific and Technical Information of China (English)

    杜岳青

    2015-01-01

    Input is the precondition of interaction and output while output promotes the input and the interaction of language,which could increase the effectiveness of input and enhance the chance of the absorption of interaction.By analyzing an ESL class,this paper gives readers a picture of how the second language (SL) learners get information,absorb and digest the information,produce language and form their language system in their SL learning.At last,it gives suggestions on how to ensure language learners have a better chance to get comprehensible input,assimilate language during interaction and output more accurate and comprehensible second language in an ESL class.

  11. PERFORMANCE ANALYSIS OF CLUSTERED RADIO INTERFEROMETRIC CALIBRATION

    NARCIS (Netherlands)

    Kazemi, S.; Yatawatta, S.; Zaroubi, S.

    2012-01-01

    Subtraction of compact, bright sources is essential to produce high quality images in radio astronomy. It is recently proposed that 'clustered' calibration can perform better in subtracting fainter background sources. This is due to the fact that the effective power of a source cluster is greater th

  12. Fuzzy Clustering

    DEFF Research Database (Denmark)

    Berks, G.; Keyserlingk, Diedrich Graf von; Jantzen, Jan;

    2000-01-01

    and clustering are the basic concerns in medicine. Classification depends on definitions of the classes and their required degree of participant of the elements in the cases' symptoms. In medicine imprecise conditions are the rule and therefore fuzzy methods are much more suitable than crisp ones. Fuzzy c....... A symptom may belong to more than one class. For instance to the class of very severe disease and to the class of failure of awareness of the own disturbance. The description of language failures by c-mean classification of analyzed factors correspond in many but not in all cases to the traditional......-mean clustering is an easy and well improved tool, which has been applied in many medical fields. We used c-mean fuzzy clustering after feature extraction from an aphasia database. Factor analysis was applied on a correlation matrix of 26 symptoms of language disorders and led to five factors. The factors...

  13. Internet Gamblers Differ on Social Variables: A Latent Class Analysis.

    Science.gov (United States)

    Khazaal, Yasser; Chatton, Anne; Achab, Sophia; Monney, Gregoire; Thorens, Gabriel; Dufour, Magali; Zullino, Daniele; Rothen, Stephane

    2016-12-27

    Online gambling has gained popularity in the last decade, leading to an important shift in how consumers engage in gambling and in the factors related to problem gambling and prevention. Indebtedness and loneliness have previously been associated with problem gambling. The current study aimed to characterize online gamblers in relation to indebtedness, loneliness, and several in-game social behaviors. The data set was obtained from 584 Internet gamblers recruited online through gambling websites and forums. Of these gamblers, 372 participants completed all study assessments and were included in the analyses. Questionnaires included those on sociodemographics and social variables (indebtedness, loneliness, in-game social behaviors), as well as the Gambling Motives Questionnaire, Gambling Related Cognitions Scale, Internet Addiction Test, Problem Gambling Severity Index, Short Depression-Happiness Scale, and UPPS-P Impulsive Behavior Scale. Social variables were explored with a latent class model. The clusters obtained were compared for psychological measures and three clusters were found: lonely indebted gamblers (cluster 1: 6.5%), not lonely not indebted gamblers (cluster 2: 75.4%), and not lonely indebted gamblers (cluster 3: 18%). Participants in clusters 1 and 3 (particularly in cluster 1) were at higher risk of problem gambling than were those in cluster 2. The three groups differed on most assessed variables, including the Problem Gambling Severity Index, the Short Depression-Happiness Scale, and the UPPS-P subscales (except the sensation seeking subscore). Results highlight significant between-group differences, suggesting that Internet gamblers are not a homogeneous group. Specific intervention strategies could be implemented for groups at risk.

  14. An Analysis on an ESL Class

    Institute of Scientific and Technical Information of China (English)

    杜岳青

    2015-01-01

    Input is the precondition of interaction and output while output promotes the input and the interaction of language,which could increase the effectiveness of input and enhance the chance of the absorption of interaction.By analyzing an ESL class,this paper gives readers a picture of how the second language(SL)learners get information,absorb and digest the information,produce language and form their language system in their SL learning.At last,it gives suggestions on how to ensure language learners have a better chance to get comprehensible input,assimilate language during interaction and output more accurate and comprehensible second language in an ESL class.

  15. Maximum-entropy clustering algorithm and its global convergence analysis

    Institute of Scientific and Technical Information of China (English)

    ZHANG; Zhihua

    2001-01-01

    [1]Bezdek, J. C., Pattern Recognition with Fuzzy Objective Function Algorithm. New York: Plenum, 1981.[2]Krishnapuram, R., Keller, J., A possibilistic approach to clustering, IEEE Trans. on Fuzzy Systems, 1993, 1(2): 98.[3]Yair, E., Zeger, K., Gersho, A., Competitive learning and soft competition for vector quantizer design, IEEE Trans on Signal Processing, 1992, 40(2): 294.[4]Pal, N. R., Bezdek, J. C., Tsao, E. C. K., Generalized clustering networks and Kohonen's self-organizing scheme, IEEE Trans on Neural Networks, 1993, 4(4): 549.[5]Karayiannis, N. B., Bezdek, J. C., Pal, N. R. et al., Repair to GLVQ: a new family of competitive learning schemes, IEEE Trans on Neural Networks, 1996, 7(5): 1062.[6]Karayiannis, N. B., Pai, P. I., Fuzzy algorithms for learning vector quantization, IEEE Trans. on Neural Networks, 1996, 7(5): 1196.[7]Karayiannis, N. B., A methodology for constructing fuzzy algorithms for learning vector quantization, IEEE Trans. on Neural Networks, 1997, 8(3): 505.[8]Karayiannis, N. B., Bezdek, J. C., An integrated approach to fuzzy learning vector quantization and fuzzy C-Means clustering, IEEE Trans. on Fuzzy Systems, 1997, 5(4): 622.[9]Li Xing-si, An efficient approach to nonlinear minimax problems, Chinese Science Bulletin? 1992, 37(10): 802.[10]Li Xing-si, An efficient approach to a class of non-smooth optimization problems, Science in China, Series A,1994, 37(3): 323.[11]. Zangwill, W., Non-linear Programming: A Unified Approach, Englewood Cliffs: Prentice-Hall, 1969.[12]. Fletcher, R., Practical Methods of Optimization,2nd ed., New York: John Wiley & Sons, 1987.[13]. Zhang Zhihua, Zheng Nanning, Wang Tianshu, Behavioral analysis and improving of generalized LVQ neural network, Acta Automatica Sinica, 1999, 25(5): 582.[14]. Kirkpatrick, S., Gelatt, C. D., Vecchi, M. P., Optimization by simulated annealing, Science, 1983, 220(3): 671.[15]. Ross, K., Deterministic annealing for

  16. The smart cluster method - Adaptive earthquake cluster identification and analysis in strong seismic regions

    Science.gov (United States)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-03-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  17. Toward optimal cluster power spectrum analysis

    CERN Document Server

    Smith, Robert E

    2014-01-01

    The power spectrum of galaxy clusters is an important probe of the cosmological model. In this paper we determine the optimal weighting scheme for maximizing the signal-to-noise ratio for such measurements. We find a closed form analytic expression for the optimal weights. Our expression takes into account: cluster mass, finite survey volume effects, survey masking, and a flux limit. The implementation of this weighting scheme requires knowledge of the measured cluster masses, and analytic models for the bias and space-density of clusters as a function of mass and redshift. Recent studies have suggested that the optimal method for reconstruction of the matter density field from a set of clusters is mass-weighting (Seljak et al 2009, Hamaus et al 2010, Cai et al 2011). We compare our optimal weighting scheme with this approach and also with the original power spectrum scheme of Feldman et al (1994). We show that our optimal weighting scheme outperforms these approaches for both volume- and flux-limited cluster...

  18. Genetic Algorithms Applied to Multi-Class Clustering for Gene Expression Data

    Institute of Scientific and Technical Information of China (English)

    Haiyan Pan; Jun Zhu; Danfu Han

    2003-01-01

    A hybrid GA (genetic algorithm)-based clustering (HGACLUS) schema, combining merits of the Simulated Annealing, was described for finding an optimal or near-optimal set of medoids. This schema maximized the clustering success by achieving internal cluster cohesion and external cluster isolation. The performance of HGACLUS and other methods was compared by using simulated data and open microarray gene-expression datasets. HGACLUS was generally found to be more accurate and robust than other methods discussed in this paper by the exact validation strategy and the explicit cluster number.

  19. PROC LCA: A SAS Procedure for Latent Class Analysis

    Science.gov (United States)

    Lanza, Stephanie T.; Collins, Linda M.; Lemmon, David R.; Schafer, Joseph L.

    2007-01-01

    Latent class analysis (LCA) is a statistical method used to identify a set of discrete, mutually exclusive latent classes of individuals based on their responses to a set of observed categorical variables. In multiple-group LCA, both the measurement part and structural part of the model can vary across groups, and measurement invariance across…

  20. Detection and Analysis of Clones in UML Class Models

    Directory of Open Access Journals (Sweden)

    Dhavleesh Rattan

    2015-07-01

    Full Text Available It is quite frequent to copy and paste code fragments in software development. The copied source code is called a software clone and the activity is referred to as code cloning. The presence of code clones hamper maintenance and may lead to bug propagation. Now-a-days, model driven development has become a standard industry practice. Duplicate parts in models i.e. model clones pose similar challenges as in source code. This paper presents an approach to detect clones in Unified Modeling Language class models. The core of our technique is the construction of a labeled, ranked tree corresponding to the UML class model where attributes with their data types and methods with their signatures are represented as subtrees. By grouping and clustering of repeating subtrees, the tool is able to detect duplications in a UML class model at different levels of granularity i.e. complete class diagram, attributes with their data types and methods with their signatures across the model and cluster of such attributes/methods. We propose a new classification of model clones with the objective of detecting exact and meaningful clones. Empirical evaluation of the tool using open source reverse engineered and forward designed models show some interesting and relevant clones which provide useful insights into software modeling practice.

  1. Detection and Analysis of Clones in UML Class Models

    Directory of Open Access Journals (Sweden)

    Dhavleesh Rattan

    2016-01-01

    Full Text Available It is quite frequent to copy and paste code fragments in software development. The copied source code is called a software clone and the activity is referred to as code cloning. The presence of code clones hamper maintenance and may lead to bug propagation. Now-a-days, model driven development has become a standard industry practice. Duplicate parts in models i.e. model clones pose similar challenges as in source code. This paper presents an approach to detect clones in Unified Modeling Language class models. The core of our technique is the construction of a labeled, ranked tree corresponding to the UML class model where attributes with their data types and methods with their signatures are represented as subtrees. By grouping and clustering of repeating subtrees, the tool is able to detect duplications in a UML class model at different levels of granularity i.e. complete class diagram, attributes with their data types and methods with their signatures across the model and cluster of such attributes/methods. We propose a new classification of model clones with the objective of detecting exact and meaningful clones. Empirical evaluation of the tool using open source reverse engineered and forward designed models show some interesting and relevant clones which provide useful insights into software modeling practice.

  2. Intelligent Hybrid Cluster Based Classification Algorithm for Social Network Analysis

    Directory of Open Access Journals (Sweden)

    S. Muthurajkumar

    2014-05-01

    Full Text Available In this paper, we propose an hybrid clustering based classification algorithm based on mean approach to effectively classify to mine the ordered sequences (paths from weblog data in order to perform social network analysis. In the system proposed in this work for social pattern analysis, the sequences of human activities are typically analyzed by switching behaviors, which are likely to produce overlapping clusters. In this proposed system, a robust Modified Boosting algorithm is proposed to hybrid clustering based classification for clustering the data. This work is useful to provide connection between the aggregated features from the network data and traditional indices used in social network analysis. Experimental results show that the proposed algorithm improves the decision results from data clustering when combined with the proposed classification algorithm and hence it is proved that of provides better classification accuracy when tested with Weblog dataset. In addition, this algorithm improves the predictive performance especially for multiclass datasets which can increases the accuracy.

  3. Bayesian model-based cluster analysis for predicting macrofaunal communities

    NARCIS (Netherlands)

    Braak, ter C.J.F.; Hoijtink, H.; Akkermans, W.; Verdonschot, P.F.M.

    2003-01-01

    To predict macrofaunal community composition from environmental data a two-step approach is often followed: (1) the water samples are clustered into groups on the basis of the macrofauna data and (2) the groups are related to the environmental data, e.g. by discriminant analysis. For the cluster ana

  4. Hierarchical Cluster Analysis – Various Approaches to Data Preparation

    Directory of Open Access Journals (Sweden)

    Z. Pacáková

    2013-09-01

    Full Text Available The article deals with two various approaches to data preparation to avoid multicollinearity. The aim of the article is to find similarities among the e-communication level of EU states using hierarchical cluster analysis. The original set of fourteen indicators was first reduced on the basis of correlation analysis while in case of high correlation indicator of higher variability was included in further analysis. Secondly the data were transformed using principal component analysis while the principal components are poorly correlated. For further analysis five principal components explaining about 92% of variance were selected. Hierarchical cluster analysis was performed both based on the reduced data set and the principal component scores. Both times three clusters were assumed following Pseudo t-Squared and Pseudo F Statistic, but the final clusters were not identical. An important characteristic to compare the two results found was to look at the proportion of variance accounted for by the clusters which was about ten percent higher for the principal component scores (57.8% compared to 47%. Therefore it can be stated, that in case of using principal component scores as an input variables for cluster analysis with explained proportion high enough (about 92% for in our analysis, the loss of information is lower compared to data reduction on the basis of correlation analysis.

  5. Entropic Approach to Multiscale Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Antonio Insolia

    2012-05-01

    Full Text Available Recently, a novel method has been introduced to estimate the statistical significance of clustering in the direction distribution of objects. The method involves a multiscale procedure, based on the Kullback–Leibler divergence and the Gumbel statistics of extreme values, providing high discrimination power, even in presence of strong background isotropic contamination. It is shown that the method is: (i semi-analytical, drastically reducing computation time; (ii very sensitive to small, medium and large scale clustering; (iii not biased against the null hypothesis. Applications to the physics of ultra-high energy cosmic rays, as a cosmological probe, are presented and discussed.

  6. Visual verification and analysis of cluster detection for molecular dynamics.

    Science.gov (United States)

    Grottel, Sebastian; Reina, Guido; Vrabec, Jadran; Ertl, Thomas

    2007-01-01

    A current research topic in molecular thermodynamics is the condensation of vapor to liquid and the investigation of this process at the molecular level. Condensation is found in many physical phenomena, e.g. the formation of atmospheric clouds or the processes inside steam turbines, where a detailed knowledge of the dynamics of condensation processes will help to optimize energy efficiency and avoid problems with droplets of macroscopic size. The key properties of these processes are the nucleation rate and the critical cluster size. For the calculation of these properties it is essential to make use of a meaningful definition of molecular clusters, which currently is a not completely resolved issue. In this paper a framework capable of interactively visualizing molecular datasets of such nucleation simulations is presented, with an emphasis on the detected molecular clusters. To check the quality of the results of the cluster detection, our framework introduces the concept of flow groups to highlight potential cluster evolution over time which is not detected by the employed algorithm. To confirm the findings of the visual analysis, we coupled the rendering view with a schematic view of the clusters' evolution. This allows to rapidly assess the quality of the molecular cluster detection algorithm and to identify locations in the simulation data in space as well as in time where the cluster detection fails. Thus, thermodynamics researchers can eliminate weaknesses in their cluster detection algorithms. Several examples for the effective and efficient usage of our tool are presented.

  7. Merged consensus clustering to assess and improve class discovery with microarray data

    NARCIS (Netherlands)

    Simpson, T. Ian; Armstrong, J. Douglas; Jarman, Andrew P.

    2010-01-01

    Background: One of the most commonly performed tasks when analysing high throughput gene expression data is to use clustering methods to classify the data into groups. There are a large number of methods available to perform clustering, but it is often unclear which method is best suited to the data

  8. Logistics Enterprise Evaluation Model Based On Fuzzy Clustering Analysis

    Science.gov (United States)

    Fu, Pei-hua; Yin, Hong-bo

    In this thesis, we introduced an evaluation model based on fuzzy cluster algorithm of logistics enterprises. First of all,we present the evaluation index system which contains basic information, management level, technical strength, transport capacity,informatization level, market competition and customer service. We decided the index weight according to the grades, and evaluated integrate ability of the logistics enterprises using fuzzy cluster analysis method. In this thesis, we introduced the system evaluation module and cluster analysis module in detail and described how we achieved these two modules. At last, we gave the result of the system.

  9. Cancer incidence in men: a cluster analysis of spatial patterns

    Directory of Open Access Journals (Sweden)

    D'Alò Daniela

    2008-11-01

    Full Text Available Abstract Background Spatial clustering of different diseases has received much less attention than single disease mapping. Besides chance or artifact, clustering of different cancers in a given area may depend on exposure to a shared risk factor or to multiple correlated factors (e.g. cigarette smoking and obesity in a deprived area. Models developed so far to investigate co-occurrence of diseases are not well-suited for analyzing many cancers simultaneously. In this paper we propose a simple two-step exploratory method for screening clusters of different cancers in a population. Methods Cancer incidence data were derived from the regional cancer registry of Umbria, Italy. A cluster analysis was performed on smoothed and non-smoothed standardized incidence ratios (SIRs of the 13 most frequent cancers in males. The Besag, York and Mollie model (BYM and Poisson kriging were used to produce smoothed SIRs. Results Cluster analysis on non-smoothed SIRs was poorly informative in terms of clustering of different cancers, as only larynx and oral cavity were grouped, and of characteristic patterns of cancer incidence in specific geographical areas. On the other hand BYM and Poisson kriging gave similar results, showing cancers of the oral cavity, larynx, esophagus, stomach and liver formed a main cluster. Lung and urinary bladder cancers clustered together but not with the cancers mentioned above. Both methods, particularly the BYM model, identified distinct geographic clusters of adjacent areas. Conclusion As in single disease mapping, non-smoothed SIRs do not provide reliable estimates of cancer risks because of small area variability. The BYM model produces smooth risk surfaces which, when entered into a cluster analysis, identify well-defined geographical clusters of adjacent areas. It probably enhances or amplifies the signal arising from exposure of more areas (statistical units to shared risk factors that are associated with different cancers. In

  10. CLUSTERING ANALYSIS OF DEBRIS-FLOW STREAMS

    Institute of Scientific and Technical Information of China (English)

    Yuan-Fan TSAI; Huai-Kuang TSAI; Cheng-Yan KAO

    2004-01-01

    The Chi-Chi earthquake in 1999 caused disastrous landslides, which triggered numerous debris flows and killed hundreds of people. A critical rainfall intensity line for each debris-flow stream is studied to prevent such a disaster. However, setting rainfall lines from incomplete data is difficult, so this study considered eight critical factors to group streams, such that streams within a cluster have similar rainfall lines. A genetic algorithm is applied to group 377 debris-flow streams selected from the center of an area affected by the Chi-Chi earthquake. These streams are grouped into seven clusters with different characteristics. The results reveal that the proposed method effectively groups debris-flow streams.

  11. Comparative analysis of genomic signal processing for microarray data clustering.

    Science.gov (United States)

    Istepanian, Robert S H; Sungoor, Ala; Nebel, Jean-Christophe

    2011-12-01

    Genomic signal processing is a new area of research that combines advanced digital signal processing methodologies for enhanced genetic data analysis. It has many promising applications in bioinformatics and next generation of healthcare systems, in particular, in the field of microarray data clustering. In this paper we present a comparative performance analysis of enhanced digital spectral analysis methods for robust clustering of gene expression across multiple microarray data samples. Three digital signal processing methods: linear predictive coding, wavelet decomposition, and fractal dimension are studied to provide a comparative evaluation of the clustering performance of these methods on several microarray datasets. The results of this study show that the fractal approach provides the best clustering accuracy compared to other digital signal processing and well known statistical methods.

  12. Cluster Analysis of Gene Expression Data

    CERN Document Server

    Domany, E

    2002-01-01

    The expression levels of many thousands of genes can be measured simultaneously by DNA microarrays (chips). This novel experimental tool has revolutionized research in molecular biology and generated considerable excitement. A typical experiment uses a few tens of such chips, each dedicated to a single sample - such as tissue extracted from a particular tumor. The results of such an experiment contain several hundred thousand numbers, that come in the form of a table, of several thousand rows (one for each gene) and 50 - 100 columns (one for each sample). We developed a clustering methodology to mine such data. In this review I provide a very basic introduction to the subject, aimed at a physics audience with no prior knowledge of either gene expression or clustering methods. I explain what genes are, what is gene expression and how it is measured by DNA chips. Next I explain what is meant by "clustering" and how we analyze the massive amounts of data from such experiments, and present results obtained from a...

  13. A Novel Divisive Hierarchical Clustering Algorithm for Geospatial Analysis

    Directory of Open Access Journals (Sweden)

    Shaoning Li

    2017-01-01

    Full Text Available In the fields of geographic information systems (GIS and remote sensing (RS, the clustering algorithm has been widely used for image segmentation, pattern recognition, and cartographic generalization. Although clustering analysis plays a key role in geospatial modelling, traditional clustering methods are limited due to computational complexity, noise resistant ability and robustness. Furthermore, traditional methods are more focused on the adjacent spatial context, which makes it hard for the clustering methods to be applied to multi-density discrete objects. In this paper, a new method, cell-dividing hierarchical clustering (CDHC, is proposed based on convex hull retraction. The main steps are as follows. First, a convex hull structure is constructed to describe the global spatial context of geospatial objects. Then, the retracting structure of each borderline is established in sequence by setting the initial parameter. The objects are split into two clusters (i.e., “sub-clusters” if the retracting structure intersects with the borderlines. Finally, clusters are repeatedly split and the initial parameter is updated until the terminate condition is satisfied. The experimental results show that CDHC separates the multi-density objects from noise sufficiently and also reduces complexity compared to the traditional agglomerative hierarchical clustering algorithm.

  14. Sensitivity Analysis of Gas Production from Class 2 and Class 3 Hydrate Deposits

    Energy Technology Data Exchange (ETDEWEB)

    Reagan, Matthew; Moridis, George; Zhang, Keni

    2008-05-01

    Gas hydrates are solid crystalline compounds in which gas molecules are lodged within the lattices of an ice-like crystalline solid. The vast quantities of hydrocarbon gases trapped in hydrate formations in the permafrost and in deep ocean sediments may constitute a new and promising energy source. Class 2 hydrate deposits are characterized by a Hydrate-Bearing Layer (HBL) that is underlain by a saturated zone of mobile water. Class 3 hydrate deposits are characterized by an isolated Hydrate-Bearing Layer (HBL) that is not in contact with any hydrate-free zone of mobile fluids. Both classes of deposits have been shown to be good candidates for exploitation in earlier studies of gas production via vertical well designs - in this study we extend the analysis to include systems with varying porosity, anisotropy, well spacing, and the presence of permeable boundaries. For Class 2 deposits, the results show that production rate and efficiency depend strongly on formation porosity, have a mild dependence on formation anisotropy, and that tighter well spacing produces gas at higher rates over shorter time periods. For Class 3 deposits, production rates and efficiency also depend significantly on formation porosity, are impacted negatively by anisotropy, and production rates may be larger, over longer times, for well configurations that use a greater well spacing. Finally, we performed preliminary calculations to assess a worst-case scenario for permeable system boundaries, and found that the efficiency of depressurization-based production strategies are compromised by migration of fluids from outside the system.

  15. HLA class II genes: typing by DNA analysis.

    Science.gov (United States)

    Bidwell, J L; Bidwell, E A; Bradley, B A

    1990-04-01

    A detailed understanding of the structure and function of the human major histocompatibility complex (MHC) has ensued from studies by molecular biologist during the last decade. Virtually all of the HLA genes have now been cloned, and the nucleotide sequences of their different allelic forms have been determined. Typing for these HLA alleles is a fundamental prerequisite for tissue matching in allogeneic organ transplantation. Until very recently, typing procedures have been dominated by serological and cellular methods. The availability of cloned DNA from HLA genes has now permitted the technique of restriction fragment length polymorphism (RFLP) analysis to be applied, with remarkable success and advantage, to phenotyping of both HLA Class I and Class II determinants. For the HLA Class II genes DR and DQ, a simple two-stage RFLP analysis permits the accurate identification of all specificities defined by serology, and of many which are defined by cellular typing. At the present time, however, RFLP typing of HLA Class I genes is not as practicable or as informative as that for HLA Class II genes. The present clinical applications of HLA-DR and DQ RFLP typing are predominantly in phenotyping of living donors, including selection of HLA-matched volunteer bone marrow donors, in allograft survival studies, and in studies of HLA Class II-associated diseases. However, the time taken to perform RFLP analysis precludes its use for the typing of cadaveric kidney donors. Nucleotide sequence data for the alleles of HLA Class II genes have now permitted the development of allele-specific oligonucleotide (ASO) typing, a second category of DNA analysis. This has been greatly facilitated by the ability to amplify specific HLA Class II DNA 'target' sequences using the polymerase chain reaction (PCR) technique. The accuracy of DNA typing techniques should ensure that this methodology will eventually replace conventional HLA phenotyping.

  16. Credibility analysis of risk classes by generalized linear model

    Science.gov (United States)

    Erdemir, Ovgucan Karadag; Sucu, Meral

    2016-06-01

    In this paper generalized linear model (GLM) and credibility theory which are frequently used in nonlife insurance pricing are combined for reliability analysis. Using full credibility standard, GLM is associated with limited fluctuation credibility approach. Comparison criteria such as asymptotic variance and credibility probability are used to analyze the credibility of risk classes. An application is performed by using one-year claim frequency data of a Turkish insurance company and results of credible risk classes are interpreted.

  17. PARTIAL TRAINING METHOD FOR HEURISTIC ALGORITHM OF POSSIBLE CLUSTERIZATION UNDER UNKNOWN NUMBER OF CLASSES

    Directory of Open Access Journals (Sweden)

    D. A. Viattchenin

    2009-01-01

    Full Text Available A method for constructing a subset of labeled objects which is used in a heuristic algorithm of possible  clusterization with partial  training is proposed in the  paper.  The  method  is  based  on  data preprocessing by the heuristic algorithm of possible clusterization using a transitive closure of a fuzzy tolerance. Method efficiency is demonstrated by way of an illustrative example.

  18. Cluster analysis of WIBS single particle bioaerosol data

    Directory of Open Access Journals (Sweden)

    N. H. Robinson

    2012-09-01

    Full Text Available Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial datasets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Waveband Integrated Bioaerosol Sensor (WIBS. The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL before being applied to two separate contemporaneous ambient WIBS datasets recorded in a forest site in Colorado, USA as part of the BEACHON-RoMBAS project. Cluster analysis results between both datasets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity to represent: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long term online PBAP measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics is improved.

  19. Cluster analysis of WIBS single particle bioaerosol data

    Science.gov (United States)

    Robinson, N. H.; Allan, J. D.; Huffman, J. A.; Kaye, P. H.; Foot, V. E.; Gallagher, M.

    2012-09-01

    Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial datasets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Waveband Integrated Bioaerosol Sensor (WIBS). The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL) before being applied to two separate contemporaneous ambient WIBS datasets recorded in a forest site in Colorado, USA as part of the BEACHON-RoMBAS project. Cluster analysis results between both datasets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity) to represent: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long term online PBAP measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics is improved.

  20. Contour Cluster Shape Analysis for Building Damage Detection from Post-earthquake Airborne LiDAR

    Directory of Open Access Journals (Sweden)

    HE Meizhang

    2015-04-01

    Full Text Available Detection of the damaged building is the obligatory step prior to evaluate earthquake casualty and economic losses. It's very difficult to detect damaged buildings accurately based on the assumption that intact roofs appear in laser data as large planar segments whereas collapsed roofs are characterized by many small segments. This paper presents a contour cluster shape similarity analysis algorithm for reliable building damage detection from the post-earthquake airborne LiDAR point cloud. First we evaluate the entropies of shape similarities between all the combinations of two contour lines within a building cluster, which quantitatively describe the shape diversity. Then the maximum entropy model is employed to divide all the clusters into intact and damaged classes. The tests on the LiDAR data at El Mayor-Cucapah earthquake rupture prove the accuracy and reliability of the proposed method.

  1. Cluster analysis of clinical data identifies fibromyalgia subgroups.

    Directory of Open Access Journals (Sweden)

    Elisa Docampo

    Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.

  2. Cluster analysis of Southeastern U.S. climate stations

    Science.gov (United States)

    Stooksbury, D. E.; Michaels, P. J.

    1991-09-01

    A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.

  3. Variable cluster analysis method for building neural network model

    Institute of Scientific and Technical Information of China (English)

    王海东; 刘元东

    2004-01-01

    To address the problems that input variables should be reduced as much as possible and explain output variables fully in building neural network model of complicated system, a variable selection method based on cluster analysis was investigated. Similarity coefficient which describes the mutual relation of variables was defined. The methods of the highest contribution rate, part replacing whole and variable replacement are put forwarded and deduced by information theory. The software of the neural network based on cluster analysis, which can provide many kinds of methods for defining variable similarity coefficient, clustering system variable and evaluating variable cluster, was developed and applied to build neural network forecast model of cement clinker quality. The results show that all the network scale, training time and prediction accuracy are perfect. The practical application demonstrates that the method of selecting variables for neural network is feasible and effective.

  4. Spatial Data Mining using Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Ch.N.Santhosh Kumar

    2012-09-01

    Full Text Available Data mining, which is refers to as Knowledge Discovery in Databases(KDD, means a process of nontrivialexaction of implicit, previously useful and unknown information such as knowledge rules, descriptions,regularities, and major trends from large databases. Data mining is evolved in a multidisciplinary field ,including database technology, machine learning, artificial intelligence, neural network, informationretrieval, and so on. In principle data mining should be applicable to the different kind of data and databasesused in many different applications, including relational databases, transactional databases, datawarehouses, object- oriented databases, and special application- oriented databases such as spatialdatabases, temporal databases, multimedia databases, and time- series databases. Spatial data mining, alsocalled spatial mining, is data mining as applied to the spatial data or spatial databases. Spatial data are thedata that have spatial or location component, and they show the information, which is more complex thanclassical data. A spatial database stores spatial data represents by spatial data types and spatialrelationships and among data. Spatial data mining encompasses various tasks. These include spatialclassification, spatial association rule mining, spatial clustering, characteristic rules, discriminant rules,trend detection. This paper presents how spatial data mining is achieved using clustering.

  5. Initial magnetization analysis of iron cluster assemblies

    Energy Technology Data Exchange (ETDEWEB)

    Michele, Oliver; Hesse, Juergen; Bremers, Heiko [Technische Universitaet Braunschweig, Institut fuer Metallphysik und Nukleare Festkoerperphysik, Mendelssohnstrasse 3, 38106 Braunschweig (Germany); Peng, Dong-Lian; Sumiyama, Kenji; Hihara, Takehiko; Yamamuro, Saeki [Department of Materials Science and Engineering, Nagoya Institute of Technology, Nagoya 466-8555 (Japan)

    2004-12-01

    Nearly monodispersed oxide-coated Fe cluster assemblies were prepared using a plasma-gas-condensation style cluster beam deposition apparatus (D. L. Peng et al. J. Appl. Phys. 92 3075 (2002)). The characterization of such assemblies is presented using SQUID magnetometry. The aim of this contribution is the interpretation of the initial magnetization curves instead of the usual presentation of hysteresis loops and coercivities. The description of the initial magnetization is based on a proposed vector model valid for Stoner-Wohlfarth particles. The model includes the particles' anisotropy and possible interactions regarding these influences as equivalent magnetic fields. The model is an extension of the one described by Michele et al. (J. Phys.: Condens. Matter 16 427 (2004)) regarding the fact that in a completely demagnetized state, in the sample consisting of a very large number of particles always equal anisotropy fields of opposite signs are present. We measured the initial magnetization curves for different temperatures and present the temperature dependence of the model's parameters. (Abstract Copyright [2004], Wiley Periodicals, Inc.)

  6. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    Directory of Open Access Journals (Sweden)

    I. Crawford

    2015-07-01

    Full Text Available In this paper we present improved methods for discriminating and quantifying Primary Biological Aerosol Particles (PBAP by applying hierarchical agglomerative cluster analysis to multi-parameter ultra violet-light induced fluorescence (UV-LIF spectrometer data. The methods employed in this study can be applied to data sets in excess of 1×106 points on a desktop computer, allowing for each fluorescent particle in a dataset to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient dataset. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4 where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best performing methods were applied to the BEACHON-RoMBAS ambient dataset where it was found that the z-score and range normalisation methods yield similar results with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misatrribution

  7. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    Directory of Open Access Journals (Sweden)

    I. Crawford

    2015-11-01

    Full Text Available In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4 where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio–hydro–atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen–Rocky Mountain Biogenic Aerosol Study ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the

  8. Multivariate analysis of the globular clusters in M87

    CERN Document Server

    Das, Sukanta; Davoust, Emmanuel

    2015-01-01

    An objective classification of 147 globular clusters in the inner region of the giant elliptical galaxy M87 is carried out with the help of two methods of multivariate analysis. First independent component analysis is used to determine a set of independent variables that are linear combinations of various observed parameters (mostly Lick indices) of the globular clusters. Next K-means cluster analysis is applied on the independent components, to find the optimum number of homogeneous groups having an underlying structure. The properties of the four groups of globular clusters thus uncovered are used to explain the formation mechanism of the host galaxy. It is suggested that M87 formed in two successive phases. First a monolithic collapse, which gave rise to an inner group of metal-rich clusters with little systematic rotation and an outer group of metal-poor clusters in eccentric orbits. In a second phase, the galaxy accreted low-mass satellites in a dissipationless fashion, from the gas of which the two othe...

  9. Future expectations among adolescents: a latent class analysis.

    Science.gov (United States)

    Sipsma, Heather L; Ickovics, Jeannette R; Lin, Haiqun; Kershaw, Trace S

    2012-09-01

    Future expectations have been important predictors of adolescent development and behavior. Its measurement, however, has largely focused on single dimensions and misses potentially important components. This analysis investigates whether an empirically-driven, multidimensional approach to conceptualizing future expectations can substantively contribute to our understanding of adolescent risk behavior. We use data from the National Longitudinal Survey of Youth 1997 to derive subpopulations of adolescents based on their future expectations with latent class analysis. Multinomial regression then determines which covariates from Bronfenbrenner's ecological systems theory are associated with class membership. After modeling these covariates, we examine whether future expectations is associated with delinquency, substance use, and sexual experience. Our analysis suggests the emergence of four distinct classes labeled the Student Expectations, Student/Drinking Expectations, Victim Expectations, and Drinking/Arrest Expectations classes according to their indicator profiles. These classes differ with respect to covariates associated with membership; furthermore, they are all statistically and differentially associated with at least one adolescent risk behavior. This analysis demonstrates the additional benefit derived from using this multidimensional approach for studying future expectations. Further research is needed to investigate its stability and role in predicting adolescent risk behavior over time.

  10. Spectral energy distribution analysis of class I and class II FU Orionis stars

    Energy Technology Data Exchange (ETDEWEB)

    Gramajo, Luciana V.; Gómez, Mercedes [Observatorio Astronómico, Universidad Nacional de Córdoba, Argentina, Laprida 854, 5000 Córdoba (Argentina); Rodón, Javier A., E-mail: luciana@oac.uncor.edu, E-mail: mercedes@oac.uncor.edu, E-mail: jrodon@eso.org [European Southern Observatory, Alonso de Córdova 3107, Vitacura, Casilla 19001, Santiago 19 (Chile)

    2014-06-01

    FU Orionis stars (FUors) are eruptive pre-main sequence objects thought to represent quasi-periodic or recurring stages of enhanced accretion during the low-mass star-forming process. We characterize the sample of known and candidate FUors in a homogeneous and consistent way, deriving stellar and circumstellar parameters for each object. We emphasize the analysis in those parameters that are supposed to vary during the FUor stage. We modeled the spectral energy distributions of 24 of the 26 currently known FUors, using the radiative transfer code of Whitney et al. We compare our models with those obtained by Robitaille et al. for Taurus class II and I sources in quiescence periods by calculating the cumulative distribution of the different parameters. FUors have more massive disks: we find that ∼80% of the disks in FUors are more massive than any Taurus class II and I sources in the sample. Median values for the disk mass accretion rates are ∼10{sup –7} M {sub ☉} yr{sup –1} versus ∼10{sup –5} M {sub ☉} yr{sup –1} for standard young stellar objects (YSOs) and FUors, respectively. While the distributions of envelope mass accretion rates for class I FUors and standard class I objects are similar, FUors, on average, have higher envelope mass accretion rates than standard class II and class I sources. Most FUors (∼70%) have envelope mass accretion rates above 10{sup –7} M {sub ☉} yr{sup –1}. In contrast, 60% of the classical YSO sample has an accretion rate below this value. Our results support the current scenario in which changes experimented by the circumstellar disk explain the observed properties of these stars. However, the increase in the disk mass accretion rate is smaller than theoretically predicted, although in good agreement with previous determinations.

  11. Identifying clinical course patterns in SMS data using cluster analysis

    DEFF Research Database (Denmark)

    Kent, Peter; Kongsted, Alice

    2012-01-01

    ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically importa...... of cluster analysis. More research is needed, especially head-to-head studies, to identify which technique is best to use under what circumstances.......ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically important...... by spline analysis. However, cluster analysis of SMS data in its original untransformed form may be simpler and offer other advantages. Therefore, the aim of this study was to determine whether cluster analysis could be used for identifying clinical course patterns distinct from the pattern of the whole...

  12. Interactive exploration of uncertainty in fuzzy classifications by isosurface visualization of class clusters

    NARCIS (Netherlands)

    Lucieer, A.; Veen, L.E.

    2009-01-01

    Uncertainty and vagueness are important concepts when dealing with transition zones between vegetation communities or land-cover classes. In this study, classification uncertainty is quantified by applying a supervised fuzzy classification algorithm. New visualization techniques are proposed and pre

  13. Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters III: Analysis of 30 Clusters

    CERN Document Server

    Wagner-Kaiser, R; Sarajedini, A; von Hippel, T; van Dyk, D A; Robinson, E; Stein, N; Jefferys, W H

    2016-01-01

    We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic Globular Clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within each cluster and the relative proportion of those populations. We find the helium differences among the two populations in the clusters fall in the range of ~0.04 to 0.11. Because adequate models varying in CNO are not presently available, we view these spreads as upper limits and present them with statistical rather than observational uncertainties. Evidence supports previous studies suggesting an increase in helium content concurrent with increasing mass of the cluster and also find that the proportion of the first population of stars increases with mass as well. Our results are examined in the context of proposed g...

  14. Technology Clusters Exploration for Patent Portfolio through Patent Abstract Analysis

    Directory of Open Access Journals (Sweden)

    Gabjo Kim

    2016-12-01

    Full Text Available This study explores technology clusters through patent analysis. The aim of exploring technology clusters is to grasp competitors’ levels of sustainable research and development (R&D and establish a sustainable strategy for entering an industry. To achieve this, we first grouped the patent documents with similar technologies by applying affinity propagation (AP clustering, which is effective while grouping large amounts of data. Next, in order to define the technology clusters, we adopted the term frequency-inverse document frequency (TF-IDF weight, which lists the terms in order of importance. We collected the patent data of Korean electric car companies from the United States Patent and Trademark Office (USPTO to verify our proposed methodology. As a result, our proposed methodology presents more detailed information on the Korean electric car industry than previous studies.

  15. Confidence in Government and Attitudes toward Bribery: A Country-Cluster Analysis of Demographic and Religiosity Perspectives

    Directory of Open Access Journals (Sweden)

    Serkan Benk

    2017-01-01

    Full Text Available In this study, we try to classify the countries by the levels of confidence in government and attitudes toward accepting bribery by using the data of the sixth wave (2010–2014 of the World Values Survey (WVS. We are also interested in which demographic, attitudinal, and religiosity variables affect each class of countries. For these purposes cluster analysis, linear regression analysis, and ordered logistic regression analysis were used. The study found that countries could be grouped into two clusters which had varying levels of opposition to bribe taking and confidence in government. Another finding was that certain demographic, attitudinal, and religiosity variables that were significant in one cluster might not be significant in another cluster.

  16. An Empirical Analysis of Rough Set Categorical Clustering Techniques

    Science.gov (United States)

    2017-01-01

    Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy. PMID:28068344

  17. Cognitive analysis of multiple sclerosis utilizing fuzzy cluster means

    Directory of Open Access Journals (Sweden)

    Imianvan Anthony Agboizebeta

    2012-01-01

    Full Text Available Multiple sclerosis, often called MS, is a disease that affects the central nervous system (the brain and spinal cord. Myelin provides insulation for nerve cells improves the conduction of impulses along the nerves and is important for maintaining the health of the nerves. In multiple sclerosis, inflammation causes the myelin to disappear. Genetic factors, environmental issues and viral infection may also play a role in developing the disease. Ms is characterized by life threatening symptoms such as; loss of balance, hearing problem and depression. The application of Fuzzy Cluster Means (FCM or Fuzzy CMean analysis to the diagnosis of different forms of multiple sclerosis is the focal point of this paper. Application of cluster analysis involves a sequence of methodological and analytical decision steps that enhances the quality and meaning of the clusters produced. Uncertainties associated with analysis of multiple sclerosis test data are eliminated by the system

  18. Analysis of Asset Classes Through the Business Cycle

    Directory of Open Access Journals (Sweden)

    Audrius Dzikevičius

    2012-06-01

    Full Text Available This study was driven by the dissimilar performance characteristics displayed by asset classes over the business cycle. The authors aim to explore assets classes on the grounds of a scientific literature review and a statistical analysis. Business cycles are divided into four stages to explore broad movements in returns of asset classes and a possible existence of asymmetrical effects of determinants within stages. Six main asset classes were analysed: US stocks, EAFE stocks, Bonds, Gold, Real Estate and Commodities. Monthly data from February 1976 to August 2011 were used for the study. The article combines business cycle and asset allocation theories by adding valuable information about performance of asset classes during different phases of the business cycle. Using the OECD Composite Leading Indicator as a business cycle measure, the authors demonstrate that different assets classes have different return/risk characteristics over the business cycle. The article demonstrates how to use the business cycle approach for investment decision-making. The OECD Composite Leading Indicator can provide significant information on market expectations and the future outlook; hence, results of this study can help every investor improve his/her performance and risk management.

  19. Cluster Analysis of Metal Concentrations in River Kubanni Zaria, Nigeria

    Directory of Open Access Journals (Sweden)

    A.W. Butu

    2013-08-01

    Full Text Available The cluster analysis was used to assess the degree of association of the metal concentrations in river Kubanni Zaria, Nigeria. The main sources of data for the analysis were the sediment from four distinct locations along the long profile Kubanni River which were analyzed using Instrumental Nitrogen Activities Analysis (INAA techniques. The Nigerian Research Reactor-1(NIRR-1 which is Miniature Nitrogen Source Reactor (MNSR was used to analyze the data. The result of the laboratory analysis was subjected to cluster analysis. The analysis shows a stable clustering system where the metal concentrations in the four different locations were grouped into two main groups with one outlier. The level of concentration of elements that were sampled in the dry months were cluster in group I and those collected in the raining months were in group II. This strongly support that there is temporal variation in the levels of concentration of metal contaminants between wet and dry seasons in river Kubanni and also confirms the fact that the elements that were collected in the wet season are from the same source and those in the dry season are also from common source.

  20. Identifying subgroups of patients using latent class analysis

    DEFF Research Database (Denmark)

    Nielsen, Anne Mølgaard; Kent, Peter; Hestbæk, Lise

    2017-01-01

    BACKGROUND: Heterogeneity in patients with low back pain (LBP) is well recognised and different approaches to subgrouping have been proposed. Latent Class Analysis (LCA) is a statistical technique that is increasingly being used to identify subgroups based on patient characteristics. However, as ...

  1. Stability Analysis for Class of Switched Nonlinear Systems

    DEFF Research Database (Denmark)

    Shaker, Hamid Reza; How, Jonathan P.

    2010-01-01

    Stability analysis for a class of switched nonlinear systems is addressed in this paper. Two linear matrix inequality (LMI) based sufficient conditions for asymptotic stability are proposed for switched nonlinear systems. These conditions are analogous counterparts for switched linear systems which...

  2. Frontiers of Performance Analysis on Leadership-Class Systems

    Energy Technology Data Exchange (ETDEWEB)

    Fowler, R J; Adhianto, L; de Supinski, B R; Fagan, M; Gamblin, T; Krentel, M; Mellor-Crummey, J; Schulz, M; Tallent, N

    2009-06-15

    The number of cores in high-end systems for scientific computing are employing is increasing rapidly. As a result, there is an pressing need for tools that can measure, model, and diagnose performance problems in highly-parallel runs. We describe two tools that employ complementary approaches for analysis at scale and we illustrate their use on DOE leadership-class systems.

  3. Multilevel Latent Class Analysis: Parametric and Nonparametric Models

    Science.gov (United States)

    Finch, W. Holmes; French, Brian F.

    2014-01-01

    Latent class analysis is an analytic technique often used in educational and psychological research to identify meaningful groups of individuals within a larger heterogeneous population based on a set of variables. This technique is flexible, encompassing not only a static set of variables but also longitudinal data in the form of growth mixture…

  4. A novel PPGA-based clustering analysis method for business cycle indicator selection

    Institute of Scientific and Technical Information of China (English)

    Dabin ZHANG; Lean YU; Shouyang WANG; Yingwen SONG

    2009-01-01

    A new clustering analysis method based on the pseudo parallel genetic algorithm (PPGA) is proposed for business cycle indicator selection. In the proposed method,the category of each indicator is coded by real numbers,and some illegal chromosomes are repaired by the identi-fication arid restoration of empty class. Two mutation op-erators, namely the discrete random mutation operator andthe optimal direction mutation operator, are designed to bal-ance the local convergence speed and the global convergence performance, which are then combined with migration strat-egy and insertion strategy. For the purpose of verification and illustration, the proposed method is compared with the K-means clustering algorithm and the standard genetic algo-rithms via a numerical simulation experiment. The experi-mental result shows the feasibility and effectiveness of the new PPGA-based clustering analysis algorithm. Meanwhile,the proposed clustering analysis algorithm is also applied to select the business cycle indicators to examine the status of the macro economy. Empirical results demonstrate that the proposed method can effectively and correctly select some leading indicators, coincident indicators, and lagging indi-cators to reflect the business cycle, which is extremely op-erational for some macro economy administrative managers and business decision-makers.

  5. Examination of European Union economic cohesion: A cluster analysis approach

    Directory of Open Access Journals (Sweden)

    Jiri Mazurek

    2014-01-01

    Full Text Available In the past years majority of EU members experienced the highest economic decline in their modern history, but impacts of the global financial crisis were not distributed homogeneously across the continent. The aim of the paper is to examine a cohesion of European Union (plus Norway and Iceland in terms of an economic development of its members from the 1st of January 2008 to the 31st of December 2012. For the study five economic indicators were selected: GDP growth, unemployment, inflation, labour productivity and government debt. Annual data from Eurostat databases were averaged over the whole period and then used as an input for a cluster analysis. It was found that EU countries were divided into six different clusters. The most populated cluster with 14 countries covered Central and West Europe and reflected relative homogeneity of this part of Europe. Countries of Southern Europe (Greece, Portugal and Spain shared their own cluster of the most affected countries by the recent crisis as well as the Baltics and the Balkans states in another cluster. On the other hand Slovakia and Poland, only two countries that escaped a recession, were classified in their own cluster of the most successful countries

  6. Sun Protection Belief Clusters: Analysis of Amazon Mechanical Turk Data.

    Science.gov (United States)

    Santiago-Rivas, Marimer; Schnur, Julie B; Jandorf, Lina

    2016-12-01

    This study aimed (i) to determine whether people could be differentiated on the basis of their sun protection belief profiles and individual characteristics and (ii) explore the use of a crowdsourcing web service for the assessment of sun protection beliefs. A sample of 500 adults completed an online survey of sun protection belief items using Amazon Mechanical Turk. A two-phased cluster analysis (i.e., hierarchical and non-hierarchical K-means) was utilized to determine clusters of sun protection barriers and facilitators. Results yielded three distinct clusters of sun protection barriers and three distinct clusters of sun protection facilitators. Significant associations between gender, age, sun sensitivity, and cluster membership were identified. Results also showed an association between barrier and facilitator cluster membership. The results of this study provided a potential alternative approach to developing future sun protection promotion initiatives in the population. Findings add to our knowledge regarding individuals who support, oppose, or are ambivalent toward sun protection and inform intervention research by identifying distinct subtypes that may best benefit from (or have a higher need for) skin cancer prevention efforts.

  7. Classification of Two Class Motor Imagery Tasks Using Hybrid GA-PSO Based K-Means Clustering

    Directory of Open Access Journals (Sweden)

    Suraj

    2015-01-01

    Full Text Available Transferring the brain computer interface (BCI from laboratory condition to meet the real world application needs BCI to be applied asynchronously without any time constraint. High level of dynamism in the electroencephalogram (EEG signal reasons us to look toward evolutionary algorithm (EA. Motivated by these two facts, in this work a hybrid GA-PSO based K-means clustering technique has been used to distinguish two class motor imagery (MI tasks. The proposed hybrid GA-PSO based K-means clustering is found to outperform genetic algorithm (GA and particle swarm optimization (PSO based K-means clustering techniques in terms of both accuracy and execution time. The lesser execution time of hybrid GA-PSO technique makes it suitable for real time BCI application. Time frequency representation (TFR techniques have been used to extract the feature of the signal under investigation. TFRs based features are extracted and relying on the concept of event related synchronization (ERD and desynchronization (ERD feature vector is formed.

  8. Bayesian analysis of two stellar populations in Galactic globular clusters- III. Analysis of 30 clusters

    Science.gov (United States)

    Wagner-Kaiser, R.; Stenning, D. C.; Sarajedini, A.; von Hippel, T.; van Dyk, D. A.; Robinson, E.; Stein, N.; Jefferys, W. H.

    2016-12-01

    We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic globular clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within each cluster and the relative proportion of those populations. We find the helium differences among the two populations in the clusters fall in the range of ˜0.04 to 0.11. Because adequate models varying in carbon, nitrogen, and oxygen are not presently available, we view these spreads as upper limits and present them with statistical rather than observational uncertainties. Evidence supports previous studies suggesting an increase in helium content concurrent with increasing mass of the cluster and we also find that the proportion of the first population of stars increases with mass as well. Our results are examined in the context of proposed globular cluster formation scenarios. Additionally, we leverage our Bayesian technique to shed light on the inconsistencies between the theoretical models and the observed data.

  9. Integrative Molecular Analysis of Intrahepatic Cholangiocarcinoma Reveals 2 Classes That Have Different Outcomes

    Science.gov (United States)

    SIA, DANIELA; HOSHIDA, YUJIN; VILLANUEVA, AUGUSTO; ROAYAIE, SASAN; FERRER, JOANA; TABAK, BARBARA; PEIX, JUDIT; SOLE, MANEL; TOVAR, VICTORIA; ALSINET, CLARA; CORNELLA, HELENA; KLOTZLE, BRANDY; FAN, JIAN–BING; COTSOGLOU, CHRISTIAN; THUNG, SWAN N.; FUSTER, JOSEP; WAXMAN, SAMUEL; GARCIA–VALDECASAS, JUAN CARLOS; BRUIX, JORDI; SCHWARTZ, MYRON E.; BEROUKHIM, RAMEEN; MAZZAFERRO, VINCENZO; LLOVET, JOSEP M.

    2013-01-01

    BACKGROUND & AIMS Cholangiocarcinoma, the second most common liver cancer, can be classified as intra-hepatic cholangiocarcinoma (ICC) or extrahepatic cholangiocarcinoma. We performed an integrative genomic analysis of ICC samples from a large series of patients. METHODS We performed a gene expression profile, high-density single-nucleotide polymorphism array, and mutation analyses using formalin-fixed ICC samples from 149 patients. Associations with clinicopathologic traits and patient outcomes were examined for 119 cases. Class discovery was based on a non-negative matrix factorization algorithm and significant copy number variations were identified by GISTIC analysis. Gene set enrichment analysis was used to identify signaling pathways activated in specific molecular classes of tumors, and to analyze their genomic overlap with hepatocellular carcinoma (HCC). RESULTS We identified 2 main biological classes of ICC. The inflammation class (38% of ICCs) is characterized by activation of inflammatory signaling pathways, overexpression of cytokines, and STAT3 activation. The proliferation class (62%) is characterized by activation of oncogenic signaling pathways (including RAS, mitogen-activated protein kinase, and MET), DNA amplifications at 11q13.2, deletions at 14q22.1, mutations in KRAS and BRAF, and gene expression signatures previously associated with poor outcomes for patients with HCC. Copy number variation– based clustering was able to refine these molecular groups further. We identified high-level amplifications in 5 regions, including 1p13 (9%) and 11q13.2 (4%), and several focal deletions, such as 9p21.3 (18%) and 14q22.1 (12% in coding regions for the SAV1 tumor suppressor). In a complementary approach, we identified a gene expression signature that was associated with reduced survival times of patients with ICC; this signature was enriched in the proliferation class (P < .001). CONCLUSIONS We used an integrative genomic analysis to identify 2 classes

  10. Cluster analysis of knowledge sources in standardized electrical engineering subfields

    Directory of Open Access Journals (Sweden)

    Blagojević Marija

    2016-01-01

    Full Text Available The paper presents a cluster analysis of innovation of knowledge sources based on the standards in the field of Electrical Engineering. Both local (SRPS and global (ISO knowledge sources have been analysed with the aim of innovating a Knowledge Base (KB. The results presented indicate a means/possibility of grouping the subfields within a cluster. They also point to a trend or intensity of knowledge source innovation for the purpose of innovating the KB that accompanies innovations. The study provides the possibility of predicting necessary financial resources in the forthcoming period by means of original mathematical relations. Furthermore, the cluster analysis facilitates the comparison of the innovation intensity in this and other (subfields. Future work relates to the monitoring of the knowledge source innovation by means of KB engineering and improvement of the methodology of prediction using neural networks.

  11. A Geometric Analysis of Subspace Clustering with Outliers

    CERN Document Server

    Soltanolkotabi, Mahdi

    2011-01-01

    This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named {\\em sparse subspace clustering} (SSC) \\cite{Elhamifar09}, which significantly broadens the range of problems where it is provably effective. For instance, we show that SSC can recover multiple subspaces, each of dimension comparable to the ambient dimension. We also prove that SSC can correctly cluster data points even when the subspaces of interest intersect. Further, we develop an extension of SSC that succeeds when the data set is corrupted with possibly overwhelmingly many outliers. Underlying our analysis are clear geometric insights, which may bear on other sparse recovery problems. A numerical study complements our theoretica...

  12. Cluster analysis of WIBS single-particle bioaerosol data

    Science.gov (United States)

    Robinson, N. H.; Allan, J. D.; Huffman, J. A.; Kaye, P. H.; Foot, V. E.; Gallagher, M.

    2013-02-01

    Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial data sets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Wideband Integrated Bioaerosol Sensors (WIBSs). The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL) before being applied to two separate contemporaneous ambient WIBS data sets recorded in a forest site in Colorado, USA, as part of the BEACHON-RoMBAS project. Cluster analysis results between both data sets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity) to represent the following: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long-term online primary biological aerosol particle (PBAP) measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics are improved.

  13. Cluster analysis of WIBS single-particle bioaerosol data

    Directory of Open Access Journals (Sweden)

    N. H. Robinson

    2013-02-01

    Full Text Available Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial data sets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Wideband Integrated Bioaerosol Sensors (WIBSs. The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL before being applied to two separate contemporaneous ambient WIBS data sets recorded in a forest site in Colorado, USA, as part of the BEACHON-RoMBAS project. Cluster analysis results between both data sets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity to represent the following: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long-term online primary biological aerosol particle (PBAP measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics are improved.

  14. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    Science.gov (United States)

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  15. Transcriptional analysis of ESAT-6 cluster 3 in Mycobacterium smegmatis

    Directory of Open Access Journals (Sweden)

    Riccardi Giovanna

    2009-03-01

    Full Text Available Abstract Background The ESAT-6 (early secreted antigenic target, 6 kDa family collects small mycobacterial proteins secreted by Mycobacterium tuberculosis, particularly in the early phase of growth. There are 23 ESAT-6 family members in M. tuberculosis H37Rv. In a previous work, we identified the Zur- dependent regulation of five proteins of the ESAT-6/CFP-10 family (esxG, esxH, esxQ, esxR, and esxS. esxG and esxH are part of ESAT-6 cluster 3, whose expression was already known to be induced by iron starvation. Results In this research, we performed EMSA experiments and transcriptional analysis of ESAT-6 cluster 3 in Mycobacterium smegmatis (msmeg0615-msmeg0625 and M. tuberculosis. In contrast to what we had observed in M. tuberculosis, we found that in M. smegmatis ESAT-6 cluster 3 responds only to iron and not to zinc. In both organisms we identified an internal promoter, a finding which suggests the presence of two transcriptional units and, by consequence, a differential expression of cluster 3 genes. We compared the expression of msmeg0615 and msmeg0620 in different growth and stress conditions by means of relative quantitative PCR. The expression of msmeg0615 and msmeg0620 genes was essentially similar; they appeared to be repressed in most of the tested conditions, with the exception of acid stress (pH 4.2 where msmeg0615 was about 4-fold induced, while msmeg0620 was repressed. Analysis revealed that in acid stress conditions M. tuberculosis rv0282 gene was 3-fold induced too, while rv0287 induction was almost insignificant. Conclusion In contrast with what has been reported for M. tuberculosis, our results suggest that in M. smegmatis only IdeR-dependent regulation is retained, while zinc has no effect on gene expression. The role of cluster 3 in M. tuberculosis virulence is still to be defined; however, iron- and zinc-dependent expression strongly suggests that cluster 3 is highly expressed in the infective process, and that the cluster

  16. Learning From Hidden Traits: Joint Factor Analysis and Latent Clustering

    Science.gov (United States)

    Yang, Bo; Fu, Xiao; Sidiropoulos, Nicholas D.

    2017-01-01

    Dimensionality reduction techniques play an essential role in data analytics, signal processing and machine learning. Dimensionality reduction is usually performed in a preprocessing stage that is separate from subsequent data analysis, such as clustering or classification. Finding reduced-dimension representations that are well-suited for the intended task is more appealing. This paper proposes a joint factor analysis and latent clustering framework, which aims at learning cluster-aware low-dimensional representations of matrix and tensor data. The proposed approach leverages matrix and tensor factorization models that produce essentially unique latent representations of the data to unravel latent cluster structure -- which is otherwise obscured because of the freedom to apply an oblique transformation in latent space. At the same time, latent cluster structure is used as prior information to enhance the performance of factorization. Specific contributions include several custom-built problem formulations, corresponding algorithms, and discussion of associated convergence properties. Besides extensive simulations, real-world datasets such as Reuters document data and MNIST image data are also employed to showcase the effectiveness of the proposed approaches.

  17. A Cluster Analysis of Personality Style in Adults with ADHD

    Science.gov (United States)

    Robin, Arthur L.; Tzelepis, Angela; Bedway, Marquita

    2008-01-01

    Objective: The purpose of this study was to use hierarchical linear cluster analysis to examine the normative personality styles of adults with ADHD. Method: A total of 311 adults with ADHD completed the Millon Index of Personality Styles, which consists of 24 scales assessing motivating aims, cognitive modes, and interpersonal behaviors. Results:…

  18. Interaction of Fanaroff-Riley class II jets with a magnetised intra-cluster medium

    CERN Document Server

    Huarte-Espinosa, Martín; Alexander, Paul

    2011-01-01

    We present 3-D MHD and synthetic numerical simulations to follow the evolution of randomly magnetized intra-cluster medium plasma under the effects of powerful, light, hypersonic and bipolar jets. We prescribe the cluster magnetic field (CMF) as a Gaussian random field with power law energy spectrum tuned to the expectation for Kolmogorov turbulence. We investigate the power of jets and the viewing angle used for the synthetic Rotation Measure (RM) observations. We find the model radio sources introduce and amplify fluctuations on the RM statistical properties; the average RM and the RM standard deviation are increased by the action of the jets. This may lead to overestimations of the CMFs' strength up to 70%. The effect correlates with the jet power. Jets distort and amplify CMFs especially near the edges of the lobes and the jets' heads. Thus the RM structure functions are flattened at scales comparable to the source size. Jet-produced RM enhancements depend on the orientation of the jet axis to the line of...

  19. Outcome-Driven Cluster Analysis with Application to Microarray Data.

    Directory of Open Access Journals (Sweden)

    Jessie J Hsu

    Full Text Available One goal of cluster analysis is to sort characteristics into groups (clusters so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes into groups of highly correlated genes that have the same effect on the outcome (recovery. We propose a random effects model where the genes within each group (cluster equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.

  20. Improving Cluster Analysis with Automatic Variable Selection Based on Trees

    Science.gov (United States)

    2014-12-01

    ANALYSIS WITH AUTOMATIC VARIABLE SELECTION BASED ON TREES by Anton D. Orr December 2014 Thesis Advisor: Samuel E. Buttrey Second Reader...DATES COVERED Master’s Thesis 4. TITLE AND SUBTITLE IMPROVING CLUSTER ANALYSIS WITH AUTOMATIC VARIABLE SELECTION BASED ON TREES 5. FUNDING NUMBERS 6...2006 based on classification and regression trees to address problems with determining dissimilarity. Current algorithms do not simultaneously address

  1. Cognitive analysis of multiple sclerosis utilizing fuzzy cluster means

    Directory of Open Access Journals (Sweden)

    Imianvan Anthony Agboizebeta

    2012-02-01

    Full Text Available Multiple sclerosis, often called MS, is a disease that affects the central nervous system (the brain andspinal cord. Myelin provides insulation for nerve cells improves the conduction of impulses along thenerves and is important for maintaining the health of the nerves. In multiple sclerosis, inflammationcauses the myelin to disappear. Genetic factors, environmental issues and viral infection may alsoplay a role in developing the disease. Ms is characterized by life threatening symptoms such as; loss ofbalance, hearing problem and depression. The application of Fuzzy Cluster Means (FCM or Fuzzy CMeananalysis to the diagnosis of different forms of multiple sclerosis is the focal point of this paper.Application of cluster analysis involves a sequence of methodological and analytical decision stepsthat enhances the quality and meaning of the clusters produced. Uncertainties associated withanalysis of multiple sclerosis test data are eliminated by the system

  2. Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes

    KAUST Repository

    Cannistraci, Carlo

    2010-09-01

    Motivation: Nonlinear small datasets, which are characterized by low numbers of samples and very high numbers of measures, occur frequently in computational biology, and pose problems in their investigation. Unsupervised hybrid-two-phase (H2P) procedures-specifically dimension reduction (DR), coupled with clustering-provide valuable assistance, not only for unsupervised data classification, but also for visualization of the patterns hidden in high-dimensional feature space. Methods: \\'Minimum Curvilinearity\\' (MC) is a principle that-for small datasets-suggests the approximation of curvilinear sample distances in the feature space by pair-wise distances over their minimum spanning tree (MST), and thus avoids the introduction of any tuning parameter. MC is used to design two novel forms of nonlinear machine learning (NML): Minimum Curvilinear embedding (MCE) for DR, and Minimum Curvilinear affinity propagation (MCAP) for clustering. Results: Compared with several other unsupervised and supervised algorithms, MCE and MCAP, whether individually or combined in H2P, overcome the limits of classical approaches. High performance was attained in the visualization and classification of: (i) pain patients (proteomic measurements) in peripheral neuropathy; (ii) human organ tissues (genomic transcription factor measurements) on the basis of their embryological origin. Conclusion: MC provides a valuable framework to estimate nonlinear distances in small datasets. Its extension to large datasets is prefigured for novel NMLs. Classification of neuropathic pain by proteomic profiles offers new insights for future molecular and systems biology characterization of pain. Improvements in tissue embryological classification refine results obtained in an earlier study, and suggest a possible reinterpretation of skin attribution as mesodermal. © The Author(s) 2010. Published by Oxford University Press.

  3. Robustness analysis for a class of nonlinear descriptor systems

    Institute of Scientific and Technical Information of China (English)

    吴敏; 张凌波; 何勇

    2004-01-01

    The robustness analysis problem of a class of nonlinear descriptor systems is studied. Nonlinear matrix inequality which has the good computation property of convex feasibility is employed to derive some sufficient conditions to guarantee that the nonlinear descriptor systems have robust disturbance attenuation performance, which avoids the computational difficulties in conversing nonlinear matrix and Hamilton-Jacobi inequality. The computation property of convex feasibility of nonlinear matrix inequality makes it possible to apply the results of nonlinear robust control to practice.

  4. DGA Clustering and Analysis: Mastering Modern, Evolving Threats, DGALab

    Directory of Open Access Journals (Sweden)

    Alexander Chailytko

    2016-05-01

    Full Text Available Domain Generation Algorithms (DGA is a basic building block used in almost all modern malware. Malware researchers have attempted to tackle the DGA problem with various tools and techniques, with varying degrees of success. We present a complex solution to populate DGA feed using reversed DGAs, third-party feeds, and a smart DGA extraction and clustering based on emulation of a large number of samples. Smart DGA extraction requires no reverse engineering and works regardless of the DGA type or initialization vector, while enabling a cluster-based analysis. Our method also automatically allows analysis of the whole malware family, specific campaign, etc. We present our system and demonstrate its abilities on more than 20 malware families. This includes showing connections between different campaigns, as well as comparing results. Most importantly, we discuss how to utilize the outcome of the analysis to create smarter protections against similar malware.

  5. Frailty phenotypes in the elderly based on cluster analysis

    DEFF Research Database (Denmark)

    Dato, Serena; Montesanto, Alberto; Lagani, Vincenzo

    2012-01-01

    genetic background on the frailty status is still questioned. We investigated the applicability of a cluster analysis approach based on specific geriatric parameters, previously set up and validated in a southern Italian population, to two large longitudinal Danish samples. In both cohorts, we identified...... groups of subjects homogeneous for their frailty status and characterized by different survival patterns. A subsequent survival analysis availing of Accelerated Failure Time models allowed us to formulate an operative index able to correlate classification variables with survival probability. From...... these models, we quantified the differential effect of various parameters on survival, and we estimated the heritability of the frailty phenotype by exploiting the twin pairs in our sample. These data suggest the presence of a genetic influence on the frailty variability and indicate that cluster analysis can...

  6. Full text clustering and relationship network analysis of biomedical publications.

    Directory of Open Access Journals (Sweden)

    Renchu Guan

    Full Text Available Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  7. The Productivity Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, E.

    2014-07-01

    Chennai, also called the Detroit of India, is India's second fastest growing auto market and exports auto components and vehicles to US, Germany, Japan and Brazil. For inclusive growth and sustainable development, 250 auto component industries in Ambattur, Thirumalisai and Thirumudivakkam Industrial Estates located in Chennai have adopted the Cluster Development Approach called Automotive Component Cluster. The objective is to study the Value Chain, Correlation and Data Envelopment Analysis by determining technical efficiency, peer weights, input and output slacks of 100 auto component industries in three estates. The methodology adopted is using Data Envelopment Analysis of Output Oriented Banker Charnes Cooper model by taking net worth, fixed assets, employment as inputs and gross output as outputs. The non-zero represents the weights for efficient clusters. The higher slack obtained reveals the excess net worth, fixed assets, employment and shortage in gross output. To conclude, the variables are highly correlated and the inefficient industries should increase their gross output or decrease the fixed assets or employment. Moreover for sustainable development, the cluster should strengthen infrastructure, technology, procurement, production and marketing interrelationships to decrease costs and to increase productivity and efficiency to compete in the indigenous and export market.

  8. Kinematic gait patterns in healthy runners: A hierarchical cluster analysis.

    Science.gov (United States)

    Phinyomark, Angkoon; Osis, Sean; Hettinga, Blayne A; Ferber, Reed

    2015-11-01

    Previous studies have demonstrated distinct clusters of gait patterns in both healthy and pathological groups, suggesting that different movement strategies may be represented. However, these studies have used discrete time point variables and usually focused on only one specific joint and plane of motion. Therefore, the first purpose of this study was to determine if running gait patterns for healthy subjects could be classified into homogeneous subgroups using three-dimensional kinematic data from the ankle, knee, and hip joints. The second purpose was to identify differences in joint kinematics between these groups. The third purpose was to investigate the practical implications of clustering healthy subjects by comparing these kinematics with runners experiencing patellofemoral pain (PFP). A principal component analysis (PCA) was used to reduce the dimensionality of the entire gait waveform data and then a hierarchical cluster analysis (HCA) determined group sets of similar gait patterns and homogeneous clusters. The results show two distinct running gait patterns were found with the main between-group differences occurring in frontal and sagittal plane knee angles (Pgait strategies. These results suggest care must be taken when selecting samples of subjects in order to investigate the pathomechanics of injured runners.

  9. Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

    Science.gov (United States)

    Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

    2016-04-01

    Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.

  10. The Quantitative Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, Ethirajan

    2016-07-01

    Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity

  11. SED analysis of class I and class II FU Orionis stars

    CERN Document Server

    Gramajo, Luciana V; Gómez, Mercedes

    2014-01-01

    FU Orionis stars (FUORS) are eruptive pre-main sequence objects thought to represent quasi-periodic or recurring stages of enhanced accretion during the low-mass star-forming process. We characterize the sample of known and candidate FUORS in an homogeneous and consistent way, deriving stellar and circumstellar parameters for each object. We emphasize the analysis in those parameters that are supposed to vary during the FUORS stage. We modeled the SEDs of 24 of the 26 currently known FUORS, using the radiative transfer code of Whitney et al (2003b). We compare our models with those obtained by Robitaille et al. (2007) for Taurus class II and I sources in quiescence periods, by calculating the cumulative distribution of the different parameters. FUORS have more massive disks: we find that $\\sim80\\%$ of the disks in FUORS are more massive than any Taurus class II and I sources in the sample. Median values for the disk mass accretion rates are ~ 1.e-7 Msun/yr vs ~ 1.e-5 Msun/yr for standard YSOs (young stellar o...

  12. THE ANALYSIS OF ILLUSTRATIONS IN THE FOURTH CLASS GEOGRAPHY TEXTBOOKS

    Directory of Open Access Journals (Sweden)

    IOANA CHIRCEV

    2014-01-01

    Full Text Available This study focuses on the analysis of the illustrations found in five different Geography textbooks in Romania. The analysis is based on several criteria: number, size, clarity, pedagogical usefulness. The following conclusions have been drawn: the illustrations are numerous; most of the illustrations are too small and unclear to be efficiently used in the teaching activity; the purpose of some materials is purely illustrative; some illustrations are overcharged with details, which prevent children from understanding them. Authors and publishing houses are advised to choose the illustrations in the fourth class Geography textbooks more carefully.

  13. Subtyping demoralization in the medically ill by cluster analysis

    Directory of Open Access Journals (Sweden)

    Chiara Rafanelli

    2013-03-01

    Full Text Available Background and Objectives: There is increasing interest in the issue of demoralization, particularly in the setting of medical disease. The aim of this investigation was to use both DSM-IV comorbidity and the Diagnostic Criteria for Psychosomatic Research (DCPR in order to characterize demoralization in the medically ill. Methods: 1700 patients were recruited from 8 medical centers in the Italian Health System and 1560 agreed to participate. They all underwent a cross-sectional assessment with DSM-IV and DCPR structured interviews. 373 patients (23.9% received a diagnosis of demoralization. Data were submitted to cluster analysis. Results: Four clusters were identified: demoralization and comorbid depression; demoralization and comorbid somatoform/adjustment disorders; demoralization and comorbid anxiety; demoralization without any comorbid DSM disorder. The first cluster included 27.6% of the total sample and was characterized by the presence of DSM-IV mood disorders (mainly major depressive disorder. The second cluster had 18.2% of the cases and contained both DSM-IV somatoform (particularly, undifferentiated somatoform disorder and hypochondriasis and adjustment disorders. In the third cluster (24.7%, DSM-IV anxiety disorders in comorbidity with demoralization were predominant (particularly, generalized anxiety disorder, agoraphobia, panic disorder and obsessive-compulsive disorder. The fourth cluster had 29.5% of the patients and was characterized by the absence of any DSM-IV comorbid disorder. Conclusions: The findings indicate the need of expanding clinical assessment in the medically ill to include the various manifestations of demoralization as encompassed by the DCPR. Subtyping demoralization may yield improved targets for psychosomatic research and treatment trials.

  14. Bayesian Analysis of Multiple Populations in Galactic Globular Clusters

    Science.gov (United States)

    Wagner-Kaiser, Rachel A.; Sarajedini, Ata; von Hippel, Ted; Stenning, David; Piotto, Giampaolo; Milone, Antonino; van Dyk, David A.; Robinson, Elliot; Stein, Nathan

    2016-01-01

    We use GO 13297 Cycle 21 Hubble Space Telescope (HST) observations and archival GO 10775 Cycle 14 HST ACS Treasury observations of Galactic Globular Clusters to find and characterize multiple stellar populations. Determining how globular clusters are able to create and retain enriched material to produce several generations of stars is key to understanding how these objects formed and how they have affected the structural, kinematic, and chemical evolution of the Milky Way. We employ a sophisticated Bayesian technique with an adaptive MCMC algorithm to simultaneously fit the age, distance, absorption, and metallicity for each cluster. At the same time, we also fit unique helium values to two distinct populations of the cluster and determine the relative proportions of those populations. Our unique numerical approach allows objective and precise analysis of these complicated clusters, providing posterior distribution functions for each parameter of interest. We use these results to gain a better understanding of multiple populations in these clusters and their role in the history of the Milky Way.Support for this work was provided by NASA through grant numbers HST-GO-10775 and HST-GO-13297 from the Space Telescope Science Institute, which is operated by AURA, Inc., under NASA contract NAS5-26555. This material is based upon work supported by the National Aeronautics and Space Administration under Grant NNX11AF34G issued through the Office of Space Science. This project was supported by the National Aeronautics & Space Administration through the University of Central Florida's NASA Florida Space Grant Consortium.

  15. Stellar variability in open clusters. I. A new class of variable stars in NGC 3766

    CERN Document Server

    Mowlavi, N; Saesen, S; Eyer, L

    2013-01-01

    Aims. We analyze the population of periodic variable stars in the open cluster NGC 3766 based on a 7-year multi-band monitoring campaign conducted on the 1.2 m Swiss Euler telescope at La Silla, Chili. Methods. The data reduction, light curve cleaning and period search procedures, combined with the long observation time line, allow us to detect variability amplitudes down to the milli-magnitude level. The variability properties are complemented with the positions in the color-magnitude and color-color diagrams to classify periodic variable stars into distinct variability types. Results. We find a large population (36 stars) of new variable stars between the red edge of slowly pulsating B (SPB) stars and the blue edge of delta Sct stars, a region in the Hertzsprung-Russell (HR) diagram where no pulsation is predicted to occur based on standard stellar models. The bulk of their periods ranges from 0.1 to 0.7 d, with amplitudes between 1 and 4 mmag for the majority of them. About 20% of stars in that region of t...

  16. Applications of Cluster Analysis to the Creation of Perfectionism Profiles: A Comparison of two Clustering Approaches

    Directory of Open Access Journals (Sweden)

    Jocelyn H Bolin

    2014-04-01

    Full Text Available Although traditional clustering methods (e.g., K-means have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  17. An Interpretation of the Boshier-Collins Cluster Analysis Testing Houle's Typology.

    Science.gov (United States)

    Furst, Edward J.

    1986-01-01

    This article speculates on an underlying order obscured by the details of the Boshier-Collins cluster analysis and the mapping of Houle's types onto it. A table illustrates an interpretation of cluster analysis on Boshier's Education Participation Scale. (CT)

  18. Segment clustering methodology for unsupervised Holter recordings analysis

    Science.gov (United States)

    Rodríguez-Sotelo, Jose Luis; Peluffo-Ordoñez, Diego; Castellanos Dominguez, German

    2015-01-01

    Cardiac arrhythmia analysis on Holter recordings is an important issue in clinical settings, however such issue implicitly involves attending other problems related to the large amount of unlabelled data which means a high computational cost. In this work an unsupervised methodology based in a segment framework is presented, which consists of dividing the raw data into a balanced number of segments in order to identify fiducial points, characterize and cluster the heartbeats in each segment separately. The resulting clusters are merged or split according to an assumed criterion of homogeneity. This framework compensates the high computational cost employed in Holter analysis, being possible its implementation for further real time applications. The performance of the method is measure over the records from the MIT/BIH arrhythmia database and achieves high values of sensibility and specificity, taking advantage of database labels, for a broad kind of heartbeats types recommended by the AAMI.

  19. Data Preprocessing in Cluster Analysis of Gene Expression

    Institute of Scientific and Technical Information of China (English)

    杨春梅; 万柏坤; 高晓峰

    2003-01-01

    Considering that the DNA microarray technology has generated explosive gene expression data and that it is urgent to analyse and to visualize such massive datasets with efficient methods, we investigate the data preprocessing methods used in cluster analysis, normalization or logarithm of the matrix, by using hierarchical clustering, principal component analysis (PCA) and self-organizing maps (SOMs). The results illustrate that when using the Euclidean distance as measuring metrics, logarithm of relative expression level is the best preprocessing method, while data preprocessed by normalization cannot attain the expected results because the data structure is ruined. If there are only a few principal components, the PCA is an effective method to extract the frame structure, while SOMs are more suitable for a specific structure.

  20. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis.

    Directory of Open Access Journals (Sweden)

    Nan Lin

    Full Text Available Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis.

  1. Coupled Two-Way Clustering Analysis of Gene Microarray Data

    CERN Document Server

    Getz, G; Domany, E

    2000-01-01

    We present a novel coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task: we present an algorithm, based on iterative clustering, which performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.

  2. Coupled two-way clustering analysis of gene microarray data

    Science.gov (United States)

    Getz, Gad; Levine, Erel; Domany, Eytan

    2000-10-01

    We present a coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task. We present an algorithm, based on iterative clustering, that performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.

  3. Evolution and comparative analysis of the MHC Class III inflammatory region

    OpenAIRE

    Speed Terence P; Sims Sarah; Palmer Sophie; Coggill Penny; Cross Joseph GR; Belov Katherine; Papenfuss Anthony T; Deakin Janine E; Beck Stephan; Graves Jennifer

    2006-01-01

    Abstract Background The Major Histocompatibility Complex (MHC) is essential for immune function. Historically, it has been subdivided into three regions (Class I, II, and III), but a cluster of functionally related genes within the Class III region has also been referred to as the Class IV region or "inflammatory region". This group of genes is involved in the inflammatory response, and includes members of the tumour necrosis family. Here we report the sequencing, annotation and comparative a...

  4. Performance assessment of air quality monitoring networks using principal component analysis and cluster analysis

    Energy Technology Data Exchange (ETDEWEB)

    Lu, Wei-Zhen [Department of Building and Construction, City University of Hong Kong (China); He, Hong-Di [Department of Building and Construction, City University of Hong Kong (China); Logistics Research Center, Shanghai Maritime University, Shanghai (China); Dong, Li-yun [Shanghai Institute of Applied Mathematics and Mechanics, Shanghai University, Shanghai (China)

    2011-03-15

    This study aims to evaluate the performance of two statistical methods, principal component analysis and cluster analysis, for the management of air quality monitoring network of Hong Kong and the reduction of associated expenses. The specific objectives include: (i) to identify city areas with similar air pollution behavior; and (ii) to locate emission sources. The statistical methods were applied to the mass concentrations of sulphur dioxide (SO{sub 2}), respirable suspended particulates (RSP) and nitrogen dioxide (NO{sub 2}), collected in monitoring network of Hong Kong from January 2001 to December 2007. The results demonstrate that, for each pollutant, the monitoring stations are grouped into different classes based on their air pollution behaviors. The monitoring stations located in nearby area are characterized by the same specific air pollution characteristics and suggested with an effective management of air quality monitoring system. The redundant equipments should be transferred to other monitoring stations for allowing further enlargement of the monitored area. Additionally, the existence of different air pollution behaviors in the monitoring network is explained by the variability of wind directions across the region. The results imply that the air quality problem in Hong Kong is not only a local problem mainly from street-level pollutions, but also a region problem from the Pearl River Delta region. (author)

  5. Assessment of repeatability of composition of perfumed waters by high-performance liquid chromatography combined with numerical data analysis based on cluster analysis (HPLC UV/VIS - CA).

    Science.gov (United States)

    Ruzik, L; Obarski, N; Papierz, A; Mojski, M

    2015-06-01

    High-performance liquid chromatography (HPLC) with UV/VIS spectrophotometric detection combined with the chemometric method of cluster analysis (CA) was used for the assessment of repeatability of composition of nine types of perfumed waters. In addition, the chromatographic method of separating components of the perfume waters under analysis was subjected to an optimization procedure. The chromatograms thus obtained were used as sources of data for the chemometric method of cluster analysis (CA). The result was a classification of a set comprising 39 perfumed water samples with a similar composition at a specified level of probability (level of agglomeration). A comparison of the classification with the manufacturer's declarations reveals a good degree of consistency and demonstrates similarity between samples in different classes. A combination of the chromatographic method with cluster analysis (HPLC UV/VIS - CA) makes it possible to quickly assess the repeatability of composition of perfumed waters at selected levels of probability.

  6. Diagnostics of subtropical plants functional state by cluster analysis

    Directory of Open Access Journals (Sweden)

    Oksana Belous

    2016-05-01

    Full Text Available The article presents an application example of statistical methods for data analysis on diagnosis of the adaptive capacity of subtropical plants varieties. We depicted selection indicators and basic physiological parameters that were defined as diagnostic. We used evaluation on a set of parameters of water regime, there are: determination of water deficit of the leaves, determining the fractional composition of water and detection parameters of the concentration of cell sap (CCS (for tea culture flushes. These settings are characterized by high liability and high responsiveness to the effects of many abiotic factors that determined the particular care in the selection of plant material for analysis and consideration of the impact on sustainability. On the basis of the experimental data calculated the coefficients of pair correlation between climatic factors and used physiological indicators. The result was a selection of physiological and biochemical indicators proposed to assess the adaptability and included in the basis of methodical recommendations on diagnostics of the functional state of the studied cultures. Analysis of complex studies involving a large number of indicators is quite difficult, especially does not allow to quickly identify the similarity of new varieties for their adaptive responses to adverse factors, and, therefore, to set general requirements to conditions of cultivation. Use of cluster analysis suggests that in the analysis of only quantitative data; define a set of variables used to assess varieties (and the more sampling, the more accurate the clustering will happen, be sure to ascertain the measure of similarity (or difference between objects. It is shown that the identification of diagnostic features, which are subjected to statistical processing, impact the accuracy of the varieties classification. Selection in result of the mono-clusters analysis (variety tea Kolhida; hazelnut Lombardsky red; variety kiwi Monty

  7. Using cluster analysis in measuring social domain of territorial brand

    Directory of Open Access Journals (Sweden)

    Zlata Stepanova

    2009-10-01

    Full Text Available Territorial brand has a social dimension reflected in the social equilibrium and measurable with social effectiveness indicators. The paper offers social effectiveness analysis of territory using investigation object “territorial and social systems (TSS” with their further classification according to social types based on cluster analysis. This method allows the authors to distinct four social types of TSS in Sverdlovsk region in accordance with such characteristics as financial activity, quality of life, social stability and ill-being levels. The results of investigation could be useful for brand policy of territorial authorities.

  8. Monitoring Customer Satisfaction in Service Industry: A Cluster Analysis Approach

    Directory of Open Access Journals (Sweden)

    Matúš Horváth

    2012-10-01

    Full Text Available One of the key performance indicators of quality management system of an organization is customer satisfaction. The process of monitoring customer satisfaction is therefore an important part of the measuring processes of the quality management system. This paper deals with new ways how to analyse and monitor customer satisfaction using the analysis of data containing how the customers use the organisation services and customer leaving rates. The article used cluster analysis in this process for segmentation of customers with the aim to increase the accuracy of the results and on these results based decisions. The aplication example was created as a part of bachelor thesis.

  9. Monitoring Customer Satisfaction in Service Industry: A Cluster Analysis Approach

    Directory of Open Access Journals (Sweden)

    Matúš Horváth

    2012-11-01

    Full Text Available One of the key performance indicators of quality management system of an organization is customer satisfaction. The process of monitoring customer satisfaction is therefore an important part of the measuring processes of the quality management system. This paper deals with new ways how to analyse and monitor customer satisfaction using the analysis of data containing how the customers use the organisation services and customer leaving rates. The article used cluster analysis in this process for segmentation of customers with the aim to increase the accuracy of the results and on these results based decisions. The aplication example was created as a part of bachelor thesis.

  10. Weighing the Giants I: Weak Lensing Masses for 51 Massive Galaxy Clusters - Project Overview, Data Analysis Methods, and Cluster Images

    CERN Document Server

    von der Linden, Anja; Applegate, Douglas E; Kelly, Patrick L; Allen, Steven W; Ebeling, Harald; Burchat, Patricia R; Burke, David L; Donovan, David; Morris, R Glenn; Blandford, Roger; Erben, Thomas; Mantz, Adam

    2012-01-01

    This is the first in a series of papers in which we measure accurate weak-lensing masses for 51 of the most X-ray luminous galaxy clusters known at redshifts 0.15cluster experiments. The primary aim is to improve the absolute mass calibration of cluster observables, currently the dominant systematic uncertainty for cluster count experiments. Key elements of this work are the rigorous quantification of systematic uncertainties, high-quality data reduction and photometric calibration, and the "blind" nature of the analysis to avoid confirmation bias. Our target clusters are drawn from RASS X-ray catalogs, and provide a versatile calibration sample for many aspects of cluster cosmology. We have acquired wide-field, high-quality imaging using the Subaru and CFHT telescopes for all 51 clusters, in at least three bands per cluster. For a subset of 27 clusters, we have data in at least five bands, allowing accurate photo-z estimates of...

  11. Analysis and Prediction of Crimes by Clustering and Classification

    Directory of Open Access Journals (Sweden)

    Rasoul Kiani

    2015-08-01

    Full Text Available Crimes will somehow influence organizations and institutions when occurred frequently in a society. Thus, it seems necessary to study reasons, factors and relations between occurrence of different crimes and finding the most appropriate ways to control and avoid more crimes. The main objective of this paper is to classify clustered crimes based on occurrence frequency during different years. Data mining is used extensively in terms of analysis, investigation and discovery of patterns for occurrence of different crimes. We applied a theoretical model based on data mining techniques such as clustering and classification to real crime dataset recorded by police in England and Wales within 1990 to 2011. We assigned weights to the features in order to improve the quality of the model and remove low value of them. The Genetic Algorithm (GA is used for optimizing of Outlier Detection operator parameters using RapidMiner tool.

  12. Cluster Analysis and Fuzzy Query in Ship Maintenance and Design

    Science.gov (United States)

    Che, Jianhua; He, Qinming; Zhao, Yinggang; Qian, Feng; Chen, Qi

    Cluster analysis and fuzzy query win wide-spread applications in modern intelligent information processing. In allusion to the features of ship maintenance data, a variant of hypergraph-based clustering algorithm, i.e., Correlation Coefficient-based Minimal Spanning Tree(CC-MST), is proposed to analyze the bulky data rooting in ship maintenance process, discovery the unknown rules and help ship maintainers make a decision on various device fault causes. At the same time, revising or renewing an existed design of ship or device maybe necessary to eliminate those device faults. For the sake of offering ship designers some valuable hints, a fuzzy query mechanism is designed to retrieve the useful information from large-scale complicated and reluctant ship technical and testing data. Finally, two experiments based on a real ship device fault statistical dataset validate the flexibility and efficiency of the CC-MST algorithm. A fuzzy query prototype demonstrates the usability of our fuzzy query mechanism.

  13. Analysis of breast cancer progression using principal component analysis and clustering

    Indian Academy of Sciences (India)

    G Alexe; G S Dalgin; S Ganesan; C DeLisi; G Bhanot

    2007-08-01

    We develop a new technique to analyse microarray data which uses a combination of principal components analysis and consensus ensemble -clustering to find robust clusters and gene markers in the data. We apply our method to a public microarray breast cancer dataset which has expression levels of genes in normal samples as well as in three pathological stages of disease; namely, atypical ductal hyperplasia or ADH, ductal carcinoma in situ or DCIS and invasive ductal carcinoma or IDC. Our method averages over clustering techniques and data perturbation to find stable, robust clusters and gene markers. We identify the clusters and their pathways with distinct subtypes of breast cancer (Luminal, Basal and Her2+). We confirm that the cancer phenotype develops early (in early hyperplasia or ADH stage) and find from our analysis that each subtype progresses from ADH to DCIS to IDC along its own specific pathway, as if each was a distinct disease.

  14. Evaluation and Comparison of Intermaxillary Tooth Size Discrepancy among Class I, Class II Division 1, and Class III Subjects Using Bolton’s Analysis: An in vitro Study

    OpenAIRE

    Prasanna, A Lakshmi; Venkatramana, V; Aryasri, A Srikanth; Katta, Anil Kumar; K. Santhanakrishnan; Maheshwari, Uma

    2015-01-01

    Aim: The aim of the present study was to evaluation and comparison of intermaxillary tooth size discrepancy among Class I, Class II division 1, and Class III subjects using Bolton’s analysis. Materials and Methods: The pre-treatment casts were selected from the records of patients attending the Department of Orthodontics of Meenakshi Ammal Dental College, Chennai. The sample consists of 180 pre-treatment casts with both sexes evenly distributed with 60 casts in each type of malocclusion, i.e....

  15. [The hierarchical clustering analysis of hyperspectral image based on probabilistic latent semantic analysis].

    Science.gov (United States)

    Yi, Wen-Bin; Shen, Li; Qi, Yin-Feng; Tang, Hong

    2011-09-01

    The paper introduces the Probabilistic Latent Semantic Analysis (PLSA) to the image clustering and an effective image clustering algorithm using the semantic information from PLSA is proposed which is used for hyperspectral images. Firstly, the ISODATA algorithm is used to obtain the initial clustering result of hyperspectral image and the clusters of the initial clustering result are considered as the visual words of the PLSA. Secondly, the object-oriented image segmentation algorithm is used to partition the hyperspectral image and segments with relatively pure pixels are regarded as documents in PLSA. Thirdly, a variety of identification methods which can estimate the best number of cluster centers is combined to get the number of latent semantic topics. Then the conditional distributions of visual words in topics and the mixtures of topics in different documents are estimated by using PLSA. Finally, the conditional probabilistic of latent semantic topics are distinguished using statistical pattern recognition method, the topic type for each visual in each document will be given and the clustering result of hyperspectral image are then achieved. Experimental results show the clusters of the proposed algorithm are better than K-MEANS and ISODATA in terms of object-oriented property and the clustering result is closer to the distribution of real spatial distribution of surface.

  16. Integrated Data Analysis (IDCA) Program - PETN Class 4 Standard

    Energy Technology Data Exchange (ETDEWEB)

    Sandstrom, Mary M. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Brown, Geoffrey W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Preston, Daniel N. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Pollard, Colin J. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Warner, Kirstin F. [Naval Surface Warfare Center (NSWC), Indian Head, MD (United States). Indian Head Division; Sorensen, Daniel N. [Naval Surface Warfare Center (NSWC), Indian Head, MD (United States). Indian Head Division; Remmers, Daniel L. [Naval Surface Warfare Center (NSWC), Indian Head, MD (United States). Indian Head Division; Shelley, Timothy J. [Air Force Research Lab. (AFRL), Tyndall AFB, FL (United States); Reyes, Jose A. [Applied Research Associates, Tyndall AFB, FL (United States); Phillips, Jason J. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Hsu, Peter C. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Reynolds, John G. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2012-08-01

    The Integrated Data Collection Analysis (IDCA) program is conducting a proficiency study for Small- Scale Safety and Thermal (SSST) testing of homemade explosives (HMEs). Described here are the results for impact, friction, electrostatic discharge, and differential scanning calorimetry analysis of PETN Class 4. The PETN was found to have: 1) an impact sensitivity (DH50) range of 6 to 12 cm, 2) a BAM friction sensitivity (F50) range 7 to 11 kg, TIL (0/10) of 3.7 to 7.2 kg, 3) a ABL friction sensitivity threshold of 5 or less psig at 8 fps, 4) an ABL ESD sensitivity threshold of 0.031 to 0.326 j/g, and 5) a thermal sensitivity of an endothermic feature with Tmin = ~ 141 °C, and a exothermic feature with a Tmax = ~205°C.

  17. Emotional Psychological and Related Problems among Truant Youths: An Exploratory Latent Class Analysis

    Science.gov (United States)

    Dembo, Richard; Briones-Robinson, Rhissa; Ungaro, Rocio Aracelis; Gulledge, Laura M.; Karas, Lora M.; Winters, Ken C.; Belenko, Steven; Greenbaum, Paul E.

    2012-01-01

    Intervention Project. Results identified two classes of youths: Class 1(n=9) - youths with low levels of delinquency, mental health and substance abuse issues; and Class 2(n=37) - youths with high levels of these problems. Comparison of these two classes on their urine analysis test results and parent/guardian reports of traumatic events found…

  18. Cluster analysis application identifies muscle characteristics of importance for beef tenderness

    Directory of Open Access Journals (Sweden)

    Chriki Sghaier

    2012-12-01

    Full Text Available Abstract Background An important controversy in the relationship between beef tenderness and muscle characteristics including biochemical traits exists among meat researchers. The aim of this study is to explain variability in meat tenderness using muscle characteristics and biochemical traits available in the Integrated and Functional Biology of Beef (BIF-Beef database. The BIF-Beef data warehouse contains characteristic measurements from animal, muscle, carcass, and meat quality derived from numerous experiments. We created three classes for tenderness (high, medium, and low based on trained taste panel tenderness scores of all meat samples consumed (4,366 observations from 40 different experiments. For each tenderness class, the corresponding means for the mechanical characteristics, muscle fibre type, collagen content, and biochemical traits which may influence tenderness of the muscles were calculated. Results Our results indicated that lower shear force values were associated with more tender meat. In addition, muscles in the highest tenderness cluster had the lowest total and insoluble collagen contents, the highest mitochondrial enzyme activity (isocitrate dehydrogenase, the highest proportion of slow oxidative muscle fibres, the lowest proportion of fast-glycolytic muscle fibres, and the lowest average muscle fibre cross-sectional area. Results were confirmed by correlation analyses, and differences between muscle types in terms of biochemical characteristics and tenderness score were evidenced by Principal Component Analysis (PCA. When the cluster analysis was repeated using only muscle samples from m. Longissimus thoracis (LT, the results were similar; only contrasting previous results by maintaining a relatively constant fibre-type composition between all three tenderness classes. Conclusion Our results show that increased meat tenderness is related to lower shear forces, lower insoluble collagen and total collagen content, lower

  19. Cyber Profiling Using Log Analysis And K-Means Clustering

    Directory of Open Access Journals (Sweden)

    Muhammad Zulfadhilah

    2016-07-01

    Full Text Available The Activities of Internet users are increasing from year to year and has had an impact on the behavior of the users themselves. Assessment of user behavior is often only based on interaction across the Internet without knowing any others activities. The log activity can be used as another way to study the behavior of the user. The Log Internet activity is one of the types of big data so that the use of data mining with K-Means technique can be used as a solution for the analysis of user behavior. This study has been carried out the process of clustering using K-Means algorithm is divided into three clusters, namely high, medium, and low. The results of the higher education institution show that each of these clusters produces websites that are frequented by the sequence: website search engine, social media, news, and information. This study also showed that the cyber profiling had been done strongly influenced by environmental factors and daily activities.

  20. Clustered Numerical Data Analysis Using Markov Lie Monoid Based Networks

    Science.gov (United States)

    Johnson, Joseph

    2016-03-01

    We have designed and build an optimal numerical standardization algorithm that links numerical values with their associated units, error level, and defining metadata thus supporting automated data exchange and new levels of artificial intelligence (AI). The software manages all dimensional and error analysis and computational tracing. Tables of entities verses properties of these generalized numbers (called ``metanumbers'') support a transformation of each table into a network among the entities and another network among their properties where the network connection matrix is based upon a proximity metric between the two items. We previously proved that every network is isomorphic to the Lie algebra that generates continuous Markov transformations. We have also shown that the eigenvectors of these Markov matrices provide an agnostic clustering of the underlying patterns. We will present this methodology and show how our new work on conversion of scientific numerical data through this process can reveal underlying information clusters ordered by the eigenvalues. We will also show how the linking of clusters from different tables can be used to form a ``supernet'' of all numerical information supporting new initiatives in AI.

  1. Dynamical analysis of galaxy cluster merger Abell 2146

    CERN Document Server

    White, J A; King, L J; Lee, B E; Russell, H R; Baum, S A; Clowe, D I; Coleman, J E; Donahue, M; Edge, A C; Fabian, A C; Johnstone, R M; McNamara, B R; ODea, C P; Sanders, J S

    2015-01-01

    We present a dynamical analysis of the merging galaxy cluster system Abell 2146 using spectroscopy obtained with the Gemini Multi-Object Spectrograph on the Gemini North telescope. As revealed by the Chandra X-ray Observatory, the system is undergoing a major merger and has a gas structure indicative of a recent first core passage. The system presents two large shock fronts, making it unique amongst these rare systems. The hot gas structure indicates that the merger axis must be close to the plane of the sky and that the two merging clusters are relatively close in mass, from the observation of two shock fronts. Using 63 spectroscopically determined cluster members, we apply various statistical tests to establish the presence of two distinct massive structures. With the caveat that the system has recently undergone a major merger, the virial mass estimate is M_vir = 8.5 +4.3 -4.7 x 10 ^14 M_sol for the whole system, consistent with the mass determination in a previous study using the Sunyaev-Zeldovich signal....

  2. Covariance analysis of differential drag-based satellite cluster flight

    Science.gov (United States)

    Ben-Yaacov, Ohad; Ivantsov, Anatoly; Gurfil, Pini

    2016-06-01

    One possibility for satellite cluster flight is to control relative distances using differential drag. The idea is to increase or decrease the drag acceleration on each satellite by changing its attitude, and use the resulting small differential acceleration as a controller. The most significant advantage of the differential drag concept is that it enables cluster flight without consuming fuel. However, any drag-based control algorithm must cope with significant aerodynamical and mechanical uncertainties. The goal of the current paper is to develop a method for examination of the differential drag-based cluster flight performance in the presence of noise and uncertainties. In particular, the differential drag control law is examined under measurement noise, drag uncertainties, and initial condition-related uncertainties. The method used for uncertainty quantification is the Linear Covariance Analysis, which enables us to propagate the augmented state and filter covariance without propagating the state itself. Validation using a Monte-Carlo simulation is provided. The results show that all uncertainties have relatively small effect on the inter-satellite distance, even in the long term, which validates the robustness of the used differential drag controller.

  3. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A;

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing...

  4. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Årup; Frutiger, Sally A.;

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing. Hum. Brain Mapping 15...

  5. Multi-class texture analysis in colorectal cancer histology

    Science.gov (United States)

    Kather, Jakob Nikolas; Weis, Cleo-Aron; Bianconi, Francesco; Melchers, Susanne M.; Schad, Lothar R.; Gaiser, Timo; Marx, Alexander; Zöllner, Frank Gerrit

    2016-06-01

    Automatic recognition of different tissue types in histological images is an essential part in the digital pathology toolbox. Texture analysis is commonly used to address this problem; mainly in the context of estimating the tumour/stroma ratio on histological samples. However, although histological images typically contain more than two tissue types, only few studies have addressed the multi-class problem. For colorectal cancer, one of the most prevalent tumour types, there are in fact no published results on multiclass texture separation. In this paper we present a new dataset of 5,000 histological images of human colorectal cancer including eight different types of tissue. We used this set to assess the classification performance of a wide range of texture descriptors and classifiers. As a result, we found an optimal classification strategy that markedly outperformed traditional methods, improving the state of the art for tumour-stroma separation from 96.9% to 98.6% accuracy and setting a new standard for multiclass tissue separation (87.4% accuracy for eight classes). We make our dataset of histological images publicly available under a Creative Commons license and encourage other researchers to use it as a benchmark for their studies.

  6. A climatology of surface ozone in the extra tropics: cluster analysis of observations and model results

    Directory of Open Access Journals (Sweden)

    O. A. Tarasova

    2007-08-01

    Full Text Available Important aspects of the seasonal variations of surface ozone are discussed. The underlying analysis is based on the long-term (1990–2004 ozone records of Co-operative Programme for Monitoring and Evaluation of the Long-range Transmission of Air Pollutants in Europe (EMEP and the World Data Center of Greenhouse Gases which do have a strong Northern Hemisphere bias. Seasonal variations are pronounced at most of the 114 locations for any time of the day. Seasonal-diurnal variability classification using hierarchical agglomeration clustering reveals 5 distinct clusters: clean/rural, semi-polluted non-elevated, semi-polluted semi-elevated, elevated and polar/remote marine types. For the cluster "clean/rural" the seasonal maximum is observed in April, both for night and day. For those sites with a double maximum or a wide spring-summer maximum, the one in spring appears both for day and night, while the one in summer is more pronounced for daytime and hence can be attributed to photochemical processes. For the spring maximum photochemistry is a less plausible explanation as no dependence of the maximum timing is observed. More probably the spring maximum is caused by dynamical/transport processes. Using data from the 3-D atmospheric chemistry general circulation model ECHAM5/MESSy1 covering the period of 1998–2005 a comparison has been performed for the identified clusters. For the model data four distinct classes of variability are detected. The majority of cases are covered by the regimes with a spring seasonal maximum or with a broad spring-summer maximum (with prevailing summer. The regime with winter–early spring maximum is reproduced by the model for southern hemispheric locations. Background and semi-polluted sites appear in the model in the same cluster. The seasonality in this model cluster is characterized by a pronounced spring (May maximum. For the model cluster that covers partly semi-elevated semi-polluted sites the role of the

  7. Latent Class Analysis of Differential Item Functioning on the Peabody Picture Vocabulary Test-III

    Science.gov (United States)

    Webb, Mi-young Lee; Cohen, Allan S.; Schwanenflugel, Paula J.

    2008-01-01

    This study investigated the use of latent class analysis for the detection of differences in item functioning on the Peabody Picture Vocabulary Test-Third Edition (PPVT-III). A two-class solution for a latent class model appeared to be defined in part by ability because Class 1 was lower in ability than Class 2 on both the PPVT-III and the…

  8. The cosmological analysis of X-ray cluster surveys; III. Bypassing cluster mass measurements

    CERN Document Server

    Pierre, M; Faccioli, L; Clerc, N; Gastaud, R; Koulouridis, E; Pacaud, F

    2016-01-01

    Despite strong theoretical arguments, the use of clusters as cosmological probes is, in practice, frequently questioned because of the many uncertainties impinging on cluster mass estimates. Our aim is to develop a fully self-consistent cosmological approach of X-ray cluster surveys, exclusively based on observable quantities, rather than masses. This procedure is justified given the possibility to directly derive the cluster properties via ab initio modelling, either analytically or by using hydrodynamical simulations. In this third paper, we evaluate the method on cluster toy-catalogues. We model the population of detected clusters in the count-rate -- hardness-ratio -- angular size -- redshift space and compare the corresponding 4-dimensional diagram with theoretical predictions. The best cosmology+physics parameter configuration is determined using a simple minimisation procedure; errors on the parameters are derived by scanning the likelihood hyper-surfaces with a wide range of starting values. The metho...

  9. Convergence Analysis of a Class of Computational Intelligence Approaches

    Directory of Open Access Journals (Sweden)

    Junfeng Chen

    2013-01-01

    Full Text Available Computational intelligence approaches is a relatively new interdisciplinary field of research with many promising application areas. Although the computational intelligence approaches have gained huge popularity, it is difficult to analyze the convergence. In this paper, a computational model is built up for a class of computational intelligence approaches represented by the canonical forms of generic algorithms, ant colony optimization, and particle swarm optimization in order to describe the common features of these algorithms. And then, two quantification indices, that is, the variation rate and the progress rate, are defined, respectively, to indicate the variety and the optimality of the solution sets generated in the search process of the model. Moreover, we give four types of probabilistic convergence for the solution set updating sequences, and their relations are discussed. Finally, the sufficient conditions are derived for the almost sure weak convergence and the almost sure strong convergence of the model by introducing the martingale theory into the Markov chain analysis.

  10. Schooling, Masculinity and Class Analysis: Towards an Aesthetic of Subjectivities

    Science.gov (United States)

    Mac an Ghaill, Mairtin; Haywood, Chris

    2011-01-01

    The retreat from social class within the sociology of education has been accompanied by the intensification of socio-economic and cultural inequalities. This paper seeks to draw upon cultural analyses of social class by addressing a classificatory shift of white English working-class males, who have moved from an ascribed primary "socio-economic"…

  11. Time series clustering analysis of health-promoting behavior

    Science.gov (United States)

    Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng

    2013-10-01

    Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.

  12. Reliability analysis of cluster-based ad-hoc networks

    Energy Technology Data Exchange (ETDEWEB)

    Cook, Jason L. [Quality Engineering and System Assurance, Armament Research Development Engineering Center, Picatinny Arsenal, NJ (United States); Ramirez-Marquez, Jose Emmanuel [School of Systems and Enterprises, Stevens Institute of Technology, Castle Point on Hudson, Hoboken, NJ 07030 (United States)], E-mail: Jose.Ramirez-Marquez@stevens.edu

    2008-10-15

    The mobile ad-hoc wireless network (MAWN) is a new and emerging network scheme that is being employed in a variety of applications. The MAWN varies from traditional networks because it is a self-forming and dynamic network. The MAWN is free of infrastructure and, as such, only the mobile nodes comprise the network. Pairs of nodes communicate either directly or through other nodes. To do so, each node acts, in turn, as a source, destination, and relay of messages. The virtue of a MAWN is the flexibility this provides; however, the challenge for reliability analyses is also brought about by this unique feature. The variability and volatility of the MAWN configuration makes typical reliability methods (e.g. reliability block diagram) inappropriate because no single structure or configuration represents all manifestations of a MAWN. For this reason, new methods are being developed to analyze the reliability of this new networking technology. New published methods adapt to this feature by treating the configuration probabilistically or by inclusion of embedded mobility models. This paper joins both methods together and expands upon these works by modifying the problem formulation to address the reliability analysis of a cluster-based MAWN. The cluster-based MAWN is deployed in applications with constraints on networking resources such as bandwidth and energy. This paper presents the problem's formulation, a discussion of applicable reliability metrics for the MAWN, and illustration of a Monte Carlo simulation method through the analysis of several example networks.

  13. Variations in students' perceived reasons for, sources of, and forms of in-school discrimination: A latent class analysis.

    Science.gov (United States)

    Byrd, Christy M; Carter Andrews, Dorinda J

    2016-08-01

    Although there exists a healthy body of literature related to discrimination in schools, this research has primarily focused on racial or ethnic discrimination as perceived and experienced by students of color. Few studies examine students' perceptions of discrimination from a variety of sources, such as adults and peers, their descriptions of the discrimination, or the frequency of discrimination in the learning environment. Middle and high school students in a Midwestern school district (N=1468) completed surveys identifying whether they experienced discrimination from seven sources (e.g., peers, teachers, administrators), for seven reasons (e.g., gender, race/ethnicity, religion), and in eight forms (e.g., punished more frequently, called names, excluded from social groups). The sample was 52% White, 15% Black/African American, 14% Multiracial, and 17% Other. Latent class analysis was used to cluster individuals based on reported sources of, reasons for, and forms of discrimination. Four clusters were found, and ANOVAs were used to test for differences between clusters on perceptions of school climate, relationships with teachers, perceptions that the school was a "good school," and engagement. The Low Discrimination cluster experienced the best outcomes, whereas an intersectional cluster experienced the most discrimination and the worst outcomes. The results confirm existing research on the negative effects of discrimination. Additionally, the paper adds to the literature by highlighting the importance of an intersectional approach to examining students' perceptions of in-school discrimination.

  14. IPC two-color analysis of x ray galaxy clusters

    Science.gov (United States)

    White, Raymond E., III

    1990-01-01

    The mass distributions were determined of several clusters of galaxies by using X ray surface brightness data from the Einstein Observatory Imaging Proportional Counter (IPC). Determining cluster mass distributions is important for constraining the nature of the dark matter which dominates the mass of galaxies, galaxy clusters, and the Universe. Galaxy clusters are permeated with hot gas in hydrostatic equilibrium with the gravitational potentials of the clusters. Cluster mass distributions can be determined from x ray observations of cluster gas by using the equation of hydrostatic equilibrium and knowledge of the density and temperature structure of the gas. The x ray surface brightness at some distance from the cluster is the result of the volume x ray emissivity being integrated along the line of sight in the cluster.

  15. A new approach for computing a flood vulnerability index using cluster analysis

    Science.gov (United States)

    Fernandez, Paulo; Mourato, Sandra; Moreira, Madalena; Pereira, Luísa

    2016-08-01

    A Flood Vulnerability Index (FloodVI) was developed using Principal Component Analysis (PCA) and a new aggregation method based on Cluster Analysis (CA). PCA simplifies a large number of variables into a few uncorrelated factors representing the social, economic, physical and environmental dimensions of vulnerability. CA groups areas that have the same characteristics in terms of vulnerability into vulnerability classes. The grouping of the areas determines their classification contrary to other aggregation methods in which the areas' classification determines their grouping. While other aggregation methods distribute the areas into classes, in an artificial manner, by imposing a certain probability for an area to belong to a certain class, as determined by the assumption that the aggregation measure used is normally distributed, CA does not constrain the distribution of the areas by the classes. FloodVI was designed at the neighbourhood level and was applied to the Portuguese municipality of Vila Nova de Gaia where several flood events have taken place in the recent past. The FloodVI sensitivity was assessed using three different aggregation methods: the sum of component scores, the first component score and the weighted sum of component scores. The results highlight the sensitivity of the FloodVI to different aggregation methods. Both sum of component scores and weighted sum of component scores have shown similar results. The first component score aggregation method classifies almost all areas as having medium vulnerability and finally the results obtained using the CA show a distinct differentiation of the vulnerability where hot spots can be clearly identified. The information provided by records of previous flood events corroborate the results obtained with CA, because the inundated areas with greater damages are those that are identified as high and very high vulnerability areas by CA. This supports the fact that CA provides a reliable FloodVI.

  16. Natural product proteomining, a quantitative proteomics platform, allows rapid discovery of biosynthetic gene clusters for different classes of natural products.

    Science.gov (United States)

    Gubbens, Jacob; Zhu, Hua; Girard, Geneviève; Song, Lijiang; Florea, Bogdan I; Aston, Philip; Ichinose, Koji; Filippov, Dmitri V; Choi, Young H; Overkleeft, Herman S; Challis, Gregory L; van Wezel, Gilles P

    2014-06-19

    Information on gene clusters for natural product biosynthesis is accumulating rapidly because of the current boom of available genome sequencing data. However, linking a natural product to a specific gene cluster remains challenging. Here, we present a widely applicable strategy for the identification of gene clusters for specific natural products, which we name natural product proteomining. The method is based on using fluctuating growth conditions that ensure differential biosynthesis of the bioactivity of interest. Subsequent combination of metabolomics and quantitative proteomics establishes correlations between abundance of natural products and concomitant changes in the protein pool, which allows identification of the relevant biosynthetic gene cluster. We used this approach to elucidate gene clusters for different natural products in Bacillus and Streptomyces, including a novel juglomycin-type antibiotic. Natural product proteomining does not require prior knowledge of the gene cluster or secondary metabolite and therefore represents a general strategy for identification of all types of gene clusters.

  17. Selections of data preprocessing methods and similarity metrics for gene cluster analysis

    Institute of Scientific and Technical Information of China (English)

    YANG Chunmei; WAN Baikun; GAO Xiaofeng

    2006-01-01

    Clustering is one of the major exploratory techniques for gene expression data analysis. Only with suitable similarity metrics and when datasets are properly preprocessed, can results of high quality be obtained in cluster analysis. In this study, gene expression datasets with external evaluation criteria were preprocessed as normalization by line, normalization by column or logarithm transformation by base-2, and were subsequently clustered by hierarchical clustering, k-means clustering and self-organizing maps (SOMs) with Pearson correlation coefficient or Euclidean distance as similarity metric. Finally, the quality of clusters was evaluated by adjusted Rand index. The results illustrate that k-means clustering and SOMs have distinct advantages over hierarchical clustering in gene clustering, and SOMs are a bit better than k-means when randomly initialized. It also shows that hierarchical clustering prefers Pearson correlation coefficient as similarity metric and dataset normalized by line. Meanwhile, k-means clustering and SOMs can produce better clusters with Euclidean distance and logarithm transformed datasets. These results will afford valuable reference to the implementation of gene expression cluster analysis.

  18. Usage of a Responsible Gambling Tool: A Descriptive Analysis and Latent Class Analysis of User Behavior.

    Science.gov (United States)

    Forsström, David; Hesser, Hugo; Carlbring, Per

    2016-09-01

    Gambling is a common pastime around the world. Most gamblers can engage in gambling activities without negative consequences, but some run the risk of developing an excessive gambling pattern. Excessive gambling has severe negative economic and psychological consequences, which makes the development of responsible gambling strategies vital to protecting individuals from these risks. One such strategy is responsible gambling (RG) tools. These tools track an individual's gambling history and supplies personalized feedback and might be one way to decrease excessive gambling behavior. However, research is lacking in this area and little is known about the usage of these tools. The aim of this article is to describe user behavior and to investigate if there are different subclasses of users by conducting a latent class analysis. The user behaviour of 9528 online gamblers who voluntarily used a RG tool was analysed. Number of visits to the site, self-tests made, and advice used were the observed variables included in the latent class analysis. Descriptive statistics show that overall the functions of the tool had a high initial usage and a low repeated usage. Latent class analysis yielded five distinct classes of users: self-testers, multi-function users, advice users, site visitors, and non-users. Multinomial regression revealed that classes were associated with different risk levels of excessive gambling. The self-testers and multi-function users used the tool to a higher extent and were found to have a greater risk of excessive gambling than the other classes.

  19. CLUSTER ANALYSIS OF NATURAL DISASTER LOSSES IN POLISH AGRICULTURE

    Directory of Open Access Journals (Sweden)

    Grzegorz STRUPCZEWSKI

    2015-04-01

    Full Text Available Agricultural production risk is of special nature due to a great number of hazards, relative weakness of production entities on the market and high ambiguity which is greater than in industrial production. Natural disasters occurring very frequently, at simultaneous low percentage of insured farmers, cause damage of such sizes that force the state to organise current financial aid (for instance in the form of preferential natural disaster loans. This aid is usually not sufficient. On the other hand, regional diversity of the risk level does not positively affect the development of insurance. From the perspective of insurance companies and policymakers it becomes highly important to investigate the spatial structure of losses in agriculture caused by natural disasters. The purpose of the research is to classify the 16 Polish voivodeships into clusters in order to show differences between them according to the criterion of level of damage in agricultural farms caused by natural disasters. On the basis of the cluster analysis it was demonstrated that 11 voivodeships form quite a homogeneous group in terms of size of damage in agriculture (the value of damage in cultivations and the acreage of destroyed cultivations are two most important factors determining affiliation to the cluster, however, the profile of loss occurring in other five voivodeships has a very individual course and requires separate handling in the actuarial sense. It was also proved that high value of losses in agriculture in the absolute sense in given voivodeships do not have to mean high vulnerability of agricultural farms from these voivodeships to natural risks.

  20. Principal Component Analysis and Cluster Analysis in Profile of Electrical System

    Science.gov (United States)

    Iswan; Garniwa, I.

    2017-03-01

    This paper propose to present approach for profile of electrical system, presented approach is combination algorithm, namely principal component analysis (PCA) and cluster analysis. Based on relevant data of gross domestic regional product and electric power and energy use. This profile is set up to show the condition of electrical system of the region, that will be used as a policy in the electrical system of spatial development in the future. This paper consider 24 region in South Sulawesi province as profile center points and use principal component analysis (PCA) to asses the regional profile for development. Cluster analysis is used to group these region into few cluster according to the new variable be produced PCA. The general planning of electrical system of South Sulawesi province can provide support for policy making of electrical system development. The future research can be added several variable into existing variable.

  1. MMPI profiles of males accused of severe crimes: a cluster analysis

    NARCIS (Netherlands)

    Spaans, M.; Barendregt, M.; Muller, E.; Beurs, E. de; Nijman, H.L.I.; Rinne, T.

    2009-01-01

    In studies attempting to classify criminal offenders by cluster analysis of Minnesota Multiphasic Personality Inventory-2 (MMPI-2) data, the number of clusters found varied between 10 (the Megargee System) and two (one cluster indicating no psychopathology and one exhibiting serious psychopathology)

  2. Online Cluster Analysis Supporting Real Time Anomaly Detection in Hyperspectral Imagery

    Science.gov (United States)

    2013-06-01

    algorithm is accomplished for this exercise by performing the principal component analysis on the entire image after the removal of the noise and...cluster completely without fully capturing the intended cluster is easily explained by referencing Figure 29. The tree cluster in green is an eccentric

  3. Investigating Faculty Familiarity with Assessment Terminology by Applying Cluster Analysis to Interpret Survey Data

    Science.gov (United States)

    Raker, Jeffrey R.; Holme, Thomas A.

    2014-01-01

    A cluster analysis was conducted with a set of survey data on chemistry faculty familiarity with 13 assessment terms. Cluster groupings suggest a high, middle, and low overall familiarity with the terminology and an independent high and low familiarity with terms related to fundamental statistics. The six resultant clusters were found to be…

  4. Abundance analysis of 5 early-type stars in the young open cluster IC2391

    CERN Document Server

    Stuetz, C; Jehin, E; Ledoux, C; Cabanac, R A; Melo, C; Smoker, J V; Stuetz, Ch.

    2006-01-01

    It is unclear whether chemically peculiar stars of the upper main sequence represent a class completely distinct from normal A-type stars, or whether there exists a continuous transition from the normal to the most peculiar late F- to early B-type stars. A systematic abundance analysis of open cluster early-type stars would help to relate the observed differences of the chemical abundances of the photospheres to other stellar characteristics, without being concerned by possible different original chemical composition. Furthermore, if a continuous transition region from the very peculiar to the so called normal A-F stars exists, it should be possible to detect objects with mild peculiarities. As a first step of a larger project, an abundance analysis of 5 F-A type stars in the young cluster IC2391 was performed using high resolution spectra obtained with the UVES instrument of the ESO VLT. Our targets seem to follow a general abundance pattern: close to solar abundance of the light elements and iron peak eleme...

  5. The Quintuplet Cluster II. Analysis of the WN stars

    CERN Document Server

    Liermann, A; Oskinova, L M; Todt, H; Butler, K; 10.1051/0004-6361/200912612

    2010-01-01

    Based on $K$-band integral-field spectroscopy, we analyze four Wolf-Rayet stars of the nitrogen sequence (WN) found in the inner part of the Quintuplet cluster. All WN stars (WR102d, WR102i, WR102hb, and WR102ea) are of spectral subtype WN9h. One further star, LHO110, is included in the analysis which has been classified as Of/WN? previously but turns out to be most likely a WN9h star as well. The Potsdam Wolf-Rayet (PoWR) models for expanding atmospheres are used to derive the fundamental stellar and wind parameters. The stars turn out to be very luminous, $\\log{(L/L_\\odot)} > 6.0$, with relatively low stellar temperatures, $T_* \\approx$ 25--35\\,kK. Their stellar winds contain a significant fraction of hydrogen, up to $X_\\mathrm{H} \\sim 0.45$ (by mass). We discuss the position of the Galactic center WN stars in the Hertzsprung-Russell diagram and find that they form a distinct group. In this respect, the Quintuplet WN stars are similar to late-type WN stars found in the Arches cluster and elsewhere in the Ga...

  6. Unsupervised analysis of classical biomedical markers: robustness and medical relevance of patient clustering using bioinformatics tools.

    Directory of Open Access Journals (Sweden)

    Michal Markovich Gordon

    Full Text Available MOTIVATION: It has been proposed that clustering clinical markers, such as blood test results, can be used to stratify patients. However, the robustness of clusters formed with this approach to data pre-processing and clustering algorithm choices has not been evaluated, nor has clustering reproducibility. Here, we made use of the NHANES survey to compare clusters generated with various combinations of pre-processing and clustering algorithms, and tested their reproducibility in two separate samples. METHOD: Values of 44 biomarkers and 19 health/life style traits were extracted from the National Health and Nutrition Examination Survey (NHANES. The 1999-2002 survey was used for training, while data from the 2003-2006 survey was tested as a validation set. Twelve combinations of pre-processing and clustering algorithms were applied to the training set. The quality of the resulting clusters was evaluated both by considering their properties and by comparative enrichment analysis. Cluster assignments were projected to the validation set (using an artificial neural network and enrichment in health/life style traits in the resulting clusters was compared to the clusters generated from the original training set. RESULTS: The clusters obtained with different pre-processing and clustering combinations differed both in terms of cluster quality measures and in terms of reproducibility of enrichment with health/life style properties. Z-score normalization, for example, dramatically improved cluster quality and enrichments, as compared to unprocessed data, regardless of the clustering algorithm used. Clustering diabetes patients revealed a group of patients enriched with retinopathies. This could indicate that routine laboratory tests can be used to detect patients suffering from complications of diabetes, although other explanations for this observation should also be considered. CONCLUSIONS: Clustering according to classical clinical biomarkers is a robust

  7. Maximum-entropy clustering algorithm and its global convergence analysis

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Constructing a batch of differentiable entropy functions touniformly approximate an objective function by means of the maximum-entropy principle, a new clustering algorithm, called maximum-entropy clustering algorithm, is proposed based on optimization theory. This algorithm is a soft generalization of the hard C-means algorithm and possesses global convergence. Its relations with other clustering algorithms are discussed.

  8. Population analysis of open clusters: radii and mass segregation

    CERN Document Server

    Schilbach, E; Piskunov, A E; Röser, S; Scholz, R D

    2006-01-01

    Aims: Based on our well-determined sample of open clusters in the all-sky catalogue ASCC-2.5 we derive new linear sizes of some 600 clusters, and investigate the effect of mass segregation of stars in open clusters. Methods: Using statistical methods, we study the distribution of linear sizes as a function of spatial position and cluster age. We also examine statistically the distribution of stars of different masses within clusters as a function of the cluster age. Results: No significant dependence of the cluster size on location in the Galaxy is detected for younger clusters (< 200 Myr), whereas older clusters inside the solar orbit turned out to be, on average, smaller than outside. Also, small old clusters are preferentially found close to the Galactic plane, whereas larger ones more frequently live farther away from the plane and at larger Galactocentric distances. For clusters with (V - M_V) < 10.5, a clear dependence of the apparent radius on age has been detected: the cluster radii decrease by ...

  9. Optimum Metallic-Bond Scheme: A Quantitative Analysis of Mass Spectra of Sodium Clusters

    Institute of Scientific and Technical Information of China (English)

    苏长荣; 李家明

    2001-01-01

    Based on the results of the optimum metallic-bond scheme for sodium clusters, we present a quantitative analysis of the detailed features of the mass spectra of sodium clusters. We find that, in the generation of sodium clusters with various abundances, the quasi-steady processes through adding or losing a sodium atom dominate. The quasi-steady processes through adding or losing a sodium dimer are also important to understand the detailed features of mass spectra for small clusters.

  10. An Analysis of Social Class Classification Based on Linguistic Variables

    Institute of Scientific and Technical Information of China (English)

    QU Xia-sha

    2016-01-01

    Since language is an influential tool in social interaction, the relationship of speech and social factors, such as social class, gender, even age is worth studying. People employ different linguistic variables to imply their social class, status and iden-tity in the social interaction. Thus the linguistic variation involves vocabulary, sounds, grammatical constructions, dialects and so on. As a result, a classification of social class draws people’s attention. Linguistic variable in speech interactions indicate the social relationship between people. This paper attempts to illustrate three main linguistic variables which influence the social class, and further sociolinguistic studies need to be more concerned about.

  11. Maltreatment and Mental Health Outcomes among Ultra-Poor Children in Burkina Faso: A Latent Class Analysis

    Science.gov (United States)

    Ismayilova, Leyla; Gaveras, Eleni; Blum, Austin; Tô-Camier, Alexice; Nanema, Rachel

    2016-01-01

    Objectives Research about the mental health of children in Francophone West Africa is scarce. This paper examines the relationships between adverse childhood experiences, including exposure to violence and exploitation, and mental health outcomes among children living in ultra-poverty in rural Burkina Faso. Methods This paper utilizes baseline data collected from 360 children ages 10–15 and 360 of their mothers recruited from twelve impoverished villages in the Nord Region of Burkina, located near the Sahel Desert and affected by extreme food insecurity. We used a Latent Class Analysis to identify underlying patterns of maltreatment. Further, the relationships between latent classes and mental health outcomes were tested using mixed effected regression models adjusted for clustering within villages. Results About 15% of the children in the study scored above the clinical cut-off for depression, 17.8% for posttraumatic stress disorder (PTSD), and 6.4% for low self-esteem. The study identified five distinct sub-groups (or classes) of children based on their exposure to adverse childhood experiences. Children with the highest exposure to violence at home, at work and in the community (Abused and Exploited class) and children not attending school and working for other households, often away from their families (External Laborer class), demonstrated highest symptoms of depression and trauma. Despite living in adverse conditions and working to assist families, the study also identified a class of children who were not exposed to any violence at home or at work (Healthy and Non-abused class). Children in this class demonstrated significantly higher self-esteem (b = 0.92, SE = 0.45, pfamily-level poverty and violence in the family. PMID:27764155

  12. Differences in the expressed HLA class I alleles effect the differential clustering of HIV type 1-specific T cell responses in infected Chinese and Caucasians

    Institute of Scientific and Technical Information of China (English)

    Yu,XG; Addo,MM; Perkins,BA; Wej,FL; Rathod,A; Geer,SC; Parta,M; Cohen,D; Stone,DR; Russell,CJ; Tanzi,G; Mei,S; Wureel,AG; Frahm,N; Lichterfeld,M; Heath,L; Mullins,JI; Marincola,F; Goulder,PJR; Brander,C; Allen,T; Cao,YZ; Walker,BD; Altfeld,M

    2005-01-01

    China is a region of the world with a rapidly spreading HIV-1 epidemic. Studies providing insights into HIV-1 pathogenesis in infected Chinese are urgently needed to support the design and testing of an effective HIV-1 vaccine for this population. HIV-1-specific T cell responses were characterized in 32 HIV-1-infected individuals of Chinese origin and compared to 34 infected caucasians using 410 overlapping peptides spanning the entire HIV-1 clade B consensus sequence in an IFN-gamma ELISpot assay. All HIV-1 proteins were targeted with similar frequency in both populations and all study subjects recognized at least one overlapping peptide. HIV-1-specific T cell responses clustered in seven different regions of the HIV-1 genome in the Chinese cohort and in nine different regions in the caucasian cohort. The dominant HLA class I alleles expressed in the two populations differed significantly, and differences in epitope clustering pattern were shown to be influenced by differences in class I alleles that restrict immunodominant epitopes. These studies demonstrate that the clustering of HIV-1-specific T cell responses is influenced by the genetic HLA class I background in the study populations. The design and testing of candidate vaccines to fight the rapidly growing HIV-1 epidemic must therefore take the HLA genetics of the population into account as specific regions of the virus can be expected to be differentially targeted in ethnically diverse populations.

  13. Higgs Pair Production: Choosing Benchmarks With Cluster Analysis

    CERN Document Server

    Dall'Osso, Martino; Gottardo, Carlo A; Oliveira, Alexandra; Tosi, Mia; Goertz, Florian

    2015-01-01

    New physics theories often depend on a large number of free parameters. The precise values of those parameters in some cases drastically affect the resulting phenomenology of fundamental physics processes, while in others finite variations can leave it basically invariant at the level of detail experimentally accessible. When designing a strategy for the analysis of experimental data in the search for a signal predicted by a new physics model, it appears advantageous to categorize the parameter space describing the model according to the corresponding kinematical features of the final state. A multi-dimensional test statistic can be used to gauge the degree of similarity in the kinematics of different models; a clustering algorithm using that metric may then allow the division of the space into homogeneous regions, each of which can be successfully represented by a benchmark point. Searches targeting those benchmark points are then guaranteed to be sensitive to a large area of the parameter space. In this doc...

  14. Archetypal TRMM Radar Profiles Identified Through Cluster Analysis

    Science.gov (United States)

    Boccippio, Dennis J.

    2003-01-01

    It is widely held that identifiable 'convective regimes' exist in nature, although precise definitions of these are elusive. Examples include land / Ocean distinctions, break / monsoon beahvior, seasonal differences in the Amazon (SON vs DJF), etc. These regimes are often described by differences in the realized local convective spectra, and measured by various metrics of convective intensity, depth, areal coverage and rainfall amount. Objective regime identification may be valuable in several ways: regimes may serve as natural 'branch points' in satellite retrieval algorithms or data assimilation efforts; one example might be objective identification of regions that 'should' share a similar 2-R relationship. Similarly, objectively defined regimes may provide guidance on optimal siting of ground validation efforts. Objectively defined regimes could also serve as natural (rather than arbitrary geographic) domain 'controls' in studies of convective response to environmental forcing. Quantification of convective vertical structure has traditionally involved parametric study of prescribed quantities thought to be important to convective dynamics: maximum radar reflectivity, cloud top height, 30-35 dBZ echo top height, rain rate, etc. Individually, these parameters are somewhat deficient as their interpretation is often nonunique (the same metric value may signify different physics in different storm realizations). Individual metrics also fail to capture the coherence and interrelationships between vertical levels available in full 3-D radar datasets. An alternative approach is discovery of natural partitions of vertical structure in a globally representative dataset, or 'archetypal' reflectivity profiles. In this study, this is accomplished through cluster analysis of a very large sample (0[107) of TRMM-PR reflectivity columns. Once achieved, the rainconditional and unconditional 'mix' of archetypal profile types in a given location and/or season provides a description

  15. AUTOMATED TEXT CLUSTERING OF NEWSPAPER AND SCIENTIFIC TEXTS IN BRAZILIAN PORTUGUESE: ANALYSIS AND COMPARISON OF METHODS

    Directory of Open Access Journals (Sweden)

    Alexandre Ribeiro Afonso

    2014-10-01

    Full Text Available This article reports the findings of an empirical study about Automated Text Clustering applied to scientific articles and newspaper texts in Brazilian Portuguese, the objective was to find the most effective computational method able to cluster the input of texts in their original groups. The study covered four experiments, each experiment had four procedures: 1. Corpus Selections (a set of texts is selected for clustering, 2. Word Class Selections (Nouns, Verbs and Adjectives are chosen from each text by using specific algorithms, 3. Filtering Algorithms (a set of terms is selected from the results of the preview stage, a semantic weight is also inserted for each term and an index is generated for each text, 4. Clustering Algorithms (the clustering algorithms Simple K-Means, sIB and EM are applied to the indexes. After those procedures, clustering correctness and clustering time statistical results were collected. The sIB clustering algorithm is the best choice for both scientific and newspaper corpus, under the condition that the sIB clustering algorithm asks for the number of clusters as input before running (for the newspaper corpus, 68.9% correctness in 1 minute and for the scientific corpus, 77.8% correctness in 1 minute. The EM clustering algorithm additionally guesses the number of clusters without user intervention, but its best case is less than 53% correctness. Considering the experiments carried out, the results of human text classification and automated clustering are distant; it was also observed that the clustering correctness results vary according to the number of input texts and their topics.

  16. Multi-class clustering of cancer subtypes through SVM based ensemble of pareto-optimal solutions for gene marker identification.

    Directory of Open Access Journals (Sweden)

    Anirban Mukhopadhyay

    Full Text Available With the advancement of microarray technology, it is now possible to study the expression profiles of thousands of genes across different experimental conditions or tissue samples simultaneously. Microarray cancer datasets, organized as samples versus genes fashion, are being used for classification of tissue samples into benign and malignant or their subtypes. They are also useful for identifying potential gene markers for each cancer subtype, which helps in successful diagnosis of particular cancer types. In this article, we have presented an unsupervised cancer classification technique based on multiobjective genetic clustering of the tissue samples. In this regard, a real-coded encoding of the cluster centers is used and cluster compactness and separation are simultaneously optimized. The resultant set of near-Pareto-optimal solutions contains a number of non-dominated solutions. A novel approach to combine the clustering information possessed by the non-dominated solutions through Support Vector Machine (SVM classifier has been proposed. Final clustering is obtained by consensus among the clusterings yielded by different kernel functions. The performance of the proposed multiobjective clustering method has been compared with that of several other microarray clustering algorithms for three publicly available benchmark cancer datasets. Moreover, statistical significance tests have been conducted to establish the statistical superiority of the proposed clustering method. Furthermore, relevant gene markers have been identified using the clustering result produced by the proposed clustering method and demonstrated visually. Biological relationships among the gene markers are also studied based on gene ontology. The results obtained are found to be promising and can possibly have important impact in the area of unsupervised cancer classification as well as gene marker identification for multiple cancer subtypes.

  17. Application of Multi-SOM clustering approach to macrophage gene expression analysis.

    Science.gov (United States)

    Ghouila, Amel; Yahia, Sadok Ben; Malouche, Dhafer; Jmel, Haifa; Laouini, Dhafer; Guerfali, Fatma Z; Abdelhak, Sonia

    2009-05-01

    The production of increasingly reliable and accessible gene expression data has stimulated the development of computational tools to interpret such data and to organize them efficiently. The clustering techniques are largely recognized as useful exploratory tools for gene expression data analysis. Genes that show similar expression patterns over a wide range of experimental conditions can be clustered together. This relies on the hypothesis that genes that belong to the same cluster are coregulated and involved in related functions. Nevertheless, clustering algorithms still show limits, particularly for the estimation of the number of clusters and the interpretation of hierarchical dendrogram, which may significantly influence the outputs of the analysis process. We propose here a multi level SOM based clustering algorithm named Multi-SOM. Through the use of clustering validity indices, Multi-SOM overcomes the problem of the estimation of clusters number. To test the validity of the proposed clustering algorithm, we first tested it on supervised training data sets. Results were evaluated by computing the number of misclassified samples. We have then used Multi-SOM for the analysis of macrophage gene expression data generated in vitro from the same individual blood infected with 5 different pathogens. This analysis led to the identification of sets of tightly coregulated genes across different pathogens. Gene Ontology tools were then used to estimate the biological significance of the clustering, which showed that the obtained clusters are coherent and biologically significant.

  18. A New Class of Macrocyclic Chiral Selectors for Stereochemical Analysis

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1999-03-11

    This report summarizes the work accomplished in the authors laboratories over the previous three years. During the funding period they have had 23 monographs published or in press, 1 book chapter, 1 patent issued and have delivered 28 invited seminars or plenary lectures on DOE sponsored research. This report covers the work that has been published (or accepted). The most notable aspect of this work involves the successful development and understanding of a new class of fused macrocyclic compounds as pseudophases and selectors in high performance separations (including high performance liquid chromatography, HPLC; capillary electrophoresis, CE; and thin layer chromatography, TLC). They have considerably extended their chiral biomarker work from amber to crude oil and coal. In the process of doing this we've developed several novel separation approaches. They finished their work on the new GSC-PLOT column which is now being used by researchers world-wide for the analysis of gases, light hydrocarbons and halocarbons. Finally, we completed basic studies on immobilizing a cyclodextrin/oligosiloxane hybrid on the wall of fused silica, as well as a basic study on the separation behavior of buckminster fullerene and higher fullerenes.

  19. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

    Directory of Open Access Journals (Sweden)

    Marco Borri

    Full Text Available To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment.The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4. Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters.The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4, determined with cluster validation, produced the best separation between reducing and non-reducing clusters.The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.

  20. Classifying life course trajectories : a comparison of latent class and sequence analysis

    NARCIS (Netherlands)

    Barban, Nicola; Billari, Francesco C.

    2012-01-01

    . We compare two techniques that are widely used in the analysis of life course trajectories: latent class analysis and sequence analysis. In particular, we focus on the use of these techniques as devices to obtain classes of individual life course trajectories. We first compare the consistency of t

  1. A Latent Class Analysis of Heterosexual Young Men's Masculinities.

    Science.gov (United States)

    Casey, Erin A; Masters, N Tatiana; Beadnell, Blair; Wells, Elizabeth A; Morrison, Diane M; Hoppe, Marilyn J

    2016-07-01

    Parallel bodies of research have described the diverse and complex ways that men understand and construct their masculine identities (often termed "masculinities") and, separately, how adherence to traditional notions of masculinity places men at risk for negative sexual and health outcomes. The goal of this analysis was to bring together these two streams of inquiry. Using data from a national, online sample of 555 heterosexually active young men, we employed latent class analysis (LCA) to detect patterns of masculine identities based on men's endorsement of behavioral and attitudinal indicators of "dominant" masculinity, including sexual attitudes and behaviors. LCA identified four conceptually distinct masculine identity profiles. Two groups, termed the Normative and Normative/Male Activities groups, respectively, constituted 88 % of the sample and were characterized by low levels of adherence to attitudes, sexual scripts, and behaviors consistent with "dominant" masculinity, but differed in their levels of engagement in male-oriented activities (e.g., sports teams). Only eight percent of the sample comprised a masculinity profile consistent with "traditional" ideas about masculinity; this group was labeled Misogynistic because of high levels of sexual assault and violence toward female partners. The remaining four percent constituted a Sex-Focused group, characterized by high numbers of sexual partners, but relatively low endorsement of other indicators of traditional masculinity. Follow-up analyses showed a small number of differences across groups on sexual and substance use health indicators. Findings have implications for sexual and behavioral health interventions and suggest that very few young men embody or endorse rigidly traditional forms of masculinity.

  2. Hair decontamination procedure prior to multi-class pesticide analysis.

    Science.gov (United States)

    Duca, Radu-Corneliu; Hardy, Emilie; Salquèbre, Guillaume; Appenzeller, Brice M R

    2014-06-01

    Although increasing interest is being observed in hair analysis for the biomonitoring of human exposure to pesticides, some limitations still have to be addressed for optimum use of this matrix in that specific context. One main possible issue concerns the need to differentiate chemicals biologically incorporated into hair from those externally deposited on hair surface from contaminated air or dust. The present study focuses on the development of a washing procedure for the decontamination of hair before analysis of pesticides from different chemical classes. For this purpose, three different procedures of artificial contamination (with silica, cellulose, and aqueous solution) were used to simulate pesticides deposition on hair surface. Several washing solvents (four organic: acetone, dichloromethane, methanol, acetonitrile; and four aqueous: water, phosphate buffer, shampoo, sodium dodecylsulfate) were evaluated for their capacity to remove artificially deposited pesticides from hair surface. The most effective washing solvents were sodium dodecylsulfate and methanol for aqueous and organic solvents, respectively. Moreover, after a first washing with sodium dodecylsulfate or methanol, the majority of externally deposited pesticides was removed and a steady-state was reached since significantly lower amounts were removed by additional second and third washings. Finally, the effectiveness of a decontamination procedure comprising washing with sodium dodecylsulfate and methanol was successively demonstrated. In parallel, it was determined that the final procedure did not affect the chemicals biologically incorporated, as hair strands naturally containing pesticides were used. Such a procedure appears to remove in one-shot the fraction of chemicals located on hair surface and does not require repeated washing steps.

  3. Cluster analysis application in research on pork quality determinants

    Science.gov (United States)

    Przybylski, W.; Wasiewicz, P.; Zieliński, P.; Gromadzka-Ostrowska, J.; Olczak, E.; Jaworska, D.; Niemyjski, S.; Santé-Lhoutellier, V.

    2010-09-01

    In this paper data mining methods were applied to investigate features determining high quality pork meat. The aim of the study was analysis of conditionality of the pork meat quality defined in coherence with HDL and LDL cholesterol concentration, plasma leptin, triglycerides, plasma glucose and serum. The research was carried out on 54 pigs. originated from crossbreeding of Naima sows with P76-PenArLan boars hybrids line. Meat quality parameters were evaluated in samples derived from the Longissimus (LD) muscle taken behind the last rib on the basis: the pH value, meat colour, drip loss, the RTN, intramuscular fat and glycolytic potential. The results of this study were elaborated by using R environment and show that cluster and regression analysis can be a useful tool for in-depth analysis of the determinants of the quality of pig meat in homogeneous populations of pigs. However, the question of determinants of the level of glycogen and fat in meat requires further research.

  4. Properties of $\\gamma$-Ray Burst Classes

    CERN Document Server

    Hakkila, J; Roiger, R J; Mallozzi, R S; Pendleton, G N; Meegan, C A; Hakkila, Jon; Haglin, David J.; Roiger, Richard J.; Mallozzi, Robert S.; Pendleton, Geoffrey N.; Meegan, Charles A.

    2000-01-01

    The three gamma-ray burst (GRB) classes identified by statistical clustering analysis (Mukherjee et al. 1998) are examined using the pattern recognition algorithm C4.5 (Quinlan 1986). Although the statistical existence of Class 3 (intermediate duration, intermediate fluence, soft) is supported, the properties of this class do not need to arise from a distinct source population. Class 3 properties can easily be produced from Class 1 (long, high fluence, intermediate hardness) by a combination of measurement error, hardness/intensity correlation, and a newly-identified BATSE bias (the fluence duration bias). Class 2 (short, low fluence, hard) does not appear to be related to Class 1.

  5. A combined multidimensional scaling and hierarchical clustering view for the exploratory analysis of multidimensional data

    Science.gov (United States)

    Craig, Paul; Roa-Seïler, Néna

    2013-01-01

    This paper describes a novel information visualization technique that combines multidimensional scaling and hierarchical clustering to support the exploratory analysis of multidimensional data. The technique displays the results of multidimensional scaling using a scatter plot where the proximity of any two items' representations is approximate to their similarity according to a Euclidean distance metric. The results of hierarchical clustering are overlaid onto this view by drawing smoothed outlines around each nested cluster. The difference in similarity between successive cluster combinations is used to colour code clusters and make stronger natural clusters more prominent in the display. When a cluster or group of items is selected, multidimensional scaling and hierarchical clustering are re-applied to a filtered subset of the data, and animation is used to smooth the transition between successive filtered views. As a case study we demonstrate the technique being used to analyse survey data relating to the appropriateness of different phrases to different emotionally charged situations.

  6. Classification of persons attempting suicide. A review of cluster analysis research

    Directory of Open Access Journals (Sweden)

    Wołodźko, Tymoteusz

    2014-08-01

    Full Text Available Aim: Review of conclusions from cluster analysis research on suicide risk factors published after the year 1993. Methods: Search and analysis of cluster analysis research papers on suicidal behaviour. Results: Following groups where distinguished: (1 persons with comorbid mental disorders or with severe symptoms, (2 persons without mental disorders or with mild symptoms, (3 persons with personality disorders and externalizing psychopathology, (4 socially withdrawn persons with a tendency to avoid social contacts, (5 depressive persons Conclusions: Analysis of studies on characteristics of suicide attempters, with the application of cluster analysis, has indicated the possibility of differentiation of several groups of persons with significantly increased risk of suicide attempt. The reviewed cluster analysis research had multiple methodological limitations. Studies employing cluster analysis on large, representative and homogeneous population are needed.

  7. Profiles of exercise motivation, physical activity, exercise habit, and academic performance in Malaysian adolescents: A cluster analysis

    OpenAIRE

    Hairul Anuar Hashim; Freddy Golok; Rosmatunisah Ali

    2011-01-01

    Objectives: This study examined Malaysian adolescents’ profiles of exercise motivation, exercise habit strength, academic performance, and levels of physical activity (PA) using cluster analysis.Methods: The sample (n = 300) consisted of 65.6% males and 34.4% females with a mean age of 13.40 ± 0.49. Statistical analysis was performed using cluster analysis.Results: Cluster analysis revealed three distinct cluster groups. Cluster 1 is characterized by a moderate level of PA, relatively high in...

  8. AVES: A Computer Cluster System approach for INTEGRAL Scientific Analysis

    Science.gov (United States)

    Federici, M.; Martino, B. L.; Natalucci, L.; Umbertini, P.

    The AVES computing system, based on an "Cluster" architecture is a fully integrated, low cost computing facility dedicated to the archiving and analysis of the INTEGRAL data. AVES is a modular system that uses the software resource manager (SLURM) and allows almost unlimited expandibility (65,536 nodes and hundreds of thousands of processors); actually is composed by 30 Personal Computers with Quad-Cores CPU able to reach the computing power of 300 Giga Flops (300x10{9} Floating point Operations Per Second), with 120 GB of RAM and 7.5 Tera Bytes (TB) of storage memory in UFS configuration plus 6 TB for users area. AVES was designed and built to solve growing problems raised from the analysis of the large data amount accumulated by the INTEGRAL mission (actually about 9 TB) and due to increase every year. The used analysis software is the OSA package, distributed by the ISDC in Geneva. This is a very complex package consisting of dozens of programs that can not be converted to parallel computing. To overcome this limitation we developed a series of programs to distribute the workload analysis on the various nodes making AVES automatically divide the analysis in N jobs sent to N cores. This solution thus produces a result similar to that obtained by the parallel computing configuration. In support of this we have developed tools that allow a flexible use of the scientific software and quality control of on-line data storing. The AVES software package is constituted by about 50 specific programs. Thus the whole computing time, compared to that provided by a Personal Computer with single processor, has been enhanced up to a factor 70.

  9. Investigating the effects of climate variations on bacillary dysentery incidence in northeast China using ridge regression and hierarchical cluster analysis

    Directory of Open Access Journals (Sweden)

    Guo Junqiao

    2008-09-01

    Full Text Available Abstract Background The effects of climate variations on bacillary dysentery incidence have gained more recent concern. However, the multi-collinearity among meteorological factors affects the accuracy of correlation with bacillary dysentery incidence. Methods As a remedy, a modified method to combine ridge regression and hierarchical cluster analysis was proposed for investigating the effects of climate variations on bacillary dysentery incidence in northeast China. Results All weather indicators, temperatures, precipitation, evaporation and relative humidity have shown positive correlation with the monthly incidence of bacillary dysentery, while air pressure had a negative correlation with the incidence. Ridge regression and hierarchical cluster analysis showed that during 1987–1996, relative humidity, temperatures and air pressure affected the transmission of the bacillary dysentery. During this period, all meteorological factors were divided into three categories. Relative humidity and precipitation belonged to one class, temperature indexes and evaporation belonged to another class, and air pressure was the third class. Conclusion Meteorological factors have affected the transmission of bacillary dysentery in northeast China. Bacillary dysentery prevention and control would benefit from by giving more consideration to local climate variations.

  10. Analysis of gene expression data from non-small cell lung carcinoma cell lines reveals distinct sub-classes from those identified at the phenotype level.

    Directory of Open Access Journals (Sweden)

    Andrew R Dalby

    Full Text Available Microarray data from cell lines of Non-Small Cell Lung Carcinoma (NSCLC can be used to look for differences in gene expression between the cell lines derived from different tumour samples, and to investigate if these differences can be used to cluster the cell lines into distinct groups. Dividing the cell lines into classes can help to improve diagnosis and the development of screens for new drug candidates. The micro-array data is first subjected to quality control analysis and then subsequently normalised using three alternate methods to reduce the chances of differences being artefacts resulting from the normalisation process. The final clustering into sub-classes was carried out in a conservative manner such that sub-classes were consistent across all three normalisation methods. If there is structure in the cell line population it was expected that this would agree with histological classifications, but this was not found to be the case. To check the biological consistency of the sub-classes the set of most strongly differentially expressed genes was be identified for each pair of clusters to check if the genes that most strongly define sub-classes have biological functions consistent with NSCLC.

  11. Significance analysis and statistical mechanics: an application to clustering.

    Science.gov (United States)

    Łuksza, Marta; Lässig, Michael; Berg, Johannes

    2010-11-26

    This Letter addresses the statistical significance of structures in random data: given a set of vectors and a measure of mutual similarity, how likely is it that a subset of these vectors forms a cluster with enhanced similarity among its elements? The computation of this cluster p value for randomly distributed vectors is mapped onto a well-defined problem of statistical mechanics. We solve this problem analytically, establishing a connection between the physics of quenched disorder and multiple-testing statistics in clustering and related problems. In an application to gene expression data, we find a remarkable link between the statistical significance of a cluster and the functional relationships between its genes.

  12. Punk and Middle-Class Values: A Content Analysis.

    Science.gov (United States)

    Lamy, Philip; Levin, Jack

    1985-01-01

    Compares periodical articles representing the "Punk" movement with articles from the "Reader's Digest" and the 1960s hippie movement. Concludes that the punk movement is more expressive and less instrumental than its middle-class counterpart. (KH)

  13. Study on Cluster Analysis Used with Laser-Induced Breakdown Spectroscopy

    Science.gov (United States)

    He, Li'ao; Wang, Qianqian; Zhao, Yu; Liu, Li; Peng, Zhong

    2016-06-01

    Supervised learning methods (eg. PLS-DA, SVM, etc.) have been widely used with laser-induced breakdown spectroscopy (LIBS) to classify materials; however, it may induce a low correct classification rate if a test sample type is not included in the training dataset. Unsupervised cluster analysis methods (hierarchical clustering analysis, K-means clustering analysis, and iterative self-organizing data analysis technique) are investigated in plastics classification based on the line intensities of LIBS emission in this paper. The results of hierarchical clustering analysis using four different similarity measuring methods (single linkage, complete linkage, unweighted pair-group average, and weighted pair-group average) are compared. In K-means clustering analysis, four kinds of choosing initial centers methods are applied in our case and their results are compared. The classification results of hierarchical clustering analysis, K-means clustering analysis, and ISODATA are analyzed. The experiment results demonstrated cluster analysis methods can be applied to plastics discrimination with LIBS. supported by Beijing Natural Science Foundation of China (No. 4132063)

  14. ASTM clustering for improving coal analysis by near-infrared spectroscopy.

    Science.gov (United States)

    Andrés, J M; Bona, M T

    2006-11-15

    Multivariate analysis techniques have been applied to near-infrared (NIR) spectra coals to investigate the relationship between nine coal properties (moisture (%), ash (%), volatile matter (%), fixed carbon (%), heating value (kcal/kg), carbon (%), hydrogen (%), nitrogen (%) and sulphur (%)) and the corresponding predictor variables. In this work, a whole set of coal samples was grouped into six more homogeneous clusters following the ASTM reference method for classification prior to the application of calibration methods to each coal set. The results obtained showed a considerable improvement of the error determination compared with the calibration for the whole sample set. For some groups, the established calibrations approached the quality required by the ASTM/ISO norms for laboratory analysis. To predict property values for a new coal sample it is necessary the assignation of that sample to its respective group. Thus, the discrimination and classification ability of coal samples by Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS) in the NIR range was also studied by applying Soft Independent Modelling of Class Analogy (SIMCA) and Linear Discriminant Analysis (LDA) techniques. Modelling of the groups by SIMCA led to overlapping models that cannot discriminate for unique classification. On the other hand, the application of Linear Discriminant Analysis improved the classification of the samples but not enough to be satisfactory for every group considered.

  15. The Norma Cluster (ACO 3627): I. A Dynamical Analysis of the Most Massive Cluster in the Great Attractor

    CERN Document Server

    Woudt, P A; Lucey, J; Fairall, A P; Moore, S A W

    2007-01-01

    A detailed dynamical analysis of the nearby rich Norma cluster (ACO 3627) is presented. From radial velocities of 296 cluster members, we find a mean velocity of 4871 +/- 54 km/s and a velocity dispersion of 925 km/s. The mean velocity of the E/S0 population (4979 +/- 85 km/s) is offset with respect to that of the S/Irr population (4812 +/- 70 km/s) by `Delta' v = 164 km/s in the cluster rest frame. This offset increases towards the core of the cluster. The E/S0 population is free of any detectable substructure and appears relaxed. Its shape is clearly elongated with a position angle that is aligned along the dominant large-scale structures in this region, the so-called Norma wall. The central cD galaxy has a very large peculiar velocity of 561 km/s which is most probably related to an ongoing merger at the core of the cluster. The spiral/irregular galaxies reveal a large amount of substructure; two dynamically distinct subgroups within the overall spiral-population have been identified, located along the Nor...

  16. Multidimensional cluster stability analysis from a Brazilian Bradyrhizobium sp. RFLP/PCR data set

    Science.gov (United States)

    Milagre, S. T.; Maciel, C. D.; Shinoda, A. A.; Hungria, M.; Almeida, J. R. B.

    2009-05-01

    The taxonomy of the N2-fixing bacteria belonging to the genus Bradyrhizobium is still poorly refined, mainly due to conflicting results obtained by the analysis of the phenotypic and genotypic properties. This paper presents an application of a method aiming at the identification of possible new clusters within a Brazilian collection of 119 Bradyrhizobium strains showing phenotypic characteristics of B. japonicum and B. elkanii. The stability was studied as a function of the number of restriction enzymes used in the RFLP-PCR analysis of three ribosomal regions with three restriction enzymes per region. The method proposed here uses clustering algorithms with distances calculated by average-linkage clustering. Introducing perturbations using sub-sampling techniques makes the stability analysis. The method showed efficacy in the grouping of the species B. japonicum and B. elkanii. Furthermore, two new clusters were clearly defined, indicating possible new species, and sub-clusters within each detected cluster.

  17. Cluster Analysis as a Method of Recovering Types of Intraindividual Growth Trajectories: A Monte Carlo Study.

    Science.gov (United States)

    Dumenci, Levent; Windle, Michael

    2001-01-01

    Used Monte Carlo methods to evaluate the adequacy of cluster analysis to recover group membership based on simulated latent growth curve (LCG) models. Cluster analysis failed to recover growth subtypes adequately when the difference between growth curves was shape only. Discusses circumstances under which it was more successful. (SLD)

  18. Cluster Analysis of the Luria-Nebraska Neuropsychological Battery with Learning Disabled Adults.

    Science.gov (United States)

    McCue, Michael; And Others

    The study reports a cluster analysis of Luria-Nebraska Neuropsychological Battery sources of 25 learning disabled adults. The cluster analysis suggested the presence of three subgroups within this sample, one having high elevations on the Rhythm, Writing, Reading, and Arithmetic Rhythm scales, the second having an extremely high evelation on the…

  19. Tracking Undergraduate Student Achievement in a First-Year Physiology Course Using a Cluster Analysis Approach

    Science.gov (United States)

    Brown, S. J.; White, S.; Power, N.

    2015-01-01

    A cluster analysis data classification technique was used on assessment scores from 157 undergraduate nursing students who passed 2 successive compulsory courses in human anatomy and physiology. Student scores in five summative assessment tasks, taken in each of the courses, were used as inputs for a cluster analysis procedure. We aimed to group…

  20. Sequential Combination Methods forData Clustering Analysis

    Institute of Scientific and Technical Information of China (English)

    钱 涛; Ching Y.Suen; 唐远炎

    2002-01-01

    This paper proposes the use of more than one clustering method to improve clustering performance. Clustering is an optimization procedure based on a specific clustering criterion. Clustering combination can be regardedasatechnique that constructs and processes multiple clusteringcriteria.Sincetheglobalandlocalclusteringcriteriaarecomplementary rather than competitive, combining these two types of clustering criteria may enhance theclustering performance. In our past work, a multi-objective programming based simultaneous clustering combination algorithmhasbeenproposed, which incorporates multiple criteria into an objective function by a weighting method, and solves this problem with constrained nonlinear optimization programming. But this algorithm has high computationalcomplexity.Hereasequential combination approach is investigated, which first uses the global criterion based clustering to produce an initial result, then uses the local criterion based information to improve the initial result with aprobabilisticrelaxation algorithm or linear additive model.Compared with the simultaneous combination method, sequential combination haslow computational complexity. Results on some simulated data and standard test data arereported.Itappearsthatclustering performance improvement can be achieved at low cost through sequential combination.

  1. Multilevel Analysis Methods for Partially Nested Cluster Randomized Trials

    Science.gov (United States)

    Sanders, Elizabeth A.

    2011-01-01

    This paper explores multilevel modeling approaches for 2-group randomized experiments in which a treatment condition involving clusters of individuals is compared to a control condition involving only ungrouped individuals, otherwise known as partially nested cluster randomized designs (PNCRTs). Strategies for comparing groups from a PNCRT in the…

  2. Mesoscopic analysis of networks: applications to exploratory analysis and data clustering

    CERN Document Server

    Granell, Clara; Arenas, Alex

    2011-01-01

    We investigate the adaptation and performance of modularity-based algorithms, designed in the scope of complex networks, to analyze the mesoscopic structure of correlation matrices. Using a multi-resolution analysis we are able to describe the structure of the data in terms of clusters at different topological levels. We demonstrate the applicability of our findings in two different scenarios: to analyze the neural connectivity of the nematode {\\em Caenorhabditis elegans}, and to automatically classify a typical benchmark of unsupervised clustering, the Iris data set, with considerable success.

  3. Putting Bourdieu to work for class analysis: reflections on some recent contributions.

    Science.gov (United States)

    Flemmen, Magne

    2013-06-01

    Recent developments in class analysis, particularly associated with so-called 'cultural class analysis'; have seen the works of Pierre Bourdieu take centre stage. Apart from the general influence of 'habitus' and 'cultural capital', some scholars have tried to reconstruct class analysis with concepts drawn from Bourdieu. This involves a theoretical reorientation, away from the conventional concerns of class analysis with property and market relations, towards an emphasis on the multiple forms of capital. Despite the significant potential of these developments, such a reorientation dismisses or neglects the relations of power and domination founded in the economic institutions of capitalism as a crucial element of what class is. Through a critique of some recent attempts by British authors to develop a 'Bourdieusian' class theory, the paper reasserts the centrality of the relations of power and domination that used to be the domain of class analysis. The paper suggests some elements central to a reworked class analysis that benefits from the power of Bourdieu's ideas while retaining a perspective on the fundamentals of class relations in capitalism.

  4. The association between school exclusion, delinquency and subtypes of cyber- and F2F-victimizations: identifying and predicting risk profiles and subtypes using latent class analysis.

    Science.gov (United States)

    Barboza, Gia Elise

    2015-01-01

    This purpose of this paper is to identify risk profiles of youth who are victimized by on- and offline harassment and to explore the consequences of victimization on school outcomes. Latent class analysis is used to explore the overlap and co-occurrence of different clusters of victims and to examine the relationship between class membership and school exclusion and delinquency. Participants were a random sample of youth between the ages of 12 and 18 selected for inclusion to participate in the 2011 National Crime Victimization Survey: School Supplement. The latent class analysis resulted in four categories of victims: approximately 3.1% of students were highly victimized by both bullying and cyberbullying behaviors; 11.6% of youth were classified as being victims of relational bullying, verbal bullying and cyberbullying; a third class of students were victims of relational bullying, verbal bullying and physical bullying but were not cyberbullied (8%); the fourth and final class, characteristic of the majority of students (77.3%), was comprised of non-victims. The inclusion of covariates to the latent class model indicated that gender, grade and race were significant predictors of at least one of the four victim classes. School delinquency measures were included as distal outcomes to test for both overall and pairwise associations between classes. With one exception, the results were indicative of a significant relationship between school delinquency and the victim subtypes. Implications for these findings are discussed.

  5. MASSCLEAN - MASSive CLuster Evolution and ANalysis Package - Description and Tests

    CERN Document Server

    Popescu, Bogdan

    2008-01-01

    We present MASSCLEAN, a new, sophisticated and robust stellar cluster image and photometry simulation package. This package is able to create color-magnitude diagrams and standard FITS images in any of the traditional optical and near-infrared bands based on cluster characteristics input by the user, including but not limited to distance, age, mass, radius and extinction. At the limit of very distant, unresolved clusters, we have checked the integrated colors created in MASSCLEAN against those from other single stellar population models with consistent results. We have also tested models which provide a reasonable estimate of the field star contamination in images and color-magnitude diagrams. We demonstrate the package by simulating images and color-magnitude diagrams of well known massive Milky Way clusters and compare their appearance to real data. Because the algorithm populates the cluster with a discrete number of tenable stars, it can be used as part of a Monte Carlo Method to derive the probabilistic ...

  6. Boundaries, links and clusters: a new paradigm in spatial analysis?

    Science.gov (United States)

    Jacquez, Geoff M; Kaufmann, Andy; Goovaerts, Pierre

    2008-12-01

    This paper develops and applies new techniques for the simultaneous detection of boundaries and clusters within a probabilistic framework. The new statistic "little b" (written b(ij)) evaluates boundaries between adjacent areas with different values, as well as links between adjacent areas with similar values. Clusters of high values (hotspots) and low values (coldspots) are then constructed by joining areas abutting locations that are significantly high (e.g., an unusually high disease rate) and that are connected through a "link" such that the values in the adjoining areas are not significantly different. Two techniques are proposed and evaluated for accomplishing cluster construction: "big B" and the "ladder" approach. We compare the statistical power and empirical Type I and Type II error of these approaches to those of wombling and the local Moran test. Significance may be evaluated using distribution theory based on the product of two continuous (e.g., non-discrete) variables. We also provide a "distribution free" algorithm based on resampling of the observed values. The methods are applied to simulated data for which the locations of boundaries and clusters is known, and compared and contrasted with clusters found using the local Moran statistic and with polygon Womble boundaries. The little b approach to boundary detection is comparable to polygon wombling in terms of Type I error, Type II error and empirical statistical power. For cluster detection, both the big B and ladder approaches have lower Type I and Type II error and are more powerful than the local Moran statistic. The new methods are not constrained to find clusters of a pre-specified shape, such as circles, ellipses and donuts, and yield a more accurate description of geographic variation than alternative cluster tests that presuppose a specific cluster shape. We recommend these techniques over existing cluster and boundary detection methods that do not provide such a comprehensive description

  7. Fingerprint analysis of Hibiscus mutabilis L. leaves based on ultra performance liquid chromatography with photodiode array detector combined with similarity analysis and hierarchical clustering analysis methods

    Directory of Open Access Journals (Sweden)

    Xianrui Liang

    2013-01-01

    Full Text Available Background: A method for chemical fingerprint analysis of Hibiscus mutabilis L. leaves was developed based on ultra performance liquid chromatography with photodiode array detector (UPLC-PAD combined with similarity analysis (SA and hierarchical clustering analysis (HCA. Materials and Methods: 10 batches of Hibiscus mutabilis L. leaves samples were collected from different regions of China. UPLC-PAD was employed to collect chemical fingerprints of Hibiscus mutabilis L. leaves. Results: The relative standard deviations (RSDs of the relative retention times (RRT and relative peak areas (RPA of 10 characteristic peaks (one of them was identified as rutin in precision, repeatability and stability test were less than 3%, and the method of fingerprint analysis was validated to be suitable for the Hibiscus mutabilis L. leaves. Conclusions: The chromatographic fingerprints showed abundant diversity of chemical constituents qualitatively in the 10 batches of Hibiscus mutabilis L. leaves samples from different locations by similarity analysis on basis of calculating the correlation coefficients between each two fingerprints. Moreover, the HCA method clustered the samples into four classes, and the HCA dendrogram showed the close or distant relations among the 10 samples, which was consistent to the SA result to some extent.

  8. RELIABILITY ANALYSIS OF RING, AGENT AND CLUSTER BASED DISTRIBUTED SYSTEMS

    Directory of Open Access Journals (Sweden)

    R.SEETHALAKSHMI

    2011-08-01

    Full Text Available The introduction of pervasive devices and mobile devices has led to immense growth of real time distributed processing. In such context reliability of the computing environment is very important. Reliability is the probability that the devices, links, processes, programs and files work efficiently for the specified period of time and in the specified condition. Distributed systems are available as conventional ring networks, clusters and agent based systems. Reliability of such systems is focused. These networks are heterogeneous and scalable in nature. There are several factors, which are to be considered for reliability estimation. These include the application related factors like algorithms, data-set sizes, memory usage pattern, input-output, communication patterns, task granularity and load-balancing. It also includes the hardware related factors like processor architecture, memory hierarchy, input-output configuration and network. The software related factors concerning reliability are operating systems, compiler, communication protocols, libraries and preprocessor performance. In estimating the reliability of a system, the performance estimation is an important aspect. Reliability analysis is approached using probability.

  9. Fully Automated Operational Modal Analysis using multi-stage clustering

    Science.gov (United States)

    Neu, Eugen; Janser, Frank; Khatibi, Akbar A.; Orifici, Adrian C.

    2017-02-01

    The interest for robust automatic modal parameter extraction techniques has increased significantly over the last years, together with the rising demand for continuous health monitoring of critical infrastructure like bridges, buildings and wind turbine blades. In this study a novel, multi-stage clustering approach for Automated Operational Modal Analysis (AOMA) is introduced. In contrast to existing approaches, the procedure works without any user-provided thresholds, is applicable within large system order ranges, can be used with very small sensor numbers and does not place any limitations on the damping ratio or the complexity of the system under investigation. The approach works with any parametric system identification algorithm that uses the system order n as sole parameter. Here a data-driven Stochastic Subspace Identification (SSI) method is used. Measurements from a wind tunnel investigation with a composite cantilever equipped with Fiber Bragg Grating Sensors (FBGSs) and piezoelectric sensors are used to assess the performance of the algorithm with a highly damped structure and low signal to noise ratio conditions. The proposed method was able to identify all physical system modes in the investigated frequency range from over 1000 individual datasets using FBGSs under challenging signal to noise ratio conditions and under better signal conditions but from only two sensors.

  10. A NEEDS ANALYSIS STUDY FOR PREPARATORY CLASS ELT STUDENTS

    OpenAIRE

    Ömer Gökhan Ulum

    2016-01-01

    With this study, the needs of preparatory class university students at an English Language Teaching Department to have a general understanding of their academic needs for the development of their speaking skill were assessed. Based upon a descriptive research design, an adapted questionnaire with open-ended questions was administered to the 2nd, 3rd and 4th class ELT students as well as ELT graduates to define their academic needs in speaking courses. The data were analysed by using SPSS, a S...

  11. Cluster analysis in retail segmentation for credit scoring

    Directory of Open Access Journals (Sweden)

    Sanja Scitovski

    2014-12-01

    Full Text Available The aim of this paper is to segment retail clients by using adaptive Mahalanobis clustering in a way that each segment can be suitable for separate credit scoring development such that a better risk assessment of retail clients could be accomplished. A real data set on retail clients from a Croatian bank was used in the paper. Grouping of the data point set is carried out by using the adaptive Mahalanobis partitioning algorithm (see, e.g., [20]. It is an incremental algorithm, which recognizes ellipsoidal clusters with the main axes in the directions of eigenvectors of the corresponding covariance matrix of the data set. On the basis of the given data set, by using the well-known DIRECT algorithm for global optimization it is possible to search successively for an optimal partition with k=2, 3,... clusters. After that, a partition with the most appropriate number of clusters is determined by using various validity indexes. Based on the description of each cluster, banks could decide to develop a separate credit scoring model for each cluster as well as to create a business strategy customized to each cluster.

  12. Cluster Computing For Real Time Seismic Array Analysis.

    Science.gov (United States)

    Martini, M.; Giudicepietro, F.

    A seismic array is an instrument composed by a dense distribution of seismic sen- sors that allow to measure the directional properties of the wavefield (slowness or wavenumber vector) radiated by a seismic source. Over the last years arrays have been widely used in different fields of seismological researches. In particular they are applied in the investigation of seismic sources on volcanoes where they can be suc- cessfully used for studying the volcanic microtremor and long period events which are critical for getting information on the volcanic systems evolution. For this reason arrays could be usefully employed for the volcanoes monitoring, however the huge amount of data produced by this type of instruments and the processing techniques which are quite time consuming limited their potentiality for this application. In order to favor a direct application of arrays techniques to continuous volcano monitoring we designed and built a small PC cluster able to near real time computing the kinematics properties of the wavefield (slowness or wavenumber vector) produced by local seis- mic source. The cluster is composed of 8 Intel Pentium-III bi-processors PC working at 550 MHz, and has 4 Gigabytes of RAM memory. It runs under Linux operating system. The developed analysis software package is based on the Multiple SIgnal Classification (MUSIC) algorithm and is written in Fortran. The message-passing part is based upon the LAM programming environment package, an open-source imple- mentation of the Message Passing Interface (MPI). The developed software system includes modules devote to receiving date by internet and graphical applications for the continuous displaying of the processing results. The system has been tested with a data set collected during a seismic experiment conducted on Etna in 1999 when two dense seismic arrays have been deployed on the northeast and the southeast flanks of this volcano. A real time continuous acquisition system has been simulated by

  13. Nonlinear analysis of nano-cluster doped fiber

    Institute of Scientific and Technical Information of China (English)

    LIU Gang; ZHANG Ru

    2007-01-01

    There are prominent nonlinear characteristics that we hope for the semiconductor nano-clusters doped fiber. Refractive index of fiber core can be effectively changed by adulteration. This technology can provide a new method for developing photons components. Because the semiconductor nano-cluster has quantum characteristics,Based on first-order perturbation theory and classical theory of fiber,we deduced refractive index expressions of fiber core,which was semiconductor nano-cluster doped fiber. Finally,third-order nonlinear coefficient equation was gained. Using this equation,we calculated SMF-28 fiber nonlinear coefficient. The equation shows that new third-order coefficient was greater.

  14. DNA splice site sequences clustering method for conservativeness analysis

    Institute of Scientific and Technical Information of China (English)

    Quanwei Zhang; Qinke Peng; Tao Xu

    2009-01-01

    DNA sequences that are near to splice sites have remarkable conservativeness,and many researchers have contributed to the prediction of splice site.In order to mine the underlying biological knowledge,we analyze the conservativeness of DNA splice site adjacent sequences by clustering.Firstly,we propose a kind of DNA splice site sequences clustering method which is based on DBSCAN,and use four kinds of dissimilarity calculating methods.Then,we analyze the conservative feature of the clustering results and the experimental data set.

  15. Cluster Forests

    CERN Document Server

    Yan, Donghui; Jordan, Michael I

    2011-01-01

    Inspired by Random Forests (RF) in the context of classification, we propose a new clustering ensemble method---Cluster Forests (CF). Geometrically, CF randomly probes a high-dimensional data cloud to obtain "good local clusterings" and then aggregates via spectral clustering to obtain cluster assignments for the whole dataset. The search for good local clusterings is guided by a cluster quality measure $\\kappa$. CF progressively improves each local clustering in a fashion that resembles the tree growth in RF. Empirical studies on several real-world datasets under two different performance metrics show that CF compares favorably to its competitors. Theoretical analysis shows that the $\\kappa$ criterion is shown to grow each local clustering in a desirable way---it is "noise-resistant." A closed-form expression is obtained for the mis-clustering rate of spectral clustering under a perturbation model, which yields new insights into some aspects of spectral clustering.

  16. Cluster Analysis in Patients with GOLD 1 Chronic Obstructive Pulmonary Disease.

    Directory of Open Access Journals (Sweden)

    Philippe Gagnon

    Full Text Available We hypothesized that heterogeneity exists within the Global Initiative for Chronic Obstructive Lung Disease (GOLD 1 spirometric category and that different subgroups could be identified within this GOLD category.Pre-randomization study participants from two clinical trials were symptomatic/asymptomatic GOLD 1 chronic obstructive pulmonary disease (COPD patients and healthy controls. A hierarchical cluster analysis used pre-randomization demographics, symptom scores, lung function, peak exercise response and daily physical activity levels to derive population subgroups.Considerable heterogeneity existed for clinical variables among patients with GOLD 1 COPD. All parameters, except forced expiratory volume in 1 second (FEV1/forced vital capacity (FVC, had considerable overlap between GOLD 1 COPD and controls. Three-clusters were identified: cluster I (18 [15%] COPD patients; 105 [85%] controls; cluster II (45 [80%] COPD patients; 11 [20%] controls; and cluster III (22 [92%] COPD patients; 2 [8%] controls. Apart from reduced diffusion capacity and lower baseline dyspnea index versus controls, cluster I COPD patients had otherwise preserved lung volumes, exercise capacity and physical activity levels. Cluster II COPD patients had a higher smoking history and greater hyperinflation versus cluster I COPD patients. Cluster III COPD patients had reduced physical activity versus controls and clusters I and II COPD patients, and lower FEV1/FVC versus clusters I and II COPD patients.The results emphasize heterogeneity within GOLD 1 COPD, supporting an individualized therapeutic approach to patients.www.clinicaltrials.gov. NCT01360788 and NCT01072396.

  17. Quality assessment of cortex cinnamomi by HPLC chemical fingerprint, principle component analysis and cluster analysis.

    Science.gov (United States)

    Yang, Jie; Chen, Li-Hong; Zhang, Qin; Lai, Mao-Xiang; Wang, Qiang

    2007-06-01

    HPLC fingerprint analysis, principle component analysis (PCA), and cluster analysis were introduced for quality assessment of Cortex cinnamomi (CC). The fingerprint of CC was developed and validated by analyzing 30 samples of CC from different species and geographic locations. Seventeen chromatographic peaks were selected as characteristic peaks and their relative peak areas (RPA) were calculated for quantitative expression of the HPLC fingerprints. The correlation coefficients of similarity in chromatograms were higher than 0.95 for the same species while much lower than 0.6 for different species. Besides, two principal components (PCs) have been extracted by PCA. PC1 separated Cinnamomum cassia from other species, capturing 56.75% of variance while PC2 contributed for their further separation, capturing 19.08% variance. The scores of the samples showed that the samples could be clustered reasonably into different groups corresponding to different species and different regions. The scores and loading plots together revealed different chemical properties of each group clearly. The cluster analysis confirmed the results of PCA analysis. Therefore, HPLC fingerprint in combination with chemometric techniques provide a very flexible and reliable method for quality assessment of traditional Chinese medicines.

  18. Critical clusters in interdependent economic sectors. A data-driven spectral clustering analysis

    Science.gov (United States)

    Oliva, Gabriele; Setola, Roberto; Panzieri, Stefano

    2016-10-01

    In this paper we develop a data-driven hierarchical clustering methodology to group the economic sectors of a country in order to highlight strongly coupled groups that are weakly coupled with other groups. Specifically, we consider an input-output representation of the coupling among the sectors and we interpret the relation among sectors as a directed graph; then we recursively apply the spectral clustering methodology over the graph, without a priori information on the number of groups that have to be obtained. In order to do this, we resort to the eigengap criterion, where a suitable number of groups is selected automatically based on the intensity and structure of the coupling among the sectors. We validate the proposed methodology considering a case study for Italy, inspecting how the coupling among clusters and sectors changes from the year 1995 to 2011, showing that in the years the Italian structure underwent deep changes, becoming more and more interdependent, i.e., a large part of the economy has become tightly coupled.

  19. Global stability analysis on a class of cellular neural networks

    Institute of Scientific and Technical Information of China (English)

    ZHANG; Yi

    2001-01-01

    [1]Chua, L. O., Yang, L., Cellular neural networks: Theory, IEEE Trans. CAS, 1988, (10): 1257.[2]Chua, L. O., Yang, L., Cellular neural networks: Applications, IEEE Trans. CAS, 1988, (10): 1273.[3]Chua, L. O., Roska, T., The CNN paradigm, IEEE Trans. CAS-I, 1993, (3): 147.[4]Matsumoto, T. Chua, L. O., Suzuki, H., CNN cloning template: Connected component detector, IEEE Trans. CAS, 1990, (8): 633.[5]Cao, L, Sun, Y, Yu, J., A CNN-based signature verification system,Proc. ICONIP′95, Beijing, 1995, 913—916.[6]Roska, T., Chua, L. O., The CNN universal machine: An analogic array computer, IEEE Trans. CAS Ⅱ, 1993, (3): 163.[7]Chua, L. O., Roska, T., Stability of a class of nonreciprocal cellular neural networks, IEEE Trans. CAS, 1990, (3): 1520.[8]Roska, T., Wu, C. W., Balsi, M. Et al., Stability and dynamics of delay type general and cellular neural networks, IEEE Trans. CAS, 1992, (6): 487.[9]Roska, T., Wu, C. W., Chua, L. O., Stability of cellular neural networks with dominant nonlinear and delaytype templates, IEEE Trans. CAS, 1993, (4): 270.[10]Civalleri, P. P., On stability of cellular neural networks with delay, IEEE Trans. CAS-I, 1993, (3): 157.[11]Gilli, G., Stability of cellular neural network and delayed cellular neural networks with nonpositive templates and nonmonotonic output functions, IEEE Trans CAS-I, 1994, (8): 518.[12]Baldi, P., Atiya, A. F., How delays affect neural dynamics and learning, IEEE Trans. On Neural Networks, 1994, (4): 612.[13]Liao, X. X., Mathematic foundation of cellular neural networks (Ⅰ), Science in China, Ser. A, 1994, 37(9): 902.[14]Liao, X. X., Mathematic foundation of cellular neural networks (Ⅱ), Science in China, Ser. A, 1994, 37(9): 1037.[15]Zhang, Y., Global exponential stability and periodic solutions of delay Hopfild neural networks, International J. Sys. Sci., 1996, (2): 227.[16]Zhang Yi, Zhong, S. M., Li, Z. L., Periodic solutions and global

  20. Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms

    Science.gov (United States)

    Esplin, M Sean; Manuck, Tracy A.; Varner, Michael W.; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M.; Ilekis, John

    2015-01-01

    Objective We sought to employ an innovative tool based on common biological pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB), in order to enhance investigators' ability to identify to highlight common mechanisms and underlying genetic factors responsible for SPTB. Study Design A secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks gestation. Each woman was assessed for the presence of underlying SPTB etiologies. A hierarchical cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis using VEGAS software. Results 1028 women with SPTB were assigned phenotypes. Hierarchical clustering of the phenotypes revealed five major clusters. Cluster 1 (N=445) was characterized by maternal stress, cluster 2 (N=294) by premature membrane rupture, cluster 3 (N=120) by familial factors, and cluster 4 (N=63) by maternal comorbidities. Cluster 5 (N=106) was multifactorial, characterized by infection (INF), decidual hemorrhage (DH) and placental dysfunction (PD). These three phenotypes were highly correlated by Chi-square analysis [PD and DH (p<2.2e-6); PD and INF (p=6.2e-10); INF and DH (p=0.0036)]. Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. Conclusion We identified 5 major clusters of SPTB based on a phenotype tool and hierarchal clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors underlying SPTB. PMID:26070700

  1. Geographic clustering of firms and urban form: a multivariate analysis

    Science.gov (United States)

    Maoh, Hanna; Kanaroglou, Pavlos

    2007-04-01

    This paper provides an empirical framework that applies spatial statistics methods to assess the relation between the change in the geographical clustering of firms and the emergence of urban form. We contend that where firms locate and eventually cluster give rise to the way commercial and industrial land uses are organized over space, which in turn defines the shape of urban form. Accordingly, the objectives of our work are twofold: (1) to identify the extent and shape of firm clustering and co-location at the intra-metropolitan level, and (2) examine how the change in the geographic clustering of different industries contributes to decentralization and the evolution of urban form. Spatial statistics methods and tools were vital and helped to fulfill these objectives.

  2. Clustering analysis of malware behavior using Self Organizing Map

    DEFF Research Database (Denmark)

    Pirscoveanu, Radu-Stefan; Stevanovic, Matija; Pedersen, Jens Myrup

    2016-01-01

    Map, an unsupervised machine learning algorithm, for generating clusters that capture the similarities between malware behavior. A data set of approximately 270,000 samples was used to generate the behavioral profile of malicious types in order to compare the outcome of the proposed clustering...... accurate results based on the clusters created by competitive and cooperative algorithms like Self Organizing Map that better describe the behavioral profile of malware....... approach with the labels collected from 57 Antivirus vendors using VirusTotal. Upon evaluating the results, the paper concludes on shortcomings of relying on AV vendors for labeling malware samples. In order to solve the problem, a cluster-based classification is proposed, which should provide more...

  3. First PPMXL photometric analysis of open cluster Ruprecht 15

    Institute of Scientific and Technical Information of China (English)

    Ashraf Latif Tadross

    2012-01-01

    We present the first in a series studying the astrophysical parameters of open clusters using the PPMXL* database whose data are applied to study Ruprecht 15.The astrophysical parameters of Ruprecht 15 have been estimated for the first time.

  4. Stellar variability in open clusters. II. Discovery of a new period-luminosity relation in a class of fast-rotating pulsating stars in NGC 3766

    CERN Document Server

    Mowlavi, N; Semaan, T; Eggenberger, P; Barblan, F; Eyer, L; Ekström, S; Georgy, C

    2016-01-01

    $Context.$ Pulsating stars are windows to the physics of stars enabling us to see glimpses of their interior. Not all stars pulsate, however. On the main sequence, pulsating stars form an almost continuous sequence in brightness, except for a magnitude range between $\\delta$ Scuti and slowly pulsating B stars. Against all expectations, 36 periodic variables were discovered in 2013 in this luminosity range in the open cluster NGC 3766, the origins of which was a mystery. $Aims.$ We investigate the properties of those new variability class candidates in relation to their stellar rotation rates and stellar multiplicity. $Methods.$ We took multi-epoch spectra over three consecutive nights using ESO's Very Large Telescope. $Results.$ We find that the majority of the new variability class candidates are fast-rotating pulsators that obey a new period-luminosity relation. We argue that the new relation discovered here has a different physical origin to the period-luminosity relations observed for Cepheids. $Conclusio...

  5. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome

    Energy Technology Data Exchange (ETDEWEB)

    Lalonde, Michel, E-mail: mlalonde15@rogers.com; Wassenaar, Richard [Department of Physics, Carleton University, Ottawa, Ontario K1S 5B6 (Canada); Wells, R. Glenn; Birnie, David; Ruddy, Terrence D. [Division of Cardiology, University of Ottawa Heart Institute, Ottawa, Ontario K1Y 4W7 (Canada)

    2014-07-15

    Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster

  6. Cluster analysis in severe emphysema subjects using phenotype and genotype data: an exploratory investigation

    Directory of Open Access Journals (Sweden)

    Martinez Fernando J

    2010-03-01

    Full Text Available Abstract Background Numerous studies have demonstrated associations between genetic markers and COPD, but results have been inconsistent. One reason may be heterogeneity in disease definition. Unsupervised learning approaches may assist in understanding disease heterogeneity. Methods We selected 31 phenotypic variables and 12 SNPs from five candidate genes in 308 subjects in the National Emphysema Treatment Trial (NETT Genetics Ancillary Study cohort. We used factor analysis to select a subset of phenotypic variables, and then used cluster analysis to identify subtypes of severe emphysema. We examined the phenotypic and genotypic characteristics of each cluster. Results We identified six factors accounting for 75% of the shared variability among our initial phenotypic variables. We selected four phenotypic variables from these factors for cluster analysis: 1 post-bronchodilator FEV1 percent predicted, 2 percent bronchodilator responsiveness, and quantitative CT measurements of 3 apical emphysema and 4 airway wall thickness. K-means cluster analysis revealed four clusters, though separation between clusters was modest: 1 emphysema predominant, 2 bronchodilator responsive, with higher FEV1; 3 discordant, with a lower FEV1 despite less severe emphysema and lower airway wall thickness, and 4 airway predominant. Of the genotypes examined, membership in cluster 1 (emphysema-predominant was associated with TGFB1 SNP rs1800470. Conclusions Cluster analysis may identify meaningful disease subtypes and/or groups of related phenotypic variables even in a highly selected group of severe emphysema subjects, and may be useful for genetic association studies.

  7. Marketing Mix Formulation for Higher Education: An Integrated Analysis Employing Analytic Hierarchy Process, Cluster Analysis and Correspondence Analysis

    Science.gov (United States)

    Ho, Hsuan-Fu; Hung, Chia-Chi

    2008-01-01

    Purpose: The purpose of this paper is to examine how a graduate institute at National Chiayi University (NCYU), by using a model that integrates analytic hierarchy process, cluster analysis and correspondence analysis, can develop effective marketing strategies. Design/methodology/approach: This is primarily a quantitative study aimed at…

  8. Identification and comparative analysis of the protocadherin cluster in a reptile, the green anole lizard.

    Directory of Open Access Journals (Sweden)

    Xiao-Juan Jiang

    Full Text Available BACKGROUND: The vertebrate protocadherins are a subfamily of cell adhesion molecules that are predominantly expressed in the nervous system and are believed to play an important role in establishing the complex neural network during animal development. Genes encoding these molecules are organized into a cluster in the genome. Comparative analysis of the protocadherin subcluster organization and gene arrangements in different vertebrates has provided interesting insights into the history of vertebrate genome evolution. Among tetrapods, protocadherin clusters have been fully characterized only in mammals. In this study, we report the identification and comparative analysis of the protocadherin cluster in a reptile, the green anole lizard (Anolis carolinensis. METHODOLOGY/PRINCIPAL FINDINGS: We show that the anole protocadherin cluster spans over a megabase and encodes a total of 71 genes. The number of genes in the anole protocadherin cluster is significantly higher than that in the coelacanth (49 genes and mammalian (54-59 genes clusters. The anole protocadherin genes are organized into four subclusters: the delta, alpha, beta and gamma. This subcluster organization is identical to that of the coelacanth protocadherin cluster, but differs from the mammalian clusters which lack the delta subcluster. The gene number expansion in the anole protocadherin cluster is largely due to the extensive gene duplication in the gammab subgroup. Similar to coelacanth and elephant shark protocadherin genes, the anole protocadherin genes have experienced a low frequency of gene conversion. CONCLUSIONS/SIGNIFICANCE: Our results suggest that similar to the protocadherin clusters in other vertebrates, the evolution of anole protocadherin cluster is driven mainly by lineage-specific gene duplications and degeneration. Our analysis also shows that loss of the protocadherin delta subcluster in the mammalian lineage occurred after the divergence of mammals and reptiles

  9. A New Class of Analysis-Based Fast Transforms

    Science.gov (United States)

    2007-08-06

    Standards), Washington, DC, 1964. [2] Arfken , George B. and Hans J. Weber, Mathematical Methods for Physicists, 6th ed., Elsevier Academic Press, New York...procedure as the “butterfly” algorithm. Another class of transforms common in mathematical physics and classical mathematics are integral transforms...to as the kernel of the integral transform. One may desire to evaluate the function Kf at n values y1, y2, . . . , yn. Standard methods for the

  10. Detection of land use/land cover changes through the comparative analysis of NDVI-MODIS phenological clusters

    Science.gov (United States)

    Simoniello, Tiziana; Imbrenda, Vito; Lanfredi, Maria

    2013-04-01

    The use of satellite time series provides precious information to understand vegetation dynamics. In particular, they can be profitably used for studying magnitudo and spatial extent of the Earth's land cover alterations, which affect directly biodiversity, can contribute to land degradation, and are linked to climate change by feedback mechanisms. In the framework of PRO-LAND project (PO-FESR Basilicata 2007-2013), we used NDVI-MODIS satellite time series (250 m), available as 16-day composite from the NASA LPDAAC dataset, to analyze land cover changes occurred in Basilicata region (Southern Italy) during the period 2000-2010. We performed a phenological clustering for the years 2000 and 2010 by means of the unsupervised classification fuzzy k-means which is able to identify gradual differences among phenological patterns. The time domain considered is from April to October in order to reduce disturbances due to the presence of clouds, which can distort actual vegetation phenological profiles. The optimal number of clusters to capture the heterogeneity of the examined area was fixed at ten, because it seemed to be a good trade-off between the need of an efficient representation of ecosystems and the ability to detect local fragmentation effects. Results show that the temporal patterns of the ten clusters can be organised in a continuum of phenological curves. They can be sorted unambiguously according to increasing percentage of man-made areas (decreasing percentage of natural areas) and allow us to well discriminate different land cover compositions by looking not only at differences in mean NDVI values but also at differences in the seasonal timing. The cluster sequence for both the examined years mostly follows the spatial arrangement of the land cover classes, and the complex orography of the investigated region. In general, results show that a slight variability characterize the arrangement of cluster cores, particularly for the clusters with a dominance of

  11. Bohai crude oil identification by gas chromatogram fingerprinting quantitative analysis coupled with cluster analysis

    Institute of Scientific and Technical Information of China (English)

    SUN Peiyan; BAO Mutai; GAO Zhenhui; LI Mei; ZHAO Yuhui; WANG Xinping; ZHOU Qing; WANG Xiulin

    2006-01-01

    By gas chromatogram, six crude oils fingerprinting distributed in four oilfields and four oil platforms were analyzed and the corresponding normal paraffin hydrocarbon (including pristane and phytane) concentration was obtained by the internal standard method. The normal paraffin hydrocarbon distribution patterns of six crude oils were built and compared. The cluster analysis on the normal paraffin hydrocarbon concentration was conducted for classification and some ratios of oils were used for oils comparison. The results indicated: there was a clear difference within different crude oils in different oil fields and a small difference between the crude oils in the same oil platform. The normal paraffin hydrocarbon distribution pattern and ratios, as well as the cluster analysis on the normal paraffin hydrocarbon concentration can have a better differentiation result for the crude oils with small difference than the original gas chromatogram.

  12. Identifying At-Risk Students in General Chemistry via Cluster Analysis of Affective Characteristics

    Science.gov (United States)

    Chan, Julia Y. K.; Bauer, Christopher F.

    2014-01-01

    The purpose of this study is to identify academically at-risk students in first-semester general chemistry using affective characteristics via cluster analysis. Through the clustering of six preselected affective variables, three distinct affective groups were identified: low (at-risk), medium, and high. Students in the low affective group…

  13. Cluster analysis of European Y-chromosomal STR haplotypes using the discrete Laplace method

    DEFF Research Database (Denmark)

    Andersen, Mikkel Meyer; Eriksen, Poul Svante; Morling, Niels

    2014-01-01

    method can be used for cluster analysis to further validate the discrete Laplace method. A very important practical fact is that the calculations can be performed on a normal computer. We identified two sub-clusters of the Eastern and Western European Y-STR haplotypes similar to results of previous...

  14. Applying Clustering to Statistical Analysis of Student Reasoning about Two-Dimensional Kinematics

    Science.gov (United States)

    Springuel, R. Padraic; Wittman, Michael C.; Thompson, John R.

    2007-01-01

    We use clustering, an analysis method not presently common to the physics education research community, to group and characterize student responses to written questions about two-dimensional kinematics. Previously, clustering has been used to analyze multiple-choice data; we analyze free-response data that includes both sketches of vectors and…

  15. The reflection of hierarchical cluster analysis of co-occurrence matrices in SPSS

    NARCIS (Netherlands)

    Zhou, Q.; Leng, F.; Leydesdorff, L.

    2015-01-01

    Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare the

  16. Co-clustering : A versatile Tool for Data Analysis in Biomedical Informatics

    OpenAIRE

    Yoon, Sungroh; Benini, Luca; De Micheli, Giovanni

    2007-01-01

    Co-clustering has not been much exploited in biomedical in- formatics, despite its success in other domains. Most of the previous applications were limited to analyzing gene expression data. We performed co-clustering analysis on other types of data and obtained promising results, as summarized in this paper.

  17. Identification and structural analysis of a novel snoRNA gene cluster from Arabidopsis thaliana

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    A Z2 snoRNA gene cluster,consisting of four antisense snoRNA genes, was identified from Arabidopsis thaliana. The sequence and structural analysis showed that the Z2 snoRNA gene cluster might be transcribed as a polycistronic precursor from an upstream promoter, and the intergenic spacers of the gene cluster encode the 'hairpin' structures similar to the processing recognition signals of yeast Saccharomyces cerevisiae polycistronic snoRNA precursor. The results also revealed that plant snoRNA gene with multiple copies is a characteristic in common, and provides a good system for further revealing the transcription and expression mechanism of plant snoRNA gene cluster.

  18. Analysis of cost data in a cluster-randomized, controlled trial: comparison of methods

    DEFF Research Database (Denmark)

    Sokolowski, Ineta; Ørnbøl, Eva; Rosendal, Marianne;

    in clusters of general practices.   There have been suggestions to apply different methods, e.g., the non-parametric bootstrap, to highly skewed data from pragmatic randomized trials without clusters, but there is very little information about how to analyse skewed data from cluster-randomized trials. Many...... studies have used non-valid analysis of skewed data. We propose two different methods to compare mean cost in two groups. Firstly, we use a non-parametric bootstrap method where the re-sampling takes place on two levels in order to take into account the cluster effect. Secondly, we proceed with a log...

  19. Cluster Analysis of Customer Reviews Extracted from Web Pages

    Directory of Open Access Journals (Sweden)

    S. Shivashankar

    2010-01-01

    Full Text Available As e-commerce is gaining popularity day by day, the web has become an excellent source for gathering customer reviews / opinions by the market researchers. The number of customer reviews that a product receives is growing at very fast rate (It could be in hundreds or thousands. Customer reviews posted on the websites vary greatly in quality. The potential customer has to read necessarily all the reviews irrespective of their quality to make a decision on whether to purchase the product or not. In this paper, we make an attempt to assess are view based on its quality, to help the customer make a proper buying decision. The quality of customer review is assessed as most significant, more significant, significant and insignificant.A novel and effective web mining technique is proposed for assessing a customer review of a particular product based on the feature clustering techniques, namely, k-means method and fuzzy c-means method. This is performed in three steps : (1Identify review regions and extract reviews from it, (2 Extract and cluster the features of reviews by a clustering technique and then assign weights to the features belonging to each of the clusters (groups and (3 Assess the review by considering the feature weights and group belongingness. The k-means and fuzzy c-means clustering techniques are implemented and tested on customer reviews extracted from web pages. Performance of these techniques are analyzed.

  20. A Bayesian Analysis of the Ages of Four Open Clusters

    CERN Document Server

    Jeffery, Elizabeth J; van Dyk, David A; Stenning, David C; Robinson, Elliot; Stein, Nathan; Jefferys, W H

    2016-01-01

    In this paper we apply a Bayesian technique to determine the best fit of stellar evolution models to find the main sequence turn off age and other cluster parameters of four intermediate-age open clusters: NGC 2360, NGC 2477, NGC 2660, and NGC 3960. Our algorithm utilizes a Markov chain Monte Carlo technique to fit these various parameters, objectively finding the best-fit isochrone for each cluster. The result is a high-precision isochrone fit. We compare these results with the those of traditional "by-eye" isochrone fitting methods. By applying this Bayesian technique to NGC 2360, NGC 2477, NGC 2660, and NGC 3960, we determine the ages of these clusters to be 1.35 +/- 0.05, 1.02 +/- 0.02, 1.64 +/- 0.04, and 0.860 +/- 0.04 Gyr, respectively. The results of this paper continue our effort to determine cluster ages to higher precision than that offered by these traditional methods of isochrone fitting.

  1. Latent Class and Latent Transition Analysis With Applications in the Social, Behavioral, and Health Sciences

    CERN Document Server

    Collins, Linda M

    2010-01-01

    One of the few books on latent class analysis (LCA) and latent transition analysis (LTA) with a comprehensive treatment of longitudinal latent class models, Latent Class and Latent Transition Analysis reflects improvements in statistical computing as the most up-to-date reference for theoretical, technical, and practical issues in cross-sectional and longitudinal data. Plentiful examples enable the reader to acquire a thorough conceptual and technical understanding and to apply techniques to address empirical research questions. Researchers seeking an advanced introduction to LCA and LTA and g

  2. Cluster-cluster clustering

    Science.gov (United States)

    Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C. S.

    1985-01-01

    The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales.

  3. Cluster-cluster clustering

    Energy Technology Data Exchange (ETDEWEB)

    Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C.S.

    1985-08-01

    The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales. 30 references.

  4. Latent Class Analysis of Peer Conformity: Who Is Yielding to Pressure and Why?

    Science.gov (United States)

    Kosten, Paul A.; Scheier, Lawrence M.; Grenard, Jerry L.

    2013-01-01

    This study used latent class analysis to examine typologies of peer conformity in a community sample of middle school students. Students responded to 31 items assessing diverse facets of conformity dispositions. The most parsimonious model produced three qualitatively distinct classes that differed on the basis of conformity to recreational…

  5. Analysis of protein profiles using fuzzy clustering methods

    DEFF Research Database (Denmark)

    Karemore, Gopal Raghunath; Ukendt, Sujatha; Rai, Lavanya

    clustering methods for their classification followed by various validation  measures.    The  clustering  algorithms  used  for  the  study  were  K-  means,  K- medoid, Fuzzy C-means, Gustafson-Kessel, and Gath-Geva.  The results presented in this study  conclude  that  the  protein  profiles  of  tissue......  samples  recorded  by  using  the  HPLC- LIF  system  and  the  data  analyzed  by  clustering  algorithms  quite  successfully  classifies them as belonging from normal and malignant conditions....

  6. Theoretical Analysis of Structures of Ga4N4 Clusters

    Institute of Scientific and Technical Information of China (English)

    宋斌; 曹培林

    2003-01-01

    The structures and energies of a Ga4N4 cluster have been calculated using a full-potential linear-muffin-tin-orbital molecular-dynamics (FP-LMTO MD) method. We obtained twenty-four structures for a Ga4N4 cluster. The most stable structure we obtained is a Cs three-dimensional structure, the energy of which is lower than that of the C2v symmetry structure proposed by Kandalam et al. [J. Phys. Chem. B 106 (2002) 1945] The calculated results show that the isomer with an N3 subunit is preferred, supporting the previous result made by Kandalam et al.We found that the most stable structure of Ga4N4 clusters presented semiconductor-like properties through the calculation of the density of states.

  7. Genetic Diversity among Parents of Hybrid Rice Based on Cluster Analysis of Morphological Traits and Simple Sequence Repeat Markers

    Institute of Scientific and Technical Information of China (English)

    WANG Sheng-jun; LU Zuo-mei; WAN Jian-min

    2006-01-01

    The genetic diversity of 41 parental lines popularized in commercial hybrid rice production in China was studied by using cluster analysis of morphological traits and simple sequence repeat (SSR) markers. Forty-one entries were assigned into two clusters (I.e. Early or medium-maturing cluster; medium or late-maturing cluster) and further assigned into six sub-clusters based on morphological trait cluster analysis. The early or medium-maturing cluster was composed of 15 maintainer lines, four early-maturing restorer lines and two thermo-sensitive genic male sterile lines, and the medium or late-maturing cluster included 16 restorer lines and 4 medium or late-maturing maintainer lines. Moreover, the SSR cluster analysis classified 41 entries into two clusters (I.e. Maintainer line cluster and restorer line cluster) and seven sub-clusters. The maintainer line cluster consisted of all 19 maintainer lines, two thermo-sensitive genic male sterile lines, while the restorer line cluster was composed of all 20 restorer lines. The SSR analysis fitted better with the pedigree information. From the views on hybrid rice breeding, the results suggested that SSR analysis might be a better method to study the diversity of parental lines in indica hybrid rice.

  8. Boys and girls smoking within the Danish elementary school classes: a group-level analysis

    DEFF Research Database (Denmark)

    Rasmussen, Mette; Damsgaard, Mogens T; Due, Pernille;

    2002-01-01

    AIMS: To quantify the correlation between male and female smoking prevalence in elementary school classes by group-level analysis. METHODS: This study was the Danish contribution to the cross-national study Health Behaviour in School-Aged Children (HBSC) 1998. Ninety school classes at grade nine (1......,515 students) from a random sample of schools in Denmark took part. The proportion of male and female "at all" smokers and daily smokers in the school class was calculated. RESULTS: The mean "at all" smoking proportion in the school classes is 39% for girls and 32% for boys. The proportion of male and female...

  9. Dialogue Analysis and Its Application in English Language Class

    Institute of Scientific and Technical Information of China (English)

    李婧雅

    2014-01-01

    Dialogue is frequently employed as role-play in the classroom activities. Students are encouraged to practice speaking skill in the context of a certain conversation. Conversation tasks in listening exercises also attract various interests in English les-sons. This essay aims to analyze the functions of dialogues, followed with the discussion on how to apply the proper dialogues in-to English classroom, and ending up with the suggestions of some possible activities adopted in English language class. A dialogue cited from Dellar and Walkley (2003, p. 125) is used as a sample to interpret in detail.

  10. Construction and Dimension Analysis for a Class of Fractal Functions

    Institute of Scientific and Technical Information of China (English)

    Hong-yong Wang; Zong-ben Xu

    2002-01-01

    In this paper, we construct a class of nowhere differentiable continuous functions by means of the Cantor series expression of real numbers. The constructed functions include some known nondifferentiable functions, such as Bush type functions. These functions are fractal functions since their graphs are in general fractal sets. Under certain conditions, we investigate the fractal dimensions of the graphs of these functions,compute the precise values of Box and Packing dimensions, and evaluate the Hausdorff dimension. Meanwhile,the Holder continuity of such functions is also discussed.

  11. Learning regularized LDA by clustering.

    Science.gov (United States)

    Pang, Yanwei; Wang, Shuang; Yuan, Yuan

    2014-12-01

    As a supervised dimensionality reduction technique, linear discriminant analysis has a serious overfitting problem when the number of training samples per class is small. The main reason is that the between- and within-class scatter matrices computed from the limited number of training samples deviate greatly from the underlying ones. To overcome the problem without increasing the number of training samples, we propose making use of the structure of the given training data to regularize the between- and within-class scatter matrices by between- and within-cluster scatter matrices, respectively, and simultaneously. The within- and between-cluster matrices are computed from unsupervised clustered data. The within-cluster scatter matrix contributes to encoding the possible variations in intraclasses and the between-cluster scatter matrix is useful for separating extra classes. The contributions are inversely proportional to the number of training samples per class. The advantages of the proposed method become more remarkable as the number of training samples per class decreases. Experimental results on the AR and Feret face databases demonstrate the effectiveness of the proposed method.

  12. Analysis of the Advantages of Creating Border Clusters

    Directory of Open Access Journals (Sweden)

    Liudmila Rosca-Sadurschi

    2015-08-01

    Full Text Available In a changing environment and rapid globalization, competitiveness of a country or region depends increasingly more effective in innovation. The main challenge for research and innovation is to facilitate the networking of companies and research laboratories. These networks can take the form of a highly integrated cross-border economic group, but may consist of action to facilitate business linkages and inter-laboratory, or cross-border clusters. The creation of these clusters requires performing several conditions but bring significant benefits to all stakeholders.

  13. Optimization of supervised cluster analysis for extracting reference tissue input curves in (R)-[11C]PK11195 brain PET studies

    OpenAIRE

    Boellaard, Ronald; Hinz, Rainer; Adriaan A. Lammertsma; Schuitemaker, Alie; Tomasi, Giampaolo; Turkheimer, Federico E.; van Berckel, Bart NM; Yaqub, Maqsood

    2012-01-01

    Performance of two supervised cluster analysis (SVCA) algorithms for extracting reference tissue curves was evaluated to improve quantification of dynamic (R)-[(11)C]PK11195 brain positron emission tomography (PET) studies. Reference tissues were extracted from images using both a manually defined cerebellum and SVCA algorithms based on either four (SVCA4) or six (SVCA6) kinetic classes. Data from controls, mild cognitive impairment patients, and patients with Alzheimer's disease were ana...

  14. Application of K-Means Algorithm for Cluster Analysis on Poverty of Provinces in Indonesia

    Directory of Open Access Journals (Sweden)

    Albert Verasius Dian Sano

    2016-06-01

    Full Text Available The objective of this study is to apply cluster analysis or also known as clustering on poverty data of provinces all over Indonesia.The problem is that the decision makers such as central government, local government and non-government organizations, which involve in poverty problems, need a tool to support decision-making process related to social welfare problems. The method used in the cluster analysis is k-means algorithm. The data used in this study were drawn from Badan Pusat Statistik (BPS or Central Bureau of Statistics on 2014.Cluster analysis in this study took characteristics of data such as absolute poverty of each province, relative number or percentage of poverty of each province, and the level of depth index poverty of each province in Indonesia. Results of cluster analysis in this study were presented in the form of grouping of clusters' members visually. Cluster analysis in the study could be used to identify more quickly and efficiently on poverty chart of all provinces all over Indonesia. The results of such identification can be used by policy makers who have interests of eradicating the problems associated with poverty and welfare distribution in Indonesia, ranging from government organizations, non-governmental organizations, and also private organizations.

  15. The intersectionality of discrimination attributes and bullying among youth: an applied latent class analysis.

    Science.gov (United States)

    Garnett, Bernice Raveche; Masyn, Katherine E; Austin, S Bryn; Miller, Matthew; Williams, David R; Viswanath, Kasisomayajula

    2014-08-01

    Discrimination is commonly experienced among adolescents. However, little is known about the intersection of multiple attributes of discrimination and bullying. We used a latent class analysis (LCA) to illustrate the intersections of discrimination attributes and bullying, and to assess the associations of LCA membership to depressive symptoms, deliberate self harm and suicidal ideation among a sample of ethnically diverse adolescents. The data come from the 2006 Boston Youth Survey where students were asked whether they had experienced discrimination based on four attributes: race/ethnicity, immigration status, perceived sexual orientation and weight. They were also asked whether they had been bullied or assaulted for these attributes. A total of 965 (78%) students contributed to the LCA analytic sample (45% Non-Hispanic Black, 29% Hispanic, 58% Female). The LCA revealed that a 4-class solution had adequate relative and absolute fit. The 4-classes were characterized as: low discrimination (51%); racial discrimination (33%); sexual orientation discrimination (7%); racial and weight discrimination with high bullying (intersectional class) (7%). In multivariate models, compared to the low discrimination class, individuals in the sexual orientation discrimination class and the intersectional class had higher odds of engaging in deliberate self-harm. Students in the intersectional class also had higher odds of suicidal ideation. All three discrimination latent classes had significantly higher depressive symptoms compared to the low discrimination class. Multiple attributes of discrimination and bullying co-occur among adolescents. Research should consider the co-occurrence of bullying and discrimination.

  16. A Latent Class Analysis of Smokeless Tobacco Use in the United States.

    Science.gov (United States)

    Fu, Qiang; Vaughn, Michael G

    2016-08-01

    While there has been an escalating trend in the number of smokeless tobacco uses, mainly snuff, in the United States, it is unclear whether smokeless tobacco users are a homogenous class. The present investigation examines this question and identifies subtypes of smokeless tobacco users in order to better understand the characteristics of these individuals and guide appropriate intervention. Data on smokeless tobacco users (N = 2504) derived from the National Epidemiologic Survey on Alcohol and Related Conditions was employed. A range of antisocial behaviors, from reflecting non-violent deviant acts, irresponsibility, and a disengaged lifestyle, to aggression and violence were used to estimate the number of subtypes of smokeless tobacco users using latent class analysis. Four latent classes emerged: Normative Class (50.2 %), Deviant Class (21.9 %), Disengaging Class (17.2 %), and Antisocial Class (10.5 %). Logistic regression shows that major depression, alcohol use disorder, and marijuana use disorder were associated with Deviant Class (OR's from 2.0 to 10.5). The same array of psychiatric disorders and general anxiety disorder were associated with greater odds of membership in the Disengaging Class (OR's from 2.6 to 7.4). Aforementioned psychiatric disorders and illicit drug use disorder were associated with the Antisocial Class (OR's from 3.8 to 38.1). Findings indicate that smokeless tobacco users are a heterogeneous population that may benefit from differential intervention strategies.

  17. ERRORS ANALYSIS AND TEACHERS' STRATEGIES IN SPEAKING CLASSES

    Institute of Scientific and Technical Information of China (English)

    LiMinquan

    2004-01-01

    In oral classes, teachers are often faced with all sorts oferrors made by students. Because of insufficient study of them,some correct all of the errors and some neglect them. The authorin this paper, through investigation of real class situation andall the possible collections of errors in his past teaching work,studies the errors and finds out four causes of the errors, andthen puts forward his suggestions for dealing with the differenterrors at different stages. In teaching students to speak English, teachers often find alot of errors in their speech. How should these errors be dealtwith properly? This is something many teachers are working at.Through investigation of real class situation and all the possiblecollections of errors in teaching work, it is believed that ateacher's knowledge of the learning law, careful observation ofthe errors being made by the students and proper attitudestoward the errors are very important. It has been found that when a child starts to learn his native language,he makes errors constantly, such as “This mammy chair”or “Mammy, apple eat” But he con say them correctly without much correction when he grows up. This is because “a human infant is born with an innate predisposition toacquire language”(Richards, 21). When an adult learns aforeign language, it is even more difficult, for physiologicallyhe has to train the muscles of his tongue and lips to get used tothe new ways of pronouncing a word, and psychologically hehas to receive new concepts of the language which are quitedifferent from his native tongue. Therefore, he unavoidablymakes errors in his speech. Even when he has mastered thelanguage to a certain degree, he still makes errors because “heknows very well what he should have done, but owing to thenervousness, tiredness, pressure and the effects of innertranslation (a kind of interference from home language), hejust lapses and forgets for a moment what to do” (McArthur,107-108). This doesn't mean that an

  18. A SPECTROSCOPIC ANALYSIS OF THE GALACTIC GLOBULAR CLUSTER NGC 6273 (M19)

    Energy Technology Data Exchange (ETDEWEB)

    Johnson, Christian I.; Caldwell, Nelson [Harvard–Smithsonian Center for Astrophysics, 60 Garden Street, MS-15, Cambridge, MA 02138 (United States); Rich, R. Michael [Department of Physics and Astronomy, UCLA, 430 Portola Plaza, Box 951547, Los Angeles, CA 90095-1547 (United States); Pilachowski, Catherine A. [Astronomy Department, Indiana University Bloomington, Swain West 319, 727 East 3rd Street, Bloomington, IN 47405-7105 (United States); Mateo, Mario; Bailey, John I. III [Department of Astronomy, University of Michigan, Ann Arbor, MI 48109 (United States); Crane, Jeffrey D., E-mail: cjohnson@cfa.harvard.edu, E-mail: ncaldwell@cfa.harvard.edu, E-mail: rmr@astro.ucla.edu, E-mail: catyp@astro.indiana.edu, E-mail: mmateo@umich.edu, E-mail: baileyji@umich.edu, E-mail: crane@obs.carnegiescience.edu [The Observatories of the Carnegie Institution for Science, Pasadena, CA 91101 (United States)

    2015-08-15

    A combined effort utilizing spectroscopy and photometry has revealed the existence of a new globular cluster class. These “anomalous” clusters, which we refer to as “iron-complex” clusters, are differentiated from normal clusters by exhibiting large (≳0.10 dex) intrinsic metallicity dispersions, complex sub-giant branches, and correlated [Fe/H] and s-process enhancements. In order to further investigate this phenomenon, we have measured radial velocities and chemical abundances for red giant branch stars in the massive, but scarcely studied, globular cluster NGC 6273. The velocities and abundances were determined using high resolution (R ∼ 27,000) spectra obtained with the Michigan/Magellan Fiber System (M2FS) and MSpec spectrograph on the Magellan–Clay 6.5 m telescope at Las Campanas Observatory. We find that NGC 6273 has an average heliocentric radial velocity of +144.49 km s{sup −1} (σ = 9.64 km s{sup −1}) and an extended metallicity distribution ([Fe/H] = −1.80 to −1.30) composed of at least two distinct stellar populations. Although the two dominant populations have similar [Na/Fe], [Al/Fe], and [α/Fe] abundance patterns, the more metal-rich stars exhibit significant [La/Fe] enhancements. The [La/Eu] data indicate that the increase in [La/Fe] is due to almost pure s-process enrichment. A third more metal-rich population with low [X/Fe] ratios may also be present. Therefore, NGC 6273 joins clusters such as ω Centauri, M2, M22, and NGC 5286 as a new class of iron-complex clusters exhibiting complicated star formation histories.

  19. The XMM Cluster Survey: X-ray analysis methodology

    CERN Document Server

    Lloyd-Davies, E J; Hosmer, Mark; Mehrtens, Nicola; Davidson, Michael; Sabirli, Kivanc; Mann, Robert G; Hilton, Matt; Liddle, Andrew R; Viana, Pedro T P; Campbell, Heather C; Collins, Chris A; Dubois, E Naomi; Freeman, Peter; Hoyle, Ben; Kay, Scott T; Kuwertz, Emma; Miller, Christopher J; Nichol, Robert C; Sahlen, Martin; Stanford, S Adam; Stott, John P

    2010-01-01

    The XMM Cluster Survey (XCS) is a serendipitous search for galaxy clusters using all publicly available data in the XMM- Newton Science Archive. Its main aims are to measure cosmological parameters and trace the evolution of X-ray scaling relations. In this paper we describe the data processing methodology applied to the 5776 XMM observations used to construct the current XCS source catalogue. A total of 3669 > 4-{\\sigma} cluster candidates with >50 background-subtracted X-ray counts are extracted from a total non-overlapping area suitable for cluster searching of 410 deg^2 . Of these, 1022 candidates are detected with >300 X-ray counts, and we demonstrate that robust temperature measurements can be obtained down to this count limit. We describe in detail the automated pipelines used to perform the spectral and surface brightness fitting for these sources, as well as to estimate redshifts from the X-ray data alone. A total of 517 (126) X-ray temperatures to a typical accuracy of <40 (<10) per cent have ...

  20. Dynamical analysis of the cluster pair: A3407 + A3408

    CERN Document Server

    Nascimento, R S; Trevisan, M; Carrasco, E R; Plana, H; Dupke, R

    2016-01-01

    We carried out a dynamical study of the galaxy cluster pair A3407 \\& A3408 based on a spectroscopic survey obtained with the 4 meter Blanco telescope at the CTIO, plus 6dF data, and ROSAT All-Sky-Survey. The sample consists of 122 member galaxies brighter than $m_R=20$. Our main goal is to probe the galaxy dynamics in this field and verify if the sample constitutes a single galaxy system or corresponds to an ongoing merging process. Statistical tests were applied to clusters members showing that both the composite system A3407 + A3408 as well as each individual cluster have Gaussian velocity distribution. A velocity gradient of $\\sim 847\\pm 114$ $\\rm km\\;s^{-1}$ was identified around the principal axis of the projected distribution of galaxies, indicating that the global field may be rotating. Applying the KMM algorithm to the distribution of galaxies we found that the solution with two clusters is better than the single unit solution at the 99\\% c.l. This is consistent with the X-ray distribution around ...

  1. Cluster analysis of fasciolosis in dairy cow herds in Munster province of Ireland and detection of major climatic and environmental predictors of the exposure risk

    Directory of Open Access Journals (Sweden)

    Nikolaos Selemetas

    2015-03-01

    Full Text Available Fasciolosis caused by Fasciola hepatica is a widespread parasitic disease in cattle farms. The aim of this study was to detect clusters of fasciolosis in dairy cow herds in Munster Province, Ireland and to identify significant climatic and environmental predictors of the exposure risk. In total, 1,292 dairy herds across Munster was sampled in September 2012 providing a single bulk tank milk (BTM sample. The analysis of samples by an in-house antibody-detection enzyme-linked immunosorbent assay (ELISA, showed that 65% of the dairy herds (n = 842 had been exposed to F. hepatica. Using the Getis-Ord Gi* statistic, 16 high-risk and 24 low-risk (P <0.01 clusters of fasciolosis were identified. The spatial distribution of high-risk clusters was more dispersed and mainly located in the northern and western regions of Munster compared to the low-risk clusters that were mostly concentrated in the southern and eastern regions. The most significant classes of variables that could reflect the difference between high-risk and low-risk clusters were the total number of wet-days and rain-days, rainfall, the normalized difference vegetation index (NDVI, temperature and soil type. There was a bigger proportion of well-drained soils among the low-risk clusters, whereas poorly drained soils were more common among the high-risk clusters. These results stress the role of precipitation, grazing, temperature and drainage on the life cycle of F. hepatica in the temperate Irish climate. The findings of this study highlight the importance of cluster analysis for identifying significant differences in climatic and environmental variables between high-risk and low-risk clusters of fasciolosis in Irish dairy herds.

  2. Limit Cycle Analysis in a Class of Hybrid Systems

    Directory of Open Access Journals (Sweden)

    Antonio Favela-Contreras

    2016-01-01

    Full Text Available Hybrid systems are those that inherently combine discrete and continuous dynamics. This paper considers the hybrid system model to be an extension of the discrete automata associating a continuous evolution with each discrete state. This model is called the hybrid automaton. In this work, we achieve a mathematical formulation of the steady state and we show a way to obtain the initial conditions region to reach a specific limit cycle for a class of uncoupled and coupled continuous-linear hybrid systems. The continuous-linear term is used in the sense of the system theory and, in this sense, continuous-linear hybrid automata will be defined. Thus, some properties and theorems that govern the hybrid automata dynamic behavior to evaluate a limit cycle existence have been established; this content is explained under a theoretical framework.

  3. [Medicen Paris Région: A world-class ''competitiveness cluster'' in the Paris region incorporating a neuroscience ''subcluster''].

    Science.gov (United States)

    Canet, Emmanuel

    2007-04-01

    The French public-private partnerships known as "competitive clusters" [pôles de compétitivité (PdC)] are intended to be novel and ambitious engines of regional growth, employment and biomedical innovation. Partly funded by government and local councils, they aim to capitalize on regional expertise by bringing together basic scientists, clinicians, innovative entrepreneurs and local decision-makers around specific themes that have become too costly and complex for any of these actors to tackle alone. Clusters provide the critical mass required both to underpin innovation potential and to authenticate regional claims to international competitiveness. Medicen is a biomedicine and therapeutics cluster comprising 120 partners from four broad "colleges" in the greater Paris region: major industry, small and medium-sized businesses, teaching hospitals/State research bodies, and local councils. Chief among its cooperative R&D projects is the neuroscience subcluster, in which "TransAl" the neurodegenerative disease project, counts Sanofi-Aventis, Servier and the French Atomic Energy Commission [Commissariat à l'Energie Atomique (CEA)] as key partners. One main aim is to develop an experimental model in rhesus monkeys in which a putative cause of Alzheimer's disease, intracerebral accumulation of b-amyloid peptide, is generated by impairing the peptide's clearance. The other aim, in which the nuclear medicine expertise of the CEA will be crucial, is to identify, characterize and validate markers for magnetic resonance and positron emission tomography imaging, and to source biomarkers from cerebrospinal fluid proteomics. A human biological resource centre (DNA and tissue banks) project dedicated to neurological and psychiatric disease should be up and running in 2007. Only through fundamental restructuring of resources on such a large cooperative scale are solutions likely to be found to the major problems of modern medicine, bringing healthcare and regional

  4. Does published orthodontic research account for clustering effects during statistical data analysis?

    Science.gov (United States)

    Koletsi, Despina; Pandis, Nikolaos; Polychronopoulou, Argy; Eliades, Theodore

    2012-06-01

    In orthodontics, multiple site observations within patients or multiple observations collected at consecutive time points are often encountered. Clustered designs require larger sample sizes compared to individual randomized trials and special statistical analyses that account for the fact that observations within clusters are correlated. It is the purpose of this study to assess to what degree clustering effects are considered during design and data analysis in the three major orthodontic journals. The contents of the most recent 24 issues of the American Journal of Orthodontics and Dentofacial Orthopedics (AJODO), Angle Orthodontist (AO), and European Journal of Orthodontics (EJO) from December 2010 backwards were hand searched. Articles with clustering effects and whether the authors accounted for clustering effects were identified. Additionally, information was collected on: involvement of a statistician, single or multicenter study, number of authors in the publication, geographical area, and statistical significance. From the 1584 articles, after exclusions, 1062 were assessed for clustering effects from which 250 (23.5 per cent) were considered to have clustering effects in the design (kappa = 0.92, 95 per cent CI: 0.67-0.99 for inter rater agreement). From the studies with clustering effects only, 63 (25.20 per cent) had indicated accounting for clustering effects. There was evidence that the studies published in the AO have higher odds of accounting for clustering effects [AO versus AJODO: odds ratio (OR) = 2.17, 95 per cent confidence interval (CI): 1.06-4.43, P = 0.03; EJO versus AJODO: OR = 1.90, 95 per cent CI: 0.84-4.24, non-significant; and EJO versus AO: OR = 1.15, 95 per cent CI: 0.57-2.33, non-significant). The results of this study indicate that only about a quarter of the studies with clustering effects account for this in statistical data analysis.

  5. Structure analysis of a class of fuzzy controllers using pseudo trapezoid shaped membership functions

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    An output expression of a class of dual-input single-output fuzzy controllers using pseudo trapezoid shaped membership function is given. By structure analysis it is proved that this class of fuzzy controllers is the sum of a global two-dimensional multi-level relay and a local linear or nonlinear proportional-integral or proportional-differential controller. And the output of this class of fuzzy controllers is a continuous, non-decreasing function of its input variables. These and other meaningful results derived from structure analysis based on the output expressions can guide the design of fuzzy controllers.

  6. Structure analysis of a class of fuzzy controllers using pseudo trapezoid shaped membership functions

    Institute of Scientific and Technical Information of China (English)

    曾珂; 张乃尧; 徐文立

    2000-01-01

    An output expression of a class of dual-input single-output fuzzy controllers using pseudo trapezoid shaped membership function is given. By structure analysis it is prdved that this class of fuzzy controllers is the sum of a global two-dimensional multi-level relay and a local linear or nonlinear proportional-integral or proportional-differential controller. And the output of this class of fuzzy controllers is a continuous, non-decreasing function of its input variables. These and other meaningful results derived from structure analysis based on the output expressions can guide the design of fuzzy controllers.

  7. A functional foot type classification with cluster analysis based on plantar pressure distribution during jogging.

    Science.gov (United States)

    De Cock, A; Willems, T; Witvrouw, E; Vanrenterghem, J; De Clercq, D

    2006-04-01

    The purpose of this study was to establish a reference dataset for peak pressures and pressure-time integrals during jogging, to compare this reference dataset with existing walking data and to develop a foot type classification, all based on plantar pressure data obtained from 215 healthy young adults. The subjects ran at 3.3 m s(-1) over a 16.5 m long running track, with a built-in pressure platform mounted on top of a force platform. Peak pressures, regional impulses and relative regional impulses were measured. These variables were found to be reliable (all intra class correlation coefficients above 0.75) and, except for the heel areas, gender and asymmetry effects could be neglected. Highest peak pressures were found under the heel due to large impact forces during initial contact phase (ICP). In the forefoot, the highest peak pressure was found under the second metatarsal (64.2 +/- 21.1 N cm(-2)). Compared to walking data, overall higher peak pressures and impulses and difference in hallux loading were found during barefoot jogging. Four pressure loading patterns were identified using a K-means cluster analysis, based on the relative regional impulses underneath the forefoot: medial M1 pattern, medial M2 pattern, central pattern and central-lateral pattern. These four pressure loading patterns could help in the functional interpretation of the foot behaviour during the stance phase in slow running.

  8. Using cluster analysis to examine the combinations of motivation regulations of physical education students.

    Science.gov (United States)

    Ullrich-French, Sarah; Cox, Anne

    2009-06-01

    According to self-determination theory, motivation is multidimensional, with motivation regulations lying along a continuum of self-determination (Ryan & Deci, 2007). Accounting for the different types of motivation in physical activity research presents a challenge. This study used cluster analysis to identify motivation regulation profiles and examined their utility by testing profile differences in relative levels of self-determination (i.e., self-determination index), and theoretical antecedents (i.e., competence, autonomy, relatedness) and consequences (i.e., enjoyment, worry, effort, value, physical activity) of physical education motivation. Students (N= 386) in 6th- through 8th-grade physical education classes completed questionnaires of the variables listed above. Five profiles emerged, including average (n = 81), motivated (n = 82), self-determined (n = 91), low motivation (n = 73), and external (n = 59). Group difference analyses showed that students with greater levels of self-determined forms of motivation, regardless of non-self-determined motivation levels, reported the most adaptive physical education experiences.

  9. High-Speed Video Analysis in a Conceptual Physics Class

    Science.gov (United States)

    Desbien, Dwain M.

    2011-01-01

    The use of probe ware and computers has become quite common in introductory physics classrooms. Video analysis is also becoming more popular and is available to a wide range of students through commercially available and/or free software. Video analysis allows for the study of motions that cannot be easily measured in the traditional lab setting…

  10. Ranking and clustering countries and their products; a network analysis

    CERN Document Server

    Caldarelli, Guido; Gabrielli, Andrea; Pietronero, Luciano; Scala, Antonio; Tacchella, Andrea

    2011-01-01

    In this paper we analyze the network of countries and products from UN data on country production. We define the country-country and product-product networks and we introduce a novel method of community detection based on elements similarity. As a result we find that country clustering reveals unexpected socio-geographic links among the most competing countries. On the same footings the products clustering can be efficiently used for a bottom-up classification of produced goods. Furthermore we define a procedure to rank different countries and their products over the global market. These analyses are a good proxy of country GDP and therefore could be possibly used to determine the robustness of a country economy.

  11. Functional Analysis of the Fusarielin Biosynthetic Gene Cluster

    Directory of Open Access Journals (Sweden)

    Aida Droce

    2016-12-01

    Full Text Available Fusarielins are polyketides with a decalin core produced by various species of Aspergillus and Fusarium. Although the responsible gene cluster has been identified, the biosynthetic pathway remains to be elucidated. In the present study, members of the gene cluster were deleted individually in a Fusarium graminearum strain overexpressing the local transcription factor. The results suggest that a trans-acting enoyl reductase (FSL5 assists the polyketide synthase FSL1 in biosynthesis of a polyketide product, which is released by hydrolysis by a trans-acting thioesterase (FSL2. Deletion of the epimerase (FSL3 resulted in accumulation of an unstable compound, which could be the released product. A novel compound, named prefusarielin, accumulated in the deletion mutant of the cytochrome P450 monooxygenase FSL4. Unlike the known fusarielins from Fusarium, this compound does not contain oxygenized decalin rings, suggesting that FSL4 is responsible for the oxygenation.

  12. ENVIRONMENTAL OBJECTIVE ANALYSIS, RANKING AND CLUSTERING OF HUNGARIAN CITIES

    Directory of Open Access Journals (Sweden)

    LÁSZLÓ MAKRA

    2008-12-01

    Full Text Available The aim of the study was to rank and classify Hungarian cities and counties according to their environmental quality and level of environmental awareness. Ranking of the Hungarian cities and counties are represented on their „Green Cities Index” and „Green Counties Index” values. According to the methodology shown in Part 1, cities and counties were grouped on different classification techniques and efficacy of the classification was analysed. However, they did not give acceptable results either for the cities, or for the counties. According to the parameters of the here mentioned three algorithms, reasonable structures were not found in any clustering. Clusters received applying algorithm fanny, though having weak structure, indicate large and definite regions in Hungary, which can be circumscribed by clear geographical objects.

  13. Class Size. NAESP School Leadership Digest Series, Number Three. ERIC/CEM Research Analysis Series, Number Five.

    Science.gov (United States)

    Schofield, Dee

    This analysis outlines the generally inconclusive nature of class size research. In the area of achievement as it relates to class size, the research is demonstrated to be especially inconclusive. In addition to analyzing the problems and weaknesses of class size research, this paper summarizes the effects of class size on the educational process…

  14. Multi-class ERP-based BCI data analysis using a discriminant space self-organizing map.

    Science.gov (United States)

    Onishi, Akinari; Natsume, Kiyohisa

    2014-01-01

    Emotional or non-emotional image stimulus is recently applied to event-related potential (ERP) based brain computer interfaces (BCI). Though the classification performance is over 80% in a single trial, a discrimination between those ERPs has not been considered. In this research we tried to clarify the discriminability of four-class ERP-based BCI target data elicited by desk, seal, spider images and letter intensifications. A conventional self organizing map (SOM) and newly proposed discriminant space SOM (ds-SOM) were applied, then the discriminabilites were visualized. We also classify all pairs of those ERPs by stepwise linear discriminant analysis (SWLDA) and verify the visualization of discriminabilities. As a result, the ds-SOM showed understandable visualization of the data with a shorter computational time than the traditional SOM. We also confirmed the clear boundary between the letter cluster and the other clusters. The result was coherent with the classification performances by SWLDA. The method might be helpful not only for developing a new BCI paradigm, but also for the big data analysis.

  15. The DANCE Project: Dynamical Analysis of Nearby Clusters

    Science.gov (United States)

    Bouy, H.; Bertin, E.; Cuillandre, J. C.; Moraux, E.; Bouvier, J.; Arevalo Sánchez, M.; Barrado Y Navascués, D.

    We present the results of the DANCE project, a ground-based survey meant to prepare and complement Gaia i) down to the planetary mass regime; ii) in regions of high extinction. The DANCE project takes advantage of archival wide-field surveys to derive precise astrometry, and in particular proper motions, for millions of stars in young nearby associations. We present the first preliminary results obtained for the Pleiades cluster, as well as our immediate objectives for other associations.

  16. Design and performance of an analysis-by-synthesis class of predictive speech coders

    Science.gov (United States)

    Rose, Richard C.; Barnwell, Thomas P., III

    1990-01-01

    The performance of a broad class of analysis-by-synthesis linear predictive speech coders is quantified experimentally. The class of coders includes a number of well-known techniques as well as a very large number of speech coders which have not been named or studied. A general formulation for deriving the parametric representation used in all of the coders in the class is presented. A new coder, named the self-excited vocoder, is discussed because of its good performance with low complexity, and because of the insight this coder gives to analysis-by-synthesis coders in general. The results of a study comparing the performances of different members of this class are presented. The study takes the form of a series of formal subjective and objective speech quality tests performed on selected coders. The results of this study lead to some interesting and important observations concerning the controlling parameters for analysis-by-synthesis speech coders.

  17. Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

    Energy Technology Data Exchange (ETDEWEB)

    Data Analysis and Visualization (IDAV) and the Department of Computer Science, University of California, Davis, One Shields Avenue, Davis CA 95616, USA,; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,' ' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA; Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA; Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA,; Computer Science Division,University of California, Berkeley, CA, USA,; Computer Science Department, University of California, Irvine, CA, USA,; All authors are with the Berkeley Drosophila Transcription Network Project, Lawrence Berkeley National Laboratory,; Rubel, Oliver; Weber, Gunther H.; Huang, Min-Yu; Bethel, E. Wes; Biggin, Mark D.; Fowlkes, Charless C.; Hendriks, Cris L. Luengo; Keranen, Soile V. E.; Eisen, Michael B.; Knowles, David W.; Malik, Jitendra; Hagen, Hans; Hamann, Bernd

    2008-05-12

    The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii) evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.

  18. SSR Cluster and Fertility Loci Analysis of GC13

    Institute of Scientific and Technical Information of China (English)

    NONG Bao-xuan; XIA Xiu-zhong; LIANG Yao-mao; LU Gang; ZHANG Zong-qiong; LI Dan-ting

    2011-01-01

    [Objective] The research aimed to clarify the genetic mechanism of special wide compatibility of GC13.[Method] The clustering analyses of GC13,five indica,five japonica and five wide compatibility varieties were carried out by using 70 SSR primers.[Result] GC13 was clustered into japonica group and had far genetic relationship with indica and wide compatibility variety.Two fertility loci were detected in GC13,in which one closely linked to RM225 on chromosome 6.According to the position on the chromosome,it speculated that this locus was allelic to S5.GC13 carried the allelic gene S5-n at this locus.The other locus closely linked to RM408 on chromosome 8 and was provisionally designated as Sg(t).At this locus,GC13 carried Sg(t)-i allelic gene,which was consistent with IR36.The effect of S5 locus was stronger than that of Sg(t).[Conclusion] The research laid the good foundation for using the wide compatibility line GC13 to breed the hybrid between subspecies.%[Objective] The research aimed to clarify the genetic mechanism of special wide compatibility of GC13.[Method] The clustering analyses of GC13,five indica,five japonica and five wide compatibility varieties were carried out by using 70 SSR primers.[Result

  19. Analysis of X-ray Structures of Matrix Metalloproteinases via Chaotic Map Clustering

    Directory of Open Access Journals (Sweden)

    Gargano Gianfranco

    2010-10-01

    Full Text Available Abstract Background Matrix metalloproteinases (MMPs are well-known biological targets implicated in tumour progression, homeostatic regulation, innate immunity, impaired delivery of pro-apoptotic ligands, and the release and cleavage of cell-surface receptors. With this in mind, the perception of the intimate relationships among diverse MMPs could be a solid basis for accelerated learning in designing new selective MMP inhibitors. In this regard, decrypting the latent molecular reasons in order to elucidate similarity among MMPs is a key challenge. Results We describe a pairwise variant of the non-parametric chaotic map clustering (CMC algorithm and its application to 104 X-ray MMP structures. In this analysis electrostatic potentials are computed and used as input for the CMC algorithm. It was shown that differences between proteins reflect genuine variation of their electrostatic potentials. In addition, the analysis has been also extended to analyze the protein primary structures and the molecular shapes of the MMP co-crystallised ligands. Conclusions The CMC algorithm was shown to be a valuable tool in knowledge acquisition and transfer from MMP structures. Based on the variation of electrostatic potentials, CMC was successful in analysing the MMP target family landscape and different subsites. The first investigation resulted in rational figure interpretation of both domain organization as well as of substrate specificity classifications. The second made it possible to distinguish the MMP classes, demonstrating the high specificity of the S1' pocket, to detect both the occurrence of punctual mutations of ionisable residues and different side-chain conformations that likely account for induced-fit phenomena. In addition, CMC demonstrated a potential comparable to the most popular UPGMA (Unweighted Pair Group Method with Arithmetic mean method that, at present, represents a standard clustering bioinformatics approach. Interestingly, CMC and

  20. k-Means clustering as tool for multivariate geophysical data analysis. An application to shallow fault zone imaging

    Science.gov (United States)

    Di Giuseppe, Maria Giulia; Troiano, Antonio; Troise, Claudia; De Natale, Giuseppe

    2014-02-01

    We present the results of an integrated imaging approach for two-dimensional high-resolution magnetotelluric and seismic profiles. These were carried out in the seismically active intermontane basin of Pantano di San Gregorio Magno (southern Italy), along a line across the surface rupture of the 1980, M 6.9, earthquake. We focus on the application of the post-inversion k-means clustering technique to the univariate resistivity and P-wave velocity models, which were obtained previously through independent inversions. Five cluster classes are recognized, allowing a joint two-dimensional section to be imaged in terms of homogeneous zones from a geo-structural point of view. Two distinct local relationships between electrical resistivity and seismic velocities are inferred. In this way, the hanging and footwall zones have been retrieved, and are characterized according to the different fracturing degrees. The case dealt with here can be viewed as a successful example of how cluster analysis can be a promising auxiliary tool that provides bridging towards the integration of distinct geophysical methods.

  1. How Teachers Use and Manage Their Blogs? A Cluster Analysis of Teachers' Blogs in Taiwan

    Science.gov (United States)

    Liu, Eric Zhi-Feng; Hou, Huei-Tse

    2013-01-01

    The development of Web 2.0 has ushered in a new set of web-based tools, including blogs. This study focused on how teachers use and manage their blogs. A sample of 165 teachers' blogs in Taiwan was analyzed by factor analysis, cluster analysis and qualitative content analysis. First, the teachers' blogs were analyzed according to six criteria…

  2. Can galaxy clusters, type Ia supernovae and cosmic microwave background ruled out a class of modified gravity theories?

    CERN Document Server

    Holanda, R F L

    2016-01-01

    In this paper we study cosmological signatures of modified gravity theories that can be written as a coupling between a extra scalar field and the electromagnetic part of the usual Lagrangian for the matter fields. In these frameworks all the electromagnetic sector of the theory is affected and variations of fundamental constants, of the cosmic distance duality relation and of the evolution law of the cosmic microwave background radiation (CMB) are expected and are related each other. In order to search these variations we perform jointly analyses with angular diameter distances of galaxy clusters, luminosity distances of type Ia supernovae and $T_{CMB}(z)$ measurements. We obtain tight constraints with no indication of violation of the standard framework.

  3. Analysis of basic clustering algorithms for numerical estimation of statistical averages in biomolecules.

    Science.gov (United States)

    Anandakrishnan, Ramu; Onufriev, Alexey

    2008-03-01

    In statistical mechanics, the equilibrium properties of a physical system of particles can be calculated as the statistical average over accessible microstates of the system. In general, these calculations are computationally intractable since they involve summations over an exponentially large number of microstates. Clustering algorithms are one of the methods used to numerically approximate these sums. The most basic clustering algorithms first sub-divide the system into a set of smaller subsets (clusters). Then, interactions between particles within each cluster are treated exactly, while all interactions between different clusters are ignored. These smaller clusters have far fewer microstates, making the summation over these microstates, tractable. These algorithms have been previously used for biomolecular computations, but remain relatively unexplored in this context. Presented here, is a theoretical analysis of the error and computational complexity for the two most basic clustering algorithms that were previously applied in the context of biomolecular electrostatics. We derive a tight, computationally inexpensive, error bound for the equilibrium state of a particle computed via these clustering algorithms. For some practical applications, it is the root mean square error, which can be significantly lower than the error bound, that may be more important. We how that there is a strong empirical relationship between error bound and root mean square error, suggesting that the error bound could be used as a computationally inexpensive metric for predicting the accuracy of clustering algorithms for practical applications. An example of error analysis for such an application-computation of average charge of ionizable amino-acids in proteins-is given, demonstrating that the clustering algorithm can be accurate enough for practical purposes.

  4. Symptom Clusters in People Living with HIV Attending Five Palliative Care Facilities in Two Sub-Saharan African Countries: A Hierarchical Cluster Analysis.

    Directory of Open Access Journals (Sweden)

    Katrien Moens

    Full Text Available Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries.Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward's method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores.Among the sample (N=217 the mean age was 36.5 (SD 9.0, 73.2% were female, and 49.1% were on antiretroviral therapy (ART. The cluster analysis produced five symptom clusters identified as: 1 dermatological; 2 generalised anxiety and elimination; 3 social and image; 4 persistently present; and 5 a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, p<0.001; being on ART (highest proportions for clusters two and three, p=0.012; global distress (F=26.8, p<0.001, physical distress (F=36.3, p<0.001 and psychological distress subscale (F=21.8, p<0.001 (all subscales worst for cluster one, best for cluster four.The greatest burden is associated with cluster one, and should be prioritised in clinical management. Further symptom cluster research in people living with HIV with longitudinally collected symptom data to

  5. Point Cluster Analysis Using a 3D Voronoi Diagram with Applications in Point Cloud Segmentation

    Directory of Open Access Journals (Sweden)

    Shen Ying

    2015-08-01

    Full Text Available Three-dimensional (3D point analysis and visualization is one of the most effective methods of point cluster detection and segmentation in geospatial datasets. However, serious scattering and clotting characteristics interfere with the visual detection of 3D point clusters. To overcome this problem, this study proposes the use of 3D Voronoi diagrams to analyze and visualize 3D points instead of the original data item. The proposed algorithm computes the cluster of 3D points by applying a set of 3D Voronoi cells to describe and quantify 3D points. The decompositions of point cloud of 3D models are guided by the 3D Voronoi cell parameters. The parameter values are mapped from the Voronoi cells to 3D points to show the spatial pattern and relationships; thus, a 3D point cluster pattern can be highlighted and easily recognized. To capture different cluster patterns, continuous progressive clusters and segmentations are tested. The 3D spatial relationship is shown to facilitate cluster detection. Furthermore, the generated segmentations of real 3D data cases are exploited to demonstrate the feasibility of our approach in detecting different spatial clusters for continuous point cloud segmentation.

  6. Qualitative analysis of certain generalized classes of quadratic oscillator systems

    Energy Technology Data Exchange (ETDEWEB)

    Bagchi, Bijan, E-mail: bbagchi123@gmail.com; Ghosh, Samiran, E-mail: sran-g@yahoo.com; Pal, Barnali, E-mail: barrna.roo@gmail.com; Poria, Swarup, E-mail: swarupporia@gmail.com [Department of Applied Mathematics, University of Calcutta, 92 Acharya Prafulla Chandra Road, Kolkata 700009 (India)

    2016-02-15

    We carry out a systematic qualitative analysis of the two quadratic schemes of generalized oscillators recently proposed by Quesne [J. Math. Phys. 56, 012903 (2015)]. By performing a local analysis of the governing potentials, we demonstrate that while the first potential admits a pair of equilibrium points one of which is typically a center for both signs of the coupling strength λ, the other points to a centre for λ < 0 but a saddle λ > 0. On the other hand, the second potential reveals only a center for both the signs of λ from a linear stability analysis. We carry out our study by extending Quesne’s scheme to include the effects of a linear dissipative term. An important outcome is that we run into a remarkable transition to chaos in the presence of a periodic force term fcosωt.

  7. Neighbor Class Linear Discriminate Analysis%近邻类鉴别分析方法

    Institute of Scientific and Technical Information of China (English)

    王言伟; 丁晓青; 刘长松

    2012-01-01

    提出一种近邻类鉴别分析方法,线性鉴别分析是该方法的一个特例.线性鉴别分析通过最大化类间散度同时最小化类内散度寻找最佳投影,其中类间散度是所有类之间散度的总体平均;而近邻类鉴别分析中类间散度定义为各个类与其k个近邻类之间的平均散度.该方法通过选取适当的近邻类数,能够缓解线性鉴别降维后造成的部分类的重叠.实验结果表明近邻类鉴别分析方法性能稳定且优于传统的线性鉴别分析.%A method of neighbor class linear discriminant analysis (NCLDA) is proposed. Linear discriminant analysis ( LDA) is a special case of this method. LDA finds the optimal projections by maximum between-class scatter while by minimum within-class scatter. The between-class scatter is an average over divergences among all classes. In NCLDA, between-class scatter is defined as average divergences between one class and its k nearest neighbor classes. By selecting proper numbers of neighbor class, NCLDA alleviates overlaps among classes caused by LDA. The experimental results show that the proposed NCLDA is robust and outperforms LDA.

  8. Clinical heterogeneity in patients with early-stage Parkinson's disease: a cluster analysis

    Institute of Scientific and Technical Information of China (English)

    Ping LIU; Tao FENG; Yong-jun WANG; Xuan ZHANG; Biao CHEN

    2011-01-01

    The aim of this study was to investigate the clinical heterogeneity of Parkinson's disease (PD) among a cohort of Chinese patients in early stages.Clinical data on demographics,motor variables,motor phenotypes,disease progression,global cognitive function,depression,apathy,sleep quality,constipation,fatigue,and L-dopa complications were collected from 138 Chinese PD subjects in early stages (Hoehn and Yahr stages 1-3).The PD subject subtypes were classified using k-means cluster analysis according to the clinical data from five- to three-cluster consecutively.Kappa statistical analysis was performed to evaluate the consistency among different subtype solutions.The cluster analysis indicated four main subtypes:the non-tremor dominant subtype (NTD,n=28,20.3%),rapid disease progression subtype (RDP,n=7,5.1%),young-onset subtype (YO,n=50,36.2%),and tremor dominant subtype (TD,n=53,38.4%).Overall,78.3% (108/138) of subjects were always classified between the same three groups (52 always in TD,7 in RDP,and 49 in NTD),and 98.6% (136/138) between five- and four-cluster solutions.However,subjects classified as NTD in the four-cluster analysis were dispersed into different subtypes in the three-cluster analysis,with low concordance between four- and three-cluster solutions (kappa value=-0.139,P=0.001 ).This study defines clinical heterogeneity of PD patients in early stages using a data-driven approach.The subtypes generated by the four-cluster solution appear to exhibit ideal internal cohesion and external isolation.

  9. Heavy minerals clustering analysis in application of provenance analysis of Kong 2 Member in Kongnan area

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    The main task of provenance analysis is to determine the source of sediments and the position of parent rocks. Provenance analysis may find out the relationship between erosion districts and sediment zone, between the uplift and the depression in the process of basin development. The authors use the method of heavy mineral clustering analysis and estimate the provenance direction of Huanghua Depression in the Paleogene Kong 2 Member. Research shows that there were five provenance areas of Kong 2 Member in Kongnan area.They are western (Shenusi), northwestern (Cangzhou), eastern (Ganhuatun), northeastern and southeastern. The main provenance areas were northwestern and western, while the southern provenance could not be ruled out. And these areas are consistent with the known provenance areas.

  10. Cluster analysis of particulate matter (PM10) and black carbon (BC) concentrations

    Science.gov (United States)

    Žibert, Janez; Pražnikar, Jure

    2012-09-01

    The monitoring of air-pollution constituents like particulate matter (PM10) and black carbon (BC) can provide information about air quality and the dynamics of emissions. Air quality depends on natural and anthropogenic sources of emissions as well as the weather conditions. For a one-year period the diurnal concentrations of PM10 and BC in the Port of Koper were analysed by clustering days into similar groups according to the similarity of the BC and PM10 hourly derived day-profiles without any prior assumptions about working and non-working days, weather conditions or hot and cold seasons. The analysis was performed by using k-means clustering with the squared Euclidean distance as the similarity measure. The analysis showed that 10 clusters in the BC case produced 3 clusters with just one member day and 7 clusters that encompasses more than one day with similar BC profiles. Similar results were found in the PM10 case, where one cluster has a single-member day, while 7 clusters contain several member days. The clustering analysis revealed that the clusters with less pronounced bimodal patterns and low hourly and average daily concentrations for both types of measurements include the most days in the one-year analysis. A typical day profile of the BC measurements includes a bimodal pattern with morning and evening peaks, while the PM10 measurements reveal a less pronounced bimodality. There are also clusters with single-peak day-profiles. The BC data in such cases exhibit morning peaks, while the PM10 data consist of noon or afternoon single peaks. Single pronounced peaks can be explained by appropriate cluster wind speed profiles. The analysis also revealed some special day-profiles. The BC cluster with a high midnight peak at 30/04/2010 and the PM10 cluster with the highest observed concentration of PM10 at 01/05/2010 (208.0 μg m-3) coincide with 1 May, which is a national holiday in Slovenia and has very strong tradition of bonfire parties. The clustering of

  11. Application of Cluster Analysis in Assessment of Dietary Habits of Secondary School Students

    Directory of Open Access Journals (Sweden)

    Zalewska Magdalena

    2014-12-01

    Full Text Available Maintenance of proper health and prevention of diseases of civilization are now significant public health problems. Nutrition is an important factor in the development of youth, as well as the current and future state of health. The aim of the study was to show the benefits of the application of cluster analysis to assess the dietary habits of high school students. The survey was carried out on 1,631 eighteen-year-old students in seven randomly selected secondary schools in Bialystok using a self-prepared anonymous questionnaire. An evaluation of the time of day meals were eaten and the number of meals consumed was made for the surveyed students. The cluster analysis allowed distinguishing characteristic structures of dietary habits in the observed population. Four clusters were identified, which were characterized by relative internal homogeneity and substantial variation in terms of the number of meals during the day and the time of their consumption. The most important characteristics of cluster 1 were cumulated food ration in 2 or 3 meals and long intervals between meals. Cluster 2 was characterized by eating the recommended number of 4 or 5 meals a day. In the 3rd cluster, students ate 3 meals a day with large intervals between them, and in the 4th they had four meals a day while maintaining proper intervals between them. In all clusters dietary mistakes occurred, but most of them were related to clusters 1 and 3. Cluster analysis allowed for the identification of major flaws in nutrition, which may include irregular eating and skipping meals, and indicated possible connections between eating patterns and disturbances of body weight in the examined population.

  12. Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient

    Directory of Open Access Journals (Sweden)

    Loraine Ann

    2008-06-01

    Full Text Available Abstract Background Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. Results In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC, that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. Conclusion

  13. Bounded Delay Timing Analysis of a Class of CSP Programs

    DEFF Research Database (Denmark)

    Hulgaard, Henrik; Burns, Steven M.

    1997-01-01

    . Such a description is transformed into a safe Petri net with interval time delays specified on the places of the net. The timing analysis we perform determines the extreme separation in time between two communication actions of the CSP program for all possible timed executions of the system. We formally define...

  14. MMPI-2: Cluster Analysis of Personality Profiles in Perinatal Depression—Preliminary Evidence

    Directory of Open Access Journals (Sweden)

    Valentina Meuti

    2014-01-01

    Full Text Available Background. To assess personality characteristics of women who develop perinatal depression. Methods. The study started with a screening of a sample of 453 women in their third trimester of pregnancy, to which was administered a survey data form, the Edinburgh Postnatal Depression Scale (EPDS and the Minnesota Multiphasic Personality Inventory 2 (MMPI-2. A clinical group of subjects with perinatal depression (PND, 55 subjects was selected; clinical and validity scales of MMPI-2 were used as predictors in hierarchical cluster analysis carried out. Results. The analysis identified three clusters of personality profile: two “clinical” clusters (1 and 3 and an “apparently common” one (cluster 2. The first cluster (39.5% collects structures of personality with prevalent obsessive or dependent functioning tending to develop a “psychasthenic” depression; the third cluster (13.95% includes women with prevalent borderline functioning tending to develop “dysphoric” depression; the second cluster (46.5% shows a normal profile with a “defensive” attitude, probably due to the presence of defense mechanisms or to the fear of stigma. Conclusion. Characteristics of personality have a key role in clinical manifestations of perinatal depression; it is important to detect them to identify mothers at risk and to plan targeted therapeutic interventions.

  15. Comprehensive analysis of cooperative gene mutations between class I and class II in de novo acute myeloid leukemia.

    Science.gov (United States)

    Ishikawa, Yuichi; Kiyoi, Hitoshi; Tsujimura, Akane; Miyawaki, Shuichi; Miyazaki, Yasushi; Kuriyama, Kazutaka; Tomonaga, Masao; Naoe, Tomoki

    2009-08-01

    Acute myeloid leukemia (AML) has been thought to be the consequence of two broad complementation classes of mutations: class I and class II. However, overlap-mutations between them or within the same class and the position of TP53 mutation are not fully analyzed. We comprehensively analyzed the FLT3, cKIT, N-RAS, C/EBPA, AML1, MLL, NPM1, and TP53 mutations in 144 newly diagnosed de novo AML. We found 103 of 165 identified mutations were overlapped with other mutations, and most overlap-mutations consisted of class I and class II mutations. Although overlap-mutations within the same class were found in seven patients, five of them additionally had the other class mutation. These results suggest that most overlap-mutations within the same class might be the consequence of acquiring an additional mutation after the completion both of class I and class II mutations. However, mutated genes overlapped with the same class were limited in N-RAS, TP53, MLL-PTD, and NPM1, suggesting the possibility that these irregular overlap-mutations might cooperatively participate in the development of AML. Notably, TP53 mutation was overlapped with both class I and class II mutations, and associated with morphologic multilineage dysplasia and complex karyotype. The genotype consisting of complex karyotype and TP53 mutation was an unfavorable prognostic factor in entire AML patients, indicating this genotype generates a disease entity in de novo AML. These results collectively suggest that TP53 mutation might be a functionally distinguishable class of mutation.

  16. Distinguishing PTSD, Complex PTSD, and Borderline Personality Disorder: A latent class analysis

    Directory of Open Access Journals (Sweden)

    Marylène Cloitre

    2014-09-01

    Full Text Available Background: There has been debate regarding whether Complex Posttraumatic Stress Disorder (Complex PTSD is distinct from Borderline Personality Disorder (BPD when the latter is comorbid with PTSD. Objective: To determine whether the patterns of symptoms endorsed by women seeking treatment for childhood abuse form classes that are consistent with diagnostic criteria for PTSD, Complex PTSD, and BPD. Method: A latent class analysis (LCA was conducted on an archival dataset of 280 women with histories of childhood abuse assessed for enrollment in a clinical trial for PTSD. Results: The LCA revealed four distinct classes of individuals: a Low Symptom class characterized by low endorsements on all symptoms; a PTSD class characterized by elevated symptoms of PTSD but low endorsement of symptoms that define the Complex PTSD and BPD diagnoses; a Complex PTSD class characterized by elevated symptoms of PTSD and self-organization symptoms that defined the Complex PTSD diagnosis but low on the symptoms of BPD; and a BPD class characterized by symptoms of BPD. Four BPD symptoms were found to greatly increase the odds of being in the BPD compared to the Complex PTSD class: frantic efforts to avoid abandonment, unstable sense of self, unstable and intense interpersonal relationships, and impulsiveness. Conclusions: Findings supported the construct validity of Complex PTSD as distinguishable from BPD. Key symptoms that distinguished between the disorders were identified, which may aid in differential diagnosis and treatment planning.

  17. Older adults who are at risk of driving under the influence: A latent class analysis.

    Science.gov (United States)

    Choi, Namkee G; DiNitto, Diana M; Marti, C Nathan

    2015-09-01

    Despite increasing rates of substance use among older adults, their risk of driving under the influence of alcohol and/or drugs (DUI) has received scant research attention. This study identified DUI risk profiles among individuals aged 50+ years based on their substance use patterns, previous DUI incidents, and previous arrests. This study's analytic sample of 11,188 individuals came from the public use data sets of the 2008 to 2012 National Survey on Drug Use and Health. Latent class analysis identified a 4-class model as the most parsimonious. Class 1 (63% of the analytic sample; lowest risk group) exhibited the lowest probabilities of substance use and trouble with law while Class 4 (9% of the sample; highest risk group) included binge/heavy drinkers who are also likely to use illicit drugs and had the highest probabilities of self-reported DUI and previous arrests. Class 2 (18.5%) and Class 3 (9.5%) exhibited low-to-medium DUI risks. Class 4 had the highest proportions of Blacks and divorced or never married persons and had lowest education and income, poorest self-rated health, and highest rates of mental health problems of all classes. Screening for substance abuse and comorbid mental health conditions should be included in protocols for assessing older adults' driving safety. More effort is also needed to improve access to substance abuse treatment and address mental health problems among older adults at high risk for DUI.

  18. Student academic performance analysis using fuzzy C-means clustering

    Science.gov (United States)

    Rosadi, R.; Akamal; Sudrajat, R.; Kharismawan, B.; Hambali, Y. A.

    2017-01-01

    Grade Point Average (GPA) is commonly used as an indicator of academic performance. Academic performance evaluations is a basic way to evaluate the progression of student performance, when evaluating student’s academic performance, there are occasion where the student data is grouped especially when the amounts of data is large. Thus, the pattern of data relationship within and among groups can be revealed. Grouping data can be done by using clustering method, where one of the methods is the Fuzzy C-Means algorithm. Furthermore, this algorithm is then applied to a set of student data form the Faculty of Mathematics and Natural Sciences, Padjadjaran University.

  19. Cluster analysis based on dimensional information with applications to feature selection and classification

    Science.gov (United States)

    Eigen, D. J.; Fromm, F. R.; Northouse, R. A.

    1974-01-01

    A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.

  20. A critical cluster analysis of 44 indicators of author-level performance

    DEFF Research Database (Denmark)

    Wildgaard, Lorna Elizabeth

    2015-01-01

    This paper explores the relationship between author-level bibliometric indicators and the researchers the "measure", exemplified across five academic seniorities and four disciplines. Using cluster methodology, the disciplinary and seniority appropriateness of author-level indicators is examined....... Publication and citation data for 741 researchers across Astronomy, Environmental Science, Philosophy and Public Health was collected in Web of Science (WoS). Forty-four indicators of individual performance were computed using the data. A two-step cluster analysis using IBM SPSS version 22 was performed......, followed by a risk analysis and ordinal logistic regression to explore cluster membership. Indicator scores were contextualized using the individual researcher's curriculum vitae. Four different clusters based on indicator scores ranked researchers as low, middle, high and extremely high performers...

  1. Cluster analysis in kinetic modelling of the brain: A noninvasive alternative to arterial sampling

    DEFF Research Database (Denmark)

    Liptrot, Matthew George; Adams, K.H.; Martiny, L.

    2004-01-01

    In emission tomography, quantification of brain tracer uptake, metabolism or binding requires knowledge of the cerebral input function. Traditionally, this is achieved with arterial blood sampling. We propose a noninvasive alternative via the use of a blood vessel time-activity curve (TAC......) extracted directly from dynamic positron emission tomography (PET) scans by cluster analysis. Five healthy subjects were injected with the 5HT2A- receptor ligand [18F]-altanserin and blood samples were subsequently taken from the radial artery and cubital vein. Eight regions-of-interest (ROI) TACs were...... extracted from the PET data set. Hierarchical K-means cluster analysis was performed on the PET time series to extract a cerebral vasculature ROI. The number of clusters was varied from K = 1 to 10 for the second of the two-stage method. Determination of the correct number of clusters was performed...

  2. A Spectroscopic Analysis of the Galactic Globular Cluster NGC 6273 (M19)

    CERN Document Server

    Johnson, Christian I; Pilachowski, Catherine A; Caldwell, Nelson; Mateo, Mario; Bailey, John I; Crane, Jeffrey D

    2015-01-01

    A combined effort utilizing spectroscopy and photometry has revealed the existence of a new globular cluster class. These "anomalous" clusters, which we refer to as "iron-complex" clusters, are differentiated from normal clusters by exhibiting large (>0.10 dex) intrinsic metallicity dispersions, complex sub-giant branches, and correlated [Fe/H] and s-process enhancements. In order to further investigate this phenomenon, we have measured radial velocities and chemical abundances for red giant branch stars in the massive, but scarcely studied, globular cluster NGC 6273. The velocities and abundances were determined using high resolution (R~27,000) spectra obtained with the Michigan/Magellan Fiber System (M2FS) and MSpec spectrograph on the Magellan-Clay 6.5m telescope at Las Campanas Observatory. We find that NGC 6273 has an average heliocentric radial velocity of +144.49 km s^-1 (sigma=9.64 km s^-1) and an extended metallicity distribution ([Fe/H]=-1.80 to -1.30) composed of at least two distinct stellar popul...

  3. Insights into Quasar UV Spectra Using Unsupervised Clustering Analysis

    CERN Document Server

    Tammour, Aycha; Daley, Mark; Richards, Gordon T

    2016-01-01

    Machine learning can provide powerful tools to detect patterns in multi-dimensional parameter space. We use K-means -a simple yet powerful unsupervised clustering algorithm which picks out structure in unlabeled data- to study a sample of quasar UV spectra from the Quasar Catalog of the 10th Data Release of the Sloan Digital Sky Survey of Paris et al. (2014). Detecting patterns in large datasets helps us gain insights into the physical conditions and processes giving rise to the observed properties of quasars. We use K-means to find clusters in the parameter space of the equivalent width (EW), the blue- and red-half-width at half-maximum (HWHM) of the Mg II 2800 A line, the C IV 1549 A line, and the C III] 1908 A blend in samples of Broad Absorption-Line (BAL) and non-BAL quasars at redshift 1.6-2.1. Using this method, we successfully recover correlations well-known in the UV regime such as the anti-correlation between the EW and blueshift of the C IV emission line and the shape of the ionizing Spectra Energy...

  4. Information search behaviour among new car buyers: A two-step cluster analysis

    Directory of Open Access Journals (Sweden)

    S.M. Satish

    2010-03-01

    Full Text Available A two-step cluster analysis of new car buyers in India was performed to identify taxonomies of search behaviour using personality and situational variables, apart from sources of information. Four distinct groups were found—broad moderate searchers, intense heavy searchers, low broad searchers, and low searchers. Dealers can identify the members of each segment by measuring the variables used for clustering, and can then design appropriate communication strategies.

  5. Applying clustering to statistical analysis of student reasoning about two-dimensional kinematics

    Directory of Open Access Journals (Sweden)

    John R. Thompson

    2007-12-01

    Full Text Available We use clustering, an analysis method not presently common to the physics education research community, to group and characterize student responses to written questions about two-dimensional kinematics. Previously, clustering has been used to analyze multiple-choice data; we analyze free-response data that includes both sketches of vectors and written elements. The primary goal of this paper is to describe the methodology itself; we include a brief overview of relevant results.

  6. A Latent Class Analysis of Risk Factors for Acquiring HIV Among Men Who Have Sex with Men: Implications for Implementing Pre-Exposure Prophylaxis Programs.

    Science.gov (United States)

    Chan, Philip A; Rose, Jennifer; Maher, Justine; Benben, Stacey; Pfeiffer, Kristen; Almonte, Alexi; Poceta, Joanna; Oldenburg, Catherine E; Parker, Sharon; Marshall, Brandon Dl; Lally, Mickey; Mayer, Kenneth; Mena, Leandro; Patel, Rupa; Nunn, Amy S

    2015-11-01

    Current Centers for Disease Control and Prevention (CDC) guidelines for prescribing pre-exposure prophylaxis (PrEP) to prevent HIV transmission are broad. In order to better characterize groups who may benefit most from PrEP, we reviewed demographics, behaviors, and clinical outcomes for individuals presenting to a publicly-funded sexually transmitted diseases (STD) clinic in Providence, Rhode Island, from 2012 to 2014. Latent class analysis (LCA) was used to identify subgroups of men who have sex with men (MSM) at highest risk for contracting HIV. A total of 1723 individuals presented for testing (75% male; 31% MSM). MSM were more likely to test HIV positive than heterosexual men or women. Among 538 MSM, we identified four latent classes. Class 1 had the highest rates of incarceration (33%), forced sex (24%), but had no HIV infections. Class 2 had 10 anal sex partners in the previous 12 months (69%), anonymous partners (100%), drug/alcohol use during sex (76%), and prior STDs (40%). Class 4 had similar characteristics and HIV prevalence as Class 2. In this population, MSM who may benefit most from PrEP include those who have >10 sexual partners per year, anonymous partners, drug/alcohol use during sex and prior STDs. LCA is a useful tool for identifying clusters of characteristics that may place individuals at higher risk for HIV infection and who may benefit most from PrEP in clinical practice.

  7. Stellar variability in open clusters . II. Discovery of a new period-luminosity relation in a class of fast-rotating pulsating stars in NGC 3766

    Science.gov (United States)

    Mowlavi, N.; Saesen, S.; Semaan, T.; Eggenberger, P.; Barblan, F.; Eyer, L.; Ekström, S.; Georgy, C.

    2016-10-01

    Context. Pulsating stars are windows to the physics of stars enabling us to see glimpses of their interior. Not all stars pulsate, however. On the main sequence, pulsating stars form an almost continuous sequence in brightness, except for a magnitude range between δ Scuti and slowly pulsating B stars. Against all expectations, 36 periodic variables were discovered in 2013 in this luminosity range in the open cluster NGC 3766, the origins of which was a mystery. Aims: We investigate the properties of those new variability class candidates in relation to their stellar rotation rates and stellar multiplicity. Methods: We took multi-epoch spectra over three consecutive nights using ESO's Very Large Telescope. Results: We find that the majority of the new variability class candidates are fast-rotating pulsators that obey a new period-luminosity relation. We argue that the new relation discovered here has a different physical origin to the period-luminosity relations observed for Cepheids. Conclusions: We anticipate that our discovery will boost the relatively new field of stellar pulsation in fast-rotating stars, will open new doors for asteroseismology, and will potentially offer a new tool to estimate stellar ages or cosmic distances. Based on observations made with the FLAMES instruments on the VLT/UT2 telescope at the Paranal Observatory, Chile, under the program ID 69.A-0123(A).

  8. Advantages and Limitations of Cluster Analysis in Interpreting Regional GPS Velocity Fields in California and Elsewhere

    Science.gov (United States)

    Thatcher, W. R.; Savage, J. C.; Simpson, R.

    2012-12-01

    Regional Global Positioning System (GPS) velocity observations are providing increasingly precise mappings of actively deforming continental lithosphere. Cluster analysis, a venerable data analysis method, offers a simple, visual exploratory tool for the initial organization and investigation of GPS velocities (Simpson et al., 2012 GRL). Here we describe the application of cluster analysis to GPS velocities from three regions, the Mojave Desert and the San Francisco Bay regions in California, and the Aegean in the eastern Mediterranean. Our goal is to illustrate the strengths and shortcomings of the method in searching for spatially coherent patterns of deformation, including evidence for and against block-like behavior in these 3 regions. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, is subjective and usually guided by the distribution of known faults. Cluster analysis applied to GPS velocities provides a completely objective method for identifying groups of observations ranging in size from 10s to 100s of km in characteristic dimension based solely on the similarities of their velocity vectors. In the three regions we have studied, statistically significant clusters are almost invariably spatially coherent, fault bounded, and coincide with elastic, geologically identified structural blocks. Often, higher order clusters that are not statistically significant are also spatially coherent, suggesting the existence of additional blocks, or defining regions of other tectonic importance (e.g. zones of localized elastic strain accumulation near locked faults). These results can be used to both formulate tentative tectonic models with testable consequences and to suggest focused new measurements in under-sampled regions. Cluster analysis applied to GPS velocities has several potential limitations, aside from the

  9. Analysis and comparison of very large metagenomes with fast clustering and functional annotation

    Directory of Open Access Journals (Sweden)

    Li Weizhong

    2009-10-01

    Full Text Available Abstract Background The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. Results The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes". Conclusion RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from http://tools.camera.calit2.net/camera/rammcap/.

  10. The application of cluster analysis in the intercomparison of loop structures in RNA.

    Science.gov (United States)

    Huang, Hung-Chung; Nagaswamy, Uma; Fox, George E

    2005-04-01

    We have developed a computational approach for the comparison and classification of RNA loop structures. Hairpin or interior loops identified in atomic resolution RNA structures were intercompared by conformational matching. The root-mean-square deviation (RMSD) values between all pairs of RNA fragments of interest, even if from different molecules, are calculated. Subsequently, cluster analysis is performed on the resulting matrix of RMSD distances using the unweighted pair group method with arithmetic mean (UPGMA). The cluster analysis objectively reveals groups of folds that resemble one another. To demonstrate the utility of the approach, a comprehensive analysis of all the terminal hairpin tetraloops that have been observed in 15 RNA structures that have been determined by X-ray crystallography was undertaken. The method found major clusters corresponding to the well-known GNRA and UNCG types. In addition, two tetraloops with the unusual primary sequence UMAC (M is A or C) were successfully assigned to the GNRA cluster. Larger loop structures were also examined and the clustering results confirmed the occurrence of variations of the GNRA and UNCG tetraloops in these loops and provided a systematic means for locating them. Nineteen examples of larger loops that closely resemble either the GNRA or UNCG tetraloop were found in the large ribosomal RNAs. When the clustering approach was extended to include all structures in the SCOR database, novel relationships were detected including one between the ANYA motif and a less common folding of the GAAA tetraloop sequence.

  11. Evaluation of socio-economic patterns of SHG members in Kerala using clustering analysis

    Directory of Open Access Journals (Sweden)

    Sajeev B. U

    2012-03-01

    Full Text Available Abstracts In the matter of social development, though Kerala stands ahead of all other states in India, the pattern of distribution of social and economic opportunities within the state is highly inequitable among different social groups. Self help groups (SHG are vehicles for social, political and financial intermediation of the state. Clustering analysis is one of the main analytical methods in data mining; the method of clustering algorithm will influence the clustering results directly. K-means and Fuzzy C-Means Algorithms are popular methods in cluster analysis. In this paper we have evaluated the socioeconomic developments of SHG in various districts in Kerala state using cluster analysis. The data were collected by field survey and interviews. The parameters considered for the study include the regularity of the members in attending meetings and training, social and economic benefits gained by the members in personal level, cluster level and society level, rate of employment and earning members in the family and literacy and educational level of SHG members.

  12. Doing Class Analysis in Singapore's Elite Education: Unravelling the Smokescreen of "Meritocratic Talk"

    Science.gov (United States)

    Koh, Aaron

    2014-01-01

    This paper examines the specificity of the education-class nexus in an elite independent school in Singapore. It seeks to unravel the puzzle that meritocracy is dogmatically believed in Singapore in spite of evidences that point to the contrary. The paper draws on discursive (analysis of media materials) and institutional (analysis of interview…

  13. Improvement in Student Data Analysis Skills after Out-of-Class Assignments

    Directory of Open Access Journals (Sweden)

    Kristen Lee Williams Walton

    2016-12-01

    Full Text Available The ability to understand and interpret data is a critical aspect of scientific thinking.  However, although data analysis is often a focus in biology majors classes, many textbooks for allied health majors classes are primarily content-driven and do not include substantial amounts of experimental data in the form of graphs and figures.  In a lower-division allied health majors microbiology class, students were exposed to data from primary journal articles as take-home assignments and their data analysis skills were assessed in a pre-/posttest format.  Students were given 3 assignments that included data analysis questions.  Assignments ranged from case studies that included a figure from a journal article to reading a short journal article and answering questions about multiple figures or tables.  Data were represented as line or bar graphs, gel photographs, and flow charts.  The pre- and posttest was designed incorporating the same types of figures to assess whether the assignments resulted in any improvement in data analysis skills.  The mean class score showed a small but significant improvement from the pretest to the posttest across three semesters of testing.  Scores on individual questions testing accurate conclusions and predictions improved the most.  This supports the conclusion that a relatively small number of out-of-class assignments through the semester resulted in a significant improvement in data analysis abilities in this population of students.

  14. Validity Index and number of clusters

    Directory of Open Access Journals (Sweden)

    Mohamed Fadhel Saad

    2012-01-01

    Full Text Available Clustering (or cluster analysis has been used widely in pattern recognition, image processing, and data analysis. It aims to organize a collection of data items into c clusters, such that items within a cluster are more similar to each other than they are items in the other clusters. The number of clusters c is the most important parameter, in the sense that the remaining parameters have less influence on the resulting partition. To determine the best number of classes several methods were made, and are called validity index. This paper presents a new validity index for fuzzy clustering called a Modified Partition Coefficient And Exponential Separation (MPCAES index. The efficiency of the proposed MPCAES index is compared with several popular validity indexes. More information about these indexes is acquired in series of numerical comparisons and also real data Iris.

  15. A cluster randomized-controlled trial of a classroom-based drama workshop program to improve mental health outcomes among immigrant and refugee youth in special classes.

    Directory of Open Access Journals (Sweden)

    Cécile Rousseau

    Full Text Available The aim of this cluster randomized trial was to evaluate the effectiveness of a school-based theatre intervention program for immigrant and refugee youth in special classes for improving mental health and academic outcomes. The primary hypothesis was that students in the theatre intervention group would report a greater reduction in impairment from symptoms compared to students in the control and tutoring groups.Special classrooms in five multiethnic high schools were randomly assigned to theater intervention (n = 10, tutoring (n = 10 or control status (n = 9, for a total of 477 participants. Students and teachers were non-blinded to group assignment. The primary outcome was impairment from emotional and behavioural symptoms assessed by the Impact Supplement of the Strengths and Difficulties Questionnaire (SDQ completed by the adolescents. The secondary outcomes were the SDQ global scores (teacher and youth reports, impairment assessed by teachers and school performance. The effect of the interventions was assessed through linear mixed effect models which incorporate the correlation between students in the same class, due to the nature of the randomization of the interventions by classroom.The theatre intervention was not associated with a greater reduction in self-reported impairment and symptoms in youth placed in special class because of learning, emotional and behavioural difficulties than a tutoring intervention or a non-active control group. The estimates of the different models show a non-significant decrease in both self-reported and impairment scores in the theatre intervention group for the overall group, but the impairment score decreased significantly for first generation adolescents while it increased for second generation adolescents.The difference between the population of immigrant and refugee youth newcomers studied previously and the sample of this trial may explain some of the differences in the observed impact of

  16. Work Disability among Employees with Diabetes: Latent Class Analysis of Risk Factors in Three Prospective Cohort Studies.

    Directory of Open Access Journals (Sweden)

    Marianna Virtanen

    Full Text Available Studies of work disability in diabetes have examined diabetes as a homogeneous disease. We sought to identify subgroups among persons with diabetes based on potential risk factors for work disability.Participants were 2,445 employees with diabetes from three prospective cohorts (the Finnish Public Sector study, the GAZEL study, and the Whitehall II study. Work disability was ascertained via linkage to registers of sickness absence and disability pensions during a follow-up of 4 years. Study-specific latent class analysis was used to identify subgroups according to prevalent comorbid disease and health-risk behaviours. Study-specific associations with work disability at follow-up were pooled using fixed-effects meta-analysis.Separate latent class analyses for men and women in each cohort supported a two-class solution with one subgroup (total n = 1,086; 44.4% having high prevalence of chronic somatic diseases, psychological symptoms, obesity, physical inactivity and abstinence from alcohol and the other subgroup (total n = 1,359; 55.6% low prevalence of these factors. In the adjusted meta-analyses, participants in the 'high-risk' group had more work disability days (pooled rate ratio = 1.66, 95% CI 1.38-1.99 and more work disability episodes (pooled rate ratio = 1.33, 95% CI 1.21-1.46. These associations were similar in men and women, younger and older participants, and across occupational groups.Diabetes is not a homogeneous disease in terms of work disability risk. Approximately half of people with diabetes are assigned to a subgroup characterised by clustering of comorbid health conditions, obesity, physical inactivity, abstinence of alcohol, and associated high risk of work disability; the other half to a subgroup characterised by a more favourable risk profile.

  17. Clustering the lexicon in the brain: a meta-analysis of the neurofunctional evidence on noun and verb processing

    Science.gov (United States)

    Crepaldi, Davide; Berlingeri, Manuela; Cattinelli, Isabella; Borghese, Nunzio A.; Luzzatti, Claudio; Paulesu, Eraldo

    2013-01-01

    Although it is widely accepted that nouns and verbs are functionally independent linguistic entities, it is less clear whether their processing recruits different brain areas. This issue is particularly relevant for those theories of lexical semantics (and, more in general, of cognition) that suggest the embodiment of abstract concepts, i.e., based strongly on perceptual and motoric representations. This paper presents a formal meta-analysis of the neuroimaging evidence on noun and verb processing in order to address this dichotomy more effectively at the anatomical level. We used a hierarchical clustering algorithm that grouped fMRI/PET activation peaks solely on the basis of spatial proximity. Cluster specificity for grammatical class was then tested on the basis of the noun-verb distribution of the activation peaks included in each cluster. Thirty-two clusters were identified: three were associated with nouns across different tasks (in the right inferior temporal gyrus, the left angular gyrus, and the left inferior parietal gyrus); one with verbs across different tasks (in the posterior part of the right middle temporal gyrus); and three showed verb specificity in some tasks and noun specificity in others (in the left and right inferior frontal gyrus and the left insula). These results do not support the popular tenets that verb processing is predominantly based in the left frontal cortex and noun processing relies specifically on temporal regions; nor do they support the idea that verb lexical-semantic representations are heavily based on embodied motoric information. Our findings suggest instead that the cerebral circuits deputed to noun and verb processing lie in close spatial proximity in a wide network including frontal, parietal, and temporal regions. The data also indicate a predominant—but not exclusive—left lateralization of the network. PMID:23825451

  18. Crowd Analysis by Using Optical Flow and Density Based Clustering

    DEFF Research Database (Denmark)

    Santoro, Francesco; Pedro, Sergio; Tan, Zheng-Hua;

    2010-01-01

    , it is applied a crowd tracker in every frame, allowing us to detect and track the crowds. Our system gives the output as a graphic overlay, i.e it adds arrows and colors to the original frame sequence, in order to identify crowds and their movements. For the evaluation, we check when our system detect certains......In this paper, we present a system to detect and track crowds in a video sequence captured by a camera. In a first step, we compute optical flows by means of pyramidal Lucas-Kanade feature tracking. Afterwards, a density based clustering is used to group similar vectors. In the last step...... events on the crowds, such as merging, splitting and collision....

  19. Displacement of Building Cluster Using Field Analysis Method

    Institute of Scientific and Technical Information of China (English)

    Al Tinghua

    2003-01-01

    This paper presents a field based method to deal with the displacement of building cluster,which is driven by the street widening. The compress of street boundary results in the force to push the building moving inside and the force propagation is a decay process. To describe the phenomenon above, the field theory is introduced with the representation model of isoline. On the basis of the skeleton of Delaunay triangulation,the displacement field is built in which the propagation force is related to the adjacency degree with respect to the street boundary. The study offers the computation of displacement direction and offset distance for the building displacement. The vector operation is performed on the basis of grade and other field concepts.

  20. Mapping informative clusters in a hierarchical [corrected] framework of FMRI multivariate analysis.

    Directory of Open Access Journals (Sweden)

    Rui Xu

    Full Text Available Pattern recognition methods have become increasingly popular in fMRI data analysis, which are powerful in discriminating between multi-voxel patterns of brain activities associated with different mental states. However, when they are used in functional brain mapping, the location of discriminative voxels varies significantly, raising difficulties in interpreting the locus of the effect. Here we proposed a hierarchical framework of multivariate approach that maps informative clusters rather than voxels to achieve reliable functional brain mapping without compromising the discriminative power. In particular, we first searched for local homogeneous clusters that consisted of voxels with similar response profiles. Then, a multi-voxel classifier was built for each cluster to extract discriminative information from the multi-voxel patterns. Finally, through multivariate ranking, outputs from the classifiers were served as a multi-cluster pattern to identify informative clusters by examining interactions among clusters. Results from both simulated and real fMRI data demonstrated that this hierarchical approach showed better performance in the robustness of functional brain mapping than traditional voxel-based multivariate methods. In addition, the mapped clusters were highly overlapped for two perceptually equivalent object categories, further confirming the validity of our approach. In short, the hierarchical framework of multivariate approach is suitable for both pattern classification and brain mapping in fMRI studies.

  1. Cluster analysis for the probability of DSB site induced by electron tracks

    Science.gov (United States)

    Yoshii, Y.; Sasaki, K.; Matsuya, Y.; Date, H.

    2015-05-01

    To clarify the influence of bio-cells exposed to ionizing radiations, the densely populated pattern of the ionization in the cell nucleus is of importance because it governs the extent of DNA damage which may lead to cell lethality. In this study, we have conducted a cluster analysis of ionization and excitation events to estimate the number of double-strand breaks (DSBs) induced by electron tracks. A Monte Carlo simulation for electrons in liquid water was performed to determine the spatial location of the ionization and excitation events. The events were divided into clusters by using the density-based spatial clustering of applications with noise (DBSCAN) algorithm. The algorithm enables us to sort out the events into the groups (clusters) in which a minimum number of neighboring events are contained within a given radius. For evaluating the number of DSBs in the extracted clusters, we have introduced an aggregation index (AI). The computational results show that a sub-keV electron produces DSBs in a dense formation more effectively than higher energy electrons. The root-mean square radius (RMSR) of the cluster size is below 5 nm, which is smaller than the chromatin fiber thickness. It was found that this size of clustering events has a high possibility to cause lesions in DNA within the chromatin fiber site.

  2. The distinction of 'psychosomatogenic family types' based on parents' self reported questionnaire information: a cluster analysis.

    Science.gov (United States)

    Rousseau, Sofie; Grietens, Hans; Vanderfaeillie, Johan; Ceulemans, Eva; Hoppenbrouwers, Karel; Desoete, Annemie; Van Leeuwen, Karla

    2014-06-01

    The theory of 'psychosomatogenic family types' is often used in treatment of somatizing adolescents. This study investigated the validity of distinguishing 'psychosomatogenic family types' based on parents' self-reported family features. The study included a Flemish general population sample of 12-year olds (n = 1428). We performed cluster analysis on 3 variables concerning parents' self-reported problems in family functioning. The distinguished clusters were examined for differences in marital problems, parental emotional problems, professional help for family members, demographics, and adolescents' somatization. Results showed the existence of 5 family types: 'chaotic family functioning,' 'average amount of family functioning problems,' 'few family functioning problems,' 'high amount of support and communication problems,' and 'high amount of sense of security problems' clusters. Membership of the 'chaotic family functioning' and 'average amount of family functioning problems' cluster was significantly associated with higher levels of somatization, compared with 'few family functioning problems' cluster membership. Among additional variables, only marital and parental emotional problems distinguished somatization relevant from non relevant clusters: parents in 'average amount of family functioning problems' and 'chaotic family functioning' clusters reported higher problems. The data showed that 'apparently perfect' or 'enmeshed' patterns of family functioning may not be assessed by means of parent report as adopted in this study. In addition, not only adolescents from 'extreme' types of family functioning may suffer from somatization. Further, professionals should be careful assuming that families in which parents report average to high amounts of family functioning problems also show different demographic characteristics.

  3. Profiling nurses' job satisfaction, acculturation, work environment, stress, cultural values and coping abilities: A cluster analysis.

    Science.gov (United States)

    Goh, Yong-Shian; Lee, Alice; Chan, Sally Wai-Chi; Chan, Moon Fai

    2015-08-01

    This study aimed to determine whether definable profiles existed in a cohort of nursing staff with regard to demographic characteristics, job satisfaction, acculturation, work environment, stress, cultural values and coping abilities. A survey was conducted in one hospital in Singapore from June to July 2012, and 814 full-time staff nurses completed a self-report questionnaire (89% response rate). Demographic characteristics, job satisfaction, acculturation, work environment, perceived stress, cultural values, ways of coping and intention to leave current workplace were assessed as outcomes. The two-step cluster analysis revealed three clusters. Nurses in cluster 1 (n = 222) had lower acculturation scores than nurses in cluster 3. Cluster 2 (n = 362) was a group of younger nurses who reported higher intention to leave (22.4%), stress level and job dissatisfaction than the other two clusters. Nurses in cluster 3 (n = 230) were mostly Singaporean and reported the lowest intention to leave (13.0%). Resources should be allocated to specifically address the needs of younger nurses and hopefully retain them in the profession. Management should focus their retention strategies on junior nurses and provide a work environment that helps to strengthen their intention to remain in nursing by increasing their job satisfaction.

  4. Fatigue Feature Extraction Analysis based on a K-Means Clustering Approach

    Directory of Open Access Journals (Sweden)

    M.F.M. Yunoh

    2015-06-01

    Full Text Available This paper focuses on clustering analysis using a K-means approach for fatigue feature dataset extraction. The aim of this study is to group the dataset as closely as possible (homogeneity for the scattered dataset. Kurtosis, the wavelet-based energy coefficient and fatigue damage are calculated for all segments after the extraction process using wavelet transform. Kurtosis, the wavelet-based energy coefficient and fatigue damage are used as input data for the K-means clustering approach. K-means clustering calculates the average distance of each group from the centroid and gives the objective function values. Based on the results, maximum values of the objective function can be seen in the two centroid clusters, with a value of 11.58. The minimum objective function value is found at 8.06 for five centroid clusters. It can be seen that the objective function with the lowest value for the number of clusters is equal to five; which is therefore the best cluster for the dataset.

  5. The Structure and Dynamics of Co-Citation Clusters: A Multiple-Perspective Co-Citation Analysis

    CERN Document Server

    Chen, Chaomei; Hou, Jianhua

    2010-01-01

    A multiple-perspective co-citation analysis method is introduced for characterizing and interpreting the structure and dynamics of co-citation clusters. The method facilitates analytic and sense making tasks by integrating network visualization, spectral clustering, automatic cluster labeling, and text summarization. Co-citation networks are decomposed into co-citation clusters. The interpretation of these clusters is augmented by automatic cluster labeling and summarization. The method focuses on the interrelations between a co-citation cluster's members and their citers. The generic method is applied to a three-part analysis of the field of Information Science as defined by 12 journals published between 1996 and 2008: 1) a comparative author co-citation analysis (ACA), 2) a progressive ACA of a time series of co-citation networks, and 3) a progressive document co-citation analysis (DCA). Results show that the multiple-perspective method increases the interpretability and accountability of both ACA and DCA n...

  6. Standardized Effect Size Measures for Mediation Analysis in Cluster-Randomized Trials

    Science.gov (United States)

    Stapleton, Laura M.; Pituch, Keenan A.; Dion, Eric

    2015-01-01

    This article presents 3 standardized effect size measures to use when sharing results of an analysis of mediation of treatment effects for cluster-randomized trials. The authors discuss 3 examples of mediation analysis (upper-level mediation, cross-level mediation, and cross-level mediation with a contextual effect) with demonstration of the…

  7. Screening for personality disorders: A new questionnaire and its validation using Latent Class Analysis

    Directory of Open Access Journals (Sweden)

    Julia Lange

    2012-12-01

    Full Text Available Background: We evaluated a new screening instrument for personality disorders. The Personality Disorder Screening (PDS is a self-administered screening questionnaire that includes 12 items from the Personality Self Portrait (Oldham & Morris, 1990. Sampling and methods: The data of n = 966 participants recruited from the non-clinical population and from different clinical settings were analyzed using latent class analysis. Results: A 4-class model fitted the data best. It confirmed a classification model for personality disorders proposed by Gunderson (1984 and showed high reliability and validity. One class corresponded to “healthy” individuals (40.6 %, and one class to individuals with personality disorders (17.2 %. Two additional classes represented individuals with specific personality styles. Evidence for convergent validity was found in terms of strong associations of the classification with the Structured Clinical Interview (SCID-II for diagnosing personality disorders. The latent classes also showed theoretically expected associations with membership in different subsamples. Conclusions: The PDS shows promise as a new instrument for identifying different classes of personality disorder severity already at the screening stage of the diagnostic process.

  8. Identifying patterns in treatment response profiles in acute bipolar mania: a cluster analysis approach

    Directory of Open Access Journals (Sweden)

    Houston John P

    2008-07-01

    Full Text Available Abstract Background Patients with acute mania respond differentially to treatment and, in many cases, fail to obtain or sustain symptom remission. The objective of this exploratory analysis was to characterize response in bipolar disorder by identifying groups of patients with similar manic symptom response profiles. Methods Patients (n = 222 were selected from a randomized, double-blind study of treatment with olanzapine or divalproex in bipolar I disorder, manic or mixed episode, with or without psychotic features. Hierarchical clustering based on Ward's distance was used to identify groups of patients based on Young-Mania Rating Scale (YMRS total scores at each of 5 assessments over 7 weeks. Logistic regression was used to identify baseline predictors for clusters of interest. Results Four distinct clusters of patients were identified: Cluster 1 (n = 64: patients did not maintain a response (YMRS total scores ≤ 12; Cluster 2 (n = 92: patients responded rapidly (within less than a week and response was maintained; Cluster 3 (n = 36: patients responded rapidly but relapsed soon afterwards (YMRS ≥ 15; Cluster 4 (n = 30: patients responded slowly (≥ 2 weeks and response was maintained. Predictive models using baseline variables found YMRS Item 10 (Appearance, and psychosis to be significant predictors for Clusters 1 and 4 vs. Clusters 2 and 3, but none of the baseline characteristics allowed discriminating between Clusters 1 vs. 4. Experiencing a mixed episode at baseline predicted membership in Clusters 2 and 3 vs. Clusters 1 and 4. Treatment with divalproex, larger number of previous manic episodes, lack of disruptive-aggressive behavior, and more prominent depressive symptoms at baseline were predictors for Cluster 3 vs. 2. Conclusion Distinct treatment response profiles can be predicted by clinical features at baseline. The presence of these features as potential risk factors for relapse in patients who have responded to treatment

  9. Comparison of population-averaged and cluster-specific models for the analysis of cluster randomized trials with missing binary outcomes: a simulation study

    Directory of Open Access Journals (Sweden)

    Ma Jinhui

    2013-01-01

    Full Text Available Abstracts Background The objective of this simulation study is to compare the accuracy and efficiency of population-averaged (i.e. generalized estimating equations (GEE and cluster-specific (i.e. random-effects logistic regression (RELR models for analyzing data from cluster randomized trials (CRTs with missing binary responses. Methods In this simulation study, clustered responses were generated from a beta-binomial distribution. The number of clusters per trial arm, the number of subjects per cluster, intra-cluster correlation coefficient, and the percentage of missing data were allowed to vary. Under the assumption of covariate dependent missingness, missing outcomes were handled by complete case analysis, standard multiple imputation (MI and within-cluster MI strategies. Data were analyzed using GEE and RELR. Performance of the methods was assessed using standardized bias, empirical standard error, root mean squared error (RMSE, and coverage probability. Results GEE performs well on all four measures — provided the downward bias of the standard error (when the number of clusters per arm is small is adjusted appropriately — under the following scenarios: complete case analysis for CRTs with a small amount of missing data; standard MI for CRTs with variance inflation factor (VIF 50. RELR performs well only when a small amount of data was missing, and complete case analysis was applied. Conclusion GEE performs well as long as appropriate missing data strategies are adopted based on the design of CRTs and the percentage of missing data. In contrast, RELR does not perform well when either standard or within-cluster MI strategy is applied prior to the analysis.

  10. Identifying differences in the experience of (in)authenticity: a latent class analysis approach.

    Science.gov (United States)

    Lenton, Alison P; Slabu, Letitia; Bruder, Martin; Sedikides, Constantine

    2014-01-01

    Generally, psychologists consider state authenticity - that is, the subjective sense of being one's true self - to be a unitary and unidimensional construct, such that (a) the phenomenological experience of authenticity is thought to be similar no matter its trigger, and (b) inauthenticity is thought to be simply the opposing pole (on the same underlying construct) of authenticity. Using latent class analysis, we put this conceptualization to a test. In order to avoid over-reliance on a Western conceptualization of authenticity, we used a cross-cultural sample (N = 543), comprising participants from Western, South-Asian, East-Asian, and South-East Asian cultures. Participants provided either a narrative in which the described when they felt most like being themselves or one in which they described when they felt least like being themselves. The analysis identified six distinct classes of experiences: two authenticity classes ("everyday" and "extraordinary"), three inauthenticity classes ("self-conscious," "deflated," and "extraordinary"), and a class representing convergence between authenticity and inauthenticity. The classes were phenomenologically distinct, especially with respect to negative affect, private and public self-consciousness, and self-esteem. Furthermore, relatively more interdependent cultures were less likely to report experiences of extraordinary (in)authenticity than relatively more independent cultures. Understanding the many facets of (in)authenticity may enable researchers to connect different findings and explain why the attainment of authenticity can be difficult.

  11. Globular Cluster Abundances from High-Resolution, Integrated-Light Spectroscopy. II. Expanding the Metallicity Range for Old Clusters and Updated Analysis Techniques

    CERN Document Server

    Colucci, J E; McWilliam, A

    2016-01-01

    We present abundances of globular clusters in the Milky Way and Fornax from integrated light spectra. Our goal is to evaluate the consistency of the integrated light analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of 7 clusters from our previous publications and results for 5 new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from integrated light spectra agrees to $\\sim$0.1 dex for globular clusters with metallicities as high as [Fe/H]=$-0.3$, but the abundances measured for more metal rich clusters may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na I, Mg I, Al I, Si I, Ca I, Ti I, Ti II, Sc II, V I, Cr I, Mn I, Co I, Ni I, Cu I, Y II, Zr I, Ba II, La II, Nd II, and Eu II. The elements for which the integrated light analysis gives results that are most similar to analysis of individual stellar ...

  12. Fault detection of flywheel system based on clustering and principal component analysis

    Directory of Open Access Journals (Sweden)

    Wang Rixin

    2015-12-01

    Full Text Available Considering the nonlinear, multifunctional properties of double-flywheel with closed-loop control, a two-step method including clustering and principal component analysis is proposed to detect the two faults in the multifunctional flywheels. At the first step of the proposed algorithm, clustering is taken as feature recognition to check the instructions of “integrated power and attitude control” system, such as attitude control, energy storage or energy discharge. These commands will ask the flywheel system to work in different operation modes. Therefore, the relationship of parameters in different operations can define the cluster structure of training data. Ordering points to identify the clustering structure (OPTICS can automatically identify these clusters by the reachability-plot. K-means algorithm can divide the training data into the corresponding operations according to the reachability-plot. Finally, the last step of proposed model is used to define the relationship of parameters in each operation through the principal component analysis (PCA method. Compared with the PCA model, the proposed approach is capable of identifying the new clusters and learning the new behavior of incoming data. The simulation results show that it can effectively detect the faults in the multifunctional flywheels system.

  13. Fault detection of flywheel system based on clustering and principal component analysis

    Institute of Scientific and Technical Information of China (English)

    Wang Rixin; Gong Xuebing; Xu Minqiang; Li Yuqing

    2015-01-01

    Considering the nonlinear, multifunctional properties of double-flywheel with closed-loop control, a two-step method including clustering and principal component analysis is proposed to detect the two faults in the multifunctional flywheels. At the first step of the proposed algorithm, clustering is taken as feature recognition to check the instructions of‘‘integrated power and attitude control”system, such as attitude control, energy storage or energy discharge. These commands will ask the flywheel system to work in different operation modes. Therefore, the relationship of parameters in different operations can define the cluster structure of training data. Ordering points to identify the clustering structure (OPTICS) can automatically identify these clusters by the reachability-plot. K-means algorithm can divide the training data into the corresponding operations according to the reachability-plot. Finally, the last step of proposed model is used to define the rela-tionship of parameters in each operation through the principal component analysis (PCA) method. Compared with the PCA model, the proposed approach is capable of identifying the new clusters and learning the new behavior of incoming data. The simulation results show that it can effectively detect the faults in the multifunctional flywheels system.

  14. Feature-space clustering for fMRI meta-analysis

    DEFF Research Database (Denmark)

    Goutte, C.; Hansen, L.K.; Liptrot, Matthew George

    2001-01-01

    Clustering functional magnetic resonance imaging (fMRI) time series has emerged in recent years as a possible alternative to parametric modeling approaches. Most of the work so far has been concerned with clustering raw time series. In this contribution we investigate the applicability...... of a clustering method applied to features extracted from the data. This approach is extremely versatile and encompasses previously published results [Goutte et al., 1999] as special cases. A typical application is in data reduction: as the increase in temporal resolution of fMRI experiments routinely yields f......-voxel analyses. In particular this allows the checking of the differences and agreements between different methods of analysis. Both approaches are illustrated on a fMRI data set involving visual stimulation, and we show that the feature space clustering approach yields nontrivial results and, in particular...

  15. Design and Analysis of SD_DWCA - A Mobility Based Clustering of Homogeneous MANETs

    Directory of Open Access Journals (Sweden)

    T.N. Janakiraman

    2011-05-01

    Full Text Available This paper deals with the design and analysis of the distributed weighted clustering algorithm SD_DWCAproposed for homogeneous mobile ad hoc networks. It is a connectivity, mobility and energy based clustering algorithm which is suitable for scalable ad hoc networks. The algorithm uses a new graph parameter called strong degree defined based on the quality of neighbours of a node. The parameters are so chosen to ensure high connectivity, cluster stability and energy efficient communication among nodes of high dynamic nature. This paper also includes the experimental results of the algorithm implementedusing the network simulator NS2. The experimental results show that the algorithm is suitable for highspeed networks and generate stable clusters with less maintenance overhead.

  16. Identification and structural analysis of a novel snoRNA gene cluster from Arabidopsis thaliana

    Institute of Scientific and Technical Information of China (English)

    周惠; 孟清; 屈良鹄

    2000-01-01

    A 22 snoRNA gene cluster, consisting of four antisense snoRNA genes, was identified from Arabidopsis thaliana. The sequence and structural analysis showed that the 22 snoRNA gene cluster might be transcribed as a polycistronic precursor from an upstream promoter, and the in-tergenic spacers of the gene cluster encode the ’hairpin’ structures similar to the processing recognition signals of yeast Saccharomyces cerevisiae polycistronic snoRNA precursor. The results also revealed that plant snoRNA gene with multiple copies is a characteristic in common, and provides a good system for further revealing the transcription and expression mechanism of plant snoRNA gene cluster.

  17. Design and Analysis of SD_DWCA - A Mobility based clustering of Homogeneous MANETs

    CERN Document Server

    Janakiraman, T N

    2011-01-01

    This paper deals with the design and analysis of the distributed weighted clustering algorithm SD_DWCA proposed for homogeneous mobile ad hoc networks. It is a connectivity, mobility and energy based clustering algorithm which is suitable for scalable ad hoc networks. The algorithm uses a new graph parameter called strong degree defined based on the quality of neighbours of a node. The parameters are so chosen to ensure high connectivity, cluster stability and energy efficient communication among nodes of high dynamic nature. This paper also includes the experimental results of the algorithm implemented using the network simulator NS2. The experimental results show that the algorithm is suitable for high speed networks and generate stable clusters with less maintenance overhead.

  18. PERFORMANCE EVALUATION OF CLUSTERING IN WEB-LOG ANALYSIS BASED ON AGENT

    Directory of Open Access Journals (Sweden)

    Himani

    2012-06-01

    Full Text Available Web mining is the use of data mining Technique toautomatically discover & extract information from webdocuments. When user searches for goods the managementagent receives order from graphical user interface.Management agent receives information, update agentinformation store house and feedback the mining result touser. Intelligent agent can help making computer systemeasier to use, enable finding & filtering information. Themining agent is the analytical center of whole agentsystem.It mainly adopts two kind of analytical method:related rule mining and cluster analysis. Cluster of objectsare formed so that objects with in a cluster have highsimilarity. The aim of this paper is to analyze the web logdata .To achieve this clustering tool is used. It performs intwo phases. First it captures the web-log data. Then itanalyzes the data& discovers the hidden pattern. Agentrequires an agent communication language to describe &process agent request. The future internet will use PERL toencode information with meaningful structure & semantics.

  19. Mismatch negativity/P3a complex in young people with psychiatric disorders: a cluster analysis.

    Directory of Open Access Journals (Sweden)

    Manreena Kaur

    Full Text Available BACKGROUND: We have recently shown that the event-related potential biomarkers, mismatch negativity (MMN and P3a, are similarly impaired in young patients with schizophrenia- and affective-spectrum psychoses as well as those with bipolar disorder. A data driven approach may help to further elucidate novel patterns of MMN/P3a amplitudes that characterise distinct subgroups in patients with emerging psychiatric disorders. METHODS: Eighty seven outpatients (16 to 30 years were assessed: 19 diagnosed with a depressive disorder; 26 with a bipolar disorder; and 42 with a psychotic disorder. The MMN/P3a complex was elicited using a two-tone passive auditory oddball paradigm with duration deviant tones. Hierarchical cluster analysis utilising frontal, central and temporal neurophysiological variables was conducted. RESULTS: Three clusters were determined: the 'globally impaired' cluster (n = 53 displayed reduced frontal and temporal MMN as well as reduced central P3a amplitudes; the 'largest frontal MMN' cluster (n = 17 were distinguished by increased frontal MMN amplitudes and the 'largest temporal MMN' cluster (n = 17 was characterised by increases in temporal MMN only. Notably, 55% of those in the globally impaired cluster were diagnosed with schizophrenia-spectrum disorder, whereas the three patient subgroups were equally represented in the remaining two clusters. The three cluster-groups did not differ in their current symptomatology; however, the globally impaired cluster was the most neuropsychologically impaired, compared with controls. CONCLUSIONS: These findings suggest that in emerging psychiatric disorders there are distinct MMN/P3a profiles of patient subgroups independent of current symptomatology. Schizophrenia-spectrum patients tended to show the most global impairments in this neurophysiological complex. Two other subgroups of patients were found to have neurophysiological profiles suggestive of quite different neurobiological (and

  20. Preliminary Cluster Analysis For Several Representatives Of Genus Kerivoula (Chiroptera: Vespertilionidae) in Borneo

    Science.gov (United States)

    Hasan, Noor Haliza; Abdullah, M. T.

    2008-01-01

    The aim of the study is to use cluster analysis on morphometric parameters within the genus Kerivoula to produce a dendrogram and to determine the suitability of this method to describe the relationship among species within this genus. A total of 15 adult male individuals from genus Kerivoula taken from sampling trips around Borneo and specimens kept at the zoological museum of Universiti Malaysia Sarawak were examined. A total of 27 characters using dental, skull and external body measurements were recorded. Clustering analysis illustrated the grouping and morphometric relationships between the species of this genus. It has clearly separated each species from each other despite the overlapping of measurements of some species within the genus. Cluster analysis provides an alternative approach to make a preliminary identification of a species.

  1. Application of cluster analysis to preventive maintenance scheme design of pavement

    Institute of Scientific and Technical Information of China (English)

    ZENG Feng; ZHANG Xiao-ning

    2009-01-01

    To quantitatively identify the maintenance demand for each highway segments in the pavement main-tenance scheme design, a mathematical model of uniform segment division was established and an approach of applying cluster analysis theory to the uniform segment division and evaluation of pavement maintenance demand was proposed.The actual maintenance project of a highway carried out in Guangdong province was cited as an example to demonstrate the validity of the proposed method.It is proved that the cluster analysis can eliminate human factors in classification without being constrained by the quantities of samples, considering muhiple pavement distress indexes and the continuity of samples.Thus it is evident that cluster analysis is an efficient analytical tool in uniform segment division and evaluation of maintenance demand.

  2. Parallelization and scheduling of data intensive particle physics analysis jobs on clusters of PCs

    CERN Document Server

    Ponce, S

    2004-01-01

    Summary form only given. Scheduling policies are proposed for parallelizing data intensive particle physics analysis applications on computer clusters. Particle physics analysis jobs require the analysis of tens of thousands of particle collision events, each event requiring typically 200ms processing time and 600KB of data. Many jobs are launched concurrently by a large number of physicists. At a first view, particle physics jobs seem to be easy to parallelize, since particle collision events can be processed independently one from another. However, since large amounts of data need to be accessed, the real challenge resides in making an efficient use of the underlying computing resources. We propose several job parallelization and scheduling policies aiming at reducing job processing times and at increasing the sustainable load of a cluster server. Since particle collision events are usually reused by several jobs, cache based job splitting strategies considerably increase cluster utilization and reduce job ...

  3. Use of multiple cluster analysis methods to explore the validity of a community outcomes concept map.

    Science.gov (United States)

    Orsi, Rebecca

    2017-02-01

    Concept mapping is now a commonly-used technique for articulating and evaluating programmatic outcomes. However, research regarding validity of knowledge and outcomes produced with concept mapping is sparse. The current study describes quantitative validity analyses using a concept mapping dataset. We sought to increase the validity of concept mapping evaluation results by running multiple cluster analysis methods and then using several metrics to choose from among solutions. We present four different clustering methods based on analyses using the R statistical software package: partitioning around medoids (PAM), fuzzy analysis (FANNY), agglomerative nesting (AGNES) and divisive analysis (DIANA). We then used the Dunn and Davies-Bouldin indices to assist in choosing a valid cluster solution for a concept mapping outcomes evaluation. We conclude that the validity of the outcomes map is high, based on the analyses described. Finally, we discuss areas for further concept mapping methods research.

  4. Vertical Migrating and Cluster Analysis of Soil Mesofauna at Dongying Halophytes Garden in Yellow River Delta

    Institute of Scientific and Technical Information of China (English)

    He Fu-xia; Xie Tong-yin; Xie Gui-lin; Fu Rong-shu

    2014-01-01

    For the first time, we used Tullgren method made a study on vertical migrating and cluster analysis of the soil mesofauna in Dongying Halophytes Garden in the Yellow River Delta (YRD), Shandong Province. The results showed that the soil mesofauna tended to gather on soil surface in most samples at most times, but the vertical migrating greatly varied in different seasons or environment conditions. Acari was the dominant group. The index of diversity of the soil fauna was correlated with the index of evenness. The Acari's number of individuals infected other species and numbers. Dominant group-Acari made greater contribution to the result of cluster analysis, and there were significant differences between communities in different habitats by cluster analysis with both Bray-Curtis and Jaccard similarity coefficient.

  5. Fuzzy C-means clustering for chromatographic fingerprints analysis: A gas chromatography-mass spectrometry case study.

    Science.gov (United States)

    Parastar, Hadi; Bazrafshan, Alisina

    2016-03-18

    Fuzzy C-means clustering (FCM) is proposed as a promising method for the clustering of chromatographic fingerprints of complex samples, such as essential oils. As an example, secondary metabolites of 14 citrus leaves samples are extracted and analyzed by gas chromatography-mass spectrometry (GC-MS). The obtained chromatographic fingerprints are divided to desired number of chromatographic regions. Owing to the fact that chromatographic problems, such as elution time shift and peak overlap can significantly affect the clustering results, therefore, each chromatographic region is analyzed using multivariate curve resolution-alternating least squares (MCR-ALS) to address these problems. Then, the resolved elution profiles are used to make a new data matrix based on peak areas of pure components to cluster by FCM. The FCM clustering parameters (i.e., fuzziness coefficient and number of cluster) are optimized by two different methods of partial least squares (PLS) as a conventional method and minimization of FCM objective function as our new idea. The results showed that minimization of FCM objective function is an easier and better way to optimize FCM clustering parameters. Then, the optimized FCM clustering algorithm is used to cluster samples and variables to figure out the similarities and dissimilarities among samples and to find discriminant secondary metabolites in each cluster (chemotype). Finally, the FCM clustering results are compared with those of principal component analysis (PCA), hierarchical cluster analysis (HCA) and Kohonon maps. The results confirmed the outperformance of FCM over the frequently used clustering algorithms.

  6. Clustered frequency analysis of shear Alfven modes in stellarators

    Energy Technology Data Exchange (ETDEWEB)

    Spong, Donald A [ORNL; D' Azevedo, Ed F [ORNL; Todo, Yasushi [National Institute for Fusion Science, Toki, Japan

    2010-01-01

    The shear Alfven spectrum in three-dimensional configurations, such as stellarators and rippled tokamaks, is more densely populated due to the larger number of mode couplings caused by the variation in the magnetic field in the toroidal dimension. This implies more significant computational requirements that can rapidly become prohibitive as more resolution is requested. Alfven eigenfrequencies and mode structures are a primary point of contact between theory and experiment. A new algorithm based on the Jacobi-Davidson method is developed here and applied for a reduced magnetohydrodynamics model to several stellarator configurations. This technique focuses on finding a subset of eigenmodes clustered about a specified input frequency. This approach can be especially useful in modeling experimental observations, where the mode frequency can generally be measured with good accuracy and several different simultaneous frequency lines may be of interest. For cases considered in this paper, it can be a factor of 10{sup 2}-10{sup 3} times faster than more conventional methods.

  7. Analysis of cardiac tissue by gold cluster ion bombardment

    Science.gov (United States)

    Aranyosiova, M.; Chorvatova, A.; Chorvat, D.; Biro, Cs.; Velic, D.

    2006-07-01

    Specific molecules in cardiac tissue of spontaneously hypertensive rats are studied by using time-of-flight secondary ion mass spectrometry (TOF-SIMS). The investigation determines phospholipids, cholesterol, fatty acids and their fragments in the cardiac tissue, with special focus on cardiolipin. Cardiolipin is a unique phospholipid typical for cardiomyocyte mitochondrial membrane and its decrease is involved in pathologic conditions. In the positive polarity, the fragments of phosphatydilcholine are observed in the mass region of 700-850 u. Peaks over mass 1400 u correspond to intact and cationized molecules of cardiolipin. In animal tissue, cardiolipin contains of almost exclusively 18 carbon fatty acids, mostly linoleic acid. Linoleic acid at 279 u, other fatty acids, and phosphatidylglycerol fragments, as precursors of cardiolipin synthesis, are identified in the negative polarity. These data demonstrate that SIMS technique along with Au 3+ cluster primary ion beam is a good tool for detection of higher mass biomolecules providing approximately 10 times higher yield in comparison with Au +.

  8. Characteristics of cyclist crashes in Italy using latent class analysis and association rule mining

    Science.gov (United States)

    De Angelis, Marco; Marín Puchades, Víctor; Fraboni, Federico; Pietrantoni, Luca

    2017-01-01

    The factors associated with severity of the bicycle crashes may differ across different bicycle crash patterns. Therefore, it is important to identify distinct bicycle crash patterns with homogeneous attributes. The current study aimed at identifying subgroups of bicycle crashes in Italy and analyzing separately the different bicycle crash types. The present study focused on bicycle crashes that occurred in Italy during the period between 2011 and 2013. We analyzed categorical indicators corresponding to the characteristics of infrastructure (road type, road signage, and location type), road user (i.e., opponent vehicle and cyclist’s maneuver, type of collision, age and gender of the cyclist), vehicle (type of opponent vehicle), and the environmental and time period variables (time of the day, day of the week, season, pavement condition, and weather). To identify homogenous subgroups of bicycle crashes, we used latent class analysis. Using latent class analysis, the bicycle crash data set was segmented into 19 classes, which represents 19 different bicycle crash types. Logistic regression analysis was used to identify the association between class membership and severity of the bicycle crashes. Finally, association rules were conducted for each of the latent classes to uncover the factors associated with an increased likelihood of severity. Association rules highlighted different crash characteristics associated with an increased likelihood of severity for each of the 19 bicycle crash types. PMID:28158296

  9. O 18 Brumário e a análise de classe contemporânea The Eighteenth Brumaire and the contemporary class analysis

    Directory of Open Access Journals (Sweden)

    Renato Monseff Perissinotto

    2007-01-01

    Full Text Available Este artigo considera O 18 Brumário de Louis Bonaparte uma espécie de súmula que condensa todas as dificuldades inerentes à análise de classe da política. O artigo está dividido em cinco partes. Na primeira, são analisadas as passagens de O 18 Brumário que enunciam algumas proposições fundamentais acerca da análise política de classe; na segunda, mostra-se que a literatura marxista contemporânea não solucionou os problemas identificados em relação às proposições marxianas; as terceira e quarta partes discutem algumas perspectivas alternativas (classistas e não - classistas ao marxismo; por fim, à guisa de conclusão, faz-se algumas reflexões sobre modos possíveis de operacionalizar a análise de classe da política e sobre os problemas a serem enfrentados nesses casos.This article considers The Eighteenth Brumaire of Louis Napoleon a kind of summary which condenses all the inherent difficulties to the class analysis of Politics. The article is divided in four parts. In the first part, it analyses some passages of The Eighteenth Brumaire that enunciate some fundamental propositions on class analysis of Politics; secondly, it asserts that contemporary Marxist literature on class has not solved the problems here pointed out; in the third and forth parts it discusses some class and non-class perspectives alternative to Marxism; at last, it essays some reflections on possible ways of elaborating with the class analysis of Politics and the problems to be overcome in those cases.

  10. Remote sensing clustering analysis based on object-based interval modeling

    Science.gov (United States)

    He, Hui; Liang, Tianheng; Hu, Dan; Yu, Xianchuan

    2016-09-01

    In object-based clustering, image data are segmented into objects (groups of pixels) and then clustered based on the objects' features. This method can be used to automatically classify high-resolution, remote sensing images, but requires accurate descriptions of object features. In this paper, we ascertain that interval-valued data model is appropriate for describing clustering prototype features. With this in mind, we developed an object-based interval modeling method for high-resolution, multiband, remote sensing data. We also designed an adaptive interval-valued fuzzy clustering method. We ran experiments utilizing images from the SPOT-5 satellite sensor, for the Pearl River Delta region and Beijing. The results indicate that the proposed algorithm considers both the anisotropy of the remote sensing data and the ambiguity of objects. Additionally, we present a new dissimilarity measure for interval vectors, which better separates the interval vectors generated by features of the segmentation units (objects). This approach effectively limits classification errors caused by spectral mixing between classes. Compared with the object-based unsupervised classification method proposed earlier, the proposed algorithm improves the classification accuracy without increasing computational complexity.

  11. Optimization of constitutive parameters of foundation soils k-means clustering analysis

    Institute of Scientific and Technical Information of China (English)

    Muge Elif Orakoglu; Cevdet Emin Ekinci

    2013-01-01

    The goal of this study was to optimize the constitutive parameters of foundation soils using a k-means algorithm with clustering analysis. A database was collected from unconfined compression tests, Proctor tests and grain distribution tests of soils taken from three different types of foundation pits:raft foundations, partial raft foundations and strip foundations. k-means algorithm with clustering analysis was applied to determine the most appropriate foundation type given the un-confined compression strengths and other parameters of the different soils.

  12. Clustering of frequency spectrums from different bearing fault using principle component analysis

    Directory of Open Access Journals (Sweden)

    Yusof M.F.M.

    2017-01-01

    Full Text Available In studies associated with the defect in rolling element bearing, signal clustering are one of the popular approach taken in attempt to identify the type of defect. However, the noise interruption are one of the major issues which affect the degree of effectiveness of the applied clustering method. In this paper, the application of principle component analysis (PCA as a pre-processing method for hierarchical clustering analysis on the frequency spectrum of the vibration signal was proposed. To achieve the aim, the vibration signal was acquired from the operating bearings with different condition and speed. In the next stage, the principle component analysis was applied to the frequency spectrums of the acquired signals for pattern recognition purpose. Meanwhile the mahalanobis distance model was used to cluster the result from PCA. According to the results, it was found that the change in amplitude at the respective fundamental frequencies can be detected as a result from the application of PCA. Meanwhile, the application of mahalanobis distance was found to be suitable for clustering the results from principle component analysis. Uniquely, it was discovered that the spectrums from healthy and inner race defect bearing can be clearly distinguished from each other even though the change in amplitude pattern for inner race defect frequency spectrum was too small compared to the healthy one. In this work, it was demonstrated that the use of principle component analysis could sensitively detect the change in the pattern of the frequency spectrums. Likewise, the implementation of mahalanobis distance model for clustering purpose was found to be significant for bearing defect identification.

  13. Comprehensive behavioral analysis of cluster of differentiation 47 knockout mice.

    Directory of Open Access Journals (Sweden)

    Hisatsugu Koshimizu

    Full Text Available Cluster of differentiation 47 (CD47 is a member of the immunoglobulin superfamily which functions as a ligand for the extracellular region of signal regulatory protein α (SIRPα, a protein which is abundantly expressed in the brain. Previous studies, including ours, have demonstrated that both CD47 and SIRPα fulfill various functions in the central nervous system (CNS, such as the modulation of synaptic transmission and neuronal cell survival. We previously reported that CD47 is involved in the regulation of depression-like behavior of mice in the forced swim test through its modulation of tyrosine phosphorylation of SIRPα. However, other potential behavioral functions of CD47 remain largely unknown. In this study, in an effort to further investigate functional roles of CD47 in the CNS, CD47 knockout (KO mice and their wild-type littermates were subjected to a battery of behavioral tests. CD47 KO mice displayed decreased prepulse inhibition, while the startle response did not differ between genotypes. The mutants exhibited slightly but significantly decreased sociability and social novelty preference in Crawley's three-chamber social approach test, whereas in social interaction tests in which experimental and stimulus mice have direct contact with each other in a freely moving setting in a novel environment or home cage, there were no significant differences between the genotypes. While previous studies suggested that CD47 regulates fear memory in the inhibitory avoidance test in rodents, our CD47 KO mice exhibited normal fear and spatial memory in the fear conditioning and the Barnes maze tests, respectively. These findings suggest that CD47 is potentially involved in the regulation of sensorimotor gating and social behavior in mice.

  14. Identifying news clusters using Q-analysis and modularity

    OpenAIRE

    2013-01-01

    With online publication and social media taking the main role in dissemination of news, and with the decline of traditional printed media, it has become necessary to devise ways to automatically extract meaningful information from the plethora of sources available and to make that information readily available to interested parties. In this paper we present a method of automated analysis of the underlying structure of online newspapers based on Q-analysis and modularity. We show how the combi...

  15. A cluster analysis to investigating nurses' knowledge, attitudes, and skills regarding the clinical management system.

    Science.gov (United States)

    Chan, M F

    2007-01-01

    Nurses' knowledge, attitudes, and skills regarding the Clinical Management System are explored by identifying profiles of nurses working in Hong Kong. A total of 282 nurses from four hospitals completed a self-reported questionnaire during the period from December 2004 to May 2005. Two-step cluster analysis yielded two clusters. The first cluster (n = 159, 56.4%) was labeled "negative attitudes, less skillful, and average knowledge" group. The second cluster (n = 123, 43.6%) was labeled "positive attitudes, good knowledge, but less skillful." There was a positive correlation in cluster 1 for nurses' knowledge and attitudes (rs = 0.28) and in cluster 2 for nurses' skills and attitudes (rs = 0.25) toward computerization. The study showed that senior and more highly educated nurses generally held more positive attitudes to computerization, whereas the attitudes among younger and less well educated nurses generally were more negative. Such findings should be used to formulate strategies to encourage nurses to resolve actual problems following computer training and to increase the depth and breadth of nurses' computer knowledge and skills and improve their attitudes toward computerization.

  16. ARABIC TEXT SUMMARIZATION BASED ON LATENT SEMANTIC ANALYSIS TO ENHANCE ARABIC DOCUMENTS CLUSTERING

    Directory of Open Access Journals (Sweden)

    Hanane Froud

    2013-01-01

    Full Text Available Arabic Documents Clustering is an important task for obtaining good results with the traditional Information Retrieval (IR systems especially with the rapid growth of the number of online documents present in Arabic language. Documents clustering aim to automatically group similar documents in one cluster using different similarity/distance measures. This task is often affected by the documents length, useful information on the documents is often accompanied by a large amount of noise, and therefore it is necessary to eliminate this noise while keeping useful information to boost the performance of Documents clustering. In this paper, we propose to evaluate the impact of text summarization using the Latent Semantic Analysis Model on Arabic Documents Clustering in order to solve problems cited above, using five similarity/distance measures: Euclidean Distance, Cosine Similarity, Jaccard Coefficient, Pearson Correlation Coefficient and Averaged Kullback-Leibler Divergence, for two times: without and with stemming. Our experimental results indicate that our proposed approach effectively solves the problems of noisy information and documents length, and thus significantly improve the clustering performance.

  17. Arabic Text Summarization Based on Latent Semantic Analysis to Enhance Arabic Documents Clustering

    Directory of Open Access Journals (Sweden)

    Hanane Froud

    2013-02-01

    Full Text Available Arabic Documents Clustering is an important task for obtaining good results with the traditional Information Retrieval (IR systems especially with the rapid growth of the number of online documents present in Arabic language. Documents clustering aim to automatically group similar documents in one cluster using different similarity/distance measures. This task is often affected by the documents length, useful information on the documents is often accompanied by a large amount of noise, and therefore it is necessary to eliminate this noise while keeping useful information to boost the performance of Documents clustering. In this paper, we propose to evaluate the impact of text summarization using the Latent Semantic Analysis Model on Arabic Documents Clustering in order to solve problems cited above, using five similarity/distance measures: Euclidean Distance, Cosine Similarity, Jaccard Coefficient, PearsonCorrelation Coefficient and Averaged Kullback-Leibler Divergence, for two times: without and with stemming. Our experimental results indicate that our proposed approach effectively solves the problems of noisy information and documents length, and thus significantly improve the clustering performance.

  18. Weighted Clustering

    DEFF Research Database (Denmark)

    Ackerman, Margareta; Ben-David, Shai; Branzei, Simina

    2012-01-01

    We investigate a natural generalization of the classical clustering problem, considering clustering tasks in which different instances may have different weights.We conduct the first extensive theoretical analysis on the influence of weighted data on standard clustering algorithms in both...... the partitional and hierarchical settings, characterizing the conditions under which algorithms react to weights. Extending a recent framework for clustering algorithm selection, we propose intuitive properties that would allow users to choose between clustering algorithms in the weighted setting and classify...... algorithms accordingly....

  19. Chronic medical conditions among jail detainees in residential psychiatric treatment: a latent class analysis.

    Science.gov (United States)

    Swartz, James A

    2011-08-01

    Studies of incarcerates with serious mental illnesses have found elevated rates of chronic medical conditions such as asthma and diabetes, and of infectious diseases such as tuberculosis compared with general population rates. This study explored the pattern of chronic medical conditions in a sample of adult detainees in psychiatric treatment in a large urban jail to develop a clinical profile encompassing the full range of medical conditions. A total of 431 male and female detainees were sampled with certainty from admissions to a residential psychiatric treatment program (overall recruitment rate = 67%). Interviews used the World Mental Health version of the Composite International Diagnostic Interview to assess psychiatric and substance use disorders per DSM-IV criteria and chronic medical conditions. Latent class analysis was conducted using 17 medical conditions as class indicators, yielding a 3-class model composed of: a latent class with a high to intermediate probability of multiple medical conditions (HMC; 12.5% of the sample); an intermediate class with a lower probability of having a smaller number of medical conditions (MMC; 43.2%); and a class with a low probability of any medical condition (44.3%). Those in the HMC class were more likely to report respiratory problems, severe headaches, musculoskeletal pain, hypertension, and arthritis, have greater functional impairment, and have a higher number of co-occurring psychiatric disorders. Being older (50+ years) and female were associated with higher odds of being in the HMC or MMC classes. The policy implications for providing medical care to incarcerates with complex mixtures of medical conditions and psychiatric disorders are considered.

  20. Latent Class Analysis of DSM-5 Alcohol Use Disorder Criteria Among Heavy-Drinking College Students.

    Science.gov (United States)

    Rinker, Dipali Venkataraman; Neighbors, Clayton

    2015-10-01

    The DSM-5 has created significant changes in the definition of alcohol use disorders (AUDs). Limited work has considered the impact of these changes in specific populations, such as heavy-drinking college students. Latent class analysis (LCA) is a person-centered approach that divides a population into mutually exclusive and exhaustive latent classes, based on observable indicator variables. The present research was designed to examine whether there were distinct classes of heavy-drinking college students who met DSM-5 criteria for an AUD and whether gender, perceived social norms, use of protective behavioral strategies (PBS), drinking refusal self-efficacy (DRSE), self-perceptions of drinking identity, psychological distress, and membership in a fraternity/sorority would be associated with class membership. Three-hundred and ninety-four college students who met DSM-5 criteria for an AUD were recruited from three different universities. Two distinct classes emerged: Less Severe (86%), the majority of whom endorsed both drinking more than intended and tolerance, as well as met criteria for a mild AUD; and More Severe (14%), the majority of whom endorsed at least half of the DSM-5 AUD criteria and met criteria for a severe AUD. Relative to the Less Severe class, membership in the More Severe class was negatively associated with DRSE and positively associated with self-identification as a drinker. There is a distinct class of heavy-drinking college students with a more severe AUD and for whom intervention content needs to be more focused and tailored. Clinical implications are discussed.

  1. Mental State Talk Structure in Children’s Narratives: A Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Giuliana Pinto

    2017-01-01

    Full Text Available This study analysed children’s Theory of Mind (ToM as assessed by mental state talk in oral narratives. We hypothesized that the children’s mental state talk in narratives has an underlying structure, with specific terms organized in clusters. Ninety-eight children attending the last year of kindergarten were asked to tell a story twice, at the beginning and at the end of the school year. Mental state talk was analysed by identifying terms and expressions referring to perceptual, physiological, emotional, willingness, cognitive, moral, and sociorelational states. The cluster analysis showed that children’s mental state talk is organized in two main clusters: perceptual states and affective states. Results from the study confirm the feasibility of narratives as an outlet to inquire mental state talk and offer a more fine-grained analysis of mental state talk structure.

  2. Clustering of the Parameters of Rhythmographic Analysis of Man’s Electrocardiogram

    Directory of Open Access Journals (Sweden)

    Ekaterina A. Filippova

    2014-12-01

    Full Text Available The article considers the clustering of the parameters of man’s heart rate variability. The technique of parameters calculation and diagrams of rhythmographic analysis construction are presented. The algorithm of conceptual clustering Cobweb, modified for quantitative data, is used for parameters clustering. The results of the experiments prove the efficiency of the division of the learning range of electrocardiograms into the groups similar in terms of rhythmographic parameters. The practical application of the offered method as a part of the software support of electrocardiograms analysis will enable to provide operational evaluation of the rhythmographic nature of heart function in the course of screening examinations or in the emergency medicine for diagnosing and prediction.

  3. Alcohol Use as Risk Factors for Older Adults’ Emergency Department Visits: A Latent Class Analysis

    Directory of Open Access Journals (Sweden)

    Namkee G. Choi, PhD

    2015-12-01

    Full Text Available Introduction: Late middle-aged and older adults’ share of emergency department (ED visits is increasing more than other age groups. ED visits by individuals with substance-related problems are also increasing. This paper was intended to identify subgroups of individuals aged 50+ by their risk for ED visits by examining their health/mental health status and alcohol use patterns. Methods: Data came from the 2013 National Health Interview Survey’s Sample Adult file (n=15,713. Following descriptive analysis of sample characteristics by alcohol use patterns, latent class analysis (LCA modeling was fit using alcohol use pattern (lifetime abstainers, ex-drinkers, current infrequent/light/ moderate drinkers, and current heavy drinkers, chronic health and mental health status, and past-year ED visits as indicators. Results: LCA identified a four-class model. All members of Class 1 (35% of the sample; lowest-risk group were infrequent/light/moderate drinkers and exhibited the lowest probabilities of chronic health/ mental health problems; Class 2 (21%; low-risk group consisted entirely of lifetime abstainers and, despite being the oldest group, exhibited low probabilities of health/mental health problems; Class 3 (37%; moderate-risk group was evenly divided between ex-drinkers and heavy drinkers; and Class 4 (7%; high-risk group included all four groups of drinkers but more ex-drinkers. In addition, Class 4 had the highest probabilities of chronic health/mental problems, unhealthy behaviors, and repeat ED visits, with the highest proportion of Blacks and the lowest proportions of college graduates and employed persons, indicating significant roles of these risk factors. Conclusion: Alcohol nonuse/use (and quantity of use and chronic health conditions are significant contributors to varying levels of ED visit risk. Clinicians need to help heavy-drinking older adults reduce unhealthy alcohol consumption and help both heavy drinkers and ex

  4. Likelihood analysis of spatial capture-recapture models for stratified or class structured populations

    Science.gov (United States)

    Royle, J. Andrew; Sutherland, Christopher S.; Fuller, Angela K.; Sun, Catherine C.

    2015-01-01

    We develop a likelihood analysis framework for fitting spatial capture-recapture (SCR) models to data collected on class structured or stratified populations. Our interest is motivated by the necessity of accommodating the problem of missing observations of individual class membership. This is particularly problematic in SCR data arising from DNA analysis of scat, hair or other material, which frequently yields individual identity but fails to identify the sex. Moreover, this can represent a large fraction of the data and, given the typically small sample sizes of many capture-recapture studies based on DNA information, utilization of the data with missing sex information is necessary. We develop the class structured likelihood for the case of missing covariate values, and then we address the scaling of the likelihood so that models with and without class structured parameters can be formally compared regardless of missing values. We apply our class structured model to black bear data collected in New York in which sex could be determined for only 62 of 169 uniquely identified individuals. The models containing sex-specificity of both the intercept of the SCR encounter probability model and the distance coefficient, and including a behavioral response are strongly favored by log-likelihood. Estimated population sex ratio is strongly influenced by sex structure in model parameters illustrating the importance of rigorous modeling of sex differences in capture-recapture models.

  5. Prestige, Centrality, and Learning: A Social Network Analysis of an Online Class

    Science.gov (United States)

    Russo, Tracy C.; Koesten, Joy

    2005-01-01

    This study explored relations between social network characteristics in an online graduate class and two learning outcomes: affective and cognitive learning. The social network analysis data were compiled by entering the number of one-to-one postings sent by each student to each other student in a course web site discussion space into a specially…

  6. Properties of Star Clusters -- III: Analysis of 13 FSR Clusters using UKIDSS-GPS and VISTA-VVV

    CERN Document Server

    Buckner, A S M

    2016-01-01

    Discerning the nature of open cluster candidates is essential for both individual and statistical analyses of cluster properties. Here we establish the nature of thirteen cluster candidates from the FSR cluster list using photometry from the 2MASS and deeper, higher resolution UKIDSS-GPS and VISTA-VVV surveys. These clusters were selected because they were flagged in our previous studies as expected to contain a large proportion of pre-main sequence members or are at unusually small/large Galactocentric distances. We employ a decontamination procedure of JHK photometry to identify cluster members. Cluster properties are homogeneously determined and we conduct a cross comparative study of our results with the literature (where available). Seven of the here studied clusters were confirmed to contain PMS stars, one of which is a newly confirmed cluster. Our study of FSR1716 is the deepest to date and is in notable disagreement with previous studies, finding that it has a distance of about 7.3kpc and age of 10-12...

  7. Cluster Analysis of Indonesian Province Based on Household Primary Cooking Fuel Using K-Means

    Science.gov (United States)

    Huda, S. N.

    2017-03-01

    Each household definitely provides installations for cooking. Kerosene, which is refined from petroleum products once dominated types of primary fuel for cooking in Indonesia, whereas kerosene has an expensive cost and small efficiency. Other household use LPG as their primary cooking fuel. However, LPG supply is also limited. In addition, with a very diverse environments and cultures in Indonesia led to diversity of the installation type of cooking, such as wood-burning stove brazier. The government is also promoting alternative fuels, such as charcoal briquettes, and fuel from biomass. The use of other fuels is part of the diversification of energy that is expected to reduce community dependence on petroleum-based fuels. The use of various fuels in cooking that vary from one region to another reflects the distribution of fuel basic use by household. By knowing the characteristics of each province, the government can take appropriate policies to each province according each character. Therefore, it would be very good if there exist a cluster analysis of all provinces in Indonesia based on the type of primary cooking fuel in household. Cluster analysis is done using K-Means method with K ranging from 2-5. Cluster results are validated using Silhouette Coefficient (SC). The results show that the highest SC achieved from K = 2 with SC value 0.39135818388151. Two clusters reflect provinces in Indonesia, one is a cluster of more traditional provinces and the other is a cluster of more modern provinces. The cluster results are then shown in a map using Google Map API.

  8. Applying of hierarchical clustering to analysis of protein patterns in the human cancer-associated liver.

    Directory of Open Access Journals (Sweden)

    Natalia A Petushkova

    Full Text Available There are two ways that statistical methods can learn from biomedical data. One way is to learn classifiers to identify diseases and to predict outcomes using the training dataset with established diagnosis for each sample. When the training dataset is not available the task can be to mine for presence of meaningful groups (clusters of samples and to explore underlying data structure (unsupervised learning.We investigated the proteomic profiles of the cytosolic fraction of human liver samples using two-dimensional electrophoresis (2DE. Samples were resected upon surgical treatment of hepatic metastases in colorectal cancer. Unsupervised hierarchical clustering of 2DE gel images (n = 18 revealed a pair of clusters, containing 11 and 7 samples. Previously we used the same specimens to measure biochemical profiles based on cytochrome P450-dependent enzymatic activities and also found that samples were clearly divided into two well-separated groups by cluster analysis. It turned out that groups by enzyme activity almost perfectly match to the groups identified from proteomic data. Of the 271 reproducible spots on our 2DE gels, we selected 15 to distinguish the human liver cytosolic clusters. Using MALDI-TOF peptide mass fingerprinting, we identified 12 proteins for the selected spots, including known cancer-associated species.Our results highlight the importance of hierarchical cluster analysis of proteomic data, and showed concordance between results of biochemical and proteomic approaches. Grouping of the human liver samples and/or patients into differing clusters may provide insights into possible molecular mechanism of drug metabolism and creates a rationale for personalized treatment.

  9. Cluster Analysis of Velocity Field Derived from Dense GNSS Network of Japan

    Science.gov (United States)

    Takahashi, A.; Hashimoto, M.

    2015-12-01

    Dense GNSS networks have been widely used to observe crustal deformation. Simpson et al. (2012) and Savage and Simpson (2013) have conducted cluster analyses of GNSS velocity field in the San Francisco Bay Area and Mojave Desert, respectively. They have successfully found velocity discontinuities. They also showed an advantage of cluster analysis for classifying GNSS velocity field. Since in western United States, strike-slip events are dominant, geometry is simple. However, the Japanese Islands are tectonically complicated due to subduction of oceanic plates. There are many types of crustal deformation such as slow slip event and large postseismic deformation. We propose a modified clustering method of GNSS velocity field in Japan to separate time variant and static crustal deformation. Our modification is performing cluster analysis every several months or years, then qualifying cluster member similarity. If a GNSS station moved differently from its neighboring GNSS stations, the station will not belong to in the cluster which includes its surrounding stations. With this method, time variant phenomena were distinguished. We applied our method to GNSS data of Japan from 1996 to 2015. According to the analyses, following conclusions were derived. The first is the clusters boundaries are consistent with known active faults. For examples, the Arima-Takatsuki-Hanaore fault system and the Shimane-Tottori segment proposed by Nishimura (2015) are recognized, though without using prior information. The second is improving detectability of time variable phenomena, such as a slow slip event in northern part of Hokkaido region detected by Ohzono et al. (2015). The last one is the classification of postseismic deformation caused by large earthquakes. The result suggested velocity discontinuities in postseismic deformation of the Tohoku-oki earthquake. This result implies that postseismic deformation is not continuously decaying proportional to distance from its epicenter.

  10. Indentifying the major air pollutants base on factor and cluster analysis, a case study in 74 Chinese cities

    Science.gov (United States)

    Zhang, Jing; Zhang, Lan-yue; Du, Ming; Zhang, Wei; Huang, Xin; Zhang, Ya-qi; Yang, Yue-yi; Zhang, Jian-min; Deng, Shi-huai; Shen, Fei; Li, Yuan-wei; Xiao, Hong

    2016-11-01

    This article investigated the major air pollutants and its spatial and seasonal distribution in 74 Chinese cities. Factor analysis and Cluster analysis are employed to indentify major factors of air pollutants. The following results are obtained (1) major factors are obtained in spring, summer, autumn, and winter. The first factor in spring includes NO2, PM10, CO, and PM2.5; the first factor in summer and autumn includes PM10, PM2.5, CO and SO2; in winter, the first factor includes NO2, PM10, PM2.5, and SO2. (2) In spring, cities of cluster 5 are the severest polluted by emission sources of SO2, CO, PM10, and PM2.5; the emission sources of O3 would significantly influence the air quality in cities of cluster 2; the emission sources of NO2 could significantly influence the air quality in cities of cluster 3 and cluster 5. (3) In summer, cities of cluster 5 are the severest polluted by automotive emissions and coal flue gas. Cities of cluster 1 are the lightest polluted. Cities of cluster 3 and cluster 2 are polluted by emission sources of SO2 and O3. (4) In Autumn, cities of cluster 3 and 4 are the severest polluted by the emission sources of SO2, CO, PM10, and PM2.5; the emission sources of NO2 would significantly influence the air quality in cities of cluster 5; the emission sources of O3 could significantly influence the air quality in cities of cluster 1 and cluster 4. (5) In winter, cities of cluster 5 are the severest polluted by the emission sources of SO2, CO, PM10, PM2.5, and CO; the emission sources of O3 could significantly influence the air quality in cities of cluster 1 and cluster 5.

  11. The Oropharyngeal Airway in Young Adults with Skeletal Class II and Class III Deformities: A 3-D Morphometric Analysis.

    Directory of Open Access Journals (Sweden)

    Yasas Shri Nalaka Jayaratne

    Full Text Available 1 To determine the accuracy and reliability of an automated anthropometric measurement software for the oropharyngeal airway and 2 To compare the anthropometric dimensions of the oropharyngeal airway in skeletal class II and III deformity patients.Cone-beam CT (CBCT scans of 62 patients with skeletal class II or III deformities were used for this study. Volumetric, linear and surface area measurements retroglossal (RG and retropalatal (RP compartments of the oropharyngeal airway was measured with the 3dMDVultus software. Accuracy of automated anthropometric pharyngeal airway measurements was assessed using an airway phantom.The software was found to be reasonably accurate for measuring dimensions of air passages. The total oropharyngeal volume was significantly greater in the skeletal class III deformity group (16.7 ± 9.04 mm3 compared with class II subjects (11.87 ± 4.01 mm3. The average surface area of both the RG and RP compartments were significantly larger in the class III deformity group. The most constricted area in the RG and RP airway was significantly larger in individuals with skeletal class III deformity. The anterior-posterior (AP length of this constriction was significantly greater in skeletal class III individuals in both compartments, whereas the width of the constriction was not significantly different between the two groups in both compartments. The RP compartment was larger but less uniform than the RG compartment in both skeletal deformities.Significant differences were observed in morphological characteristics of the oropharyngeal airway in individuals with skeletal class II and III deformities. This information may be valuable for surgeons in orthognathic treatment planning, especially for mandibular setback surgery that might compromise the oropharyngeal patency.

  12. Gene microarray data analysis using parallel point-symmetry-based clustering.

    Science.gov (United States)

    Sarkar, Anasua; Maulik, Ujjwal

    2015-01-01

    Identification of co-expressed genes is the central goal in microarray gene expression analysis. Point-symmetry-based clustering is an important unsupervised learning technique for recognising symmetrical convex- or non-convex-shaped clusters. To enable fast clustering of large microarray data, we propose a distributed time-efficient scalable approach for point-symmetry-based K-Means algorithm. A natural basis for analysing gene expression data using symmetry-based algorithm is to group together genes with similar symmetrical expression patterns. This new parallel implementation also satisfies linear speedup in timing without sacrificing the quality of clustering solution on large microarray data sets. The parallel point-symmetry-based K-Means algorithm is compared with another new parallel symmetry-based K-Means and existing parallel K-Means over eight artificial and benchmark microarray data sets, to demonstrate its superiority, in both timing and validity. The statistical analysis is also performed to establish the significance of this message-passing-interface based point-symmetry K-Means implementation. We also analysed the biological relevance of clustering solutions.

  13. Performance Analysis of a Cluster-Based MAC Protocol for Wireless Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    Jesús Alonso-Zárate

    2010-01-01

    Full Text Available An analytical model to evaluate the non-saturated performance of the Distributed Queuing Medium Access Control Protocol for Ad Hoc Networks (DQMANs in single-hop networks is presented in this paper. DQMAN is comprised of a spontaneous, temporary, and dynamic clustering mechanism integrated with a near-optimum distributed queuing Medium Access Control (MAC protocol. Clustering is executed in a distributed manner using a mechanism inspired by the Distributed Coordination Function (DCF of the IEEE 802.11. Once a station seizes the channel, it becomes the temporary clusterhead of a spontaneous cluster and it coordinates the peer-to-peer communications between the clustermembers. Within each cluster, a near-optimum distributed queuing MAC protocol is executed. The theoretical performance analysis of DQMAN in single-hop networks under non-saturation conditions is presented in this paper. The approach integrates the analysis of the clustering mechanism into the MAC layer model. Up to the knowledge of the authors, this approach is novel in the literature. In addition, the performance of an ad hoc network using DQMAN is compared to that obtained when using the DCF of the IEEE 802.11, as a benchmark reference.

  14. Kraepelin Was Right: A Latent Class Analysis of Symptom Dimensions in Patients and Controls

    OpenAIRE

    Derks, Eske M.; Allardyce, Judith; Boks, Marco P; Vermunt, Jeroen K.; Hijman, Ron; Ophoff, Roel A

    2010-01-01

    Phenotypic heterogeneity within patients and controls may explain why the genetic variants contributing to schizophrenia risk explain only a fraction of the heritability. The aim of this study is to investigate quantitative and qualitative differences in psychosis symptoms in a sample including psychosis patients, their relatives, and community controls. We combined factor analysis and latent class analysis to analyze variation in Comprehensive Assessment of Symptoms and History lifetime-rate...

  15. Scalable classification by clustering: Hybrid can be better than Pure

    Institute of Scientific and Technical Information of China (English)

    Deng Shengchun; He Zengyou; Xu Xiaofei

    2007-01-01

    The problem of scalable classification by clustering in large databases was discussed. Clustering based classification method first generates clusters using clustering algorithms . To classify new coming data points , it finds the k nearest clusters of the data point as neighbors , and assign each data point to the dominant class of these neighbors . Existing algorithms incorporated class information in making clustering decisions and produced pure clusters (each cluster associated with only one class) . We presented hybrid cluster based algorithms , which produce clusters by unsupervised clustering and allow each cluster associated with multiple classes . Experimental results show that hybrid cluster based algorithms outperform pure ones in both classification accuracy and training speed.

  16. 2 x 2 Achievement Goals and Achievement Emotions: A Cluster Analysis of Students' Motivation

    Science.gov (United States)

    Jang, Leong Yeok; Liu, Woon Chia

    2012-01-01

    This study sought to better understand the adoption of multiple achievement goals at an intra-individual level, and its links to emotional well-being, learning, and academic achievement. Participants were 480 Secondary Two students (aged between 13 and 14 years) from two coeducational government schools. Hierarchical cluster analysis revealed the…

  17. Generating Geospatially Realistic Driving Patterns Derived From Clustering Analysis Of Real EV Driving Data

    DEFF Research Database (Denmark)

    Pedersen, Anders Bro; Aabrandt, Andreas; Østergaard, Jacob

    2014-01-01

    scales, which calls for a statistically correct, yet flexible model. This paper describes a method for modelling EV, based on non-categorized data, which takes into account the plug in locations of the vehicles. By using clustering analysis to extrapolate and classify the primary locations where...

  18. Improved Detection of Time Windows of Brain Responses in Fmri Using Modified Temporal Clustering Analysis

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    @@ Temporal clustering analysis (TCA) has been proposed recently as a method to detect time windows of brain responses in functional MRI (fMRI) studies when the timing and location of the activation are completely unknown. Modifications to the TCA technique are introduced in this report to further improve the sensitivity in detecting brain activation.

  19. Classification of shoulder complaints in general practice by means of cluster analysis

    NARCIS (Netherlands)

    Winters, JC; Groenier, KH; Sobel, JS; Arendzen, HH; Meyboom-de Jong, B

    1997-01-01

    Objective: To determine if a classification of shoulder complaints in general practice can be made with a cluster analysis of variables of medical history and physical examination. Method: One hundred one patients with shoulder complaints were examined upon inclusion (week 0) and after 2 weeks. Elev

  20. [Current service invention patents and growth pathways on basis of cluster analysis].

    Science.gov (United States)

    Yang, Xu-jie; Xiao, Shi-ying

    2012-09-01

    This study aims for enhancing quantity and quality of patents of traditional Chinese medicine compounds of traditional Chinese medicine enterprises, traditional Chinese medicine colleges and relevant institutions while building an efficient pathway for patent protection using simple statistics and cluster analysis, with service invention patent holders of traditional Chinese medicine compounds as the study object.

  1. Cluster Analysis of Assessment in Anatomy and Physiology for Health Science Undergraduates

    Science.gov (United States)

    Brown, Stephen; White, Sue; Power, Nicola

    2016-01-01

    Academic content common to health science programs is often taught to a mixed group of students; however, content assessment may be consistent for each discipline. This study used a retrospective cluster analysis on such a group, first to identify high and low achieving students, and second, to determine the distribution of students within…

  2. Traumatic Brain Injury and PTSD Screening Efforts Evaluated Using Latent Class Analysis

    Science.gov (United States)

    2014-01-01

    Carolina). For latent class analyses, SAS PROC LCA , version 1.2.5 was used (PROC LCA ; Lanza et al., 2007). Results Analysis yielded four classes with...41, 1284–1292. doi:10.1097/01.MLR.0000093487.78664.3C Lanza, S., Collins, L., Lemmon, D., & Schafer, J. (2007). PROC LCA : A SAS procedure for latent...International Journal of Psychiatry in Clinical Practice, 9, 9 –14. doi:10.1185/ 135525703125002360 PROC LCA , Version 1.2.5. [Computer software]. The

  3. Cluster Analysis of Polyphenols and Organic Acids in 11 Different Brand Cigarette Samples at Home and Abroad

    Institute of Scientific and Technical Information of China (English)

    Lan MI; Bilong DAI; Yu QIN; Wenjun ZHANG; Zhen XIONG; Yanhong WANG; Ting ZHU

    2015-01-01

    The objective of this research was to investigate the differences between local cigarette and foreign cigarette and supplied a base for improving the quality of cigarette. Different kinds of polyphenols and organic acids in 11 different brand cigarette samples at home and abroad were classified by the method of cluster analysis. The results indicated that the 11 samples could be classified into 2 class-es. Suyan, Furongwang, Chinese, Baisha, Dihao, Yunyan, Hongtashan belonged to type 1; foreign cigarettes that represented by Marboro, Blue pacific and Brazil cigarette belonged to type 2. The content of malic acid and citric acid in type 1 was higher than type 2, the content of malonic acid was higher in type 2, and there is no difference between the type 1 and type 2 about the content of polyphe-nols. In conclusion, the content of malic acid and citric in Chinese cigarettes was higher than foreign, but the content of malonic acid was lower than foreign. There is no difference between Chinese cigarettes and foreign cigarettes about the content of polyphenols.

  4. Unraveling the dha cluster in Citrobacter werkmanii: comparative genomic analysis of bacterial 1,3-propanediol biosynthesis clusters.

    Science.gov (United States)

    Maervoet, Veerle E T; De Maeseneire, Sofie L; Soetaert, Wim K; De Mey, Marjan

    2014-04-01

    In natural 1,3-propanediol (PDO) producing microorganisms such as Klebsiella pneumoniae, Citrobacter freundii and Clostridium sp., the genes coding for PDO producing enzymes are grouped in a dha cluster. This article describes the dha cluster of a novel candidate for PDO production, Citrobacter werkmanii DSM17579 and compares the cluster to the currently known PDO clusters of Enterobacteriaceae and Clostridiaceae. Moreover, we attribute a putative function to two previously unannotated ORFs, OrfW and OrfY, both in C. freundii and in C. werkmanii: both proteins might form a complex and support the glycerol dehydratase by converting cob(I)alamin to the glycerol dehydratase cofactor coenzyme B12. Unraveling this biosynthesis cluster revealed high homology between the deduced amino acid sequence of the open reading frames of C. werkmanii DSM17579 and those of C. freundii DSM30040 and K. pneumoniae MGH78578, i.e., 96 and 87.5 % identity, respectively. On the other hand, major differences between the clusters have also been discovered. For example, only one dihydroxyacetone kinase (DHAK) is present in the dha cluster of C. werkmanii DSM17579, while two DHAK enzymes are present in the cluster of K. pneumoniae MGH78578 and Clostridium butyricum VPI1718.

  5. Joint analysis of time-to-event and multiple binary indicators of latent classes

    DEFF Research Database (Denmark)

    Larsen, Klaus

    2004-01-01

    Multiple categorical variables are commonly used in medical and epidemiological research to measure specific aspects of human health and functioning. To analyze such data, models have been developed considering these categorical variables as imperfect indicators of an individual's "true" status......, such as no additional effect of the observed indicators given latent class. The usefulness of the model framework and the proposed techniques are illustrated in an analysis of data from the Women's Health and Aging Study concerning the effect of severe mobility disability on time-to-death for elderly women....... of health or functioning. In this article, the latent class regression model is used to model the relationship between covariates, a latent class variable (the unobserved status of health or functioning), and the observed indicators (e.g., variables from a questionnaire). The Cox model is extended...

  6. [The craniofacial architecture of class III malocclusion using the Coben analysis].

    Science.gov (United States)

    Vallée-Cussac, V

    1991-01-01

    In this study, longitudinal tracings of dental and skeletal Class III malocclusion group are compared to tracings of COBEN analysis standard values. Cephalometric measurements and surimpositions illustrate the dynamic variations of Class III cranio-facial architecture for two age ranges: 8 years +/- 1 year and 16 years +/- 1 year. The Class III pathology for children 8 years +/- 1 year aged is characterized by alterations of tracings sizes and position with excessive cranio-facial components length and rotation of cranial base into a more vertical position. A growth rate deficiency in length with a variable individual adaptation is showed for cranial structures except the mandibule after growth at the age of 16 years +/- 1 year.

  7. Degradation Assessment and Fault Diagnosis for Roller Bearing Based on AR Model and Fuzzy Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Lingli Jiang

    2011-01-01

    Full Text Available This paper proposes a new approach combining autoregressive (AR model and fuzzy cluster analysis for bearing fault diagnosis and degradation assessment. AR model is an effective approach to extract the fault feature, and is generally applied to stationary signals. However, the fault vibration signals of a roller bearing are non-stationary and non-Gaussian. Aiming at this problem, the set of parameters of the AR model is estimated based on higher-order cumulants. Consequently, the AR parameters are taken as the feature vectors, and fuzzy cluster analysis is applied to perform classification and pattern recognition. Experiments analysis results show that the proposed method can be used to identify various types and severities of fault bearings. This study is significant for non-stationary and non-Gaussian signal analysis, fault diagnosis and degradation assessment.

  8. THE USE OF CLUSTER ANALYSIS TO EVALUATE SOCIO-ECONOMIC DEVELOPMENT OF REGIONS (EVIDENCE FROM THE YAROSLAVL REGION

    Directory of Open Access Journals (Sweden)

    Vera V. Zholudeva

    2014-01-01

    Full Text Available In the article the results of Cluster Analysis of the Central Federal District regions are presented. The use of cluster analysis methods for definition of the resources utilization degree and the improvement of socio-economic development of any region is considered. The article gives the index of socio-economic development of the Yaroslavl region.

  9. Module Cluster: IFE - 001.00 (GSC) Basic Terminology and Analysis of Writings Concerned with Educational Issues.

    Science.gov (United States)

    Zahn, R. D.

    This document is one of the module clusters developed for the Camden Teacher Corps project. The purpose of this module cluster is to enable students to define and use basic terminology in the discussion and analysis of educational issues, to use various approaches in studying an issue, and to apply critical analysis skills to written and spoken…

  10. Design and analysis of 19 pin annular fuel rod cluster for pressure tube type boiling water reactor

    Energy Technology Data Exchange (ETDEWEB)

    Deokule, A.P., E-mail: abhijit.deokule1986@gmail.com [Homi Bhabha National Institute, Trombay 400 085, Mumbai (India); Vishnoi, A.K.; Dasgupta, A.; Umasankari, K.; Chandraker, D.K.; Vijayan, P.K. [Bhabha Atomic Research Centre, Trombay 400 085, Mumbai (India)

    2014-09-15

    Highlights: • Development of 19 pin annular fuel rod cluster. • Reactor physics study of designed annular fuel rod cluster. • Thermal hydraulic study of annular fuel rod cluster. - Abstract: An assessment of 33 pin annular fuel rod cluster has been carried out previously for possible use in a pressure tube type boiling water reactor. Despite the benefits such as negative coolant void reactivity and larger heat transfer area, the 33 pin annular fuel rod cluster is having lower discharge burn up as compared to solid fuel rod cluster when all other parameters are kept the same. The power rating of this design cannot be increased beyond 20% of the corresponding solid fuel rod cluster. The limitation on the power is not due to physics parameters rather it comes from the thermal hydraulics side. In order to increase power rating of the annular fuel cluster, keeping same pressure tube diameter, the pin diameter was increased, achieving larger inside flow area. However, this reduces the number of annular fuel rods. In spite of this, the power of the annular fuel cluster can be increased by 30% compared to the solid fuel rod cluster. This makes the nineteen pin annular fuel rod cluster a suitable option to extract more power without any major changes in the existing design of the fuel. In the present study reactor physics and thermal hydraulic analysis carried out with different annular fuel rod cluster geometry is reported in detail.

  11. Investigating nurses' knowledge, attitudes, and skills patterns towards clinical management system: results of a cluster analysis.

    Science.gov (United States)

    Chan, M F

    2006-09-01

    To determine whether definable subtypes exist within a cohort of Hong Kong nurses as related to the clinical management system use in their clinical practices based on their knowledge, attitudes, skills, and background factors. Data were collected using a structured questionnaire. The sample of 242 registered nurses was recruited from three hospitals in Hong Kong. The study employs personal and demographic variables, knowledge, attitudes, and skills scale. A cluster analysis yielded two clusters. Each cluster represents a different profile of Hong Kong nurses on the clinical management system use in their clinical practices. The first group (Cluster 1) was labeled 'lower attitudes, less skilful and average knowledge' group, and represented 55.4% of the total respondents. The second group (Cluster 2) was labeled as 'positive attitudes, good knowledge but less skilful'. They comprised almost 44.6% of this nursing sample. Cluster 2 had more older nurses, the majority were educated to the baccalaureate or above level, with more than 10 years working experience, and they held a more senior ranking then Cluster 1. A clear profile of Hong Kong nurses may benefit healthcare professionals in making appropriate education or assistance to prompt the use of the clinical management system by nurses an officially recognized profession. The findings were useful in determining nurse-users' specific needs and their preferences for modification of the clinical management system. Such findings should be used to formulate strategies to encourage nurses to resolve actual problems following computer training and to increase the depth and breadth of nurses' knowledge, attitudes, and skills toward such system.

  12. Market segmentation for multiple option healthcare delivery systems--an application of cluster analysis.

    Science.gov (United States)

    Jarboe, G R; Gates, R H; McDaniel, C D

    1990-01-01

    Healthcare providers of multiple option plans may be confronted with special market segmentation problems. This study demonstrates how cluster analysis may be used for discovering distinct patterns of preference for multiple option plans. The availability of metric, as opposed to categorical or ordinal, data provides the ability to use sophisticated analysis techniques which may be superior to frequency distributions and cross-tabulations in revealing preference patterns.

  13. Clinical evaluation of nonsyndromic dental anomalies in Dravidian population: A cluster sample analysis

    OpenAIRE

    Yamunadevi, Andamuthu; Selvamani, M.; Vinitha, V.; Srivandhana, R.; Balakrithiga, M.; Prabhu, S; Ganapathy, N

    2015-01-01

    Aim: To record the prevalence rate of dental anomalies in Dravidian population and analyze the percentage of individual anomalies in the population. Methodology: A cluster sample analysis was done, where 244 subjects studying in a dental institution were all included and analyzed for occurrence of dental anomalies by clinical examination, excluding third molars from analysis. Results: 31.55% of the study subjects had dental anomalies and shape anomalies were more prevalent (22.1%), followed b...

  14. Deep observations of the Super-CLASS super-cluster at 325 MHz with the GMRT: the low-frequency source catalogue

    CERN Document Server

    Riseley, C J; Hales, C A; Harrison, I; Birkinshaw, M; Battye, R A; Beswick, R J; Brown, M L; Casey, C M; Chapman, S C; Demetroullas, C; Hung, C -L; Jackson, N J; Muxlow, T; Watson, B

    2016-01-01

    We present the results of 325 MHz GMRT observations of a super-cluster field, known to contain five Abell clusters at redshift $z \\sim 0.2$. We achieve a nominal sensitivity of $34\\,\\mu$Jy beam$^{-1}$ toward the phase centre. We compile a catalogue of 3257 sources with flux densities in the range $183\\,\\mu\\rm{Jy}\\,-\\,1.5\\,\\rm{Jy}$ within the entire $\\sim 6.5$ square degree field of view. Subsequently, we use available survey data at other frequencies to derive the spectral index distribution for a sub-sample of these sources, recovering two distinct populations -- a dominant population which exhibit spectral index trends typical of steep-spectrum synchrotron emission, and a smaller population of sources with typically flat or rising spectra. We identify a number of sources with ultra-steep spectra or rising spectra for further analysis, finding two candidate high-redshift radio galaxies and three gigahertz-peaked-spectrum radio sources. Finally, we derive the Euclidean-normalised differential source counts us...

  15. 3D BUILDING MODELS SEGMENTATION BASED ON K-MEANS++ CLUSTER ANALYSIS

    Directory of Open Access Journals (Sweden)

    C. Zhang

    2016-10-01

    Full Text Available 3D mesh model segmentation is drawing increasing attentions from digital geometry processing field in recent years. The original 3D mesh model need to be divided into separate meaningful parts or surface patches based on certain standards to support reconstruction, compressing, texture mapping, model retrieval and etc. Therefore, segmentation is a key problem for 3D mesh model segmentation. In this paper, we propose a method to segment Collada (a type of mesh model 3D building models into meaningful parts using cluster analysis. Common clustering methods segment 3D mesh models by K-means, whose performance heavily depends on randomized initial seed points (i.e., centroid and different randomized centroid can get quite different results. Therefore, we improved the existing method and used K-means++ clustering algorithm to solve this problem. Our experiments show that K-means++ improves both the speed and the accuracy of K-means, and achieve good and meaningful results.

  16. Competitiveness Analysis of Processing Industry Cluster of Livestock Products in Inner Mongolia Based on "Diamond Model"

    Institute of Scientific and Technical Information of China (English)

    YANG Xing-long; REN Ya-tong

    2012-01-01

    Using Michael Porter’s "diamond model", based on regional development characteristics, we conduct analysis of the competitiveness of processing industry cluster of livestock products in Inner Mongolia from six aspects (the factor conditions, demand conditions, corporate strategy, structure and competition, related and supporting industries, government and opportunities). And we put forward the following rational recommendations for improving the competitiveness of processing industry cluster of livestock products in Inner Mongolia: (i) The government should increase capital input, focus on supporting processing industry of livestock products, and give play to the guidance and aggregation effect of financial funds; (ii) In terms of enterprises, it is necessary to vigorously develop leading enterprises, to give full play to the cluster effect of the leading enterprises.

  17. Clustering analysis of western North Pacific Tropical Cyclone tracks using the Self Organizing Map

    Science.gov (United States)

    Kim, H.; Seo, K.

    2013-12-01

    A cluster analysis using Self Organizing Map (SOM) is used to characterize tropical cyclone (TC) tracks over the western North Pacific. A False Discovery Rate (FDR) method is used to objectively determine an optimum cluster number. For 620 TC tracks over the WNP from June-October during 1979-2010, the five clusters for TC tracks are selected. These can further be categorized into three major patterns: straight-moving track, recurving track, and quasi-random pattern. Each pattern is characterized by land falling regions: near South and East China, East Asia, and off-shore of Japan. In addition, each pattern shows distinctive properties in its traveling distance, lifetime, intensity (mean minimum sea level pressure), and genesis location. It is revealed that these three patterns are associated with the large-scale dynamics such as variability of the western Pacific subtropical high and the Madden-Julian Oscillation. The impacts of El Nino and NAO will be discussed.

  18. Automation of Large-scale Computer Cluster Monitoring Information Analysis

    Science.gov (United States)

    Magradze, Erekle; Nadal, Jordi; Quadt, Arnulf; Kawamura, Gen; Musheghyan, Haykuhi

    2015-12-01

    High-throughput computing platforms consist of a complex infrastructure and provide a number of services apt to failures. To mitigate the impact of failures on the quality of the provided services, a constant monitoring and in time reaction is required, which is impossible without automation of the system administration processes. This paper introduces a way of automation of the process of monitoring information analysis to provide the long and short term predictions of the service response time (SRT) for a mass storage and batch systems and to identify the status of a service at a given time. The approach for the SRT predictions is based on Adaptive Neuro Fuzzy Inference System (ANFIS). An evaluation of the approaches is performed on real monitoring data from the WLCG Tier 2 center GoeGrid. Ten fold cross validation results demonstrate high efficiency of both approaches in comparison to known methods.

  19. The heterogeneity of headache patients who self-medicate: a cluster analysis approach.

    Science.gov (United States)

    Mehuys, Els; Paemeleire, Koen; Crombez, Geert; Adriaens, Els; Van Hees, Thierry; Demarche, Sophie; Christiaens, Thierry; Van Bortel, Luc; Van Tongelen, Inge; Remon, Jean-Paul; Boussery, Koen

    2016-07-01

    Patients with headache often self-treat their condition with over-the-counter analgesics. However, overuse of analgesics can cause medication-overuse headache. The present study aimed to identify subgroups of individuals with headache who self-medicate, as this could be helpful to tailor intervention strategies for prevention of medication-overuse headache. Patients (n = 1021) were recruited from 202 community pharmacies and completed a self-administered questionnaire. A hierarchical cluster analysis was used to group patients as a function of sociodemographics, pain, disability, and medication use for pain. Three patient clusters were identified. Cluster 1 (n = 498, 48.8%) consisted of relatively young individuals, and most of them suffered from migraine. They reported the least number of other pain complaints and the lowest prevalence of medication overuse (MO; 16%). Cluster 2 (n = 301, 29.5%) included older persons with mainly non-migraine headache, a low disability, and on average pain in 2 other locations. Prevalence of MO was 40%. Cluster 3 (n = 222, 21.7%) mostly consisted of patients with migraine who also report pain in many other locations. These patients reported a high disability and a severe limitation of activities. They also showed the highest rates of MO (73%).

  20. Dynamical analysis of NGC 110: cluster of fainter stars or data fluctuation?

    CERN Document Server

    Joshi, Gireesh C

    2016-01-01

    The stellar enhancement of the cluster NGC 110 is investigated in various optical and infrared (IR) bands. The radial density profile of the IR region does not show a stellar enhancement in the central region of the cluster. This stellar deficiency may be occurring by undetected fainter stars due to the contamination effect of massive stars. Since, our analysis is not indicating the stellar enhancement below 16.5 mag of I band, therefore the cluster is assumed to be a group of fainter stars. The proposed magnitude scatter factor would be an excellent tool to understand the characteristic of colour-scattering of stars. The most probable members do not coincide with the model isochronic fitting in the optical bands due to poor data quality of P P MXL catalogue. The different values of the mean proper motions are found for the fainter stars of the cluster and field regions, whereas similar values are obtained for radial zones of the cluster. The symmetrical distribution of fainter stars of the core are found aro...

  1. Global Analysis of miRNA Gene Clusters and Gene Families Reveals Dynamic and Coordinated Expression

    Directory of Open Access Journals (Sweden)

    Li Guo

    2014-01-01

    Full Text Available To further understand the potential expression relationships of miRNAs in miRNA gene clusters and gene families, a global analysis was performed in 4 paired tumor (breast cancer and adjacent normal tissue samples using deep sequencing datasets. The compositions of miRNA gene clusters and families are not random, and clustered and homologous miRNAs may have close relationships with overlapped miRNA species. Members in the miRNA group always had various expression levels, and even some showed larger expression divergence. Despite the dynamic expression as well as individual difference, these miRNAs always indicated consistent or similar deregulation patterns. The consistent deregulation expression may contribute to dynamic and coordinated interaction between different miRNAs in regulatory network. Further, we found that those clustered or homologous miRNAs that were also identified as sense and antisense miRNAs showed larger expression divergence. miRNA gene clusters and families indicated important biological roles, and the specific distribution and expression further enrich and ensure the flexible and robust regulatory network.

  2. Cluster-based analysis for personalized stress evaluation using physiological signals.

    Science.gov (United States)

    Xu, Qianli; Nwe, Tin Lay; Guan, Cuntai

    2015-01-01

    Technology development in wearable sensors and biosignal processing has made it possible to detect human stress from the physiological features. However, the intersubject difference in stress responses presents a major challenge for reliable and accurate stress estimation. This research proposes a novel cluster-based analysis method to measure perceived stress using physiological signals, which accounts for the intersubject differences. The physiological data are collected when human subjects undergo a series of task-rest cycles, incurring varying levels of stress that is indicated by an index of the State Trait Anxiety Inventory. Next, a quantitative measurement of stress is developed by analyzing the physiological features in two steps: 1) a k -means clustering process to divide subjects into different categories (clusters), and 2) cluster-wise stress evaluation using the general regression neural network. Experimental results show a significant improvement in evaluation accuracy as compared to traditional methods without clustering. The proposed method is useful in developing intelligent, personalized products for human stress management.

  3. A clustering analysis of eddies' spatial distribution in the South China Sea

    Directory of Open Access Journals (Sweden)

    J. Yi

    2012-11-01

    Full Text Available Spatial variation is important for studying the mesoscale eddies in the South China Sea (SCS. To investigate such spatial variations, this study made a clustering analysis on eddies' distribution using the K-means approach. Results showed that clustering tendency of anticyclonic eddies (AEs and cyclonic eddies (CEs were weak but not random, and the number of clusters were proved greater than four. Finer clustering results showed 10 regions where AEs densely populated and 6 regions for CEs in the SCS. Previous studies confirmed these partitions and possible generation mechanisms were related. Comparisons between AEs and CEs revealed that patterns of AE are relatively more aggregated than those of CE, and specific distinctions were summarized: (1 to the southwest of Luzon Island, AEs and CEs are generated spatially apart; AEs are likely located north of 14° N and closer to shore, while CEs are to the south and further offshore; (2 the Central SCS and Nansha Trough are mostly dominated by AEs; (3 along 112° E, clusters of AEs and CEs are located sequentially apart, and the pair off Vietnam represents the dipole eddies; (4 to the southwest of Dongsha Islands, AEs are concentrated to the east of CEs. Overlaps of AEs and CEs in the northeastern and Southern SCS were further examined considering seasonal variations. The northeastern overlap represented near-concentric distributions while the southern one was a mixed effect of seasonal variations, complex circulations and topography influences.

  4. A clustering analysis of eddies' spatial distribution in the South China Sea

    Directory of Open Access Journals (Sweden)

    J. Yi

    2013-02-01

    Full Text Available Spatial variation is important for studying the mesoscale eddies in the South China Sea (SCS. To investigate such spatial variations, this study made a clustering analysis on eddies' distribution using the K-means approach. Results showed that clustering tendency of anticyclonic eddies (AEs and cyclonic eddies (CEs were weak but not random, and the number of clusters were proved greater than four. Finer clustering results showed 10 regions where AEs densely populated and 6 regions for CEs in the SCS. Previous studies confirmed these partitions and possible generation mechanisms were related. Comparisons between AEs and CEs revealed that patterns of AE are relatively more aggregated than those of CE, and specific distinctions were summarized: (1 to the southwest of Luzon Island, AEs and CEs are generated spatially apart; AEs are likely located north of 14° N and closer to shore, while CEs are to the south and further offshore. (2 The central SCS and Nansha Trough are mostly dominated by AEs. (3 Along 112° E, clusters of AEs and CEs are located sequentially apart, and the pairs off Vietnam represent the dipole structures. (4 To the southwest of the Dongsha Islands, AEs are concentrated to the east of CEs. Overlaps of AEs and CEs in the northeastern and southern SCS were further examined considering seasonal variations. The northeastern overlap represented near-concentric distributions while the southern one was a mixed effect of seasonal variations, complex circulations and topography influences.

  5. Phenotype Clustering of Breast Epithelial Cells in Confocal Imagesbased on Nuclear Protein Distribution Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Long, Fuhui; Peng, Hanchuan; Sudar, Damir; Levievre, Sophie A.; Knowles, David W.

    2006-09-05

    Background: The distribution of the chromatin-associatedproteins plays a key role in directing nuclear function. Previously, wedeveloped an image-based method to quantify the nuclear distributions ofproteins and showed that these distributions depended on the phenotype ofhuman mammary epithelial cells. Here we describe a method that creates ahierarchical tree of the given cell phenotypes and calculates thestatistical significance between them, based on the clustering analysisof nuclear protein distributions. Results: Nuclear distributions ofnuclear mitotic apparatus protein were previously obtained fornon-neoplastic S1 and malignant T4-2 human mammary epithelial cellscultured for up to 12 days. Cell phenotype was defined as S1 or T4-2 andthe number of days in cultured. A probabilistic ensemble approach wasused to define a set of consensus clusters from the results of multipletraditional cluster analysis techniques applied to the nucleardistribution data. Cluster histograms were constructed to show how cellsin any one phenotype were distributed across the consensus clusters.Grouping various phenotypes allowed us to build phenotype trees andcalculate the statistical difference between each group. The resultsshowed that non-neoplastic S1 cells could be distinguished from malignantT4-2 cells with 94.19 percent accuracy; that proliferating S1 cells couldbe distinguished from differentiated S1 cells with 92.86 percentaccuracy; and showed no significant difference between the variousphenotypes of T4-2 cells corresponding to increasing tumor sizes.Conclusion: This work presents a cluster analysis method that canidentify significant cell phenotypes, based on the nuclear distributionof specific proteins, with high accuracy.

  6. Validation of hierarchical cluster analysis for identification of bacterial species using 42 bacterial isolates

    Science.gov (United States)

    Ghebremedhin, Meron; Yesupriya, Shubha; Luka, Janos; Crane, Nicole J.

    2015-03-01

    Recent studies have demonstrated the potential advantages of the use of Raman spectroscopy in the biomedical field due to its rapidity and noninvasive nature. In this study, Raman spectroscopy is applied as a method for differentiating between bacteria isolates for Gram status and Genus species. We created models for identifying 28 bacterial isolates using spectra collected with a 785 nm laser excitation Raman spectroscopic system. In order to investigate the groupings of these samples, partial least squares discriminant analysis (PLSDA) and hierarchical cluster analysis (HCA) was implemented. In addition, cluster analyses of the isolates were performed using various data types consisting of, biochemical tests, gene sequence alignment, high resolution melt (HRM) analysis and antimicrobial susceptibility tests of minimum inhibitory concentration (MIC) and degree of antimicrobial resistance (SIR). In order to evaluate the ability of these models to correctly classify bacterial isolates using solely Raman spectroscopic data, a set of 14 validation samples were tested using the PLSDA models and consequently the HCA models. External cluster evaluation criteria of purity and Rand index were calculated at different taxonomic levels to compare the performance of clustering using Raman spectra as well as the other datasets. Results showed that Raman spectra performed comparably, and in some cases better than, the other data types with Rand index and purity values up to 0.933 and 0.947, respectively. This study clearly demonstrates that the discrimination of bacterial species using Raman spectroscopic data and hierarchical cluster analysis is possible and has the potential to be a powerful point-of-care tool in clinical settings.

  7. Class-first analysis in a continuum: an approach to the complexities of schools, society, and insurgent science

    Science.gov (United States)

    Valdiviezo, Laura Alicia

    2010-06-01

    This essay addresses Katherine Richardson Bruna's paper: Mexican Immigrant Transnational Social Capital and Class Transformation: Examining the Role of Peer Mediation in Insurgent Science, through five main points . First, I offer a comparison between the traditional analysis of classism in Latin America and Richardson Bruna's call for a class-first analysis in the North American social sciences where there has been a tendency to obviate the specific examination of class relations and class issues. Secondly, I discuss that a class-first analysis solely cannot suffice to depict the complex dimensions in the relations of schools and society. Thus, I suggest a continuum in the class-first analysis. Third, I argue that social constructions surrounding issues of language, ethnicity, and gender necessarily intersect with issues of class and that, in fact, those other constructions offer compatible epistemologies that aid in representing the complexity of social and institutional practices in the capitalist society. Richardson Bruna's analysis of Augusto's interactions with his teacher and peers in the science class provides a fourth point of discussion in this essay. As a final point in my response I discuss Richardson Bruna's idea of making accessible class-first analysis knowledge to educators and especially to science teachers.

  8. Globular Cluster Abundances from High-resolution, Integrated-light Spectroscopy. II. Expanding the Metallicity Range for Old Clusters and Updated Analysis Techniques

    Science.gov (United States)

    Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew

    2017-01-01

    We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ∼0.1 dex for GCs with metallicities as high as [Fe/H] = ‑0.3, but the abundances measured for more metal-rich clusters may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na i, Mg i, Al i, Si i, Ca i, Ti i, Ti ii, Sc ii, V i, Cr i, Mn i, Co i, Ni i, Cu i, Y ii, Zr i, Ba ii, La ii, Nd ii, and Eu ii. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe i, Ca i, Si i, Ni i, and Ba ii. The elements that show the greatest differences include Mg i and Zr i. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements. This paper includes data gathered with the 6.5 m Magellan Telescopes located at Las Campanas Observatory, Chile.

  9. CLUSTERING ANALYSIS OF OFFICER'S BEHAVIOURS IN LONDON POLICE FOOT PATROL ACTIVITIES

    Directory of Open Access Journals (Sweden)

    J. Shen

    2015-07-01

    Full Text Available In this small paper we aim at presenting a framework of conceptual representation and clustering analysis of police officers’ patrol pattern obtained from mining their raw movement trajectory data. This have been achieved by a model developed to accounts for the spatio-temporal dynamics human movements by incorporating both the behaviour features of the travellers and the semantic meaning of the environment they are moving in. Hence, the similarity metric of traveller behaviours is jointly defined according to the stay time allocation in each Spatio-temporal region of interests (ST-ROI to support clustering analysis of patrol behaviours. The proposed framework enables the analysis of behaviour and preferences on higher level based on raw moment trajectories. The model is firstly applied to police patrol data provided by the Metropolitan Police and will be tested by other type of dataset afterwards.

  10. Cluster analysis of midlatitude oceanic cloud regimes: mean properties and temperature sensitivity

    Directory of Open Access Journals (Sweden)

    N. D. Gordon

    2010-07-01

    Full Text Available Clouds play an important role in the climate system by reducing the amount of shortwave radiation reaching the surface and the amount of longwave radiation escaping to space. Accurate simulation of clouds in computer models remains elusive, however, pointing to a lack of understanding of the connection between large-scale dynamics and cloud properties. This study uses a k-means clustering algorithm to group 21 years of satellite cloud data over midlatitude oceans into seven clusters, and demonstrates that the cloud clusters are associated with distinct large-scale dynamical conditions. Three clusters correspond to low-level cloud regimes with different cloud fraction and cumuliform or stratiform characteristics, but all occur under large-scale descent and a relatively dry free troposphere. Three clusters correspond to vertically extensive cloud regimes with tops in the middle or upper troposphere, and they differ according to the strength of large-scale ascent and enhancement of tropospheric temperature and humidity. The final cluster is associated with a lower troposphere that is dry and an upper troposphere that is moist and experiencing weak ascent and horizontal moist advection.

    Since the present balance of reflection of shortwave and absorption of longwave radiation by clouds could change as the atmosphere warms from increasing anthropogenic greenhouse gases, we must also better understand how increasing temperature modifies cloud and radiative properties. We therefore undertake an observational analysis of how midlatitude oceanic clouds change with temperature when dynamical processes are held constant (i.e., partial derivative with respect to temperature. For each of the seven cloud regimes, we examine the difference in cloud and radiative properties between warm and cold subsets. To avoid misinterpreting a cloud response to large-scale dynamical forcing as a cloud response to temperature, we require horizontal and vertical

  11. Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters II: NGC 5024, NGC 5272, and NGC 6352

    CERN Document Server

    Wagner-Kaiser, R; Robinson, E; von Hippel, T; Sarajedini, A; van Dyk, D A; Stein, N; Jefferys, W H

    2016-01-01

    We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of Galactic Globular Clusters to find and characterize two stellar populations in NGC 5024 (M53), NGC 5272 (M3), and NGC 6352. For these three clusters, both single and double-population analyses are used to determine a best fit isochrone(s). We employ a sophisticated Bayesian analysis technique to simultaneously fit the cluster parameters (age, distance, absorption, and metallicity) that characterize each cluster. For the two-population analysis, unique population level helium values are also fit to each distinct population of the cluster and the relative proportions of the populations are determined. We find differences in helium ranging from $\\sim$0.05 to 0.11 for these three clusters. Model grids with solar $\\alpha$-element abundances ([$\\alpha$/Fe] =0.0) and enhanced $\\alpha$-elements ([$\\alpha$/Fe]=0.4) are adopted.

  12. A tensor analysis to evaluate the effect of high-pull headgear on Class II malocclusions.

    Science.gov (United States)

    Ngan, P; Scheick, J; Florman, M

    1993-03-01

    The inaccuracies inherent in cephalometric analysis of treatment effects are well known. The objective of this article is to present a more reliable research tool in the analysis of cephalometric data. Bookstein introduced a dilation function by means of a homogeneous deformation tensor as a method of describing changes in cephalometric data. His article gave an analytic description of the deformation tensor that permits the rapid and highly accurate calculation of it on a desktop computer. The first part of this article describes the underlying ideas and mathematics. The second part uses the tensor analysis to analyze the cephalometric results of a group of patients treated with high-pull activator (HPA) to demonstrate the application of this research tool. Eight patients with Class II skeletal open bite malocclusions in the mixed dentition were treated with HPA. A control sample consisting of eight untreated children with Class II who were obtained from The Ohio State University Growth Study was used as a comparison group. Lateral cephalograms taken before and at the completion of treatment were traced, digitized, and analyzed with the conventional method and tensor analysis. The results showed that HPA had little or no effect on maxillary skeletal structures. However, reduction in growth rate was found with the skeletal triangle S-N-A, indicating a posterior tipping and torquing of the maxillary incisors. The treatment also induced additional deformation on the mandible in a downward and slightly forward direction. Together with the results from the conventional cephalometric analysis, HPA seemed to provide the vertical and rotational control of the maxilla during orthopedic Class II treatment by inhibiting the downward and forward eruptive path of the upper posterior teeth. The newly designed computer software permits rapid analysis of cephalometric data with the tensor analysis on a desktop computer. This tool may be useful in analyzing growth changes for

  13. Original article Exploring somatization types among patients in Indonesia: latent class analysis using the Adult Symptom Inventory

    Directory of Open Access Journals (Sweden)

    Wahyu Widhiarso

    2014-12-01

    Full Text Available Background The aim of this study was to explore somatization types by reducing patient complaints to their most basic and parsimonious characteristics. We hypothesized that there were latent groups representing distinct types of somatization. Participants and procedure Data were collected from patients undergoing both inpatient and outpatient treatment at two hospitals in Yogyakarta, Indonesia (N = 212. Results Results from latent class analysis revealed four classes of somatization: two classes (Classes 1 and 2 referring to levels of somatization and two classes (Classes 3 and 4 referring to unique types of somatization. The first two classes (Classes 1 and 2; low and high levels of somatization, respectively corresponded to the number of different symptoms that patients reported out of the list of physical symptoms in the Adult Symptom Inventory. The second two classes (Classes 3 and 4; non-serious and critical complaints, respectively corresponded to two different sets of symptoms. Patients in Class 3 tended to report temporary mild complaints that are common in daily life, such as dizziness, nausea, and stomach pain. Patients in Class 4 tended to report severe complaints and medical problems that require serious treatment or medication, such as deafness or blindness. Conclusions The present study do confirm somatization as a unidimensional experience reflecting a general tendency to report somatic symptoms, but rather support the understanding of somatization as a multidimensional construct.

  14. 中国南瓜自交系的聚类分析%Cluster Analysis of Chinese Pumpkin Inbred Lines

    Institute of Scientific and Technical Information of China (English)

    杜晓华; 李小梅; 李新峥

    2008-01-01

    A cluster analysis was carried out based on Euclidean genetic distances through UPGMA method in Chinese pumpkin inbred lines. 7 important agronomic traits of 46 Chinese pumpkin inbred lines were investigated. The result indicated that 46 pumpkin inbred lines were clustered into 4 groups and the inter-groups distances was larger than that in intra-group. The genetic distances of parents were related to F1 performance and the results of cluster would increase effectiveness in the Chinese pumpkin crossing breeding.

  15. Study and Analysis of K-Means Clustering Algorithm Using Rapidminer

    Directory of Open Access Journals (Sweden)

    Abhinn Pandey

    2014-12-01

    Full Text Available Institution is a place where teacher explains and student just understands and learns the lesson. Every student has his own definition for toughness and easiness and there isn’t any absolute scale for measuring knowledge but examination score indicate the performance of student. In this case study, knowledge of data mining is combined with educational strategies to improve students’ performance. Generally, data mining (sometimes called data or knowledge discovery is the process of analysing data from different perspectives and summarizing it into useful information. Data mining software is one of a number of analytical tools for data. It allows users to analyse data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational database. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster are more similar (in some sense or another to each other than to those in other groups (clusters.This project describes the use of clustering data mining technique to improve the efficiency of academic performance in the educational institutions .In this project, a live experiment was conducted on students .By conducting an exam on students of computer science major using MOODLE(LMS and analysing that data generated using RapidMiner(Datamining Software and later by performing clustering on the data. This method helps to identify the students who need special advising or counselling by the teacher to give high quality of education.

  16. Cluster Mass Calibration at High Redshift: HST Weak Lensing Analysis of 13 Distant Galaxy Clusters from the South Pole Telescope Sunyaev-Zel'dovich Survey

    CERN Document Server

    Schrabback, T; Dietrich, J P; Hoekstra, H; Bocquet, S; Gonzalez, A H; von der Linden, A; McDonald, M; Morrison, C B; Raihan, S F; Allen, S W; Bayliss, M; Benson, B A; Bleem, L E; Chiu, I; Desai, S; Foley, R J; de Haan, T; High, F W; Hilbert, S; Mantz, A B; Massey, R; Mohr, J; Reichardt, C L; Saro, A; Simon, P; Stern, C; Stubbs, C W; Zenteno, A

    2016-01-01

    We present an HST/ACS weak gravitational lensing analysis of 13 massive high-redshift (z_median=0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev-Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass-observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in V-I colour. Our estimate of the source redshift distribution is based on CANDELS data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing...

  17. WHY DO SOME NATIONS SUCCEED AND OTHERS FAIL IN INTERNATIONAL COMPETITION? FACTOR ANALYSIS AND CLUSTER ANALYSIS AT EUROPEAN LEVEL

    Directory of Open Access Journals (Sweden)

    Popa Ion

    2015-07-01

    Full Text Available As stated by Michael Porter (1998: 57, 'this is perhaps the most frequently asked economic question of our times.' However, a widely accepted answer is still missing. The aim of this paper is not to provide the BIG answer for such a BIG question, but rather to provide a different perspective on the competitiveness at the national level. In this respect, we followed a two step procedure, called “tandem analysis”. (OECD, 2008. First we employed a Factor Analysis in order to reveal the underlying factors of the initial dataset followed by a Cluster Analysis which aims classifying the 35 countries according to the main characteristics of competitiveness resulting from Factor Analysis. The findings revealed that clustering the 35 states after the first two factors: Smart Growth and Market Development, which recovers almost 76% of common variability of the twelve original variables, are highlighted four clusters as well as a series of useful information in order to analyze the characteristics of the four clusters and discussions on them.

  18. Interactive Parallel Data Analysis within Data-Centric Cluster Facilities using the IPython Notebook

    Science.gov (United States)

    Pascoe, S.; Lansdowne, J.; Iwi, A.; Stephens, A.; Kershaw, P.

    2012-12-01

    The data deluge is making traditional analysis workflows for many researchers obsolete. Support for parallelism within popular tools such as matlab, IDL and NCO is not well developed and rarely used. However parallelism is necessary for processing modern data volumes on a timescale conducive to curiosity-driven analysis. Furthermore, for peta-scale datasets such as the CMIP5 archive, it is no longer practical to bring an entire dataset to a researcher's workstation for analysis, or even to their institutional cluster. Therefore, there is an increasing need to develop new analysis platforms which both enable processing at the point of data storage and which provides parallelism. Such an environment should, where possible, maintain the convenience and familiarity of our current analysis environments to encourage curiosity-driven research. We describe how we are combining the interactive python shell (IPython) with our JASMIN data-cluster infrastructure. IPython has been specifically designed to bridge the gap between the HPC-style parallel workflows and the opportunistic curiosity-driven analysis usually carried out using domain specific languages and scriptable tools. IPython offers a web-based interactive environment, the IPython notebook, and a cluster engine for parallelism all underpinned by the well-respected Python/Scipy scientific programming stack. JASMIN is designed to support the data analysis requirements of the UK and European climate and earth system modeling community. JASMIN, with its sister facility CEMS focusing the earth observation community, has 4.5 PB of fast parallel disk storage alongside over 370 computing cores provide local computation. Through the IPython interface to JASMIN, users can make efficient use of JASMIN's multi-core virtual machines to perform interactive analysis on all cores simultaneously or can configure IPython clusters across multiple VMs. Larger-scale clusters can be provisioned through JASMIN's batch scheduling system

  19. Bayesian model accounting for within-class biological variability in Serial Analysis of Gene Expression (SAGE

    Directory of Open Access Journals (Sweden)

    Brentani Helena

    2004-08-01

    Full Text Available Abstract Background An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE, "Digital Northern" or Massively Parallel Signature Sequencing (MPSS, is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error. Results We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries" and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it. Conclusion Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.

  20. Countries population determination to test rice crisis indicator at national level using k-means cluster analysis

    Science.gov (United States)

    Hidayat, Y.; Purwandari, T.; Sukono; Ariska, Y. D.

    2017-01-01

    This study aimed to obtain information on the population of the countries which is have similarities with Indonesia based on three characteristics, that is the democratic atmosphere, rice consumption and purchasing power of rice. It is useful as a reference material for research which tested the strength and predictability of the rice crisis indicators Unprecedented Restlessness (UR). The similarities countries with Indonesia were conducted using multivariate analysis that is non-hierarchical cluster analysis k-Means with 38 countries as the data population. This analysis is done repeatedly until the obtainment number of clusters which is capable to show the differentiator power of the three characteristics and describe the high similarity within clusters. Based on the results, it turns out with 6 clusters can describe the differentiator power of characteristics of formed clusters. However, to answer the purpose of the study, only one cluster which will be taken accordance with the criteria of success for the population of countries that have similarities with Indonesia that cluster contain Indonesia therein, there are countries which is sustain crisis and non-crisis of rice in 2008, and cluster which is have the largest member among them. This criterion is met by cluster 2, which consists of 22 countries, namely Indonesia, Brazil, Costa Rica, Djibouti, Dominican Republic, Ecuador, Fiji, Guinea-Bissau, Haiti, India, Jamaica, Japan, Korea South, Madagascar, Malaysia, Mali, Nicaragua, Panama, Peru, Senegal, Sierra Leone and Suriname.

  1. Principal component cluster analysis of ECG time series based on Lyapunov exponent spectrum

    Institute of Scientific and Technical Information of China (English)

    WANG Nai; RUAN Jiong

    2004-01-01

    In this paper we propose an approach of principal component cluster analysis based on Lyapunov exponent spectrum (LES) to analyze the ECG time series. Analysis results of 22 sample-files of ECG from the MIT-BIH database confirmed the validity of our approach. Another technique named improved teacher selecting student (TSS) algorithm is presented to analyze unknown samples by means of some known ones, which is of better accuracy. This technique combines the advantages of both statistical and nonlinear dynamical methods and is shown to be significant to the analysis of nonlinear ECG time series.

  2. Clustering by Pattern Similarity

    Institute of Scientific and Technical Information of China (English)

    Hai-xun Wang; Jian Pei

    2008-01-01

    The task of clustering is to identify classes of similar objects among a set of objects. The definition of similarity varies from one clustering model to another. However, in most of these models the concept of similarity is often based on such metrics as Manhattan distance, Euclidean distance or other Lp distances. In other words, similar objects must have close values in at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. The new similarity concept models a wide range of applications. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collaborative filtering, can also benefit from the new model, because it is able to capture not only the closeness of values of certain leading indicators but also the closeness of (purchasing, browsing, etc.) patterns exhibited by the customers. In addition to the novel similarity model, this paper also introduces an effective and efficient algorithm to detect such clusters, and we perform tests on several real and synthetic data sets to show its performance.

  3. Diagrammatic analysis of correlations in polymer fluids: Cluster diagrams via Edwards’ field theory

    Science.gov (United States)

    Morse, David C.

    2006-10-01

    Edwards' functional integral approach to the statistical mechanics of polymer liquids is amenable to a diagrammatic analysis in which free energies and correlation functions are expanded as infinite sums of Feynman diagrams. This analysis is shown to lead naturally to a perturbative cluster expansion that is closely related to the Mayer cluster expansion developed for molecular liquids by Chandler and co-workers. Expansion of the functional integral representation of the grand-canonical partition function yields a perturbation theory in which all quantities of interest are expressed as functionals of a monomer-monomer pair potential, as functionals of intramolecular correlation functions of non-interacting molecules, and as functions of molecular activities. In different variants of the theory, the pair potential may be either a bare or a screened potential. A series of topological reductions yields a renormalized diagrammatic expansion in which collective correlation functions are instead expressed diagrammatically as functionals of the true single-molecule correlation functions in the interacting fluid, and as functions of molecular number density. Similar renormalized expansions are also obtained for a collective Ornstein-Zernicke direct correlation function, and for intramolecular correlation functions. A concise discussion is given of the corresponding Mayer cluster expansion, and of the relationship between the Mayer and perturbative cluster expansions for liquids of flexible molecules. The application of the perturbative cluster expansion to coarse-grained models of dense multi-component polymer liquids is discussed, and a justification is given for the use of a loop expansion. As an example, the formalism is used to derive a new expression for the wave-number dependent direct correlation function and recover known expressions for the intramolecular two-point correlation function to first-order in a renormalized loop expansion for coarse-grained models of

  4. Analysis of Decision Trees in Context Clustering of Hidden Markov Model Based Thai Speech Synthesis

    Directory of Open Access Journals (Sweden)

    Suphattharachai Chomphan

    2011-01-01

    Full Text Available Problem statement: In Thai speech synthesis using Hidden Markov model (HMM based synthesis system, the tonal speech quality is degraded due to tone distortion. This major problem must be treated appropriately to preserve the tone characteristics of each syllable unit. Since tone brings about the intelligibility of the synthesized speech. It is needed to establish the tone questions and other phonetic questions in tree-based context clustering process accordingly. Approach: This study describes the analysis of questions in tree-based context clustering process of an HMM-based speech synthesis system for Thai language. In the system, spectrum, pitch or F0 and state duration are modeled simultaneously in a unified framework of HMM, their parameter distributions are clustered independently by using a decision-tree based context clustering technique. The contextual factors which affect spectrum, pitch and duration, i.e., part of speech, position and number of phones in a syllable, position and number of syllables in a word, position and number of words in a sentence, phone type and tone type, are taken into account for constructing the questions of the decision tree. All in all, thirteen sets of questions are analyzed in comparison. Results: In the experiment, we analyzed the decision trees by counting the number of questions in each node coming from those thirteen sets and by calculating the dominance score given to each question as the reciprocal of the distance from the root node to the question node. The highest number and dominance score are of the set of phonetic type, while the second, third highest ones are of the set of part of speech and tone type. Conclusion: By counting the number of questions in each node and calculating the dominance score, we can set the priority of each question set. All in all, the analysis results bring about further development of Thai speech synthesis with efficient context clustering process in

  5. Joint Analysis of Galaxy-Galaxy Lensing and Galaxy Clustering: Methodology and Forecasts for DES

    CERN Document Server

    Park, Y; Dodelson, S; Jain, B; Amara, A; Becker, M R; Bridle, S L; Clampitt, J; Crocce, M; Fosalba, P; Gaztanaga, E; Honscheid, K; Rozo, E; Sobreira, F; Sánchez, C; Wechsler, R H; Abbott, T; Abdalla, F B; Allam, S; Benoit-Lévy, A; Bertin, E; Brooks, D; Buckley-Geer, E; Burke, D L; Rosell, A Carnero; Kind, M Carrasco; Carretero, J; Castander, F J; da Costa, L N; DePoy, D L; Desai, S; Dietrich, J P; Gerdes, D W; Gruen, D; Gruendl, R A; Gutierrez, G; James, D J; Kent, S; Kuehn, K; Kuropatkin, N; Lima, M; Maia, M A G; Marshall, J L; Melchior, P; Miller, C J; Sanchez, E; Scarpine, V; Schubnell, M; Sevilla-Noarbe, I; Soares-Santos, M; Suchyta, E; Swanson, M E C; Tarle, G; Thaler, J; Vikram, V; Walker, A R; Weller, J; Zuntz, J

    2015-01-01

    The joint analysis of galaxy-galaxy lensing and galaxy clustering is a promising method for inferring the growth function of large scale structure. This analysis will be carried out on data from the Dark Energy Survey (DES), with its measurements of both the distribution of galaxies and the tangential shears of background galaxies induced by these foreground lenses. We develop a practical approach to modeling the assumptions and systematic effects affecting small scale lensing, which provides halo masses, and large scale galaxy clustering. Introducing parameters that characterize the halo occupation distribution (HOD), photometric redshift uncertainties, and shear measurement errors, we study how external priors on different subsets of these parameters affect our growth constraints. Degeneracies within the HOD model, as well as between the HOD and the growth function, are identified as the dominant source of complication, with other systematic effects sub-dominant. The impact of HOD parameters and their degen...

  6. Data Clustering

    Science.gov (United States)

    Wagstaff, Kiri L.

    2012-03-01

    On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained

  7. Accuracy of a class of concurrent algorithms for transient finite element analysis

    Science.gov (United States)

    Ortiz, Michael; Sotelino, Elisa D.; Nour-Omid, Bahram

    1988-01-01

    The accuracy of a new class of concurrent procedures for transient finite element analysis is examined. A phase error analysis is carried out which shows that wave retardation leading to unacceptable loss of accuracy may occur if a Courant condition based on the dimensions of the subdomains is violated. Numerical tests suggest that this Courant condition is conservative for typical structural applications and may lead to a marked increase in accuracy as the number of subdomains is increased. Theoretical speed-up ratios are derived which suggest that the algorithms under consideration can be expected to exhibit a performance superior to that of globally implicit methods when implemented on parallel machines.

  8. Emergent team roles in organizational meetings: Identifying communication patterns via cluster analysis.

    OpenAIRE

    Lehmann-Willenbrock, N.K.; Beck, S.J.; Kauffeld, S.

    2016-01-01

    Previous team role taxonomies have largely relied on self-report data, focused on functional roles, and described individual predispositions or personality traits. Instead, this study takes a communicative approach and proposes that team roles are produced, shaped, and sustained in communicative behaviors. To identify team roles communicatively, 59 regular organizational meetings were videotaped and analyzed. Cluster analysis revealed five emergent roles: the solution seeker, the problem anal...

  9. A Study on Differences of China’s Regional Economic Development Level Based on Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Qi Yaoyuan

    2015-01-01

    Full Text Available An evaluation index system of regional economic development is established in this paper and STATA11.0 is used in the cluster analysis on samplings of 31 provincial regions. Results indicate that the economy of most regions is still in a backward stage except a few developed regions and the economic polarization of China is quite serious. This study provides a reference for the coordinated and rapid development of China’s economy.

  10. Dietary Patterns Derived by Cluster Analysis are Associated with Cognitive Function among Korean Older Adults

    OpenAIRE

    Jihye Kim; Areum Yu; Bo Youl Choi; Jung Hyun Nam; Mi Kyung Kim; Dong Hoon Oh; Yoon Jung Yang

    2015-01-01

    The objective of this study was to investigate major dietary patterns among older Korean adults through cluster analysis and to determine an association between dietary patterns and cognitive function. This is a cross-sectional study. The data from the Korean Multi-Rural Communities Cohort Study was used. Participants included 765 participants aged 60 years and over. A quantitative food frequency questionnaire with 106 items was used to investigate dietary intake. The Korean version of the M...

  11. Using cluster analysis for the estimation of efficiency of strategic management of the region enterprises

    Directory of Open Access Journals (Sweden)

    Feklistova Inessa

    2016-02-01

    Full Text Available The article presents methodical approach to the estimation of strategic man-agement efficiency of enterprises of the region with the use of cluster analysis, realized by means of the specially worked out application package. The necessity of its application in the analytical work of economic services of the region enterprises has been proved. It will allow to improve the quality of monitoring, and scientifically substantiate strategic administrative decisions

  12. A THREE-STEP SPATIAL-TEMPORAL-SEMANTIC CLUSTERING METHOD FOR HUMAN ACTIVITY PATTERN ANALYSIS

    Directory of Open Access Journals (Sweden)

    W. Huang

    2016-06-01

    Full Text Available How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time to four dimensions (space, time and semantics. More specifically, not only a location and time that people stay and spend are collected, but also what people “say” for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The

  13. Cluster analysis for identifying sub-groups and selecting potential discriminatory variables in human encephalitis

    Directory of Open Access Journals (Sweden)

    Crowcroft Natasha S

    2010-12-01

    Full Text Available Abstract Background Encephalitis is an acute clinical syndrome of the central nervous system (CNS, often associated with fatal outcome or permanent damage, including cognitive and behavioural impairment, affective disorders and epileptic seizures. Infection of the central nervous system is considered to be a major cause of encephalitis and more than 100 different pathogens have been recognized as causative agents. However, a large proportion of cases have unknown disease etiology. Methods We perform hierarchical cluster analysis on a multicenter England encephalitis data set with the aim of identifying sub-groups in human encephalitis. We use the simple matching similarity measure which is appropriate for binary data sets and performed variable selection using cluster heatmaps. We also use heatmaps to visually assess underlying patterns in the data, identify the main clinical and laboratory features and identify potential risk factors associated with encephalitis. Results Our results identified fever, personality and behavioural change, headache and lethargy as the main characteristics of encephalitis. Diagnostic variables such as brain scan and measurements from cerebrospinal fluids are also identified as main indicators of encephalitis. Our analysis revealed six major clusters in the England encephalitis data set. However, marked within-cluster heterogeneity is observed in some of the big clusters indicating possible sub-groups. Overall, the results show that patients are clustered according to symptom and diagnostic variables rather than causal agents. Exposure variables such as recent infection, sick person contact and animal contact have been identified as potential risk factors. Conclusions It is in general assumed and is a common practice to group encephalitis cases according to disease etiology. However, our results indicate that patients are clustered with respect to mainly symptom and diagnostic variables rather than causal agents

  14. Seismic clusters analysis in North-Eastern Italy by the nearest-neighbor approach

    Science.gov (United States)

    Peresan, Antonella; Gentili, Stefania

    2016-04-01

    The main features of earthquake clusters in the Friuli Venezia Giulia Region (North Eastern Italy) are explored, with the aim to get some new insights on local scale patterns of seismicity in the area. The study is based on a systematic analysis of robustly and uniformly detected seismic clusters of small-to-medium magnitude events, as opposed to selected clusters analyzed in earlier studies. To characterize the features of seismicity for FVG, we take advantage of updated information from local OGS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics, Centre of Seismological Research, since 1977. A preliminary reappraisal of the earthquake bulletins is carried out, in order to identify possible missing events and to remove spurious records (e.g. duplicates and explosions). The area of sufficient completeness is outlined; for this purpose, different techniques are applied, including a comparative analysis with global ISC data, which are available in the region for large and moderate size earthquakes. Various techniques are considered to estimate the average parameters that characterize the earthquake occurrence in the region, including the b-value and the fractal dimension of epicenters distribution. Specifically, besides the classical Gutenberg-Richter Law, the Unified Scaling Law for Earthquakes, USLE, is applied. Using the updated and revised OGS data, a new formal method for detection of earthquake clusters, based on nearest-neighbor distances of events in space-time-energy domain, is applied. The bimodality of the distribution, which characterizes the earthquake nearest-neighbor distances, is used to decompose the seismic catalog into sequences of individual clusters and background seismicity. Accordingly, the method allows for a data-driven identification of main shocks (first event with the largest magnitude in the cluster), foreshocks and aftershocks. Average robust estimates of the USLE parameters (particularly, b

  15. a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis

    Science.gov (United States)

    Huang, W.; Li, S.; Xu, S.

    2016-06-01

    How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the

  16. gamma-rays from annihilating dark matter in galaxy clusters: stacking vs single source analysis

    CERN Document Server

    Nezri, E; Combet, C; Maurin, D; Pointecouteau, E; Hinton, J A

    2012-01-01

    Clusters of galaxies are potentially important targets for indirect searches for dark matter annihilation. Here, we reassess the detection prospects for annihilation in massive halos, based on a statistical investigation of 1743 clusters from the recent MCXC meta-catalogue. We derive a new data-driven limit for the extra-galactic DM annihilation background Jextra-gal>JGal/5 and consider a source-stacking approach. The number of clusters scales with their brightness (boosted by DM substructures) to the power of -2 for an integration angle 0.1deg. It suggests that stacking may provide a significant improvement over a single target analysis for gamma-ray observations at high-energies where the angular resolution achievable is comparable to this angle. In our study the mean angle containing 80% of the dark-matter signal for the entire sample (assuming an NFW DM profile) is 0.15deg. It indicates that instruments with this angular resolution or better would be optimal for a cluster annihilation search based on stac...

  17. Tully-Fisher analysis of the multiple cluster system Abell 901/902

    CERN Document Server

    Bösch, Benjamin; Wolf, Christian; Aragón-Salamanca, Alfonso; Ziegler, Bodo L; Barden, Marco; Gray, Meghan E; Balogh, Michael; Meisenheimer, Klaus; Schindler, Sabine

    2013-01-01

    We derive rotation curves from optical emission lines of 182 disk galaxies (96 in the cluster and 86 in the field) in the region of Abell 901/902 located at $z\\sim 0.165$. We focus on the analysis of B-band and stellar-mass Tully-Fisher relations. We examine possible environmental dependencies and differences between normal spirals and "dusty red" galaxies, i.e. disk galaxies that have red colors due to relatively low star formation rates. We find no significant differences between the best-fit TF slope of cluster and field galaxies. At fixed slope, the field population with high-quality rotation curves (57 objects) is brighter by $\\Delta M_{B}=-0\\fm42\\pm0\\fm15$ than the cluster population (55 objects). We show that this slight difference is at least in part an environmental effect. The scatter of the cluster TFR increases for galaxies closer to the core region, also indicating an environmental effect. Interestingly, dusty red galaxies become fainter towards the core at given rotation velocity (i.e. total mas...

  18. Sequencing and transcriptional analysis of the biosynthesis gene cluster of putrescine-producing Lactococcus lactis.

    Science.gov (United States)

    Ladero, Victor; Rattray, Fergal P; Mayo, Baltasar; Martín, María Cruz; Fernández, María; Alvarez, Miguel A

    2011-09-01

    Lactococcus lactis is a prokaryotic microorganism with great importance as a culture starter and has become the model species among the lactic acid bacteria. The long and safe history of use of L. lactis in dairy fermentations has resulted in the classification of this species as GRAS (General Regarded As Safe) or QPS (Qualified Presumption of Safety). However, our group has identified several strains of L. lactis subsp. lactis and L. lactis subsp. cremoris that are able to produce putrescine from agmatine via the agmatine deiminase (AGDI) pathway. Putrescine is a biogenic amine that confers undesirable flavor characteristics and may even have toxic effects. The AGDI cluster of L. lactis is composed of a putative regulatory gene, aguR, followed by the genes (aguB, aguD, aguA, and aguC) encoding the catabolic enzymes. These genes are transcribed as an operon that is induced in the presence of agmatine. In some strains, an insertion (IS) element interrupts the transcription of the cluster, which results in a non-putrescine-producing phenotype. Based on this knowledge, a PCR-based test was developed in order to differentiate nonproducing L. lactis strains from those with a functional AGDI cluster. The analysis of the AGDI cluster and their flanking regions revealed that the capacity to produce putrescine via the AGDI pathway could be a specific characteristic that was lost during the adaptation to the milk environment by a process of reductive genome evolution.

  19. A Method for Traffic Congestion Clustering Judgment Based on Grey Relational Analysis

    Directory of Open Access Journals (Sweden)

    Yingya Zhang

    2016-05-01

    Full Text Available Traffic congestion clustering judgment is a fundamental problem in the study of traffic jam warning. However, it is not satisfactory to judge traffic congestion degrees using only vehicle speed. In this paper, we collect traffic flow information with three properties (traffic flow velocity, traffic flow density and traffic volume of urban trunk roads, which is used to judge the traffic congestion degree. We first define a grey relational clustering model by leveraging grey relational analysis and rough set theory to mine relationships of multidimensional-attribute information. Then, we propose a grey relational membership degree rank clustering algorithm (GMRC to discriminant clustering priority and further analyze the urban traffic congestion degree. Our experimental results show that the average accuracy of the GMRC algorithm is 24.9% greater than that of the K-means algorithm and 30.8% greater than that of the Fuzzy C-Means (FCM algorithm. Furthermore, we find that our method can be more conducive to dynamic traffic warnings.

  20. A spatial cluster analysis of tractor overturns in Kentucky from 1960 to 2002

    Science.gov (United States)

    Saman, D.M.; Cole, H.P.; Odoi, A.; Myers, M.L.; Carey, D.I.; Westneat, S.C.

    2012-01-01

    Background: Agricultural tractor overturns without rollover protective structures are the leading cause of farm fatalities in the United States. To our knowledge, no studies have incorporated the spatial scan statistic in identifying high-risk areas for tractor overturns. The aim of this study was to determine whether tractor overturns cluster in certain parts of Kentucky and identify factors associated with tractor overturns. Methods: A spatial statistical analysis using Kulldorff's spatial scan statistic was performed to identify county clusters at greatest risk for tractor overturns. A regression analysis was then performed to identify factors associated with tractor overturns. Results: The spatial analysis revealed a cluster of higher than expected tractor overturns in four counties in northern Kentucky (RR = 2.55) and 10 counties in eastern Kentucky (RR = 1.97). Higher rates of tractor overturns were associated with steeper average percent slope of pasture land by county (p = 0.0002) and a greater percent of total tractors with less than 40 horsepower by county (ptractor overturns exist in Kentucky and identifies factors associated with overturns. This study provides policymakers a guide to targeted county-level interventions (e.g., roll-over protective structures promotion interventions) with the intention of reducing tractor overturns in the highest risk counties in Kentucky. ?? 2012 Saman et al.