WorldWideScience

Sample records for mining cluster analysis

  1. Using Cluster Analysis for Data Mining in Educational Technology Research

    Science.gov (United States)

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

    2012-01-01

    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  2. URL Mining Using Agglomerative Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Chinmay R. Deshmukh

    2015-02-01

    Full Text Available Abstract The tremendous growth of the web world incorporates application of data mining techniques to the web logs. Data Mining and World Wide Web encompasses an important and active area of research. Web log mining is analysis of web log files with web pages sequences. Web mining is broadly classified as web content mining web usage mining and web structure mining. Web usage mining is a technique to discover usage patterns from Web data in order to understand and better serve the needs of Web-based applications. URL mining refers to a subclass of Web mining that helps us to investigate the details of a Uniform Resource Locator. URL mining can be advantageous in the fields of security and protection. The paper introduces a technique for mining a collection of user transactions with an Internet search engine to discover clusters of similar queries and similar URLs. The information we exploit is a clickthrough data each record consist of a users query to a search engine along with the URL which the user selected from among the candidates offered by search engine. By viewing this dataset as a bipartite graph with the vertices on one side corresponding to queries and on the other side to URLs one can apply an agglomerative clustering algorithm to the graphs vertices to identify related queries and URLs.

  3. Android Malware Clustering through Malicious Payload Mining

    OpenAIRE

    Li, Yuping; Jang, Jiyong; Hu, Xin; Ou, Xinming

    2017-01-01

    Clustering has been well studied for desktop malware analysis as an effective triage method. Conventional similarity-based clustering techniques, however, cannot be immediately applied to Android malware analysis due to the excessive use of third-party libraries in Android application development and the widespread use of repackaging in malware development. We design and implement an Android malware clustering system through iterative mining of malicious payload and checking whether malware s...

  4. Cluster analysis to evaluate stable chemical elements and physical-chemical parameters behavior on uranium mining waste

    Energy Technology Data Exchange (ETDEWEB)

    Pereira, Wagner de Souza; Py Junior, Delcy de Azevedo; Goncalves, Simone, E-mail: wspereira@inb.gov.br [Unidade de Tratamento de Minerio (UTM/INB), Pocos de Caldas, MG (Brazil). Coordenacao de Protecao Radiologica. Grupo Multidisciplinar de Radioprotecao; Kelecom, Alphonse [Universidade Federal Fluminense (UFF), Niteroi, RJ (Brazil). Inst. de Biologia. Lab. de Radiobiologia e Radiometria Pedro Lopes dos Santos; Morais, Gustavo Ferrari de; Campelo, Emanuele Lazzaretti Cordova [Unidade de Tratamento de Minerio (UTM/INB), Pocos de Caldas, MG (Brazil). Coordenacao de Desenvolvimento de Processos; Dores, Luis Augusto de Carvalho Bresser [Unidade de Tratamento de Minerio (UTM/INB), Pocos de Caldas, MG (Brazil). Gerencia de Descomissionamento

    2011-07-01

    The Ore Treating Unit (UTM, in portuguese) is a deactivated uranium mine. A cluster analysis was used to evaluate the behavior of stable chemical elements and physical-chemical parameters in their effluents. The utilization of the cluster analysis proved itself effective in the assessment, allowing the identification of groups of chemical elements, physical-chemical parameters and their joint analysis (elements and parameters). As a result we may assert, based on data analysis, that there is a strong link between calcium and magnesium and between aluminum and rare-earth oxides on UTM's effluents. Sulphate was also identified as strongly linked to total and dissolved solids, and those to electrical conductivity. There were other associations, but not so strongly linked. Further gathering, to seasonal evaluation, are required in order to confirm those analysis. Additional statistical analysis (factor analysis) must be used to try to identify the origin of the identified groups on this analysis. (author)

  5. Cluster analysis to evaluate stable chemical elements and physical-chemical parameters behavior on uranium mining waste

    International Nuclear Information System (INIS)

    Pereira, Wagner de Souza; Py Junior, Delcy de Azevedo; Goncalves, Simone; Kelecom, Alphonse; Morais, Gustavo Ferrari de; Campelo, Emanuele Lazzaretti Cordova; Dores, Luis Augusto de Carvalho Bresser

    2011-01-01

    The Ore Treating Unit (UTM, in portuguese) is a deactivated uranium mine. A cluster analysis was used to evaluate the behavior of stable chemical elements and physical-chemical parameters in their effluents. The utilization of the cluster analysis proved itself effective in the assessment, allowing the identification of groups of chemical elements, physical-chemical parameters and their joint analysis (elements and parameters). As a result we may assert, based on data analysis, that there is a strong link between calcium and magnesium and between aluminum and rare-earth oxides on UTM's effluents. Sulphate was also identified as strongly linked to total and dissolved solids, and those to electrical conductivity. There were other associations, but not so strongly linked. Further gathering, to seasonal evaluation, are required in order to confirm those analysis. Additional statistical analysis (factor analysis) must be used to try to identify the origin of the identified groups on this analysis. (author)

  6. Frequent Pattern Mining Algorithms for Data Clustering

    DEFF Research Database (Denmark)

    Zimek, Arthur; Assent, Ira; Vreeken, Jilles

    2014-01-01

    that frequent pattern mining was at the cradle of subspace clustering—yet, it quickly developed into an independent research field. In this chapter, we discuss how frequent pattern mining algorithms have been extended and generalized towards the discovery of local clusters in high-dimensional data......Discovering clusters in subspaces, or subspace clustering and related clustering paradigms, is a research field where we find many frequent pattern mining related influences. In fact, as the first algorithms for subspace clustering were based on frequent pattern mining algorithms, it is fair to say....... In particular, we discuss several example algorithms for subspace clustering or projected clustering as well as point out recent research questions and open topics in this area relevant to researchers in either clustering or pattern mining...

  7. Cluster Analysis-Based Approaches for Geospatiotemporal Data Mining of Massive Data Sets for Identification of Forest Threats

    Energy Technology Data Exchange (ETDEWEB)

    Mills, Richard T [ORNL; Hoffman, Forrest M [ORNL; Kumar, Jitendra [ORNL; HargroveJr., William Walter [USDA Forest Service

    2011-01-01

    We investigate methods for geospatiotemporal data mining of multi-year land surface phenology data (250 m2 Normalized Difference Vegetation Index (NDVI) values derived from the Moderate Resolution Imaging Spectrometer (MODIS) in this study) for the conterminous United States (CONUS) as part of an early warning system for detecting threats to forest ecosystems. The approaches explored here are based on k-means cluster analysis of this massive data set, which provides a basis for defining the bounds of the expected or normal phenological patterns that indicate healthy vegetation at a given geographic location. We briefly describe the computational approaches we have used to make cluster analysis of such massive data sets feasible, describe approaches we have explored for distinguishing between normal and abnormal phenology, and present some examples in which we have applied these approaches to identify various forest disturbances in the CONUS.

  8. Clustering analysis

    International Nuclear Information System (INIS)

    Romli

    1997-01-01

    Cluster analysis is the name of group of multivariate techniques whose principal purpose is to distinguish similar entities from the characteristics they process.To study this analysis, there are several algorithms that can be used. Therefore, this topic focuses to discuss the algorithms, such as, similarity measures, and hierarchical clustering which includes single linkage, complete linkage and average linkage method. also, non-hierarchical clustering method, which is popular name K -mean method ' will be discussed. Finally, this paper will be described the advantages and disadvantages of every methods

  9. Cluster analysis

    CERN Document Server

    Everitt, Brian S; Leese, Morven; Stahl, Daniel

    2011-01-01

    Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.Real life examples are used throughout to demons

  10. Data Mining of University Philanthropic Giving: Cluster-Discriminant Analysis and Pareto Effects

    Science.gov (United States)

    Le Blanc, Louis A.; Rucks, Conway T.

    2009-01-01

    A large sample of 33,000 university alumni records were cluster-analyzed to generate six groups relatively unique in their respective attribute values. The attributes used to cluster the former students included average gift to the university's foundation and to the alumni association for the same institution. Cluster detection is useful in this…

  11. Application of Learning Analytics Using Clustering Data Mining for Students' Disposition Analysis

    Science.gov (United States)

    Bharara, Sanyam; Sabitha, Sai; Bansal, Abhay

    2018-01-01

    Learning Analytics (LA) is an emerging field in which sophisticated analytic tools are used to improve learning and education. It draws from, and is closely tied to, a series of other fields of study like business intelligence, web analytics, academic analytics, educational data mining, and action analytics. The main objective of this research…

  12. Cluster analysis

    OpenAIRE

    Mucha, Hans-Joachim; Sofyan, Hizir

    2000-01-01

    As an explorative technique, duster analysis provides a description or a reduction in the dimension of the data. It classifies a set of observations into two or more mutually exclusive unknown groups based on combinations of many variables. Its aim is to construct groups in such a way that the profiles of objects in the same groups are relatively homogenous whereas the profiles of objects in different groups are relatively heterogeneous. Clustering is distinct from classification techniques, ...

  13. Environmental conflict analysis using an integrated grey clustering and entropy-weight method: A case study of a mining project in Peru.

    OpenAIRE

    Delgado-Villanueva, Kiko Alexi; Romero Gil, Inmaculada

    2016-01-01

    [EN] Environmental conflict analysis (henceforth ECA) has become a key factor for the viability of projects and welfare of affected populations. In this study, we propose an approach for ECA using an integrated grey clustering and entropy-weight method (The IGCEW method). The case study considered a mining project in northern Peru. Three stakeholder groups and seven criteria were identified. The data were gathered by conducting field interviews. The results revealed that for the groups urban ...

  14. Clustering-based approaches to SAGE data mining

    Directory of Open Access Journals (Sweden)

    Wang Haiying

    2008-07-01

    Full Text Available Abstract Serial analysis of gene expression (SAGE is one of the most powerful tools for global gene expression profiling. It has led to several biological discoveries and biomedical applications, such as the prediction of new gene functions and the identification of biomarkers in human cancer research. Clustering techniques have become fundamental approaches in these applications. This paper reviews relevant clustering techniques specifically designed for this type of data. It places an emphasis on current limitations and opportunities in this area for supporting biologically-meaningful data mining and visualisation.

  15. Cytokine profile determined by data-mining analysis set into clusters of non-small-cell lung cancer patients according to prognosis.

    Science.gov (United States)

    Barrera, L; Montes-Servín, E; Barrera, A; Ramírez-Tirado, L A; Salinas-Parra, F; Bañales-Méndez, J L; Sandoval-Ríos, M; Arrieta, Ó

    2015-02-01

    Immunoregulatory cytokines may play a fundamental role in tumor growth and metastases. Their effects are mediated through complex regulatory networks. Human cytokine profiles could define patient subgroups and represent new potential biomarkers. The aim of this study was to associate a cytokine profile obtained through data mining with the clinical characteristics of patients with advanced non-small-cell lung cancer (NSCLC). We conducted a prospective study of the plasma levels of 14 immunoregulatory cytokines by ELISA and a cytometric bead array assay in 110 NSCLC patients before chemotherapy and 25 control subjects. Cytokine levels and data-mining profiles were associated with clinical, quality of life and pathological outcomes. NSCLC patients had higher levels of interleukin (IL)-6, IL-8, IL-12p70, IL-17a and interferon (IFN)-γ, and lower levels of IL-33 and IL-29 compared with controls. The pro-inflammatory cytokines IL-1b, IL-6 and IL-8 were associated with lower hemoglobin levels, worse functional performance status (Eastern Cooperative Oncology Group, ECOG), fatigue and hyporexia. The anti-inflammatory cytokines IL-4, IL-10 and IL-33 were associated with anorexia and lower body mass index. We identified three clusters of patients according to data-mining analysis with different overall survival (OS; 25.4, 16.8 and 5.09 months, respectively, P = 0.0012). Multivariate analysis showed that ECOG performance status and data-mining clusters were significantly associated with OS (RR 3.59, [95% CI 1.9-6.7], P < 0.001 and 2.2, [1.2-3.8], P = 0.005). Our results provide evidence that complex cytokine networks may be used to identify patient subgroups with different prognoses in advanced NSCLC. These cytokines may represent potential biomarkers, particularly in the immunotherapy era in cancer research. © The Author 2014. Published by Oxford University Press on behalf of the European Society for Medical Oncology. All rights reserved. For permissions, please email

  16. Cluster analysis for applications

    CERN Document Server

    Anderberg, Michael R

    1973-01-01

    Cluster Analysis for Applications deals with methods and various applications of cluster analysis. Topics covered range from variables and scales to measures of association among variables and among data units. Conceptual problems in cluster analysis are discussed, along with hierarchical and non-hierarchical clustering methods. The necessary elements of data analysis, statistics, cluster analysis, and computer implementation are integrated vertically to cover the complete path from raw data to a finished analysis.Comprised of 10 chapters, this book begins with an introduction to the subject o

  17. Cluster Analysis-Based Approaches for Geospatiotemporal Data Mining of Massive Data Sets for Identification of Forest Threats

    Science.gov (United States)

    Richard Trans Mills; Forrest M Hoffman; Jitendra Kumar; William W. Hargrove

    2011-01-01

    We investigate methods for geospatiotemporal data mining of multi-year land surface phenology data (250 m2 Normalized Difference Vegetation Index (NDVI) values derived from the Moderate Resolution Imaging Spectrometer (MODIS) in this study) for the conterminous United States (CONUS) as part of an early warning system for detecting threats to forest ecosystems. The...

  18. Marketing research cluster analysis

    Directory of Open Access Journals (Sweden)

    Marić Nebojša

    2002-01-01

    Full Text Available One area of applications of cluster analysis in marketing is identification of groups of cities and towns with similar demographic profiles. This paper considers main aspects of cluster analysis by an example of clustering 12 cities with the use of Minitab software.

  19. Marketing research cluster analysis

    OpenAIRE

    Marić Nebojša

    2002-01-01

    One area of applications of cluster analysis in marketing is identification of groups of cities and towns with similar demographic profiles. This paper considers main aspects of cluster analysis by an example of clustering 12 cities with the use of Minitab software.

  20. Text Mining in Biomedical Domain with Emphasis on Document Clustering.

    Science.gov (United States)

    Renganathan, Vinaitheerthan

    2017-07-01

    With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.

  1. Mining the National Career Assessment Examination Result Using Clustering Algorithm

    Science.gov (United States)

    Pagudpud, M. V.; Palaoag, T. T.; Padirayon, L. M.

    2018-03-01

    Education is an essential process today which elicits authorities to discover and establish innovative strategies for educational improvement. This study applied data mining using clustering technique for knowledge extraction from the National Career Assessment Examination (NCAE) result in the Division of Quirino. The NCAE is an examination given to all grade 9 students in the Philippines to assess their aptitudes in the different domains. Clustering the students is helpful in identifying students’ learning considerations. With the use of the RapidMiner tool, clustering algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), k-means, k-medoid, expectation maximization clustering, and support vector clustering algorithms were analyzed. The silhouette indexes of the said clustering algorithms were compared, and the result showed that the k-means algorithm with k = 3 and silhouette index equal to 0.196 is the most appropriate clustering algorithm to group the students. Three groups were formed having 477 students in the determined group (cluster 0), 310 proficient students (cluster 1) and 396 developing students (cluster 2). The data mining technique used in this study is essential in extracting useful information from the NCAE result to better understand the abilities of students which in turn is a good basis for adopting teaching strategies.

  2. Data Mining and Analysis

    Science.gov (United States)

    Samms, Kevin O.

    2015-01-01

    The Data Mining project seeks to bring the capability of data visualization to NASA anomaly and problem reporting systems for the purpose of improving data trending, evaluations, and analyses. Currently NASA systems are tailored to meet the specific needs of its organizations. This tailoring has led to a variety of nomenclatures and levels of annotation for procedures, parts, and anomalies making difficult the realization of the common causes for anomalies. Making significant observations and realizing the connection between these causes without a common way to view large data sets is difficult to impossible. In the first phase of the Data Mining project a portal was created to present a common visualization of normalized sensitive data to customers with the appropriate security access. The tool of the visualization itself was also developed and fine-tuned. In the second phase of the project we took on the difficult task of searching and analyzing the target data set for common causes between anomalies. In the final part of the second phase we have learned more about how much of the analysis work will be the job of the Data Mining team, how to perform that work, and how that work may be used by different customers in different ways. In this paper I detail how our perspective has changed after gaining more insight into how the customers wish to interact with the output and how that has changed the product.

  3. Clustering for data mining a data recovery approach

    CERN Document Server

    Mirkin, Boris

    2005-01-01

    Often considered more as an art than a science, the field of clustering has been dominated by learning through examples and by techniques chosen almost through trial-and-error. Even the most popular clustering methods--K-Means for partitioning the data set and Ward's method for hierarchical clustering--have lacked the theoretical attention that would establish a firm relationship between the two methods and relevant interpretation aids.Rather than the traditional set of ad hoc techniques, Clustering for Data Mining: A Data Recovery Approach presents a theory that not only closes gaps in K-Mean

  4. Sentiment Analysis and Opinion Mining

    CERN Document Server

    Liu, Bing

    2012-01-01

    Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language. It is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining. In fact, this research has spread outside of computer science to the management sciences and social sciences due to its importance to business and society as a whole. The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions

  5. Fuzzy Modeled K-Cluster Quality Mining of Hidden Knowledge for Decision Support

    OpenAIRE

    S. Parkash  Kumar; K. S. Ramaswami

    2011-01-01

    Problem statement: The work presented Fuzzy Modeled K-means Cluster Quality Mining of hidden knowledge for Decision Support. Based on the number of clusters, number of objects in each cluster and its cohesiveness, precision and recall values, the cluster quality metrics is measured. The fuzzy k-means is adapted approach by using heuristic method which iterates the cluster to form an efficient valid cluster. With the obtained data clusters, quality assessment is made by predictive mining using...

  6. Comprehensive cluster analysis with Transitivity Clustering.

    Science.gov (United States)

    Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan

    2011-03-01

    Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

  7. Marine data users clustering using data mining technique

    Directory of Open Access Journals (Sweden)

    Farnaz Ghiasi

    2015-09-01

    Full Text Available The objective of this research is marine data users clustering using data mining technique. To achieve this objective, marine organizations will enable to know their data and users requirements. In this research, CRISP-DM standard model was used to implement the data mining technique. The required data was extracted from 500 marine data users profile database of Iranian National Institute for Oceanography and Atmospheric Sciences (INIOAS from 1386 to 1393. The TwoStep algorithm was used for clustering. In this research, patterns was discovered between marine data users such as student, organization and scientist and their data request (Data source, Data type, Data set, Parameter and Geographic area using clustering for the first time. The most important clusters are: Student with International data source, Chemistry data type, “World Ocean Database” dataset, Persian Gulf geographic area and Organization with Nitrate parameter. Senior managers of the marine organizations will enable to make correct decisions concerning their existing data. They will direct to planning for better data collection in the future. Also data users will guide with respect to their requests. Finally, the valuable suggestions were offered to improve the performance of marine organizations.

  8. [Cluster analysis in biomedical researches].

    Science.gov (United States)

    Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

    2013-01-01

    Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.

  9. Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

    Science.gov (United States)

    Dai, Zhimin; Guo, Xue; Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

    2014-01-01

    Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.

  10. Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

    Directory of Open Access Journals (Sweden)

    Zhimin Dai

    Full Text Available Biological nitrogen fixation is an essential function of acid mine drainage (AMD microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.

  11. Identification of Nitrogen-Fixing Genes and Gene Clusters from Metagenomic Library of Acid Mine Drainage

    Science.gov (United States)

    Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

    2014-01-01

    Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community. PMID:24498417

  12. EOQ estimation for imperfect quality items using association rule mining with clustering

    Directory of Open Access Journals (Sweden)

    Mandeep Mittal

    2015-09-01

    Full Text Available Timely identification of newly emerging trends is needed in business process. Data mining techniques like clustering, association rule mining, classification, etc. are very important for business support and decision making. This paper presents a method for redesigning the ordering policy by including cross-selling effect. Initially, association rules are mined on the transactional database and EOQ is estimated with revenue earned. Then, transactions are clustered to obtain homogeneous clusters and association rules are mined in each cluster to estimate EOQ with revenue earned for each cluster. Further, this paper compares ordering policy for imperfect quality items which is developed by applying rules derived from apriori algorithm viz. a without clustering the transactions, and b after clustering the transactions. A numerical example is illustrated to validate the results.

  13. Fuzzy C-Means Clustering Model Data Mining For Recognizing Stock Data Sampling Pattern

    Directory of Open Access Journals (Sweden)

    Sylvia Jane Annatje Sumarauw

    2007-06-01

    Full Text Available Abstract Capital market has been beneficial to companies and investor. For investors, the capital market provides two economical advantages, namely deviden and capital gain, and a non-economical one that is a voting .} hare in Shareholders General Meeting. But, it can also penalize the share owners. In order to prevent them from the risk, the investors should predict the prospect of their companies. As a consequence of having an abstract commodity, the share quality will be determined by the validity of their company profile information. Any information of stock value fluctuation from Jakarta Stock Exchange can be a useful consideration and a good measurement for data analysis. In the context of preventing the shareholders from the risk, this research focuses on stock data sample category or stock data sample pattern by using Fuzzy c-Me, MS Clustering Model which providing any useful information jar the investors. lite research analyses stock data such as Individual Index, Volume and Amount on Property and Real Estate Emitter Group at Jakarta Stock Exchange from January 1 till December 31 of 204. 'he mining process follows Cross Industry Standard Process model for Data Mining (CRISP,. DM in the form of circle with these steps: Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation and Deployment. At this modelling process, the Fuzzy c-Means Clustering Model will be applied. Data Mining Fuzzy c-Means Clustering Model can analyze stock data in a big database with many complex variables especially for finding the data sample pattern, and then building Fuzzy Inference System for stimulating inputs to be outputs that based on Fuzzy Logic by recognising the pattern. Keywords: Data Mining, AUz..:y c-Means Clustering Model, Pattern Recognition

  14. Fuzzy Clustering: An Approachfor Mining Usage Profilesfrom Web

    OpenAIRE

    Ms.Archana N. Boob; Prof. D. M. Dakhane

    2012-01-01

    Web usage mining is an application of data mining technology to mining the data of the web server log file. It can discover the browsing patterns of user and some kind of correlations between the web pages. Web usage mining provides the support for the web site design, providing personalization server and other business making decision, etc. Web mining applies the data mining, the artificial intelligence and the chart technology and so on to the web data and traces users' visiting characteris...

  15. Integrative cluster analysis in bioinformatics

    CERN Document Server

    Abu-Jamous, Basel; Nandi, Asoke K

    2015-01-01

    Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review o

  16. antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters

    DEFF Research Database (Denmark)

    Weber, Tilmann; Blin, Kai; Duddela, Srikanth

    2015-01-01

    Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we...... introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration...... of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products...

  17. Cluster analysis of track structure

    International Nuclear Information System (INIS)

    Michalik, V.

    1991-01-01

    One of the possibilities of classifying track structures is application of conventional partition techniques of analysis of multidimensional data to the track structure. Using these cluster algorithms this paper attempts to find characteristics of radiation reflecting the spatial distribution of ionizations in the primary particle track. An absolute frequency distribution of clusters of ionizations giving the mean number of clusters produced by radiation per unit of deposited energy can serve as this characteristic. General computation techniques used as well as methods of calculations of distributions of clusters for different radiations are discussed. 8 refs.; 5 figs

  18. An improved Pearson's correlation proximity-based hierarchical clustering for mining biological association between genes.

    Science.gov (United States)

    Booma, P M; Prabhakaran, S; Dhanalakshmi, R

    2014-01-01

    Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality.

  19. MINING ON CAR DATABASE EMPLOYING LEARNING AND CLUSTERING ALGORITHMS

    OpenAIRE

    Muhammad Rukunuddin Ghalib; Shivam Vohra; Sunish Vohra; Akash Juneja

    2013-01-01

    In data mining, classification is a form of data analysis that can be used to extract models describing important data classes. Two of the known learning algorithms used are Naïve Bayesian (NB) and SMO (Self-Minimal-Optimisation) .Thus the following two learning algorithms are used on a Car review database and thus a model is hence created which predicts the characteristic of a review comment after getting trained. It was found that model successfully predicted correctly about the review comm...

  20. Fatal accidents analysis in Peruvian mining industry

    International Nuclear Information System (INIS)

    Candia, R. C.; Hennies, W. T.; Azevedo, R. c.; Almeida, I.G.; Soto, J. F.

    2010-01-01

    Although reductions in the tax of injuries and accidents have been observed in recent years, Mining is still one of the highest risks industries. The basic causes for occurrence of fatalities can be attributed to unsafe conditions and unsafe acts. In this scene is necessary to identify safety problems and to aim the effective solutions. On the other hand, the developing countries dependence on primary industries as mining is evident. In the Peruvian economy, approximately 16% of the GNP and more than 50% of the exportations are due to the mining sector, detaching its competitive position in the worldwide mining. This paper presents fatal accidents analysis in the Peruvian mining industry, having as basis the register of occurred fatal accidents since year 2000 until 2007, identifying the main types of accidents occurred. The source of primary information is the General Mining Direction (DGM) of the Peruvian Mining and Energy Ministry (MEM). The majority of victims belongs to tertiary contractor companies that render services for mine companies. The results of the analysis show also that the majority of accidents happened in the underground mines, and that it is necessary to propose effective solutions to manage risks, aiming at reducing the fatal accidents taxes. (Author)

  1. Data mining theories, algorithms, and examples

    CERN Document Server

    Ye, Nong

    2013-01-01

    AN OVERVIEW OF DATA MINING METHODOLOGIESIntroduction to data mining methodologiesMETHODOLOGIES FOR MINING CLASSIFICATION AND PREDICTION PATTERNSRegression modelsBayes classifiersDecision treesMulti-layer feedforward artificial neural networksSupport vector machinesSupervised clusteringMETHODOLOGIES FOR MINING CLUSTERING AND ASSOCIATION PATTERNSHierarchical clusteringPartitional clusteringSelf-organized mapProbability distribution estimationAssociation rulesBayesian networksMETHODOLOGIES FOR MINING DATA REDUCTION PATTERNSPrincipal components analysisMulti-dimensional scalingLatent variable anal

  2. Critical analysis of the Colombian mining legislation

    International Nuclear Information System (INIS)

    Vargas P, Elkin; Gonzalez S, Carmen Lucia

    2003-01-01

    The document analyses the Colombian mining legislation, Act 685 of 2001, based on the reasons expressed by the government and the miners for its conceit and approval. The document tries to determine the developments achieved by this new Mining Code considering international mining competitiveness and its adaptation to the constitutional rules about environment, indigenous communities, decentralization and sustainable development. The analysis formulates general and specific hypothesis about the proposed objectives of the reform, which are confronted with the arguments and critical evaluations of the results. Most hypothesis are not verified, thus demonstrating that the Colombian mining legislation is far from being the necessary instrument to promote mining activities, making it competitive according to international standards and adapted to the principles of sustainable development, healthy environment, community participation, ethnic minorities and regional autonomy

  3. A survey of text clustering techniques used for web mining

    Directory of Open Access Journals (Sweden)

    Dan MUNTEANU

    2005-12-01

    Full Text Available This paper contains an overview of basic formulations and approaches to clustering. Then it presents two important clustering paradigms: a bottom-up agglomerative technique, which collects similar documents into larger and larger groups, and a top-down partitioning technique, which divides a corpus into topic-oriented partitions.

  4. The effect of mining data k-means clustering toward students profile model drop out potential

    Science.gov (United States)

    Purba, Windania; Tamba, Saut; Saragih, Jepronel

    2018-04-01

    The high of student success and the low of student failure can reflect the quality of a college. One of the factors of fail students was drop out. To solve the problem, so mining data with K-means Clustering was applied. K-Means Clustering method would be implemented to clustering the drop out students potentially. Firstly the the result data would be clustering to get the information of all students condition. Based on the model taken was found that students who potentially drop out because of the unexciting students in learning, unsupported parents, diffident students and less of students behavior time. The result of process of K-Means Clustering could known that students who more potentially drop out were in Cluster 1 caused Credit Total System, Quality Total, and the lowest Grade Point Average (GPA) compared between cluster 2 and 3.

  5. Cluster analysis for portfolio optimization

    OpenAIRE

    Vincenzo Tola; Fabrizio Lillo; Mauro Gallegati; Rosario N. Mantegna

    2005-01-01

    We consider the problem of the statistical uncertainty of the correlation matrix in the optimization of a financial portfolio. We show that the use of clustering algorithms can improve the reliability of the portfolio in terms of the ratio between predicted and realized risk. Bootstrap analysis indicates that this improvement is obtained in a wide range of the parameters N (number of assets) and T (investment horizon). The predicted and realized risk level and the relative portfolio compositi...

  6. Availability analysis of selected mining machinery

    Directory of Open Access Journals (Sweden)

    Brodny Jarosław

    2017-06-01

    Full Text Available Underground extraction of coal is characterized by high variability of mining and geological conditions in which it is conducted. Despite ever more effective methods and tools, used to identify the factors influencing this process, mining machinery, used in mining underground, work in difficult and not always foreseeable conditions, which means that these machines should be very universal and reliable. Additionally, a big competition, occurring on the coal market, causes that it is necessary to take action in order to reduce the cost of its production, e.g. by increasing the efficiency of utilization machines. To meet this objective it should be pro-ceed with analysis presented in this paper. The analysis concerns to availability of utilization selected mining machinery, conducted using the model of OEE, which is a tool for quantitative estimate strategy TPM. In this article we considered the machines being part of the mechanized longwall complex and the basis of analysis was the data recording by the industrial automation system. Using this data set we evaluated the availability of studied machines and the structure of registered breaks in their work. The results should be an important source of information for maintenance staff and management of mining plants, needed to improve the economic efficiency of underground mining.

  7. antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters.

    Science.gov (United States)

    Weber, Tilmann; Blin, Kai; Duddela, Srikanth; Krug, Daniel; Kim, Hyun Uk; Bruccoleri, Robert; Lee, Sang Yup; Fischbach, Michael A; Müller, Rolf; Wohlleben, Wolfgang; Breitling, Rainer; Takano, Eriko; Medema, Marnix H

    2015-07-01

    Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Building clusters for CRM strategies by mining airlines customer data

    OpenAIRE

    Miranda, Helena Sofia Guerreiro de

    2013-01-01

    Trabalho de Projeto apresentado como requisito parcial para obtenção do grau de Mestre em Estatística e Gestão de Informação As airlines strive to gain market share and sustain profitability in today’s economically challenging environment, they should develop new ways to optimize their frequent flyer programs while increase revenues. Aware of the challenges, airlines want to implement a customer relationship management (CRM) strategy based on customer analytics and data mining ...

  9. Functional Genome Mining for Metabolites Encoded by Large Gene Clusters through Heterologous Expression of a Whole-Genome Bacterial Artificial Chromosome Library in Streptomyces spp.

    Science.gov (United States)

    Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin

    2016-01-01

    ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including

  10. Real Options Analysis of Mining Projects

    OpenAIRE

    Rudolf Zdravlje

    2011-01-01

    When long life assets are being evaluated based on constant predictions of future variables and the assumptions of zero management flexibility, is value being missed? In project evaluation today, the most common evaluation methods that calculate a net present value are discounted cash flow (DCF) analysis, decision tree analysis and Monte Carlo simulation. A fourth method, which is beginning to gain ground in terms of its use in the mining industry, is real option analysis (ROA). ROA utilizes ...

  11. Multiscale visual quality assessment for cluster analysis with self-organizing maps

    Science.gov (United States)

    Bernard, Jürgen; von Landesberger, Tatiana; Bremm, Sebastian; Schreck, Tobias

    2011-01-01

    Cluster analysis is an important data mining technique for analyzing large amounts of data, reducing many objects to a limited number of clusters. Cluster visualization techniques aim at supporting the user in better understanding the characteristics and relationships among the found clusters. While promising approaches to visual cluster analysis already exist, these usually fall short of incorporating the quality of the obtained clustering results. However, due to the nature of the clustering process, quality plays an important aspect, as for most practical data sets, typically many different clusterings are possible. Being aware of clustering quality is important to judge the expressiveness of a given cluster visualization, or to adjust the clustering process with refined parameters, among others. In this work, we present an encompassing suite of visual tools for quality assessment of an important visual cluster algorithm, namely, the Self-Organizing Map (SOM) technique. We define, measure, and visualize the notion of SOM cluster quality along a hierarchy of cluster abstractions. The quality abstractions range from simple scalar-valued quality scores up to the structural comparison of a given SOM clustering with output of additional supportive clustering methods. The suite of methods allows the user to assess the SOM quality on the appropriate abstraction level, and arrive at improved clustering results. We implement our tools in an integrated system, apply it on experimental data sets, and show its applicability.

  12. Mining Hierarchies and Similarity Clusters from Value Set Repositories.

    Science.gov (United States)

    Peterson, Kevin J; Jiang, Guoqian; Brue, Scott M; Shen, Feichen; Liu, Hongfang

    2017-01-01

    A value set is a collection of permissible values used to describe a specific conceptual domain for a given purpose. By helping to establish a shared semantic understanding across use cases, these artifacts are important enablers of interoperability and data standardization. As the size of repositories cataloging these value sets expand, knowledge management challenges become more pronounced. Specifically, discovering value sets applicable to a given use case may be challenging in a large repository. In this study, we describe methods to extract implicit relationships between value sets, and utilize these relationships to overlay organizational structure onto value set repositories. We successfully extract two different structurings, hierarchy and clustering, and show how tooling can leverage these structures to enable more effective value set discovery.

  13. Text-mining analysis of mHealth research

    Science.gov (United States)

    Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

    2017-01-01

    In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from “mobile phone” to “smartphone” and from “applications” to “apps”. Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical

  14. Text-mining analysis of mHealth research.

    Science.gov (United States)

    Ozaydin, Bunyamin; Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

    2017-01-01

    In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from "mobile phone" to "smartphone" and from "applications" to "apps". Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions

  15. Cluster analysis in phenotyping a Portuguese population.

    Science.gov (United States)

    Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J

    2015-09-03

    Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.

  16. Data mining with unsupervised clustering using photonic micro-ring resonators

    Science.gov (United States)

    McAulay, Alastair D.

    2013-09-01

    Data is commonly moved through optical fiber in modern data centers and may be stored optically. We propose an optical method of data mining for future data centers to enhance performance. For example, in clustering, a form of unsupervised learning, we propose that parameters corresponding to information in a database are converted from analog values to frequencies, as in the brain's neurons, where similar data will have close frequencies. We describe the Wilson-Cowan model for oscillating neurons. In optics we implement the frequencies with micro ring resonators. Due to the influence of weak coupling, a group of resonators will form clusters of similar frequencies that will indicate the desired parameters having close relations. Fewer clusters are formed as clustering proceeds, which allows the creation of a tree showing topics of importance and their relationships in the database. The tree can be used for instance to target advertising and for planning.

  17. Data mining approach to bipolar cognitive map development and decision analysis

    Science.gov (United States)

    Zhang, Wen-Ran

    2002-03-01

    A data mining approach to cognitive mapping is presented based on bipolar logic, bipolar relations, and bipolar clustering. It is shown that a correlation network derived from a database can be converted to a bipolar cognitive map (or bipolar relation). A transitive, symmetric, and reflexive bipolar relation (equilibrium relation) can be used to identify focal links in decision analysis. It can also be used to cluster a set of events or itemsets into three different clusters: coalition sets, conflict sets, and harmony sets. The coalition sets are positively correlated events or itemsets; each conflict set is a negatively correlated set of two coalition subsets; and a harmony set consists of events that are both negatively and positively correlated. A cognitive map and the clusters can then be used for online decision analysis. This approach combines knowledge discovery with the views of decision makers and provides an effective means for online analytical processing (OLAP) and online analytical mining (OLAM).

  18. Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

    Science.gov (United States)

    Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

    2013-03-01

    Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.

  19. Research of the Space Clustering Method for the Airport Noise Data Minings

    Directory of Open Access Journals (Sweden)

    Jiwen Xie

    2014-03-01

    Full Text Available Mining the distribution pattern and evolution of the airport noise from the airport noise data and the geographic information of the monitoring points is of great significance for the scientific and rational governance of airport noise pollution problem. However, most of the traditional clustering methods are based on the closeness of space location or the similarity of non-spatial features, which split the duality of space elements, resulting in that the clustering result has difficult in satisfying both the closeness of space location and the similarity of non-spatial features. This paper, therefore, proposes a spatial clustering algorithm based on dual-distance. This algorithm uses a distance function as the similarity measure function in which spatial features and non-spatial features are combined. The experimental results show that the proposed algorithm can discover the noise distribution pattern around the airport effectively.

  20. Cluster analysis as a prediction tool for pregnancy outcomes.

    Science.gov (United States)

    Banjari, Ines; Kenjerić, Daniela; Šolić, Krešimir; Mandić, Milena L

    2015-03-01

    Considering specific physiology changes during gestation and thinking of pregnancy as a "critical window", classification of pregnant women at early pregnancy can be considered as crucial. The paper demonstrates the use of a method based on an approach from intelligent data mining, cluster analysis. Cluster analysis method is a statistical method which makes possible to group individuals based on sets of identifying variables. The method was chosen in order to determine possibility for classification of pregnant women at early pregnancy to analyze unknown correlations between different variables so that the certain outcomes could be predicted. 222 pregnant women from two general obstetric offices' were recruited. The main orient was set on characteristics of these pregnant women: their age, pre-pregnancy body mass index (BMI) and haemoglobin value. Cluster analysis gained a 94.1% classification accuracy rate with three branch- es or groups of pregnant women showing statistically significant correlations with pregnancy outcomes. The results are showing that pregnant women both of older age and higher pre-pregnancy BMI have a significantly higher incidence of delivering baby of higher birth weight but they gain significantly less weight during pregnancy. Their babies are also longer, and these women have significantly higher probability for complications during pregnancy (gestosis) and higher probability of induced or caesarean delivery. We can conclude that the cluster analysis method can appropriately classify pregnant women at early pregnancy to predict certain outcomes.

  1. Data mining for clustering naming of the village at Java Island

    Science.gov (United States)

    Setiawan Abdullah, Atje; Nurani Ruchjana, Budi; Hidayat, Akik; Akmal; Setiana, Deni

    2017-10-01

    Clustering of query based data mining to identify the meaning of the naming of the village in Java island, done by exploring the database village with three categories namely: prefix in the naming of the village, syllables contained in the naming of the village, and full word naming of the village which is actually used. While syllables contained in the naming of the village are classified by the behaviour of the culture and character of each province that describes the business, feelings, circumstances, places, nature, respect, plants, fruits, and animals. Sources of data used for the clustering of the naming of the village on the island of Java was obtained from Geospatial Information Agency (BIG) in the form of a complete village name data with the coordinates in six provinces in Java, which is arranged in a hierarchy of provinces, districts / cities, districts and villages. The research method using KDD (Knowledge Discovery in Database) through the process of preprocessing, data mining and postprocessing to obtain knowledge. In this study, data mining applications to facilitate the search query based on the name of the village, using Java software. While the contours of a map is processed using ArcGIS software. The results of the research can give recommendations to stakeholders such as the Department of Tourism to describe the meaning of the classification of naming the village according to the character in each province at Java island.

  2. Mining Co-Location Patterns with Clustering Items from Spatial Data Sets

    Science.gov (United States)

    Zhou, G.; Li, Q.; Deng, G.; Yue, T.; Zhou, X.

    2018-05-01

    The explosive growth of spatial data and widespread use of spatial databases emphasize the need for the spatial data mining. Co-location patterns discovery is an important branch in spatial data mining. Spatial co-locations represent the subsets of features which are frequently located together in geographic space. However, the appearance of a spatial feature C is often not determined by a single spatial feature A or B but by the two spatial features A and B, that is to say where A and B appear together, C often appears. We note that this co-location pattern is different from the traditional co-location pattern. Thus, this paper presents a new concept called clustering terms, and this co-location pattern is called co-location patterns with clustering items. And the traditional algorithm cannot mine this co-location pattern, so we introduce the related concept in detail and propose a novel algorithm. This algorithm is extended by join-based approach proposed by Huang. Finally, we evaluate the performance of this algorithm.

  3. Robust cluster analysis and variable selection

    CERN Document Server

    Ritter, Gunter

    2014-01-01

    Clustering remains a vibrant area of research in statistics. Although there are many books on this topic, there are relatively few that are well founded in the theoretical aspects. In Robust Cluster Analysis and Variable Selection, Gunter Ritter presents an overview of the theory and applications of probabilistic clustering and variable selection, synthesizing the key research results of the last 50 years. The author focuses on the robust clustering methods he found to be the most useful on simulated data and real-time applications. The book provides clear guidance for the varying needs of bot

  4. Depth data research of GIS based on clustering analysis algorithm

    Science.gov (United States)

    Xiong, Yan; Xu, Wenli

    2018-03-01

    The data of GIS have spatial distribution. Geographic data has both spatial characteristics and attribute characteristics, and also changes with time. Therefore, the amount of data is very large. Nowadays, many industries and departments in the society are using GIS. However, without proper data analysis and mining scheme, GIS will not exert its maximum effectiveness and will waste a lot of data. In this paper, we use the geographic information demand of a national security department as the experimental object, combining the characteristics of GIS data, taking into account the characteristics of time, space, attributes and so on, and using cluster analysis algorithm. We further study the mining scheme for depth data, and get the algorithm model. This algorithm can automatically classify sample data, and then carry out exploratory analysis. The research shows that the algorithm model and the information mining scheme can quickly find hidden depth information from the surface data of GIS, thus improving the efficiency of the security department. This algorithm can also be extended to other fields.

  5. Exact WKB analysis and cluster algebras

    International Nuclear Information System (INIS)

    Iwaki, Kohei; Nakanishi, Tomoki

    2014-01-01

    We develop the mutation theory in the exact WKB analysis using the framework of cluster algebras. Under a continuous deformation of the potential of the Schrödinger equation on a compact Riemann surface, the Stokes graph may change the topology. We call this phenomenon the mutation of Stokes graphs. Along the mutation of Stokes graphs, the Voros symbols, which are monodromy data of the equation, also mutate due to the Stokes phenomenon. We show that the Voros symbols mutate as variables of a cluster algebra with surface realization. As an application, we obtain the identities of Stokes automorphisms associated with periods of cluster algebras. The paper also includes an extensive introduction of the exact WKB analysis and the surface realization of cluster algebras for nonexperts. This article is part of a special issue of Journal of Physics A: Mathematical and Theoretical devoted to ‘Cluster algebras in mathematical physics’. (paper)

  6. Study on text mining algorithm for ultrasound examination of chronic liver diseases based on spectral clustering

    Science.gov (United States)

    Chang, Bingguo; Chen, Xiaofei

    2018-05-01

    Ultrasonography is an important examination for the diagnosis of chronic liver disease. The doctor gives the liver indicators and suggests the patient's condition according to the description of ultrasound report. With the rapid increase in the amount of data of ultrasound report, the workload of professional physician to manually distinguish ultrasound results significantly increases. In this paper, we use the spectral clustering method to cluster analysis of the description of the ultrasound report, and automatically generate the ultrasonic diagnostic diagnosis by machine learning. 110 groups ultrasound examination report of chronic liver disease were selected as test samples in this experiment, and the results were validated by spectral clustering and compared with k-means clustering algorithm. The results show that the accuracy of spectral clustering is 92.73%, which is higher than that of k-means clustering algorithm, which provides a powerful ultrasound-assisted diagnosis for patients with chronic liver disease.

  7. ESTminer: a Web interface for mining EST contig and cluster databases.

    Science.gov (United States)

    Huang, Yecheng; Pumphrey, Janie; Gingle, Alan R

    2005-03-01

    ESTminer is a Web application and database schema for interactive mining of expressed sequence tag (EST) contig and cluster datasets. The Web interface contains a query frame that allows the selection of contigs/clusters with specific cDNA library makeup or a threshold number of members. The results are displayed as color-coded tree nodes, where the color indicates the fractional size of each cDNA library component. The nodes are expandable, revealing library statistics as well as EST or contig members, with links to sequence data, GenBank records or user configurable links. Also, the interface allows 'queries within queries' where the result set of a query is further filtered by the subsequent query. ESTminer is implemented in Java/JSP and the package, including MySQL and Oracle schema creation scripts, is available from http://cggc.agtec.uga.edu/Data/download.asp agingle@uga.edu.

  8. Clustering Educational Digital Library Usage Data: A Comparison of Latent Class Analysis and K-Means Algorithms

    Science.gov (United States)

    Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei

    2013-01-01

    This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…

  9. Clustering Analysis for Credit Default Probabilities in a Retail Bank Portfolio

    Directory of Open Access Journals (Sweden)

    Elena ANDREI (DRAGOMIR

    2012-08-01

    Full Text Available Methods underlying cluster analysis are very useful in data analysis, especially when the processed volume of data is very large, so that it becomes impossible to extract essential information, unless specific instruments are used to summarize and structure the gross information. In this context, cluster analysis techniques are used particularly, for systematic information analysis. The aim of this article is to build an useful model for banking field, based on data mining techniques, by dividing the groups of borrowers into clusters, in order to obtain a profile of the customers (debtors and good payers. We assume that a class is appropriate if it contains members that have a high degree of similarity and the standard method for measuring the similarity within a group shows the lowest variance. After clustering, data mining techniques are implemented on the cluster with bad debtors, reaching a very high accuracy after implementation. The paper is structured as follows: Section 2 describes the model for data analysis based on a specific scoring model that we proposed. In section 3, we present a cluster analysis using K-means algorithm and the DM models are applied on a specific cluster. Section 4 shows the conclusions.

  10. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    Science.gov (United States)

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.

  11. The accident analysis of mobile mine machinery in Indian opencast coal mines.

    Science.gov (United States)

    Kumar, R; Ghosh, A K

    2014-01-01

    This paper presents the analysis of large mining machinery related accidents in Indian opencast coal mines. The trends of coal production, share of mining methods in production, machinery deployment in open cast mines, size and population of machinery, accidents due to machinery, types and causes of accidents have been analysed from the year 1995 to 2008. The scrutiny of accidents during this period reveals that most of the responsible factors are machine reversal, haul road design, human fault, operator's fault, machine fault, visibility and dump design. Considering the types of machines, namely, dumpers, excavators, dozers and loaders together the maximum number of fatal accidents has been caused by operator's faults and human faults jointly during the period from 1995 to 2008. The novel finding of this analysis is that large machines with state-of-the-art safety system did not reduce the fatal accidents in Indian opencast coal mines.

  12. An intelligent hybrid system for surface coal mine safety analysis

    Energy Technology Data Exchange (ETDEWEB)

    Lilic, N.; Obradovic, I.; Cvjetic, A. [University of Belgrade, Belgrade (Serbia)

    2010-06-15

    Analysis of safety in surface coal mines represents a very complex process. Published studies on mine safety analysis are usually based on research related to accidents statistics and hazard identification with risk assessment within the mining industry. Discussion in this paper is focused on the application of AI methods in the analysis of safety in mining environment. Complexity of the subject matter requires a high level of expert knowledge and great experience. The solution was found in the creation of a hybrid system PROTECTOR, whose knowledge base represents a formalization of the expert knowledge in the mine safety field. The main goal of the system is the estimation of mining environment as one of the significant components of general safety state in a mine. This global goal is subdivided into a hierarchical structure of subgoals where each subgoal can be viewed as the estimation of a set of parameters (gas, dust, climate, noise, vibration, illumination, geotechnical hazard) which determine the general mine safety state and category of hazard in mining environment. Both the hybrid nature of the system and the possibilities it offers are illustrated through a case study using field data related to an existing Serbian surface coal mine.

  13. BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters

    Directory of Open Access Journals (Sweden)

    Gong Cheng

    2017-11-01

    Full Text Available Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily.

  14. BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters.

    Science.gov (United States)

    Cheng, Gong; Lu, Quan; Ma, Ling; Zhang, Guocai; Xu, Liang; Zhou, Zongshan

    2017-01-01

    Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily.

  15. Cluster analysis of obesity and asthma phenotypes.

    Directory of Open Access Journals (Sweden)

    E Rand Sutherland

    Full Text Available Asthma is a heterogeneous disease with variability among patients in characteristics such as lung function, symptoms and control, body weight, markers of inflammation, and responsiveness to glucocorticoids (GC. Cluster analysis of well-characterized cohorts can advance understanding of disease subgroups in asthma and point to unsuspected disease mechanisms. We utilized an hypothesis-free cluster analytical approach to define the contribution of obesity and related variables to asthma phenotype.In a cohort of clinical trial participants (n = 250, minimum-variance hierarchical clustering was used to identify clinical and inflammatory biomarkers important in determining disease cluster membership in mild and moderate persistent asthmatics. In a subset of participants, GC sensitivity was assessed via expression of GC receptor alpha (GCRα and induction of MAP kinase phosphatase-1 (MKP-1 expression by dexamethasone. Four asthma clusters were identified, with body mass index (BMI, kg/m(2 and severity of asthma symptoms (AEQ score the most significant determinants of cluster membership (F = 57.1, p<0.0001 and F = 44.8, p<0.0001, respectively. Two clusters were composed of predominantly obese individuals; these two obese asthma clusters differed from one another with regard to age of asthma onset, measures of asthma symptoms (AEQ and control (ACQ, exhaled nitric oxide concentration (F(ENO and airway hyperresponsiveness (methacholine PC(20 but were similar with regard to measures of lung function (FEV(1 (% and FEV(1/FVC, airway eosinophilia, IgE, leptin, adiponectin and C-reactive protein (hsCRP. Members of obese clusters demonstrated evidence of reduced expression of GCRα, a finding which was correlated with a reduced induction of MKP-1 expression by dexamethasoneObesity is an important determinant of asthma phenotype in adults. There is heterogeneity in expression of clinical and inflammatory biomarkers of asthma across obese individuals

  16. Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study.

    Science.gov (United States)

    Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein

    2014-11-01

    Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. Process mining : overview and opportunities

    NARCIS (Netherlands)

    Aalst, van der W.M.P.

    2012-01-01

    Over the last decade, process mining emerged as a new research ¿eld that focuses on the analysis of processes using event data. Classical data mining techniques such as classi¿cation, clustering, regression, association rule learning, and sequence/episode mining do not focus on business process

  18. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis

    Directory of Open Access Journals (Sweden)

    Huanhuan Li

    2017-08-01

    Full Text Available The Shipboard Automatic Identification System (AIS is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW, a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our

  19. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis.

    Science.gov (United States)

    Li, Huanhuan; Liu, Jingxian; Liu, Ryan Wen; Xiong, Naixue; Wu, Kefeng; Kim, Tai-Hoon

    2017-08-04

    The Shipboard Automatic Identification System (AIS) is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW), a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA) is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our proposed method with

  20. Factor Analysis for Clustered Observations.

    Science.gov (United States)

    Longford, N. T.; Muthen, B. O.

    1992-01-01

    A two-level model for factor analysis is defined, and formulas for a scoring algorithm for this model are derived. A simple noniterative method based on decomposition of total sums of the squares and cross-products is discussed and illustrated with simulated data and data from the Second International Mathematics Study. (SLD)

  1. Cluster analysis for determining distribution center location

    Science.gov (United States)

    Lestari Widaningrum, Dyah; Andika, Aditya; Murphiyanto, Richard Dimas Julian

    2017-12-01

    Determination of distribution facilities is highly important to survive in the high level of competition in today’s business world. Companies can operate multiple distribution centers to mitigate supply chain risk. Thus, new problems arise, namely how many and where the facilities should be provided. This study examines a fast-food restaurant brand, which located in the Greater Jakarta. This brand is included in the category of top 5 fast food restaurant chain based on retail sales. There were three stages in this study, compiling spatial data, cluster analysis, and network analysis. Cluster analysis results are used to consider the location of the additional distribution center. Network analysis results show a more efficient process referring to a shorter distance to the distribution process.

  2. Analysis on present radon ventilation situation of Chinese uranium mines

    International Nuclear Information System (INIS)

    Li Xianjie; Hu Penghua

    2010-01-01

    Mine Ventilation is the most important way in lowering radon of uranium mines. At present, radon and radon daughter concentration of underground air is 3∼5 times higher than any other air concentration of foreign uranium mines, as the same input for Protective Ventilation between Chinese uranium mines with compaction methodology and international advanced uranium mines. In this passage, through the analysis of Ventilation Radon Reduction status in Chinese uranium mines and the comparison of advantages and shortcomings between variety of ventilation and radon reduction, it illuminated the reasons of higher radon and radon daughter concentration in Chinese uranium mines and put forward some problems in three aspects, which are Ventilation Radon Reduction Theory, Ventilation Radon Reduction Measures and Ventilation Management. And to above problems, this passage put forward some proposals and measures about some aspects, such as strengthen examination and verification and monitoring practical situation, making clear ventilation plan, in according to mining sequence strictly, training Ventilation technician forcefully, enhance Ventilation System management, development of Ventilation Radon Reduction technology research in uranium mines and carrying out ventilation equipments as soon as possible in further and so on. (authors)

  3. Manipulating measurement scales in medical statistical analysis and data mining: A review of methodologies

    Directory of Open Access Journals (Sweden)

    Hamid Reza Marateb

    2014-01-01

    Full Text Available Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal-variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD. Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables.

  4. Manipulating measurement scales in medical statistical analysis and data mining: A review of methodologies

    Science.gov (United States)

    Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario

    2014-01-01

    Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565

  5. Analysis on safety production in coal mines Henan Province

    Institute of Scientific and Technical Information of China (English)

    KONG Liu-an; ZHANG Wen-yong

    2006-01-01

    Based on the rigorous situation of safety production in coal mines, the paper analyzed the statistical data of recent accidents indexes in Henan's coal mines. Using investigation and comparison analysis methods, a specified analysis on mining conditions, technical facility level, safety input and vocational quality of workers in Henan's coal mines was conducted. The result indicates that there have been existing such main safety production problems as weak safety management, low-level facilities, inadequate safety input and poor vocational quality and so on. Finally it proposes such reference solutions as to establish and perfect coal mining supervision and management system, to increase safety investment into techniques and facilities and to strengthen workers' safety education and introduction of more high-level professional talents.

  6. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations

    Science.gov (United States)

    2014-01-01

    Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations

  7. Semi-supervised consensus clustering for gene expression data analysis

    OpenAIRE

    Wang, Yunli; Pan, Youlian

    2014-01-01

    Background Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and do...

  8. A Dimensionally Reduced Clustering Methodology for Heterogeneous Occupational Medicine Data Mining.

    Science.gov (United States)

    Saâdaoui, Foued; Bertrand, Pierre R; Boudet, Gil; Rouffiac, Karine; Dutheil, Frédéric; Chamoux, Alain

    2015-10-01

    Clustering is a set of techniques of the statistical learning aimed at finding structures of heterogeneous partitions grouping homogenous data called clusters. There are several fields in which clustering was successfully applied, such as medicine, biology, finance, economics, etc. In this paper, we introduce the notion of clustering in multifactorial data analysis problems. A case study is conducted for an occupational medicine problem with the purpose of analyzing patterns in a population of 813 individuals. To reduce the data set dimensionality, we base our approach on the Principal Component Analysis (PCA), which is the statistical tool most commonly used in factorial analysis. However, the problems in nature, especially in medicine, are often based on heterogeneous-type qualitative-quantitative measurements, whereas PCA only processes quantitative ones. Besides, qualitative data are originally unobservable quantitative responses that are usually binary-coded. Hence, we propose a new set of strategies allowing to simultaneously handle quantitative and qualitative data. The principle of this approach is to perform a projection of the qualitative variables on the subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the resulting PCA-regressed subspaces.

  9. Topic modeling for cluster analysis of large biological and medical datasets.

    Science.gov (United States)

    Zhao, Weizhong; Zou, Wen; Chen, James J

    2014-01-01

    The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting

  10. MANNER OF STOCKS SORTING USING CLUSTER ANALYSIS METHODS

    Directory of Open Access Journals (Sweden)

    Jana Halčinová

    2014-06-01

    Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.

  11. Accounting and Financial Data Analysis Data Mining Tools

    Directory of Open Access Journals (Sweden)

    Diana Elena Codreanu

    2011-05-01

    Full Text Available Computerized accounting systems in recent years have seen an increase in complexity due to thecompetitive economic environment but with the help of data analysis solutions such as OLAP and DataMining can be a multidimensional data analysis, can detect the fraud and can discover knowledge hidden indata, ensuring such information is useful for decision making within the organization. In the literature thereare many definitions for data mining but all boils down to same idea: the process takes place to extract newinformation from large data collections, information without the aid of data mining tools would be verydifficult to obtain. Information obtained by data mining process has the advantage that only respond to thequestion of what happens but at the same time argue and show why certain things are happening. In this paperwe wish to present advanced techniques for analysis and exploitation of data stored in a multidimensionaldatabase.

  12. Advanced analysis of forest fire clustering

    Science.gov (United States)

    Kanevski, Mikhail; Pereira, Mario; Golay, Jean

    2017-04-01

    Analysis of point pattern clustering is an important topic in spatial statistics and for many applications: biodiversity, epidemiology, natural hazards, geomarketing, etc. There are several fundamental approaches used to quantify spatial data clustering using topological, statistical and fractal measures. In the present research, the recently introduced multi-point Morisita index (mMI) is applied to study the spatial clustering of forest fires in Portugal. The data set consists of more than 30000 fire events covering the time period from 1975 to 2013. The distribution of forest fires is very complex and highly variable in space. mMI is a multi-point extension of the classical two-point Morisita index. In essence, mMI is estimated by covering the region under study by a grid and by computing how many times more likely it is that m points selected at random will be from the same grid cell than it would be in the case of a complete random Poisson process. By changing the number of grid cells (size of the grid cells), mMI characterizes the scaling properties of spatial clustering. From mMI, the data intrinsic dimension (fractal dimension) of the point distribution can be estimated as well. In this study, the mMI of forest fires is compared with the mMI of random patterns (RPs) generated within the validity domain defined as the forest area of Portugal. It turns out that the forest fires are highly clustered inside the validity domain in comparison with the RPs. Moreover, they demonstrate different scaling properties at different spatial scales. The results obtained from the mMI analysis are also compared with those of fractal measures of clustering - box counting and sand box counting approaches. REFERENCES Golay J., Kanevski M., Vega Orozco C., Leuenberger M., 2014: The multipoint Morisita index for the analysis of spatial patterns. Physica A, 406, 191-202. Golay J., Kanevski M. 2015: A new estimator of intrinsic dimension based on the multipoint Morisita index

  13. Cluster Analysis in Rapeseed (Brassica Napus L.)

    International Nuclear Information System (INIS)

    Mahasi, J.M

    2002-01-01

    With widening edible deficit, Kenya has become increasingly dependent on imported edible oils. Many oilseed crops (e.g. sunflower, soya beans, rapeseed/mustard, sesame, groundnuts etc) can be grown in Kenya. But oilseed rape is preferred because it very high yielding (1.5 tons-4.0 tons/ha) with oil content of 42-46%. Other uses include fitting in various cropping systems as; relay/inter crops, rotational crops, trap crops and fodder. It is soft seeded hence oil extraction is relatively easy. The meal is high in protein and very useful in livestock supplementation. Rapeseed can be straight combined using adjusted wheat combines. The priority is to expand domestic oilseed production, hence the need to introduce improved rapeseed germplasm from other countries. The success of any crop improvement programme depends on the extent of genetic diversity in the material. Hence, it is essential to understand the adaptation of introduced genotypes and the similarities if any among them. Evaluation trials were carried out on 17 rapeseed genotypes (nine Canadian origin and eight of European origin) grown at 4 locations namely Endebess, Njoro, Timau and Mau Narok in three years (1992, 1993 and 1994). Results for 1993 were discarded due to severe drought. An analysis of variance was carried out only on seed yields and the treatments were found to be significantly different. Cluster analysis was then carried out on mean seed yields and based on this analysis; only one major group exists within the material. In 1992, varieties 2,3,8 and 9 didn't fall in the same cluster as the rest. Variety 8 was the only one not classified with the rest of the Canadian varieties. Three European varieties (2,3 and 9) were however not classified with the others. In 1994, varieties 10 and 6 didn't fall in the major cluster. Of these two, variety 10 is of Canadian origin. Varieties were more similar in 1994 than 1992 due to favorable weather. It is evident that, genotypes from different geographical

  14. Data Mining and Knowledge Management in Higher Education -Potential Applications.

    Science.gov (United States)

    Luan, Jing

    This paper introduces a new decision support tool, data mining, in the context of knowledge management. The most striking features of data mining techniques are clustering and prediction. The clustering aspect of data mining offers comprehensive characteristics analysis of students, while the predicting function estimates the likelihood for a…

  15. Tweets clustering using latent semantic analysis

    Science.gov (United States)

    Rasidi, Norsuhaili Mahamed; Bakar, Sakhinah Abu; Razak, Fatimah Abdul

    2017-04-01

    Social media are becoming overloaded with information due to the increasing number of information feeds. Unlike other social media, Twitter users are allowed to broadcast a short message called as `tweet". In this study, we extract tweets related to MH370 for certain of time. In this paper, we present overview of our approach for tweets clustering to analyze the users' responses toward tragedy of MH370. The tweets were clustered based on the frequency of terms obtained from the classification process. The method we used for the text classification is Latent Semantic Analysis. As a result, there are two types of tweets that response to MH370 tragedy which is emotional and non-emotional. We show some of our initial results to demonstrate the effectiveness of our approach.

  16. Recent development of antiSMASH and other computational approaches to mine secondary metabolite biosynthetic gene clusters

    DEFF Research Database (Denmark)

    Blin, Kai; Kim, Hyun Uk; Medema, Marnix H.

    2017-01-01

    Many drugs are derived from small molecules produced by microorganisms and plants, so-called natural products. Natural products have diverse chemical structures, but the biosynthetic pathways producing those compounds are often organized as biosynthetic gene clusters (BGCs) and follow a highly...... conserved biosynthetic logic. This allows for the identification of core biosynthetic enzymes using genome mining strategies that are based on the sequence similarity of the involved enzymes/genes. However, mining for a variety of BGCs quickly approaches a complexity level where manual analyses...... are no longer possible and require the use of automated genome mining pipelines, such as the antiSMASH software. In this review, we discuss the principles underlying the predictions of antiSMASH and other tools and provide practical advice for their application. Furthermore, we discuss important caveats...

  17. TIME SERIES ANALYSIS ON STOCK MARKET FOR TEXT MINING CORRELATION OF ECONOMY NEWS

    Directory of Open Access Journals (Sweden)

    Sadi Evren SEKER

    2014-01-01

    Full Text Available This paper proposes an information retrieval methodfor the economy news. Theeffect of economy news, are researched in the wordlevel and stock market valuesare considered as the ground proof.The correlation between stock market prices and economy news is an already ad-dressed problem for most of the countries. The mostwell-known approach is ap-plying the text mining approaches to the news and some time series analysis tech-niques over stock market closing values in order toapply classification or cluster-ing algorithms over the features extracted. This study goes further and tries to askthe question what are the available time series analysis techniques for the stockmarket closing values and which one is the most suitable? In this study, the newsand their dates are collected into a database and text mining is applied over thenews, the text mining part has been kept simple with only term frequency – in-verse document frequency method. For the time series analysis part, we havestudied 10 different methods such as random walk, moving average, acceleration,Bollinger band, price rate of change, periodic average, difference, momentum orrelative strength index and their variation. In this study we have also explainedthese techniques in a comparative way and we have applied the methods overTurkish Stock Market closing values for more than a2 year period. On the otherhand, we have applied the term frequency – inversedocument frequency methodon the economy news of one of the high-circulatingnewspapers in Turkey.

  18. Clustering Spam Domains and Destination Websites: Digital Forensics with Data Mining

    Directory of Open Access Journals (Sweden)

    Chun Wei

    2010-03-01

    Full Text Available Spam related cyber crimes have become a serious threat to society. Current spam research mainly aims to detect spam more effectively. We believe the prosecution of spammers is a more effective way of stopping spam emails than filtering, therefore more research is needed to help forensic investigators to collect useful evidence. This research proposes an algorithm for clustering spam domains extracted from spam emails based on the hosting IP addresses and tracing the domains over a period of time. The results reveal several facts that merit law enforcement attention: many seemingly unrelated spam campaigns are actually related; spammers have a sophisticated mechanism for combating URL blacklisting by registering many new domain names every day and flushing out old domains; the domains are hosted at different IP addresses across several networks, mostly in China where legislation is not as tight as in US; old IP addresses are replaced by new ones from time to time, but still show strong correlation among them. These facts lead to the conclusion that spam-related cyber crimes are operated by well-organized criminal syndicates that have sufficient manpower to distribute a huge volume of spam through bots, purchase a large number of domain names and hosting servers and maintain websites to sell counterfeit products online. Traditional law enforcements technology has not scaled well in cases involving millions of data elements. This paper demonstrates an effective use of data mining to respond to this challenge.

  19. CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

    Science.gov (United States)

    Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin

    2017-08-31

    Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.

  20. Analysis of Occupational Accidents in Underground and Surface Mining in Spain Using Data-Mining Techniques.

    Science.gov (United States)

    Sanmiquel, Lluís; Bascompta, Marc; Rossell, Josep M; Anticoi, Hernán Francisco; Guash, Eduard

    2018-03-07

    An analysis of occupational accidents in the mining sector was conducted using the data from the Spanish Ministry of Employment and Social Safety between 2005 and 2015, and data-mining techniques were applied. Data was processed with the software Weka. Two scenarios were chosen from the accidents database: surface and underground mining. The most important variables involved in occupational accidents and their association rules were determined. These rules are composed of several predictor variables that cause accidents, defining its characteristics and context. This study exposes the 20 most important association rules in the sector-either surface or underground mining-based on the statistical confidence levels of each rule as obtained by Weka. The outcomes display the most typical immediate causes, along with the percentage of accidents with a basis in each association rule. The most important immediate cause is body movement with physical effort or overexertion, and the type of accident is physical effort or overexertion. On the other hand, the second most important immediate cause and type of accident are different between the two scenarios. Data-mining techniques were chosen as a useful tool to find out the root cause of the accidents.

  1. The legacy of war: an epidemiological study of cluster weapon and land mine accidents in Quang Tri Province, Vietnam.

    Science.gov (United States)

    Phung, Tran Kim; Le, Viet; Husum, Hans

    2012-07-01

    The study examines the epidemiology of cluster weapon and land mine accidents in Quang Tri Province since the end of the Vietnam War. The province is located just south of the demarcation line and was the province most affected during the war. In 2009, a cross sectional household study was conducted in all nine districts of the province. During the study period of 1975-2009, 7,030 persons in the study area were exposed to unexploded ordnances (UXO) or land mine accidents, or 1.1% of the provincial population. There were 2,620 fatalities and 4,410 accident survivors. The study documents that the main problem is cluster weapons and other unexploded ordnances; only 4.3% of casualties were caused by land mines. The legacy of the war affects poor people the most; the accident rate was highest among villagers living in mountainous areas, ethnic minorities, and low-income families. The most common activities leading to the accidents were farming (38.6%), collecting scrap metal (11.2%), and herding of cattle (8.3%). The study documents that the people of the Quang Tri Province until this day have suffered heavily due to the legacy of war. Mine risk education programs should account for the epidemiological findings when future accident prevention programs are designed to target high-risk areas and activities.

  2. Multisource Images Analysis Using Collaborative Clustering

    Directory of Open Access Journals (Sweden)

    Pierre Gançarski

    2008-04-01

    Full Text Available The development of very high-resolution (VHR satellite imagery has produced a huge amount of data. The multiplication of satellites which embed different types of sensors provides a lot of heterogeneous images. Consequently, the image analyst has often many different images available, representing the same area of the Earth surface. These images can be from different dates, produced by different sensors, or even at different resolutions. The lack of machine learning tools using all these representations in an overall process constraints to a sequential analysis of these various images. In order to use all the information available simultaneously, we propose a framework where different algorithms can use different views of the scene. Each one works on a different remotely sensed image and, thus, produces different and useful information. These algorithms work together in a collaborative way through an automatic and mutual refinement of their results, so that all the results have almost the same number of clusters, which are statistically similar. Finally, a unique result is produced, representing a consensus among the information obtained by each clustering method on its own image. The unified result and the complementarity of the single results (i.e., the agreement between the clustering methods as well as the disagreement lead to a better understanding of the scene. The experiments carried out on multispectral remote sensing images have shown that this method is efficient to extract relevant information and to improve the scene understanding.

  3. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features.

    Science.gov (United States)

    Nikfarjam, Azadeh; Sarker, Abeed; O'Connor, Karen; Ginn, Rachel; Gonzalez, Graciela

    2015-05-01

    Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words' semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  4. Grey Wolf Optimizer Based on Powell Local Optimization Method for Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Sen Zhang

    2015-01-01

    Full Text Available One heuristic evolutionary algorithm recently proposed is the grey wolf optimizer (GWO, inspired by the leadership hierarchy and hunting mechanism of grey wolves in nature. This paper presents an extended GWO algorithm based on Powell local optimization method, and we call it PGWO. PGWO algorithm significantly improves the original GWO in solving complex optimization problems. Clustering is a popular data analysis and data mining technique. Hence, the PGWO could be applied in solving clustering problems. In this study, first the PGWO algorithm is tested on seven benchmark functions. Second, the PGWO algorithm is used for data clustering on nine data sets. Compared to other state-of-the-art evolutionary algorithms, the results of benchmark and data clustering demonstrate the superior performance of PGWO algorithm.

  5. Mining

    Directory of Open Access Journals (Sweden)

    Khairullah Khan

    2014-09-01

    Full Text Available Opinion mining is an interesting area of research because of its applications in various fields. Collecting opinions of people about products and about social and political events and problems through the Web is becoming increasingly popular every day. The opinions of users are helpful for the public and for stakeholders when making certain decisions. Opinion mining is a way to retrieve information through search engines, Web blogs and social networks. Because of the huge number of reviews in the form of unstructured text, it is impossible to summarize the information manually. Accordingly, efficient computational methods are needed for mining and summarizing the reviews from corpuses and Web documents. This study presents a systematic literature survey regarding the computational techniques, models and algorithms for mining opinion components from unstructured reviews.

  6. Constructing storyboards based on hierarchical clustering analysis

    Science.gov (United States)

    Hasebe, Satoshi; Sami, Mustafa M.; Muramatsu, Shogo; Kikuchi, Hisakazu

    2005-07-01

    There are growing needs for quick preview of video contents for the purpose of improving accessibility of video archives as well as reducing network traffics. In this paper, a storyboard that contains a user-specified number of keyframes is produced from a given video sequence. It is based on hierarchical cluster analysis of feature vectors that are derived from wavelet coefficients of video frames. Consistent use of extracted feature vectors is the key to avoid a repetition of computationally-intensive parsing of the same video sequence. Experimental results suggest that a significant reduction in computational time is gained by this strategy.

  7. Data mining in radiology

    International Nuclear Information System (INIS)

    Kharat, Amit T; Singh, Amarjit; Kulkarni, Vilas M; Shah, Digish

    2014-01-01

    Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining

  8. Clustering box office movie with Partition Around Medoids (PAM) Algorithm based on Text Mining of Indonesian subtitle

    Science.gov (United States)

    Alfarizy, A. D.; Indahwati; Sartono, B.

    2017-03-01

    Indonesia is the largest Hollywood movie industry target market in Southeast Asia in 2015. Hollywood movies distributed in Indonesia targeted people in all range of ages including children. Low awareness of guiding children while watching movies make them could watch any rated films even the unsuitable ones for their ages. Even after being translated into Bahasa and passed the censorship phase, words that uncomfortable for children to watch still exist. The purpose of this research is to cluster box office Hollywood movies based on Indonesian subtitle, revenue, IMDb user rating and genres as one of the reference for adults to choose right movies for their children to watch. Text mining is used to extract words from the subtitles and count the frequency for three group of words (bad words, sexual words and terror words), while Partition Around Medoids (PAM) Algorithm with Gower similarity coefficient as proximity matrix is used as clustering method. We clustered 624 movies from 2006 until first half of 2016 from IMDb. Cluster with highest silhouette coefficient value (0.36) is the one with 5 clusters. Animation, Adventure and Comedy movies with high revenue like in cluster 5 is recommended for children to watch, while Comedy movies with high revenue like in cluster 4 should be avoided to watch.

  9. Analysis of Bonds as an Instrument for Financing Mining Investments

    Science.gov (United States)

    Ranosz, Robert

    2017-06-01

    The purpose of this article is to examine the structure of financing for mining enterprises in the years 2007-2013, with particular emphasis on bonds. The document pays special attention to Polish mining enterprises. The financing structure analysis was based on data collected from financial statements (cash flows) of the largest mining companies in Poland, and their comparison with the results of global mining enterprises pursuant to reports prepared by international advisory firms. The article takes into account capital sources such as: corporate bonds, bank loans and issue of shares. As indicated by the performed analysis, mining enterprises both around the world and in Poland are increasingly eager to take advantage of obtaining business financing from issue of corporate bonds. It should also be recognized that in the analyzed period, both global and Polish mining enterprises deviate from forms of financing such as issue of shares. This may be caused by the fact that the bonds market in Poland is becoming increasingly popular, mainly due to interest rate on bonds being lower in comparison with bank loans. Another reason may be that banks and potential buyers of shares are less eager to finance this type of investment due to a relatively substantial risk acceptable to bondholders.

  10. Social big data mining

    CERN Document Server

    Ishikawa, Hiroshi

    2015-01-01

    Social Media. Big Data and Social Data. Hypotheses in the Era of Big Data. Social Big Data Applications. Basic Concepts in Data Mining. Association Rule Mining. Clustering. Classification. Prediction. Web Structure Mining. Web Content Mining. Web Access Log Mining, Information Extraction and Deep Web Mining. Media Mining. Scalability and Outlier Detection.

  11. Analysis of water control in an underground mine under strong karst media influence (Vazante mine, Brazil)

    Science.gov (United States)

    Ninanya, Hugo; Guiguer, Nilson; Vargas, Eurípedes A.; Nascimento, Gustavo; Araujo, Edmar; Cazarin, Caroline L.

    2018-05-01

    This work presents analysis of groundwater flow conditions and groundwater control measures for Vazante underground mine located in the state of Minas Gerais, Brazil. According to field observations, groundwater flow processes in this mine are highly influenced by the presence of karst features located in the near-surface terrain next to Santa Catarina River. The karstic features, such as caves, sinkholes, dolines and conduits, have direct contact with the aquifer and tend to increase water flow into the mine. These effects are more acute in areas under the influence of groundwater-level drawdown by pumping. Numerical analyses of this condition were carried out using the computer program FEFLOW. This program represents karstic features as one-dimensional discrete flow conduits inside a three-dimensional finite element structure representing the geologic medium following a combined discrete-continuum approach for representing the karst system. These features create preferential flow paths between the river and mine; their incorporation into the model is able to more realistically represent the hydrogeological environment of the mine surroundings. In order to mitigate the water-inflow problems, impermeabilization of the river through construction of a reinforced concrete channel was incorporated in the developed hydrogeological model. Different scenarios for channelization lengths for the most critical zones along the river were studied. Obtained results were able to compare effectiveness of different river channelization scenarios. It was also possible to determine whether the use of these impermeabilization measures would be able to reduce, in large part, the elevated costs of pumping inside the mine.

  12. Communication Base Station Log Analysis Based on Hierarchical Clustering

    Directory of Open Access Journals (Sweden)

    Zhang Shao-Hua

    2017-01-01

    Full Text Available Communication base stations generate massive data every day, these base station logs play an important value in mining of the business circles. This paper use data mining technology and hierarchical clustering algorithm to group the scope of business circle for the base station by recording the data of these base stations.Through analyzing the data of different business circle based on feature extraction and comparing different business circle category characteristics, which can choose a suitable area for operators of commercial marketing.

  13. Cluster Analysis of Maize Inbred Lines

    Directory of Open Access Journals (Sweden)

    Jiban Shrestha

    2016-12-01

    Full Text Available The determination of diversity among inbred lines is important for heterosis breeding. Sixty maize inbred lines were evaluated for their eight agro morphological traits during winter season of 2011 to analyze their genetic diversity. Clustering was done by average linkage method. The inbred lines were grouped into six clusters. Inbred lines grouped into Clusters II had taller plants with maximum number of leaves. The cluster III was characterized with shorter plants with minimum number of leaves. The inbred lines categorized into cluster V had early flowering whereas the group into cluster VI had late flowering time. The inbred lines grouped into the cluster III were characterized by higher value of anthesis silking interval (ASI and those of cluster VI had lower value of ASI. These results showed that the inbred lines having widely divergent clusters can be utilized in hybrid breeding programme.

  14. Analysis of dynamic parameters of mine fans

    Science.gov (United States)

    Russky, E. Yu

    2018-03-01

    The design of the rotor of an axial fan and its main units, namely double leaf blades impeller and the main shaft are discussed. The parameters of a disturbed mine air flow under sudden outbursts are determined and the influence of disturbances on frequencies of axial fan units is assessed. The scope of the assessment embraces the disturbance effect on the blades and on the torsional vibrations of the main shafts. The dependences of the stresses in the elements of the rotor versus the disturbed air flow parameters are derived.

  15. Analysis of Occupational Accidents in Underground and Surface Mining in Spain Using Data-Mining Techniques

    Directory of Open Access Journals (Sweden)

    Lluís Sanmiquel

    2018-03-01

    Full Text Available An analysis of occupational accidents in the mining sector was conducted using the data from the Spanish Ministry of Employment and Social Safety between 2005 and 2015, and data-mining techniques were applied. Data was processed with the software Weka. Two scenarios were chosen from the accidents database: surface and underground mining. The most important variables involved in occupational accidents and their association rules were determined. These rules are composed of several predictor variables that cause accidents, defining its characteristics and context. This study exposes the 20 most important association rules in the sector—either surface or underground mining—based on the statistical confidence levels of each rule as obtained by Weka. The outcomes display the most typical immediate causes, along with the percentage of accidents with a basis in each association rule. The most important immediate cause is body movement with physical effort or overexertion, and the type of accident is physical effort or overexertion. On the other hand, the second most important immediate cause and type of accident are different between the two scenarios. Data-mining techniques were chosen as a useful tool to find out the root cause of the accidents.

  16. Analysis of Occupational Accidents in Underground and Surface Mining in Spain Using Data-Mining Techniques

    Science.gov (United States)

    Sanmiquel, Lluís; Bascompta, Marc; Rossell, Josep M.; Anticoi, Hernán Francisco; Guash, Eduard

    2018-01-01

    An analysis of occupational accidents in the mining sector was conducted using the data from the Spanish Ministry of Employment and Social Safety between 2005 and 2015, and data-mining techniques were applied. Data was processed with the software Weka. Two scenarios were chosen from the accidents database: surface and underground mining. The most important variables involved in occupational accidents and their association rules were determined. These rules are composed of several predictor variables that cause accidents, defining its characteristics and context. This study exposes the 20 most important association rules in the sector—either surface or underground mining—based on the statistical confidence levels of each rule as obtained by Weka. The outcomes display the most typical immediate causes, along with the percentage of accidents with a basis in each association rule. The most important immediate cause is body movement with physical effort or overexertion, and the type of accident is physical effort or overexertion. On the other hand, the second most important immediate cause and type of accident are different between the two scenarios. Data-mining techniques were chosen as a useful tool to find out the root cause of the accidents. PMID:29518921

  17. Environmental impact analysis of mine tailing reservoir

    Science.gov (United States)

    Gong, J. Z.

    2016-08-01

    Under certain conditions landscape topography which utilizes mine tailing reservoir construction using is likely to increase lateral recharge source regions, resulting in dramatic changes to the local hydrological dynamic field and recharge of downstream areas initiated by runoff, excretion state, elevated groundwater depth, shallow groundwater, rainfall direct communication, and thinning of the vadose zone. Corrosive leaching of topsoil over many years of exposure to chemical fertilizers and pesticides may result in their dissolution into the groundwater system, which may lead to excessive amounts of many harmful chemicals, therby affecting the physical and mental health of human residents and increase environmental vulnerability and risk associated with the water and soil. According to field survey data from Yujiakan, Qian'an City, and Hebei provinces, this paper analyzes the hydrogeological environmental mechanisms of areas adjacent to mine tailing reservoirs and establishes a conceptual model of the local groundwater system and the concentration-response function between NO3 - content in groundwater and the incidence of cancer in local residents.

  18. Online Nonparametric Bayesian Activity Mining and Analysis From Surveillance Video.

    Science.gov (United States)

    Bastani, Vahid; Marcenaro, Lucio; Regazzoni, Carlo S

    2016-05-01

    A method for online incremental mining of activity patterns from the surveillance video stream is presented in this paper. The framework consists of a learning block in which Dirichlet process mixture model is employed for the incremental clustering of trajectories. Stochastic trajectory pattern models are formed using the Gaussian process regression of the corresponding flow functions. Moreover, a sequential Monte Carlo method based on Rao-Blackwellized particle filter is proposed for tracking and online classification as well as the detection of abnormality during the observation of an object. Experimental results on real surveillance video data are provided to show the performance of the proposed algorithm in different tasks of trajectory clustering, classification, and abnormality detection.

  19. Unsupervised text mining methods for literature analysis: a case study for Thomas Pynchon's V.

    Directory of Open Access Journals (Sweden)

    Christos Iraklis Tsatsoulis

    2013-08-01

    Full Text Available We investigate the use of unsupervised text mining methods for the analysis of prose literature works, using Thomas Pynchon's novel 'V'. as a case study. Our results suggest that such methods may be employed to reveal meaningful information regarding the novel’s structure. We report results using a wide variety of clustering algorithms, several distinct distance functions, and different visualization techniques. The application of a simple topic model is also demonstrated. We discuss the meaningfulness of our results along with the limitations of our approach, and we suggest some possible paths for further study.

  20. Cluster analysis of word frequency dynamics

    Science.gov (United States)

    Maslennikova, Yu S.; Bochkarev, V. V.; Belashova, I. A.

    2015-01-01

    This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations.

  1. Cluster analysis of word frequency dynamics

    International Nuclear Information System (INIS)

    Maslennikova, Yu S; Bochkarev, V V; Belashova, I A

    2015-01-01

    This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations

  2. HC StratoMineR: A Web-Based Tool for the Rapid Analysis of High-Content Datasets.

    Science.gov (United States)

    Omta, Wienand A; van Heesbeen, Roy G; Pagliero, Romina J; van der Velden, Lieke M; Lelieveld, Daphne; Nellen, Mehdi; Kramer, Maik; Yeong, Marley; Saeidi, Amir M; Medema, Rene H; Spruit, Marco; Brinkkemper, Sjaak; Klumperman, Judith; Egan, David A

    2016-10-01

    High-content screening (HCS) can generate large multidimensional datasets and when aligned with the appropriate data mining tools, it can yield valuable insights into the mechanism of action of bioactive molecules. However, easy-to-use data mining tools are not widely available, with the result that these datasets are frequently underutilized. Here, we present HC StratoMineR, a web-based tool for high-content data analysis. It is a decision-supportive platform that guides even non-expert users through a high-content data analysis workflow. HC StratoMineR is built by using My Structured Query Language for storage and querying, PHP: Hypertext Preprocessor as the main programming language, and jQuery for additional user interface functionality. R is used for statistical calculations, logic and data visualizations. Furthermore, C++ and graphical processor unit power is diffusely embedded in R by using the rcpp and rpud libraries for operations that are computationally highly intensive. We show that we can use HC StratoMineR for the analysis of multivariate data from a high-content siRNA knock-down screen and a small-molecule screen. It can be used to rapidly filter out undesirable data; to select relevant data; and to perform quality control, data reduction, data exploration, morphological hit picking, and data clustering. Our results demonstrate that HC StratoMineR can be used to functionally categorize HCS hits and, thus, provide valuable information for hit prioritization.

  3. From virtual clustering analysis to self-consistent clustering analysis: a mathematical study

    Science.gov (United States)

    Tang, Shaoqiang; Zhang, Lei; Liu, Wing Kam

    2018-03-01

    In this paper, we propose a new homogenization algorithm, virtual clustering analysis (VCA), as well as provide a mathematical framework for the recently proposed self-consistent clustering analysis (SCA) (Liu et al. in Comput Methods Appl Mech Eng 306:319-341, 2016). In the mathematical theory, we clarify the key assumptions and ideas of VCA and SCA, and derive the continuous and discrete Lippmann-Schwinger equations. Based on a key postulation of "once response similarly, always response similarly", clustering is performed in an offline stage by machine learning techniques (k-means and SOM), and facilitates substantial reduction of computational complexity in an online predictive stage. The clear mathematical setup allows for the first time a convergence study of clustering refinement in one space dimension. Convergence is proved rigorously, and found to be of second order from numerical investigations. Furthermore, we propose to suitably enlarge the domain in VCA, such that the boundary terms may be neglected in the Lippmann-Schwinger equation, by virtue of the Saint-Venant's principle. In contrast, they were not obtained in the original SCA paper, and we discover these terms may well be responsible for the numerical dependency on the choice of reference material property. Since VCA enhances the accuracy by overcoming the modeling error, and reduce the numerical cost by avoiding an outer loop iteration for attaining the material property consistency in SCA, its efficiency is expected even higher than the recently proposed SCA algorithm.

  4. Performance Analysis of Indonesia’s Mining Sector Price Index

    Directory of Open Access Journals (Sweden)

    Hastra Reza Satyatama

    2017-07-01

    Full Text Available Subprime mortage’s crisis in United States 2008 giving effect to the global capital markets especially the stock price index of the mining sector Indonesia. This research analyzes the effect of BI Rate, exchange rate, world gold price, crude oil price, and Dow Jones Industrial Average on the stock price index of the mining sector. This research employs time series monthly data of 2009-2016 with Error Correction Model-Engle Granger (ECM-EG as the method. The analysis showed that the BI rate, exchange rate and world gold price, has a negative and significant effect. World oil prices affect positively but not significant meanwhile the Dow Jones Industrial Average has a positive and significant impact on the stock price index of the mining sector. For investors in the mining sector, should pay attention to the exchange rate of the rupiah and Dow Jones Index significantly in the mining sector of the stock price index.DOI: 10.15408/sjie.v6i2.5395 

  5. An analysis of hospital brand mark clusters.

    Science.gov (United States)

    Vollmers, Stacy M; Miller, Darryl W; Kilic, Ozcan

    2010-07-01

    This study analyzed brand mark clusters (i.e., various types of brand marks displayed in combination) used by hospitals in the United States. The brand marks were assessed against several normative criteria for creating brand marks that are memorable and that elicit positive affect. Overall, results show a reasonably high level of adherence to many of these normative criteria. Many of the clusters exhibited pictorial elements that reflected benefits and that were conceptually consistent with the verbal content of the cluster. Also, many clusters featured icons that were balanced and moderately complex. However, only a few contained interactive imagery or taglines communicating benefits.

  6. Genome mining of the sordarin biosynthetic gene cluster from Sordaria araneosa Cain ATCC 36386: characterization of cycloaraneosene synthase and GDP-6-deoxyaltrose transferase.

    Science.gov (United States)

    Kudo, Fumitaka; Matsuura, Yasunori; Hayashi, Takaaki; Fukushima, Masayuki; Eguchi, Tadashi

    2016-07-01

    Sordarin is a glycoside antibiotic with a unique tetracyclic diterpene aglycone structure called sordaricin. To understand its intriguing biosynthetic pathway that may include a Diels-Alder-type [4+2]cycloaddition, genome mining of the gene cluster from the draft genome sequence of the producer strain, Sordaria araneosa Cain ATCC 36386, was carried out. A contiguous 67 kb gene cluster consisting of 20 open reading frames encoding a putative diterpene cyclase, a glycosyltransferase, a type I polyketide synthase, and six cytochrome P450 monooxygenases were identified. In vitro enzymatic analysis of the putative diterpene cyclase SdnA showed that it catalyzes the transformation of geranylgeranyl diphosphate to cycloaraneosene, a known biosynthetic intermediate of sordarin. Furthermore, a putative glycosyltransferase SdnJ was found to catalyze the glycosylation of sordaricin in the presence of GDP-6-deoxy-d-altrose to give 4'-O-demethylsordarin. These results suggest that the identified sdn gene cluster is responsible for the biosynthesis of sordarin. Based on the isolated potential biosynthetic intermediates and bioinformatics analysis, a plausible biosynthetic pathway for sordarin is proposed.

  7. Smartness and Italian Cities. A Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Flavio Boscacci

    2014-05-01

    Full Text Available Smart cities have been recently recognized as the most pleasing and attractive places to live in; due to this, both scholars and policy-makers pay close attention to this topic. Specifically, urban “smartness” has been identified by plenty of characteristics that can be grouped into six dimensions (Giffinger et al. 2007: smart Economy (competitiveness, smart People (social and human capital, smart Governance (participation, smart Mobility (both ICTs and transport, smart Environment (natural resources, and smart Living (quality of life. According to this analytical framework, in the present paper the relation between urban attractiveness and the “smart” characteristics has been investigated in the 103 Italian NUTS3 province capitals in the year 2011. To this aim, a descriptive statistics has been followed by a regression analysis (OLS, where the dependent variable measuring the urban attractiveness has been proxied by housing market prices. Besides, a Cluster Analysis (CA has been developed in order to find differences and commonalities among the province capitals.The OLS results indicate that living, people and economy are the key drivers for achieving a better urban attractiveness. Environment, instead, keeps on playing a minor role. Besides, the CA groups the province capitals a

  8. Development of Database for Accident Analysis in Indian Mines

    Science.gov (United States)

    Tripathy, Debi Prasad; Guru Raghavendra Reddy, K.

    2016-10-01

    Mining is a hazardous industry and high accident rates associated with underground mining is a cause of deep concern. Technological developments notwithstanding, rate of fatal accidents and reportable incidents have not shown corresponding levels of decline. This paper argues that adoption of appropriate safety standards by both mine management and the government may result in appreciable reduction in accident frequency. This can be achieved by using the technology in improving the working conditions, sensitising workers and managers about causes and prevention of accidents. Inputs required for a detailed analysis of an accident include information on location, time, type, cost of accident, victim, nature of injury, personal and environmental factors etc. Such information can be generated from data available in the standard coded accident report form. This paper presents a web based application for accident analysis in Indian mines during 2001-2013. An accident database (SafeStat) prototype based on Intranet of the TCP/IP agreement, as developed by the authors, is also discussed.

  9. CLUSTERING ANALYSIS OF OFFICER'S BEHAVIOURS IN LONDON POLICE FOOT PATROL ACTIVITIES

    Directory of Open Access Journals (Sweden)

    J. Shen

    2015-07-01

    Full Text Available In this small paper we aim at presenting a framework of conceptual representation and clustering analysis of police officers’ patrol pattern obtained from mining their raw movement trajectory data. This have been achieved by a model developed to accounts for the spatio-temporal dynamics human movements by incorporating both the behaviour features of the travellers and the semantic meaning of the environment they are moving in. Hence, the similarity metric of traveller behaviours is jointly defined according to the stay time allocation in each Spatio-temporal region of interests (ST-ROI to support clustering analysis of patrol behaviours. The proposed framework enables the analysis of behaviour and preferences on higher level based on raw moment trajectories. The model is firstly applied to police patrol data provided by the Metropolitan Police and will be tested by other type of dataset afterwards.

  10. Detection of land mines using fast and thermal neutron analysis

    International Nuclear Information System (INIS)

    Bach, P.

    1998-01-01

    The detection of land mines is made possible by using nuclear sensor based on neutron interrogation. Neutron interrogation allows to detect the sensitive elements (C, H, O, N) of the explosives in land mines or in unexploded shells: the evaluation of characteristic ratio N/O and C/O in a volume element gives a signature of high explosives. Fast neutron interrogation has been qualified in our laboratories as a powerful close distance method for identifying the presence of a mine or explosive. This method could be implemented together with a multisensor detection system - for instance IR or microwave - to reduce the false alarm rate by addressing the suspected area. Principle of operation is based on the measurement of gamma rays induced by neutron interaction with irradiated nuclei from the soil and from a possible mine. Specific energy of these gamma rays allows to recognise the elements at the origin of neutron interaction. Several detection methods can be used, depending on nuclei to be identified. Analysis of physical data, computations by simulation codes, and experimentations performed in our laboratory have shown the interest of Fast Neutron Analysis (FNA) combined with Thermal Neutron Analysis (TNA) techniques, especially for detection of nitrogen 14 N, carbon 12 C and oxygen 16 O. The FNA technique can be implemented using a 14 MeV sealed neutron tube, and a set of detectors. The mines detection has been demonstrated from our investigations, using a low power neutron generator working in the 10 8 n/s range, which is reasonable when considering safety rules. A fieldable demonstrator would be made with a detection head including tube and detectors, and with remote electronics, power supplies and computer installed in a vehicle. (author)

  11. Taxonomical analysis of the Cancer cluster of galaxies

    International Nuclear Information System (INIS)

    Perea, J.; Olmo, A. del; Moles, M.

    1986-01-01

    A description is presented of the Cancer cluster of galaxies, based on a taxonomical analysis in (α,delta, Vsub(r)) space. Earlier results by previous authors on the lack of dynamical entity of the cluster are confirmed. The present analysis points out the existence of a binary structure in the most populated region of the complex. (author)

  12. Simultaneous Two-Way Clustering of Multiple Correspondence Analysis

    Science.gov (United States)

    Hwang, Heungsun; Dillon, William R.

    2010-01-01

    A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is…

  13. Mining survey data for SWOT analysis

    OpenAIRE

    Phadermrod, Boonyarat

    2016-01-01

    Strengths, Weaknesses, Opportunities and Threats (SWOT) analysis is one of the most important tools for strategic planning. The traditional method of conducting SWOT analysis does not prioritize and is likely to hold subjective views that may result in an improper strategic action. Accordingly, this research exploits Importance-Performance Analysis (IPA), a technique for measuring customers’ satisfaction based on survey data, to systematically generate prioritized SWOT factors based on custom...

  14. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...

  15. Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.

    Science.gov (United States)

    Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun

    2017-01-01

    Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.

  16. Analysis of queuing mine-cars affecting shaft station radon concentrations in Quzhou uranium mine, eastern China

    Directory of Open Access Journals (Sweden)

    Changshou Hong

    2018-04-01

    Full Text Available Shaft stations of underground uranium mines in China are not only utilized as waiting space for loaded mine-cars queuing to be hoisted but also as the principal channel for fresh air taken to working places. Therefore, assessment of how mine-car queuing processes affect shaft station radon concentration was carried out. Queuing network of mine-cars has been analyzed in an underground uranium mine, located in Quzhou, Zhejiang province of Eastern China. On the basis of mathematical analysis of the queue network, a MATLAB-based quasi-random number generating program utilizing Monte-Carlo methods was worked out. Extensive simulations were then implemented via MATALB operating on a DELL PC. Thereafter, theoretical calculations and field measurements of shaft station radon concentrations for several working conditions were performed. The queuing performance measures of interest, like average queuing length and waiting time, were found to be significantly affected by the utilization rate (positively correlated. However, even with respect to the “worst case”, the shaft station radon concentration was always lower than 200 Bq/m3. The model predictions were compared with the measuring results, and a satisfactory agreement was noted. Under current working conditions, queuing-induced variations of shaft station radon concentration of the study mine are not remarkable. Keywords: Hoist and Transport Systems, Mine-cars, Queuing Simulation, Radon Concentration, Underground Uranium Mine

  17. The Analysis of Object-Based Change Detection in Mining Area: a Case Study with Pingshuo Coal Mine

    Science.gov (United States)

    Zhang, M.; Zhou, W.; Li, Y.

    2017-09-01

    Accurate information on mining land use and land cover change are crucial for monitoring and environmental change studies. In this paper, RapidEye Remote Sensing Image (Map 2012) and SPOT7 Remote Sensing Image (Map 2015) in Pingshuo Mining Area are selected to monitor changes combined with object-based classification and change vector analysis method, we also used R in highresolution remote sensing image for mining land classification, and found the feasibility and the flexibility of open source software. The results show that (1) the classification of reclaimed mining land has higher precision, the overall accuracy and kappa coefficient of the classification of the change region map were 86.67 % and 89.44 %. It's obvious that object-based classification and change vector analysis which has a great significance to improve the monitoring accuracy can be used to monitor mining land, especially reclaiming mining land; (2) the vegetation area changed from 46 % to 40 % accounted for the proportion of the total area from 2012 to 2015, and most of them were transformed into the arable land. The sum of arable land and vegetation area increased from 51 % to 70 %; meanwhile, build-up land has a certain degree of increase, part of the water area was transformed into arable land, but the extent of the two changes is not obvious. The result illustrated the transformation of reclaimed mining area, at the same time, there is still some land convert to mining land, and it shows the mine is still operating, mining land use and land cover are the dynamic procedure.

  18. The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

    Science.gov (United States)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-07-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  19. ANALYSIS METHODS OF BANKRUPTCY RISK IN ROMANIAN ENERGY MINING INDUSTRY

    Directory of Open Access Journals (Sweden)

    CORICI MARIAN CATALIN

    2016-12-01

    Full Text Available The study is an analysis of bankruptcy risk and assessing the economic performance of the entity in charge of energy mining industry from southwest region. The scientific activity assesses the risk of bankruptcy using score’s method and some indicators witch reflecting the results obtained and elements from organization balance sheet involved in mining and energy which contributes to the stability of the national energy system. Analysis undertaken is focused on the application of the business organization models that allow a comprehensive assessment of the risk of bankruptcy and be an instrument of its forecast. In this study will be highlighted developments bankruptcy risk within the organization through the Altman model and Conan-Holder model in order to show a versatile image on the organization's ability to ensure business continuity

  20. Data Analysis and Data Mining: Current Issues in Biomedical Informatics

    Science.gov (United States)

    Bellazzi, Riccardo; Diomidous, Marianna; Sarkar, Indra Neil; Takabayashi, Katsuhiko; Ziegler, Andreas; McCray, Alexa T.

    2011-01-01

    Summary Background Medicine and biomedical sciences have become data-intensive fields, which, at the same time, enable the application of data-driven approaches and require sophisticated data analysis and data mining methods. Biomedical informatics provides a proper interdisciplinary context to integrate data and knowledge when processing available information, with the aim of giving effective decision-making support in clinics and translational research. Objectives To reflect on different perspectives related to the role of data analysis and data mining in biomedical informatics. Methods On the occasion of the 50th year of Methods of Information in Medicine a symposium was organized, that reflected on opportunities, challenges and priorities of organizing, representing and analysing data, information and knowledge in biomedicine and health care. The contributions of experts with a variety of backgrounds in the area of biomedical data analysis have been collected as one outcome of this symposium, in order to provide a broad, though coherent, overview of some of the most interesting aspects of the field. Results The paper presents sections on data accumulation and data-driven approaches in medical informatics, data and knowledge integration, statistical issues for the evaluation of data mining models, translational bioinformatics and bioinformatics aspects of genetic epidemiology. Conclusions Biomedical informatics represents a natural framework to properly and effectively apply data analysis and data mining methods in a decision-making context. In the future, it will be necessary to preserve the inclusive nature of the field and to foster an increasing sharing of data and methods between researchers. PMID:22146916

  1. Cluster Analysis of Customer Reviews Extracted from Web Pages

    Directory of Open Access Journals (Sweden)

    S. Shivashankar

    2010-01-01

    Full Text Available As e-commerce is gaining popularity day by day, the web has become an excellent source for gathering customer reviews / opinions by the market researchers. The number of customer reviews that a product receives is growing at very fast rate (It could be in hundreds or thousands. Customer reviews posted on the websites vary greatly in quality. The potential customer has to read necessarily all the reviews irrespective of their quality to make a decision on whether to purchase the product or not. In this paper, we make an attempt to assess are view based on its quality, to help the customer make a proper buying decision. The quality of customer review is assessed as most significant, more significant, significant and insignificant.A novel and effective web mining technique is proposed for assessing a customer review of a particular product based on the feature clustering techniques, namely, k-means method and fuzzy c-means method. This is performed in three steps : (1Identify review regions and extract reviews from it, (2 Extract and cluster the features of reviews by a clustering technique and then assign weights to the features belonging to each of the clusters (groups and (3 Assess the review by considering the feature weights and group belongingness. The k-means and fuzzy c-means clustering techniques are implemented and tested on customer reviews extracted from web pages. Performance of these techniques are analyzed.

  2. Allergen Sensitization Pattern by Sex: A Cluster Analysis in Korea.

    Science.gov (United States)

    Ohn, Jungyoon; Paik, Seung Hwan; Doh, Eun Jin; Park, Hyun-Sun; Yoon, Hyun-Sun; Cho, Soyun

    2017-12-01

    Allergens tend to sensitize simultaneously. Etiology of this phenomenon has been suggested to be allergen cross-reactivity or concurrent exposure. However, little is known about specific allergen sensitization patterns. To investigate the allergen sensitization characteristics according to gender. Multiple allergen simultaneous test (MAST) is widely used as a screening tool for detecting allergen sensitization in dermatologic clinics. We retrospectively reviewed the medical records of patients with MAST results between 2008 and 2014 in our Department of Dermatology. A cluster analysis was performed to elucidate the allergen-specific immunoglobulin (Ig)E cluster pattern. The results of MAST (39 allergen-specific IgEs) from 4,360 cases were analyzed. By cluster analysis, 39items were grouped into 8 clusters. Each cluster had characteristic features. When compared with female, the male group tended to be sensitized more frequently to all tested allergens, except for fungus allergens cluster. The cluster and comparative analysis results demonstrate that the allergen sensitization is clustered, manifesting allergen similarity or co-exposure. Only the fungus cluster allergens tend to sensitize female group more frequently than male group.

  3. Study on Adaptive Parameter Determination of Cluster Analysis in Urban Management Cases

    Science.gov (United States)

    Fu, J. Y.; Jing, C. F.; Du, M. Y.; Fu, Y. L.; Dai, P. P.

    2017-09-01

    The fine management for cities is the important way to realize the smart city. The data mining which uses spatial clustering analysis for urban management cases can be used in the evaluation of urban public facilities deployment, and support the policy decisions, and also provides technical support for the fine management of the city. Aiming at the problem that DBSCAN algorithm which is based on the density-clustering can not realize parameter adaptive determination, this paper proposed the optimizing method of parameter adaptive determination based on the spatial analysis. Firstly, making analysis of the function Ripley's K for the data set to realize adaptive determination of global parameter MinPts, which means setting the maximum aggregation scale as the range of data clustering. Calculating every point object's highest frequency K value in the range of Eps which uses K-D tree and setting it as the value of clustering density to realize the adaptive determination of global parameter MinPts. Then, the R language was used to optimize the above process to accomplish the precise clustering of typical urban management cases. The experimental results based on the typical case of urban management in XiCheng district of Beijing shows that: The new DBSCAN clustering algorithm this paper presents takes full account of the data's spatial and statistical characteristic which has obvious clustering feature, and has a better applicability and high quality. The results of the study are not only helpful for the formulation of urban management policies and the allocation of urban management supervisors in XiCheng District of Beijing, but also to other cities and related fields.

  4. STUDY ON ADAPTIVE PARAMETER DETERMINATION OF CLUSTER ANALYSIS IN URBAN MANAGEMENT CASES

    Directory of Open Access Journals (Sweden)

    J. Y. Fu

    2017-09-01

    Full Text Available The fine management for cities is the important way to realize the smart city. The data mining which uses spatial clustering analysis for urban management cases can be used in the evaluation of urban public facilities deployment, and support the policy decisions, and also provides technical support for the fine management of the city. Aiming at the problem that DBSCAN algorithm which is based on the density-clustering can not realize parameter adaptive determination, this paper proposed the optimizing method of parameter adaptive determination based on the spatial analysis. Firstly, making analysis of the function Ripley's K for the data set to realize adaptive determination of global parameter MinPts, which means setting the maximum aggregation scale as the range of data clustering. Calculating every point object’s highest frequency K value in the range of Eps which uses K-D tree and setting it as the value of clustering density to realize the adaptive determination of global parameter MinPts. Then, the R language was used to optimize the above process to accomplish the precise clustering of typical urban management cases. The experimental results based on the typical case of urban management in XiCheng district of Beijing shows that: The new DBSCAN clustering algorithm this paper presents takes full account of the data’s spatial and statistical characteristic which has obvious clustering feature, and has a better applicability and high quality. The results of the study are not only helpful for the formulation of urban management policies and the allocation of urban management supervisors in XiCheng District of Beijing, but also to other cities and related fields.

  5. Incremental temporal pattern mining using efficient batch-free stream clustering

    NARCIS (Netherlands)

    Lu, Y.; Hassani, M.; Seidl, T.

    2017-01-01

    This paper address the problem of temporal pattern mining from multiple data streams containing temporal events. Temporal events are considered as real world events aligned with comprehensive starting and ending timing information rather than simple integer timestamps. Predefined relations, such as

  6. Clustering of users of digital libraries through log file analysis

    Directory of Open Access Journals (Sweden)

    Juan Antonio Martínez-Comeche

    2017-09-01

    Full Text Available This study analyzes how users perform information retrieval tasks when introducing queries to the Hispanic Digital Library. Clusters of users are differentiated based on their distinct information behavior. The study used the log files collected by the server over a year and different possible clustering algorithms are compared. The k-means algorithm is found to be a suitable clustering method for the analysis of large log files from digital libraries. In the case of the Hispanic Digital Library the results show three clusters of users and the characteristic information behavior of each group is described.

  7. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.

    Science.gov (United States)

    Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy

    2016-01-01

    Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.

  8. K-Line Patterns’ Predictive Power Analysis Using the Methods of Similarity Match and Clustering

    Directory of Open Access Journals (Sweden)

    Lv Tao

    2017-01-01

    Full Text Available Stock price prediction based on K-line patterns is the essence of candlestick technical analysis. However, there are some disputes on whether the K-line patterns have predictive power in academia. To help resolve the debate, this paper uses the data mining methods of pattern recognition, pattern clustering, and pattern knowledge mining to research the predictive power of K-line patterns. The similarity match model and nearest neighbor-clustering algorithm are proposed for solving the problem of similarity match and clustering of K-line series, respectively. The experiment includes testing the predictive power of the Three Inside Up pattern and Three Inside Down pattern with the testing dataset of the K-line series data of Shanghai 180 index component stocks over the latest 10 years. Experimental results show that (1 the predictive power of a pattern varies a great deal for different shapes and (2 each of the existing K-line patterns requires further classification based on the shape feature for improving the prediction performance.

  9. A SURVEY ON DOCUMENT CLUSTERING APPROACH FOR COMPUTER FORENSIC ANALYSIS

    OpenAIRE

    Monika Raghuvanshi*, Rahul Patel

    2016-01-01

    In a forensic analysis, large numbers of files are examined. Much of the information comprises of in unstructured format, so it’s quite difficult task for computer forensic to perform such analysis. That’s why to do the forensic analysis of document within a limited period of time require a special approach such as document clustering. This paper review different document clustering algorithms methodologies for example K-mean, K-medoid, single link, complete link, average link in accorandance...

  10. A data mining approach to dinoflagellate clustering according to sterol composition: Correlations with evolutionary history.

    Science.gov (United States)

    This study examined the sterol compositions of 102 dinoflagellates (including several previously unexamined species) using clustering techniques as a means of determining the relatedness of the organisms. In addition, dinoflagellate sterol-based relationships were compared statistically to dinoflag...

  11. SURVEY ON CRIME ANALYSIS AND PREDICTION USING DATA MINING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    H Benjamin Fredrick David

    2017-04-01

    Full Text Available Data Mining is the procedure which includes evaluating and examining large pre-existing databases in order to generate new information which may be essential to the organization. The extraction of new information is predicted using the existing datasets. Many approaches for analysis and prediction in data mining had been performed. But, many few efforts has made in the criminology field. Many few have taken efforts for comparing the information all these approaches produce. The police stations and other similar criminal justice agencies hold many large databases of information which can be used to predict or analyze the criminal movements and criminal activity involvement in the society. The criminals can also be predicted based on the crime data. The main aim of this work is to perform a survey on the supervised learning and unsupervised learning techniques that has been applied towards criminal identification. This paper presents the survey on the Crime analysis and crime prediction using several Data Mining techniques.

  12. Merging Galaxy Clusters: Analysis of Simulated Analogs

    Science.gov (United States)

    Nguyen, Jayke; Wittman, David; Cornell, Hunter

    2018-01-01

    The nature of dark matter can be better constrained by observing merging galaxy clusters. However, uncertainty in the viewing angle leads to uncertainty in dynamical quantities such as 3-d velocities, 3-d separations, and time since pericenter. The classic timing argument links these quantities via equations of motion, but neglects effects of nonzero impact parameter (i.e. it assumes velocities are parallel to the separation vector), dynamical friction, substructure, and larger-scale environment. We present a new approach using n-body cosmological simulations that naturally incorporate these effects. By uniformly sampling viewing angles about simulated cluster analogs, we see projected merger parameters in the many possible configurations of a given cluster. We select comparable simulated analogs and evaluate the likelihood of particular merger parameters as a function of viewing angle. We present viewing angle constraints for a sample of observed mergers including the Bullet cluster and El Gordo, and show that the separation vectors are closer to the plane of the sky than previously reported.

  13. Analysis of Aspects of Innovation in a Brazilian Cluster

    Directory of Open Access Journals (Sweden)

    Adriana Valélia Saraceni

    2012-09-01

    Full Text Available Innovation through clustering has become very important on the increased significance that interaction represents on innovation and learning process concept. This study aims to identify whereas a case analysis on innovation process in a cluster represents on the learning process. Therefore, this study is developed in two stages. First, we used a preliminary case study verifying a cluster innovation analysis and it Innovation Index, for further, exploring a combined body of theory and practice. Further, the second stage is developed by exploring the learning process concept. Both stages allowed us building a theory model for the learning process development in clusters. The main results of the model development come up with a mechanism of improvement implementation on clusters when case studies are applied.

  14. Mining Predictors of Success in Air Force Flight Training Regiments via Semantic Analysis of Instructor Evaluations

    Science.gov (United States)

    2018-03-01

    the flight-training course. 14. SUBJECT TERMS text mining , feedback analysis, semantic network, binary classification 15. NUMBER OF PAGES 105 16...A. TEXT MINING ..........................................................................................5 B. SEMANTIC WORD NETWORK...13 Figure 2. Text Mining Pre-Processing Techniques. Source: Vijayarani (2015). ............20 Figure 3. From text

  15. Uranium solution mining cost estimating technique: means for rapid comparative analysis of deposits

    International Nuclear Information System (INIS)

    Anon.

    1978-01-01

    Twelve graphs provide a technique for determining relative cost ranges for uranium solution mining projects. The use of the technique can provide a consistent framework for rapid comparative analysis of various properties of mining situations. The technique is also useful to determine the sensitivities of cost figures to incremental changes in mining factors or deposit characteristics

  16. Traversability analysis for a mine safety inspection robot

    CSIR Research Space (South Africa)

    Senekal, F

    2013-09-01

    Full Text Available A new fast algorithm for traversability analysis of an arbitrary three-dimensional point cloud is presented. The algorithm segments a three-dimensional point cloud into vertical sections; each of which is clustered into bins and further analysed...

  17. A Flocking Based algorithm for Document Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.

  18. Reproducibility of Cognitive Profiles in Psychosis Using Cluster Analysis.

    Science.gov (United States)

    Lewandowski, Kathryn E; Baker, Justin T; McCarthy, Julie M; Norris, Lesley A; Öngür, Dost

    2018-04-01

    Cognitive dysfunction is a core symptom dimension that cuts across the psychoses. Recent findings support classification of patients along the cognitive dimension using cluster analysis; however, data-derived groupings may be highly determined by sampling characteristics and the measures used to derive the clusters, and so their interpretability must be established. We examined cognitive clusters in a cross-diagnostic sample of patients with psychosis and associations with clinical and functional outcomes. We then compared our findings to a previous report of cognitive clusters in a separate sample using a different cognitive battery. Participants with affective or non-affective psychosis (n=120) and healthy controls (n=31) were administered the MATRICS Consensus Cognitive Battery, and clinical and community functioning assessments. Cluster analyses were performed on cognitive variables, and clusters were compared on demographic, cognitive, and clinical measures. Results were compared to findings from our previous report. A four-cluster solution provided a good fit to the data; profiles included a neuropsychologically normal cluster, a globally impaired cluster, and two clusters of mixed profiles. Cognitive burden was associated with symptom severity and poorer community functioning. The patterns of cognitive performance by cluster were highly consistent with our previous findings. We found evidence of four cognitive subgroups of patients with psychosis, with cognitive profiles that map closely to those produced in our previous work. Clusters were associated with clinical and community variables and a measure of premorbid functioning, suggesting that they reflect meaningful groupings: replicable, and related to clinical presentation and functional outcomes. (JINS, 2018, 24, 382-390).

  19. ANALYSIS OF WEB MINING APPLICATIONS AND BENEFICIAL AREAS

    Directory of Open Access Journals (Sweden)

    Khaleel Ahmad

    2011-10-01

    Full Text Available The main purpose of this paper is to study the process of Web mining techniques, features, application ( e-commerce and e-business and its beneficial areas. Web mining has become more popular and its widely used in varies application areas (such as business intelligent system, e-commerce and e-business. The e-commerce or e-business results are bettered by the application of the mining techniques such as data mining and text mining, among all the mining techniques web mining is better.

  20. Network Analysis Tools: from biological networks to clusters and pathways.

    Science.gov (United States)

    Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Vanderstocken, Gilles; van Helden, Jacques

    2008-01-01

    Network Analysis Tools (NeAT) is a suite of computer tools that integrate various algorithms for the analysis of biological networks: comparison between graphs, between clusters, or between graphs and clusters; network randomization; analysis of degree distribution; network-based clustering and path finding. The tools are interconnected to enable a stepwise analysis of the network through a complete analytical workflow. In this protocol, we present a typical case of utilization, where the tasks above are combined to decipher a protein-protein interaction network retrieved from the STRING database. The results returned by NeAT are typically subnetworks, networks enriched with additional information (i.e., clusters or paths) or tables displaying statistics. Typical networks comprising several thousands of nodes and arcs can be analyzed within a few minutes. The complete protocol can be read and executed in approximately 1 h.

  1. The ClusTree : indexing micro-clusters for anytime stream mining

    DEFF Research Database (Denmark)

    Kranen, Philipp; Assent, Ira; Baldauf, Corinna

    2011-01-01

    -arrival times of the stream. Likewise, memory is limited, making it impossible to store all data. For clustering, we are faced with the challenge of maintaining a current result that can be presented to the user at any given time. In this work, we propose a parameter-free algorithm that automatically adapts...... introduce the ClusTree, a compact and self-adaptive index structure for maintaining stream summaries. Additionally we present solutions to handle very fast streams through aggregation mechanisms and propose novel descent strategies that improve the clustering result on slower streams as long as time permits...

  2. Cluster analysis of typhoid cases in Kota Bharu, Kelantan, Malaysia

    Directory of Open Access Journals (Sweden)

    Nazarudin Safian

    2008-09-01

    Full Text Available Typhoid fever is still a major public health problem globally as well as in Malaysia. This study was done to identify the spatial epidemiology of typhoid fever in the Kota Bharu District of Malaysia as a first step to developing more advanced analysis of the whole country. The main characteristic of the epidemiological pattern that interested us was whether typhoid cases occurred in clusters or whether they were evenly distributed throughout the area. We also wanted to know at what spatial distances they were clustered. All confirmed typhoid cases that were reported to the Kota Bharu District Health Department from the year 2001 to June of 2005 were taken as the samples. From the home address of the cases, the location of the house was traced and a coordinate was taken using handheld GPS devices. Spatial statistical analysis was done to determine the distribution of typhoid cases, whether clustered, random or dispersed. The spatial statistical analysis was done using CrimeStat III software to determine whether typhoid cases occur in clusters, and later on to determine at what distances it clustered. From 736 cases involved in the study there was significant clustering for cases occurring in the years 2001, 2002, 2003 and 2005. There was no significant clustering in year 2004. Typhoid clustering also occurred strongly for distances up to 6 km. This study shows that typhoid cases occur in clusters, and this method could be applicable to describe spatial epidemiology for a specific area. (Med J Indones 2008; 17: 175-82Keywords: typhoid, clustering, spatial epidemiology, GIS

  3. Planning, implementation and analysis of mine-surveying measurements to detect rock movements at the Asse salt mine

    International Nuclear Information System (INIS)

    Hensel, G.

    1991-01-01

    At the Asse pit, a former salt mine, research has been done since 1965 mainly for the ultimate disposal of radioactive wastes. Within this framework a mine-surveying measurement program has been developed to detect local and extensive rock movements in the mine structure and on the surface. The rock observation program consists of surface levelling, levellings in the mine structure, measurement of shaft depth, shaft sounding, position and gyroscopic measurements as well as cavity convergence and extensometer measurements. The results of that measuring program are taken into account to judge stability. The subject of this work is to analyse the position measurements by priorities to find out to which extent the results, that is the horizontal displacement components, are interpretable. Such analysis is carried out according to the rules of compensating calculation by means of strict compensation after mediating observations. (HS) [de

  4. Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-Means Cluster Analysis

    Science.gov (United States)

    de Craen, Saskia; Commandeur, Jacques J. F.; Frank, Laurence E.; Heiser, Willem J.

    2006-01-01

    K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these…

  5. A Novel Double Cluster and Principal Component Analysis-Based Optimization Method for the Orbit Design of Earth Observation Satellites

    Directory of Open Access Journals (Sweden)

    Yunfeng Dong

    2017-01-01

    Full Text Available The weighted sum and genetic algorithm-based hybrid method (WSGA-based HM, which has been applied to multiobjective orbit optimizations, is negatively influenced by human factors through the artificial choice of the weight coefficients in weighted sum method and the slow convergence of GA. To address these two problems, a cluster and principal component analysis-based optimization method (CPC-based OM is proposed, in which many candidate orbits are gradually randomly generated until the optimal orbit is obtained using a data mining method, that is, cluster analysis based on principal components. Then, the second cluster analysis of the orbital elements is introduced into CPC-based OM to improve the convergence, developing a novel double cluster and principal component analysis-based optimization method (DCPC-based OM. In DCPC-based OM, the cluster analysis based on principal components has the advantage of reducing the human influences, and the cluster analysis based on six orbital elements can reduce the search space to effectively accelerate convergence. The test results from a multiobjective numerical benchmark function and the orbit design results of an Earth observation satellite show that DCPC-based OM converges more efficiently than WSGA-based HM. And DCPC-based OM, to some degree, reduces the influence of human factors presented in WSGA-based HM.

  6. ArrayMining: a modular web-application for microarray analysis combining ensemble and consensus methods with cross-study normalization

    Directory of Open Access Journals (Sweden)

    Krasnogor Natalio

    2009-10-01

    Full Text Available Abstract Background Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks. Results We present ArrayMining.net, a web-application for microarray analysis that provides easy access to a wide choice of feature selection, clustering, prediction, gene set analysis and cross-study normalization methods. In contrast to other microarray-related web-tools, multiple algorithms and data sets for an analysis task can be combined using ensemble feature selection, ensemble prediction, consensus clustering and cross-platform data integration. By interlinking different analysis tools in a modular fashion, new exploratory routes become available, e.g. ensemble sample classification using features obtained from a gene set analysis and data from multiple studies. The analysis is further simplified by automatic parameter selection mechanisms and linkage to web tools and databases for functional annotation and literature mining. Conclusion ArrayMining.net is a free web-application for microarray analysis combining a broad choice of algorithms based on ensemble and consensus methods, using automatic parameter selection and integration with annotation databases.

  7. Using cluster analysis to organize and explore regional GPS velocities

    Science.gov (United States)

    Simpson, Robert W.; Thatcher, Wayne; Savage, James C.

    2012-01-01

    Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.

  8. Data Exploration and Analysis of Alternative Learning System Accreditation and Equivalency Test Result Using Data Mining

    Science.gov (United States)

    Talingdan, J. A.; Trinidad, J. T., Jr.; Palaoag, T. D.

    2018-03-01

    Alternative Learning System (ALS) is a subsystem of Depatment of Education (DepEd) that serves as an option of learners who cannot afford to go in a formal education. The research focuses on the data exploration and analysis of ALS accreditation and equivalency test result using data mining. The ALS 2014 to 2016 A & E test results in the secondary level were used as data sets in the study. The A & E test results revealed that the passing rate is doubled per year. The results were clustered using k- means clustering algorithm and they were grouped into good, medium, and low standard learners to identify students need exceptional stuff for enhancement. From the clustered data, it was found out that the strand they are weak in is strand 4 which is the Development of Self and a Sense of Community with a general average of 84.23. It also revealed that the essay type of exam got the lowest score with a general average of 2.14 compared to the multiple type of exam that covers the five learning strands. Furthermore, decision tree and naive bayes were also employed in the study to predict the performance of the learners in the A & E test and determine which is better to use for prediction. It was concluded that naive bayes performs better because the accuracy rate is higher than the decision tree algorithm.

  9. A novel procedure on next generation sequencing data analysis using text mining algorithm.

    Science.gov (United States)

    Zhao, Weizhong; Chen, James J; Perkins, Roger; Wang, Yuping; Liu, Zhichao; Hong, Huixiao; Tong, Weida; Zou, Wen

    2016-05-13

    Next-generation sequencing (NGS) technologies have provided researchers with vast possibilities in various biological and biomedical research areas. Efficient data mining strategies are in high demand for large scale comparative and evolutional studies to be performed on the large amounts of data derived from NGS projects. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. We report a novel procedure to analyse NGS data using topic modeling. It consists of four major procedures: NGS data retrieval, preprocessing, topic modeling, and data mining using Latent Dirichlet Allocation (LDA) topic outputs. The NGS data set of the Salmonella enterica strains were used as a case study to show the workflow of this procedure. The perplexity measurement of the topic numbers and the convergence efficiencies of Gibbs sampling were calculated and discussed for achieving the best result from the proposed procedure. The output topics by LDA algorithms could be treated as features of Salmonella strains to accurately describe the genetic diversity of fliC gene in various serotypes. The results of a two-way hierarchical clustering and data matrix analysis on LDA-derived matrices successfully classified Salmonella serotypes based on the NGS data. The implementation of topic modeling in NGS data analysis procedure provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data. The implementation of topic modeling in NGS data analysis provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data.

  10. Mining concepts of health responsibility using text mining and exploratory graph analysis.

    Science.gov (United States)

    Kjellström, Sofia; Golino, Hudson

    2018-05-24

    Occupational therapists need to know about people's beliefs about personal responsibility for health to help them pursue everyday activities. The study aims to employ state-of-the-art quantitative approaches to understand people's views of health and responsibility at different ages. A mixed method approach was adopted, using text mining to extract information from 233 interviews with participants aged 5 to 96 years, and then exploratory graph analysis to estimate the number of latent variables. The fit of the structure estimated via the exploratory graph analysis was verified using confirmatory factor analysis. Exploratory graph analysis estimated three dimensions of health responsibility: (1) creating good health habits and feeling good; (2) thinking about one's own health and wanting to improve it; and 3) adopting explicitly normative attitudes to take care of one's health. The comparison between the three dimensions among age groups showed, in general, that children and adolescents, as well as the old elderly (>73 years old) expressed ideas about personal responsibility for health less than young adults, adults and young elderly. Occupational therapists' knowledge of the concepts of health responsibility is of value when working with a patient's health, but an identified challenge is how to engage children and older persons.

  11. A Novel Divisive Hierarchical Clustering Algorithm for Geospatial Analysis

    Directory of Open Access Journals (Sweden)

    Shaoning Li

    2017-01-01

    Full Text Available In the fields of geographic information systems (GIS and remote sensing (RS, the clustering algorithm has been widely used for image segmentation, pattern recognition, and cartographic generalization. Although clustering analysis plays a key role in geospatial modelling, traditional clustering methods are limited due to computational complexity, noise resistant ability and robustness. Furthermore, traditional methods are more focused on the adjacent spatial context, which makes it hard for the clustering methods to be applied to multi-density discrete objects. In this paper, a new method, cell-dividing hierarchical clustering (CDHC, is proposed based on convex hull retraction. The main steps are as follows. First, a convex hull structure is constructed to describe the global spatial context of geospatial objects. Then, the retracting structure of each borderline is established in sequence by setting the initial parameter. The objects are split into two clusters (i.e., “sub-clusters” if the retracting structure intersects with the borderlines. Finally, clusters are repeatedly split and the initial parameter is updated until the terminate condition is satisfied. The experimental results show that CDHC separates the multi-density objects from noise sufficiently and also reduces complexity compared to the traditional agglomerative hierarchical clustering algorithm.

  12. A Distributed Flocking Approach for Information Stream Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Intelligence analysts are currently overwhelmed with the amount of information streams generated everyday. There is a lack of comprehensive tool that can real-time analyze the information streams. Document clustering analysis plays an important role in improving the accuracy of information retrieval. However, most clustering technologies can only be applied for analyzing the static document collection because they normally require a large amount of computation resource and long time to get accurate result. It is very difficult to cluster a dynamic changed text information streams on an individual computer. Our early research has resulted in a dynamic reactive flock clustering algorithm which can continually refine the clustering result and quickly react to the change of document contents. This character makes the algorithm suitable for cluster analyzing dynamic changed document information, such as text information stream. Because of the decentralized character of this algorithm, a distributed approach is a very natural way to increase the clustering speed of the algorithm. In this paper, we present a distributed multi-agent flocking approach for the text information stream clustering and discuss the decentralized architectures and communication schemes for load balance and status information synchronization in this approach.

  13. A Neural-Network Clustering-Based Algorithm for Privacy Preserving Data Mining

    Science.gov (United States)

    Tsiafoulis, S.; Zorkadis, V. C.; Karras, D. A.

    The increasing use of fast and efficient data mining algorithms in huge collections of personal data, facilitated through the exponential growth of technology, in particular in the field of electronic data storage media and processing power, has raised serious ethical, philosophical and legal issues related to privacy protection. To cope with these concerns, several privacy preserving methodologies have been proposed, classified in two categories, methodologies that aim at protecting the sensitive data and those that aim at protecting the mining results. In our work, we focus on sensitive data protection and compare existing techniques according to their anonymity degree achieved, the information loss suffered and their performance characteristics. The ℓ-diversity principle is combined with k-anonymity concepts, so that background information can not be exploited to successfully attack the privacy of data subjects data refer to. Based on Kohonen Self Organizing Feature Maps (SOMs), we firstly organize data sets in subspaces according to their information theoretical distance to each other, then create the most relevant classes paying special attention to rare sensitive attribute values, and finally generalize attribute values to the minimum extend required so that both the data disclosure probability and the information loss are possibly kept negligible. Furthermore, we propose information theoretical measures for assessing the anonymity degree achieved and empirical tests to demonstrate it.

  14. Cluster analysis of clinical data identifies fibromyalgia subgroups.

    Directory of Open Access Journals (Sweden)

    Elisa Docampo

    Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.

  15. Hyperspectral analysis for qualitative and quantitative features related to acid mine drainage at a remediated open-pit mine

    Science.gov (United States)

    Davies, G.; Calvin, W. M.

    2015-12-01

    The exposure of pyrite to oxygen and water in mine waste environments is known to generate acidity and the accumulation of secondary iron minerals. Sulfates and secondary iron minerals associated with acid mine drainage (AMD) exhibit diverse spectral properties in the ultraviolet, visible and near-infrared regions of the electromagnetic spectrum. The use of hyperspectral imagery for identification of AMD mineralogy and contamination has been well studied. Fewer studies have examined the impacts of hydrologic variations on mapping AMD or the unique spectral signatures of mine waters. Open-pit mine lakes are an additional environmental hazard which have not been widely studied using imaging spectroscopy. A better understanding of AMD variation related to climate fluctuations and the spectral signatures of contaminated surface waters will aid future assessments of environmental contamination. This study examined the ability of multi-season airborne hyperspectral data to identify the geochemical evolution of substances and contaminant patterns at the Leviathan Mine Superfund site. The mine is located 24 miles southeast of Lake Tahoe and contains remnant tailings piles and several AMD collection ponds. The objectives were to 1) distinguish temporal changes in mineralogy at a the remediated open-pit sulfur mine, 2) identify the absorption features of mine affected waters, and 3) quantitatively link water spectra to known dissolved iron concentrations. Images from NASA's AVIRIS instrument were collected in the spring, summer, and fall seasons for two consecutive years at Leviathan (HyspIRI campaign). Images had a spatial resolution of 15 meters at nadir. Ground-based surveys using the ASD FieldSpecPro spectrometer and laboratory spectral and chemical analysis complemented the remote sensing data. Temporal changes in surface mineralogy were difficult to distinguish. However, seasonal changes in pond water quality were identified. Dissolved ferric iron and chlorophyll

  16. Clustering Trajectories by Relevant Parts for Air Traffic Analysis.

    Science.gov (United States)

    Andrienko, Gennady; Andrienko, Natalia; Fuchs, Georg; Garcia, Jose Manuel Cordero

    2018-01-01

    Clustering of trajectories of moving objects by similarity is an important technique in movement analysis. Existing distance functions assess the similarity between trajectories based on properties of the trajectory points or segments. The properties may include the spatial positions, times, and thematic attributes. There may be a need to focus the analysis on certain parts of trajectories, i.e., points and segments that have particular properties. According to the analysis focus, the analyst may need to cluster trajectories by similarity of their relevant parts only. Throughout the analysis process, the focus may change, and different parts of trajectories may become relevant. We propose an analytical workflow in which interactive filtering tools are used to attach relevance flags to elements of trajectories, clustering is done using a distance function that ignores irrelevant elements, and the resulting clusters are summarized for further analysis. We demonstrate how this workflow can be useful for different analysis tasks in three case studies with real data from the domain of air traffic. We propose a suite of generic techniques and visualization guidelines to support movement data analysis by means of relevance-aware trajectory clustering.

  17. Spectral signature verification using statistical analysis and text mining

    Science.gov (United States)

    DeCoster, Mallory E.; Firpi, Alexe H.; Jacobs, Samantha K.; Cone, Shelli R.; Tzeng, Nigel H.; Rodriguez, Benjamin M.

    2016-05-01

    In the spectral science community, numerous spectral signatures are stored in databases representative of many sample materials collected from a variety of spectrometers and spectroscopists. Due to the variety and variability of the spectra that comprise many spectral databases, it is necessary to establish a metric for validating the quality of spectral signatures. This has been an area of great discussion and debate in the spectral science community. This paper discusses a method that independently validates two different aspects of a spectral signature to arrive at a final qualitative assessment; the textual meta-data and numerical spectral data. Results associated with the spectral data stored in the Signature Database1 (SigDB) are proposed. The numerical data comprising a sample material's spectrum is validated based on statistical properties derived from an ideal population set. The quality of the test spectrum is ranked based on a spectral angle mapper (SAM) comparison to the mean spectrum derived from the population set. Additionally, the contextual data of a test spectrum is qualitatively analyzed using lexical analysis text mining. This technique analyzes to understand the syntax of the meta-data to provide local learning patterns and trends within the spectral data, indicative of the test spectrum's quality. Text mining applications have successfully been implemented for security2 (text encryption/decryption), biomedical3 , and marketing4 applications. The text mining lexical analysis algorithm is trained on the meta-data patterns of a subset of high and low quality spectra, in order to have a model to apply to the entire SigDB data set. The statistical and textual methods combine to assess the quality of a test spectrum existing in a database without the need of an expert user. This method has been compared to other validation methods accepted by the spectral science community, and has provided promising results when a baseline spectral signature is

  18. Nonlinear coupling analysis of coal seam floor during mining based on FLAC3D

    Institute of Scientific and Technical Information of China (English)

    YAO Duo-xi; XU Ji-ying; LU Hai-feng

    2011-01-01

    Based on the hydro-geological conditions of 1028 mining face in Suntuan Coal Mine, mining seepage strain mechanism of seam floor was simulated by a nonlinear coupling method, which applied fluid-solid coupling analysis module of FLAC3D. The results indicate that the permeability coefficient of adjoining rock changes a lot due to mining. The maximum value reaches 1 379.9 times to the original value, where it is at immediate roof of the mined-out area. According to the analysis on the seepage field, mining does not destroy water resistance of the floor aquiclude. The mining fissure does not conduct lime-stone aquifer, and it is less likely to form damage. The plastic zone does not exactly correspond to the seepage area, and the scope of the altered seepage area is much larger than the plastic zone.

  19. Preliminary analysis about reducing production costs in uranium mining and metallurgy at Fuzhou uranium mine

    International Nuclear Information System (INIS)

    Wu Sanmao

    1999-01-01

    The production costs in uranium ming and metallurgy have been analyzed quantitatively term by term according to present production situation for The Uranium Mining and Metallurgy Corp, which is part of Fuzhou Uranium Mine. The principal factors influencing on the production costs and the main means reducing the production costs have been found

  20. Cluster analysis of Southeastern U.S. climate stations

    Science.gov (United States)

    Stooksbury, D. E.; Michaels, P. J.

    1991-09-01

    A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.

  1. Grouping of Cities In Terms Of Primary Health Indicators in Turkey: An Application of Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Bilgehan TEKİN

    2015-12-01

    Full Text Available It is thought that to determine the differences between cities that locate in Turkey is important in the context of primary health care indicators. The subject of this study is the classification of cities in Turkey in terms of health indicators. The cluster analysis method which is the one of the data mining and multivariate statistical methods is used for classification method. The main objective of the study is to examine the point of results of movement transformation in health in terms of basic health indicators on the basis of cities.. In this context, 81 cities, in Turkey are grouped with sixteen health indicators which is assumed to demonstrate the effectiveness of health care services, by the years of 2013. And also compared with the health and socio-economic development ranking in the previous studies. Providences are gathered in 21, 13, 11, 7 and 5 clusters. 11’s, 7’s and 5’s clusters are determined as the most significant clusters. As a result of the study the development gap between eastern and western provinces emerges in terms of the health variables.

  2. Clustering analysis of line indices for LAMOST spectra with AstroStat

    Science.gov (United States)

    Chen, Shu-Xin; Sun, Wei-Min; Yan, Qi

    2018-06-01

    The application of data mining in astronomical surveys, such as the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) survey, provides an effective approach to automatically analyze a large amount of complex survey data. Unsupervised clustering could help astronomers find the associations and outliers in a big data set. In this paper, we employ the k-means method to perform clustering for the line index of LAMOST spectra with the powerful software AstroStat. Implementing the line index approach for analyzing astronomical spectra is an effective way to extract spectral features for low resolution spectra, which can represent the main spectral characteristics of stars. A total of 144 340 line indices for A type stars is analyzed through calculating their intra and inter distances between pairs of stars. For intra distance, we use the definition of Mahalanobis distance to explore the degree of clustering for each class, while for outlier detection, we define a local outlier factor for each spectrum. AstroStat furnishes a set of visualization tools for illustrating the analysis results. Checking the spectra detected as outliers, we find that most of them are problematic data and only a few correspond to rare astronomical objects. We show two examples of these outliers, a spectrum with abnormal continuumand a spectrum with emission lines. Our work demonstrates that line index clustering is a good method for examining data quality and identifying rare objects.

  3. Development of small scale cluster computer for numerical analysis

    Science.gov (United States)

    Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.

    2017-09-01

    In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.

  4. Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends.

    Science.gov (United States)

    Jurca, Gabriela; Addam, Omar; Aksac, Alper; Gao, Shang; Özyer, Tansel; Demetrick, Douglas; Alhajj, Reda

    2016-04-26

    Breast cancer is a serious disease which affects many women and may lead to death. It has received considerable attention from the research community. Thus, biomedical researchers aim to find genetic biomarkers indicative of the disease. Novel biomarkers can be elucidated from the existing literature. However, the vast amount of scientific publications on breast cancer make this a daunting task. This paper presents a framework which investigates existing literature data for informative discoveries. It integrates text mining and social network analysis in order to identify new potential biomarkers for breast cancer. We utilized PubMed for the testing. We investigated gene-gene interactions, as well as novel interactions such as gene-year, gene-country, and abstract-country to find out how the discoveries varied over time and how overlapping/diverse are the discoveries and the interest of various research groups in different countries. Interesting trends have been identified and discussed, e.g., different genes are highlighted in relationship to different countries though the various genes were found to share functionality. Some text analysis based results have been validated against results from other tools that predict gene-gene relations and gene functions.

  5. Applications of Data Mining in Higher Education

    OpenAIRE

    Monika Goyal; Rajan Vohra

    2012-01-01

    Data analysis plays an important role for decision support irrespective of type of industry like any manufacturing unit and educations system. There are many domains in which data mining techniques plays an important role. This paper proposes the use of data mining techniques to improve the efficiency of higher education institution. If data mining techniques such as clustering, decision tree and association are applied to higher education processes, it would help to improve students performa...

  6. Predicting healthcare outcomes in prematurely born infants using cluster analysis.

    Science.gov (United States)

    MacBean, Victoria; Lunt, Alan; Drysdale, Simon B; Yarzi, Muska N; Rafferty, Gerrard F; Greenough, Anne

    2018-05-23

    Prematurely born infants are at high risk of respiratory morbidity following neonatal unit discharge, though prediction of outcomes is challenging. We have tested the hypothesis that cluster analysis would identify discrete groups of prematurely born infants with differing respiratory outcomes during infancy. A total of 168 infants (median (IQR) gestational age 33 (31-34) weeks) were recruited in the neonatal period from consecutive births in a tertiary neonatal unit. The baseline characteristics of the infants were used to classify them into hierarchical agglomerative clusters. Rates of viral lower respiratory tract infections (LRTIs) were recorded for 151 infants in the first year after birth. Infants could be classified according to birth weight and duration of neonatal invasive mechanical ventilation (MV) into three clusters. Cluster one (MV ≤5 days) had few LRTIs. Clusters two and three (both MV ≥6 days, but BW ≥or <882 g respectively), had significantly higher LRTI rates. Cluster two had a higher proportion of infants experiencing respiratory syncytial virus LRTIs (P = 0.01) and cluster three a higher proportion of rhinovirus LRTIs (P < 0.001) CONCLUSIONS: Readily available clinical data allowed classification of prematurely born infants into one of three distinct groups with differing subsequent respiratory morbidity in infancy. © 2018 Wiley Periodicals, Inc.

  7. Analysis of post-blasting source mechanisms of mining-induced seismic events in Rudna copper mine, Poland

    Directory of Open Access Journals (Sweden)

    Caputa Alicja

    2015-10-01

    Full Text Available The exploitation of georesources by underground mining can be responsible for seismic activity in areas considered aseismic. Since strong seismic events are connected with rockburst hazard, it is a continuous requirement to reduce seismic risk. One of the most effective methods to do so is blasting in potentially hazardous mining panels. In this way, small to moderate tremors are provoked and stress accumulation is substantially reduced. In this paper we present an analysis of post-blasting events using Full Moment Tensor (MT inversion at the Rudna mine, Poland, underground seismic network. In addition, we describe the problems we faced when analyzing seismic signals. Our studies show that focal mechanisms for events that occurred after blasts exhibit common features in the MT solution. The strong isotropic and small Double Couple (DC component of the MT, indicate that these events were provoked by detonations. On the other hand, post-blasting MT is considerably different than the MT obtained for strong mining events. We believe that seismological analysis of provoked and unprovoked events can be a very useful tool in confirming the effectiveness of blasting in seismic hazard reduction in mining areas.

  8. A knowledge discovery approach to urban analysis: Beyoglu Preservation Area as a data mine

    Directory of Open Access Journals (Sweden)

    Ahu Sokmenoglu Sohtorik

    2017-11-01

    Full Text Available Enhancing our knowledge of the complexities of cities in order to empower ourselves to make more informed decisions has always been a challenge for urban research. Recent developments in large-scale computing, together with the new techniques and automated tools for data collection and analysis are opening up promising opportunities for addressing this problem. The main motivation that served as the driving force behind this research is how these developments may contribute to urban data analysis. On this basis, the thesis focuses on urban data analysis in order to search for findings that can enhance our knowledge of urban environments, using the generic process of knowledge discovery using data mining. A knowledge discovery process based on data mining is a fully automated or semi-automated process which involves the application of computational tools and techniques to explore the “previously unknown, and potentially useful information” (Witten & Frank, 2005 hidden in large and often complex and multi-dimensional databases. This information can be obtained in the form of correlations amongst variables, data groupings (classes and clusters or more complex hypotheses (probabilistic rules of co-occurrence, performance vectors of prediction models etc.. This research targets researchers and practitioners working in the field of urban studies who are interested in quantitative/ computational approaches to urban data analysis and specifically aims to engage the interest of architects, urban designers and planners who do not have a background in statistics or in using data mining methods in their work. Accordingly, the overall aim of the thesis is the development of a knowledge discovery approach to urban analysis; a domain-specific adaptation of the generic process of knowledge discovery using data mining enabling the analyst to discover ‘relational urban knowledge’. ‘Relational urban knowledge’ is a term employed in this thesis to refer

  9. A Case Study for Student Performance Analysis based on Educational Data Mining (EDM)

    OpenAIRE

    Daxa Kundariya; Prof. Vaseem Ghada

    2016-01-01

    Educational Data Mining (EDM) is a study methodology and an application of data mining techniques related to student’s data from academic database. Like other domain, educational domain also produce vast amount of studying data. To enhance the quality of education system student performance analysis plays an important role for decision support. This paper elaborates a study on various Educational data mining technique and how they could be used to educational system to analysis student perfor...

  10. Text mining analysis of public comments regarding high-level radioactive waste disposal

    International Nuclear Information System (INIS)

    Kugo, Akihide; Yoshikawa, Hidekazu; Shimoda, Hiroshi; Wakabayashi, Yasunaga

    2005-01-01

    In order to narrow the risk perception gap as seen in social investigations between the general public and people who are involved in nuclear industry, public comments on high-level radioactive waste (HLW) disposal have been conducted to find the significant talking points with the general public for constructing an effective risk communication model of social risk information regarding HLW disposal. Text mining was introduced to examine public comments to identify the core public interest underlying the comments. The utilized test mining method is to cluster specific groups of words with negative meanings and then to analyze public understanding by employing text structural analysis to extract words from subjective expressions. Using these procedures, it was found that the public does not trust the nuclear fuel cycle promotion policy and shows signs of anxiety about the long-lasting technological reliability of waste storage. To develop effective social risk communication of HLW issues, these findings are expected to help experts in the nuclear industry to communicate with the general public more effectively to obtain their trust. (author)

  11. Text mining factor analysis (TFA) in green tea patent data

    Science.gov (United States)

    Rahmawati, Sela; Suprijadi, Jadi; Zulhanif

    2017-03-01

    Factor analysis has become one of the most widely used multivariate statistical procedures in applied research endeavors across a multitude of domains. There are two main types of analyses based on factor analysis: Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA). Both EFA and CFA aim to observed relationships among a group of indicators with a latent variable, but they differ fundamentally, a priori and restrictions made to the factor model. This method will be applied to patent data technology sector green tea to determine the development technology of green tea in the world. Patent analysis is useful in identifying the future technological trends in a specific field of technology. Database patent are obtained from agency European Patent Organization (EPO). In this paper, CFA model will be applied to the nominal data, which obtain from the presence absence matrix. While doing processing, analysis CFA for nominal data analysis was based on Tetrachoric matrix. Meanwhile, EFA model will be applied on a title from sector technology dominant. Title will be pre-processing first using text mining analysis.

  12. Cluster analysis of radionuclide concentrations in beach sand

    NARCIS (Netherlands)

    de Meijer, R.J.; James, I.; Jennings, P.J.; Keoyers, J.E.

    This paper presents a method in which natural radionuclide concentrations of beach sand minerals are traced along a stretch of coast by cluster analysis. This analysis yields two groups of mineral deposit with different origins. The method deviates from standard methods of following dispersal of

  13. Principal Component Clustering Approach to Teaching Quality Discriminant Analysis

    Science.gov (United States)

    Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan

    2016-01-01

    Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…

  14. Characterizing Suicide in Toronto: An Observational Study and Cluster Analysis

    Science.gov (United States)

    Sinyor, Mark; Schaffer, Ayal; Streiner, David L

    2014-01-01

    Objective: To determine whether people who have died from suicide in a large epidemiologic sample form clusters based on demographic, clinical, and psychosocial factors. Method: We conducted a coroner’s chart review for 2886 people who died in Toronto, Ontario, from 1998 to 2010, and whose death was ruled as suicide by the Office of the Chief Coroner of Ontario. A cluster analysis using known suicide risk factors was performed to determine whether suicide deaths separate into distinct groups. Clusters were compared according to person- and suicide-specific factors. Results: Five clusters emerged. Cluster 1 had the highest proportion of females and nonviolent methods, and all had depression and a past suicide attempt. Cluster 2 had the highest proportion of people with a recent stressor and violent suicide methods, and all were married. Cluster 3 had mostly males between the ages of 20 and 64, and all had either experienced recent stressors, suffered from mental illness, or had a history of substance abuse. Cluster 4 had the youngest people and the highest proportion of deaths by jumping from height, few were married, and nearly one-half had bipolar disorder or schizophrenia. Cluster 5 had all unmarried people with no prior suicide attempts, and were the least likely to have an identified mental illness and most likely to leave a suicide note. Conclusions: People who die from suicide assort into different patterns of demographic, clinical, and death-specific characteristics. Identifying and studying subgroups of suicides may advance our understanding of the heterogeneous nature of suicide and help to inform development of more targeted suicide prevention strategies. PMID:24444321

  15. Pattern recognition in menstrual bleeding diaries by statistical cluster analysis

    Directory of Open Access Journals (Sweden)

    Wessel Jens

    2009-07-01

    Full Text Available Abstract Background The aim of this paper is to empirically identify a treatment-independent statistical method to describe clinically relevant bleeding patterns by using bleeding diaries of clinical studies on various sex hormone containing drugs. Methods We used the four cluster analysis methods single, average and complete linkage as well as the method of Ward for the pattern recognition in menstrual bleeding diaries. The optimal number of clusters was determined using the semi-partial R2, the cubic cluster criterion, the pseudo-F- and the pseudo-t2-statistic. Finally, the interpretability of the results from a gynecological point of view was assessed. Results The method of Ward yielded distinct clusters of the bleeding diaries. The other methods successively chained the observations into one cluster. The optimal number of distinctive bleeding patterns was six. We found two desirable and four undesirable bleeding patterns. Cyclic and non cyclic bleeding patterns were well separated. Conclusion Using this cluster analysis with the method of Ward medications and devices having an impact on bleeding can be easily compared and categorized.

  16. Technology Clusters Exploration for Patent Portfolio through Patent Abstract Analysis

    Directory of Open Access Journals (Sweden)

    Gabjo Kim

    2016-12-01

    Full Text Available This study explores technology clusters through patent analysis. The aim of exploring technology clusters is to grasp competitors’ levels of sustainable research and development (R&D and establish a sustainable strategy for entering an industry. To achieve this, we first grouped the patent documents with similar technologies by applying affinity propagation (AP clustering, which is effective while grouping large amounts of data. Next, in order to define the technology clusters, we adopted the term frequency-inverse document frequency (TF-IDF weight, which lists the terms in order of importance. We collected the patent data of Korean electric car companies from the United States Patent and Trademark Office (USPTO to verify our proposed methodology. As a result, our proposed methodology presents more detailed information on the Korean electric car industry than previous studies.

  17. Practical graph mining with R

    CERN Document Server

    Hendrix, William; Jenkins, John; Padmanabhan, Kanchana; Chakraborty, Arpan

    2014-01-01

    Practical Graph Mining with R presents a "do-it-yourself" approach to extracting interesting patterns from graph data. It covers many basic and advanced techniques for the identification of anomalous or frequently recurring patterns in a graph, the discovery of groups or clusters of nodes that share common patterns of attributes and relationships, the extraction of patterns that distinguish one category of graphs from another, and the use of those patterns to predict the category of new graphs. Hands-On Application of Graph Data Mining Each chapter in the book focuses on a graph mining task, such as link analysis, cluster analysis, and classification. Through applications using real data sets, the book demonstrates how computational techniques can help solve real-world problems. The applications covered include network intrusion detection, tumor cell diagnostics, face recognition, predictive toxicology, mining metabolic and protein-protein interaction networks, and community detection in social networks. De...

  18. Analysis of case studies - mining, milling and discharges

    International Nuclear Information System (INIS)

    McEwan, A.C.

    2000-01-01

    This analysis paper reviews case studies on mining and milling and on radioactive discharges. An outline is given of each of the case studies presented from the perspectives of the study background, the criteria followed in remediation, the decision making process, outcomes achieved, and an evaluation in relation to radiological criteria that are recommended internationally. Site remediation after mining and milling operations may be driven more by aesthetic and environmental concerns than radiological criteria, particularly in more populated areas. The cases illustrated that it is highly desirable that stakeholders, including the public, are involved in decision making at an early stage with agreement on remediation outcomes. In particular, the exposure pathways and dose assessment models employed should generally be agreed with or approved by the regulatory authority prior to remediation work. In the case of remediated properties at Grand Junction, Colorado, it appears the cleanup criteria employed were below or consistent with those applicable to practices, although the situation was one of intervention, and this raises a question as to the cost effectiveness of the cleanup. For some remediation projects there are long term ownership issues arising out of the need for extended public or state oversight of engineered structures or active water treatment facilities, but for these cases ownership issues did not arise for purely radiological reasons. (author)

  19. Big data mining analysis method based on cloud computing

    Science.gov (United States)

    Cai, Qing Qiu; Cui, Hong Gang; Tang, Hao

    2017-08-01

    Information explosion era, large data super-large, discrete and non-(semi) structured features have gone far beyond the traditional data management can carry the scope of the way. With the arrival of the cloud computing era, cloud computing provides a new technical way to analyze the massive data mining, which can effectively solve the problem that the traditional data mining method cannot adapt to massive data mining. This paper introduces the meaning and characteristics of cloud computing, analyzes the advantages of using cloud computing technology to realize data mining, designs the mining algorithm of association rules based on MapReduce parallel processing architecture, and carries out the experimental verification. The algorithm of parallel association rule mining based on cloud computing platform can greatly improve the execution speed of data mining.

  20. CLUSTER ANALYSIS UKRAINIAN REGIONAL DISTRIBUTION BY LEVEL OF INNOVATION

    Directory of Open Access Journals (Sweden)

    Roman Shchur

    2016-07-01

    Full Text Available   SWOT-analysis of the threats and benefits of innovation development strategy of Ivano-Frankivsk region in the context of financial support was сonducted. Methodical approach to determine of public-private partnerships potential that is tool of innovative economic development financing was identified. Cluster analysis of possibilities of forming public-private partnership in a particular region was carried out. Optimal set of problem areas that require urgent solutions and financial security is defined on the basis of cluster approach. It will help to form practical recommendations for the formation of an effective financial mechanism in the regions of Ukraine. Key words: the mechanism of innovation development financial provision, innovation development, public-private partnerships, cluster analysis, innovative development strategy.

  1. The analysis of strategies for the mining regions’ development in Russia as a condition of effective management of economy

    Directory of Open Access Journals (Sweden)

    Zaruba Natalya

    2017-01-01

    Full Text Available The conceptual issues of a new approach in the implementation of strategic management development of the coal-mining region as conditions of effective government regulation of economy at the macro level are considered in the article. The purpose of the study is to justify the use of marketing techniques in the strategic management of the region, clustering on the basis of the territorial concentration and combination of all available resources, the integration of regional economic networks. A comparative analysis of the main strategic directions of development of the coal-mining regions from the point of view of the leading economic development strategies is carried out. The main result is that the estimation of value of synergy effects occurring when the resources of combining sectors and industries in the region are united has been made. The results of the study can be recommended for usage in the development of strategies for sustainable development of «mono-territory».

  2. Cluster-based analysis of multi-model climate ensembles

    Science.gov (United States)

    Hyde, Richard; Hossaini, Ryan; Leeson, Amber A.

    2018-06-01

    Clustering - the automated grouping of similar data - can provide powerful and unique insight into large and complex data sets, in a fast and computationally efficient manner. While clustering has been used in a variety of fields (from medical image processing to economics), its application within atmospheric science has been fairly limited to date, and the potential benefits of the application of advanced clustering techniques to climate data (both model output and observations) has yet to be fully realised. In this paper, we explore the specific application of clustering to a multi-model climate ensemble. We hypothesise that clustering techniques can provide (a) a flexible, data-driven method of testing model-observation agreement and (b) a mechanism with which to identify model development priorities. We focus our analysis on chemistry-climate model (CCM) output of tropospheric ozone - an important greenhouse gas - from the recent Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP). Tropospheric column ozone from the ACCMIP ensemble was clustered using the Data Density based Clustering (DDC) algorithm. We find that a multi-model mean (MMM) calculated using members of the most-populous cluster identified at each location offers a reduction of up to ˜ 20 % in the global absolute mean bias between the MMM and an observed satellite-based tropospheric ozone climatology, with respect to a simple, all-model MMM. On a spatial basis, the bias is reduced at ˜ 62 % of all locations, with the largest bias reductions occurring in the Northern Hemisphere - where ozone concentrations are relatively large. However, the bias is unchanged at 9 % of all locations and increases at 29 %, particularly in the Southern Hemisphere. The latter demonstrates that although cluster-based subsampling acts to remove outlier model data, such data may in fact be closer to observed values in some locations. We further demonstrate that clustering can provide a viable and

  3. Application of microarray analysis on computer cluster and cloud platforms.

    Science.gov (United States)

    Bernau, C; Boulesteix, A-L; Knaus, J

    2013-01-01

    Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.

  4. Cluster Analysis as an Analytical Tool of Population Policy

    Directory of Open Access Journals (Sweden)

    Oksana Mikhaylovna Shubat

    2017-12-01

    Full Text Available The predicted negative trends in Russian demography (falling birth rates, population decline actualize the need to strengthen measures of family and population policy. Our research purpose is to identify groups of Russian regions with similar characteristics in the family sphere using cluster analysis. The findings should make an important contribution to the field of family policy. We used hierarchical cluster analysis based on the Ward method and the Euclidean distance for segmentation of Russian regions. Clustering is based on four variables, which allowed assessing the family institution in the region. The authors used the data of Federal State Statistics Service from 2010 to 2015. Clustering and profiling of each segment has allowed forming a model of Russian regions depending on the features of the family institution in these regions. The authors revealed four clusters grouping regions with similar problems in the family sphere. This segmentation makes it possible to develop the most relevant family policy measures in each group of regions. Thus, the analysis has shown a high degree of differentiation of the family institution in the regions. This suggests that a unified approach to population problems’ solving is far from being effective. To achieve greater results in the implementation of family policy, a differentiated approach is needed. Methods of multidimensional data classification can be successfully applied as a relevant analytical toolkit. Further research could develop the adaptation of multidimensional classification methods to the analysis of the population problems in Russian regions. In particular, the algorithms of nonparametric cluster analysis may be of relevance in future studies.

  5. Quantitative microbial community analysis of three different sulfidic mine tailing dumps generating acid mine drainage.

    Science.gov (United States)

    Kock, Dagmar; Schippers, Axel

    2008-08-01

    The microbial communities of three different sulfidic and acidic mine waste tailing dumps located in Botswana, Germany, and Sweden were quantitatively analyzed using quantitative real-time PCR (Q-PCR), fluorescence in situ hybridization (FISH), catalyzed reporter deposition-FISH (CARD-FISH), Sybr green II direct counting, and the most probable number (MPN) cultivation technique. Depth profiles of cell numbers showed that the compositions of the microbial communities are greatly different at the three sites and also strongly varied between zones of oxidized and unoxidized tailings. Maximum cell numbers of up to 10(9) cells g(-1) dry weight were determined in the pyrite or pyrrhotite oxidation zones, whereas cell numbers in unoxidized tailings were significantly lower. Bacteria dominated over Archaea and Eukarya at all tailing sites. The acidophilic Fe(II)- and/or sulfur-oxidizing Acidithiobacillus spp. dominated over the acidophilic Fe(II)-oxidizing Leptospirillum spp. among the Bacteria at two sites. The two genera were equally abundant at the third site. The acidophilic Fe(II)- and sulfur-oxidizing Sulfobacillus spp. were generally less abundant. The acidophilic Fe(III)-reducing Acidiphilium spp. could be found at only one site. The neutrophilic Fe(III)-reducing Geobacteraceae as well as the dsrA gene of sulfate reducers were quantifiable at all three sites. FISH analysis provided reliable data only for tailing zones with high microbial activity, whereas CARD-FISH, Q-PCR, Sybr green II staining, and MPN were suitable methods for a quantitative microbial community analysis of tailings in general.

  6. Preliminary analysis of surface mining options for Naval Oil Shale Reserve 1

    Energy Technology Data Exchange (ETDEWEB)

    1981-07-20

    The study was undertaken to determine the economic viability of surface mining to exploit the reserves. It is based on resource information already developed for NOSR 1 and conceptual designs of mining systems compatible with this resource. Environmental considerations as they relate to surface mining have been addressed qualitatively. The conclusions on economic viability were based primarily on mining costs projected from other industries using surface mining. An analysis of surface mining for the NOSR 1 resource was performed based on its particular overburden thickness, oil shale thickness, oil shale grade, and topography. This evaluation considered reclamation of the surface as part of its design and cost estimate. The capital costs for mining 25 GPT and 30 GPT shale and the operating costs for mining 25 GPT, 30 GPT, and 35 GPT shale are presented. The relationship between operating cost and stripping ratio, and the break-even stripping ratio (BESR) for surface mining to be competitive with room-and-pillar mining, are shown. Identification of potential environmental impacts shows that environmental control procedures for surface mining are more difficult to implement than those for underground mining. The following three areas are of prime concern: maintenance of air quality standards by disruption, movement, and placement of large quantities of overburden; disruption or cutting of aquifers during the mining process which affect area water supplies; and potential mineral leaching from spent shales into the aquifers. Although it is an operational benefit to place spent shale in the open pit, leaching of the spent shales and contamination of the water is detrimental. It is therefore concluded that surface mining on NOSR 1 currently is neither economically desirable nor environmentally safe. Stringent mitigation measures would have to be implemented to overcome some of the potential environmental hazards.

  7. Automated analysis of organic particles using cluster SIMS

    Energy Technology Data Exchange (ETDEWEB)

    Gillen, Greg; Zeissler, Cindy; Mahoney, Christine; Lindstrom, Abigail; Fletcher, Robert; Chi, Peter; Verkouteren, Jennifer; Bright, David; Lareau, Richard T.; Boldman, Mike

    2004-06-15

    Cluster primary ion bombardment combined with secondary ion imaging is used on an ion microscope secondary ion mass spectrometer for the spatially resolved analysis of organic particles on various surfaces. Compared to the use of monoatomic primary ion beam bombardment, the use of a cluster primary ion beam (SF{sub 5}{sup +} or C{sub 8}{sup -}) provides significant improvement in molecular ion yields and a reduction in beam-induced degradation of the analyte molecules. These characteristics of cluster bombardment, along with automated sample stage control and custom image analysis software are utilized to rapidly characterize the spatial distribution of trace explosive particles, narcotics and inkjet-printed microarrays on a variety of surfaces.

  8. Assessment of surface water quality using hierarchical cluster analysis

    Directory of Open Access Journals (Sweden)

    Dheeraj Kumar Dabgerwal

    2016-02-01

    Full Text Available This study was carried out to assess the physicochemical quality river Varuna inVaranasi,India. Water samples were collected from 10 sites during January-June 2015. Pearson correlation analysis was used to assess the direction and strength of relationship between physicochemical parameters. Hierarchical Cluster analysis was also performed to determine the sources of pollution in the river Varuna. The result showed quite high value of DO, Nitrate, BOD, COD and Total Alkalinity, above the BIS permissible limit. The results of correlation analysis identified key water parameters as pH, electrical conductivity, total alkalinity and nitrate, which influence the concentration of other water parameters. Cluster analysis identified three major clusters of sampling sites out of total 10 sites, according to the similarity in water quality. This study illustrated the usefulness of correlation and cluster analysis for getting better information about the river water quality.International Journal of Environment Vol. 5 (1 2016,  pp: 32-44

  9. application of single-linkage clustering method in the analysis of ...

    African Journals Online (AJOL)

    Admin

    ANALYSIS OF GROWTH RATE OF GROSS DOMESTIC PRODUCT. (GDP) AT ... The end result of the algorithm is a tree of clusters called a dendrogram, which shows how the clusters are ..... Number of cluster sum from from observations of ...

  10. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    Science.gov (United States)

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  11. Transcriptional analysis of ESAT-6 cluster 3 in Mycobacterium smegmatis

    Directory of Open Access Journals (Sweden)

    Riccardi Giovanna

    2009-03-01

    Full Text Available Abstract Background The ESAT-6 (early secreted antigenic target, 6 kDa family collects small mycobacterial proteins secreted by Mycobacterium tuberculosis, particularly in the early phase of growth. There are 23 ESAT-6 family members in M. tuberculosis H37Rv. In a previous work, we identified the Zur- dependent regulation of five proteins of the ESAT-6/CFP-10 family (esxG, esxH, esxQ, esxR, and esxS. esxG and esxH are part of ESAT-6 cluster 3, whose expression was already known to be induced by iron starvation. Results In this research, we performed EMSA experiments and transcriptional analysis of ESAT-6 cluster 3 in Mycobacterium smegmatis (msmeg0615-msmeg0625 and M. tuberculosis. In contrast to what we had observed in M. tuberculosis, we found that in M. smegmatis ESAT-6 cluster 3 responds only to iron and not to zinc. In both organisms we identified an internal promoter, a finding which suggests the presence of two transcriptional units and, by consequence, a differential expression of cluster 3 genes. We compared the expression of msmeg0615 and msmeg0620 in different growth and stress conditions by means of relative quantitative PCR. The expression of msmeg0615 and msmeg0620 genes was essentially similar; they appeared to be repressed in most of the tested conditions, with the exception of acid stress (pH 4.2 where msmeg0615 was about 4-fold induced, while msmeg0620 was repressed. Analysis revealed that in acid stress conditions M. tuberculosis rv0282 gene was 3-fold induced too, while rv0287 induction was almost insignificant. Conclusion In contrast with what has been reported for M. tuberculosis, our results suggest that in M. smegmatis only IdeR-dependent regulation is retained, while zinc has no effect on gene expression. The role of cluster 3 in M. tuberculosis virulence is still to be defined; however, iron- and zinc-dependent expression strongly suggests that cluster 3 is highly expressed in the infective process, and that the cluster

  12. Graph analysis of cell clusters forming vascular networks

    Science.gov (United States)

    Alves, A. P.; Mesquita, O. N.; Gómez-Gardeñes, J.; Agero, U.

    2018-03-01

    This manuscript describes the experimental observation of vasculogenesis in chick embryos by means of network analysis. The formation of the vascular network was observed in the area opaca of embryos from 40 to 55 h of development. In the area opaca endothelial cell clusters self-organize as a primitive and approximately regular network of capillaries. The process was observed by bright-field microscopy in control embryos and in embryos treated with Bevacizumab (Avastin), an antibody that inhibits the signalling of the vascular endothelial growth factor (VEGF). The sequence of images of the vascular growth were thresholded, and used to quantify the forming network in control and Avastin-treated embryos. This characterization is made by measuring vessels density, number of cell clusters and the largest cluster density. From the original images, the topology of the vascular network was extracted and characterized by means of the usual network metrics such as: the degree distribution, average clustering coefficient, average short path length and assortativity, among others. This analysis allows to monitor how the largest connected cluster of the vascular network evolves in time and provides with quantitative evidence of the disruptive effects that Avastin has on the tree structure of vascular networks.

  13. Analysis of the planned post-mining landscape of MIBRAG's open-cast mines with regard to a possible environmental impact of alteration processes in mixed dumps

    International Nuclear Information System (INIS)

    Jolas, P.; Hofmann, B.

    2010-01-01

    There has been an increasing body of knowledge with regard to hydro- and geochemical alteration processes in overburden dumps and their impact on groundwater quality in lignite mining and reclamation operations associated with post-mining landscapes in Germany. The operators of the MIBRAG mines have examined issues regarding alteration processes and how they affect the environment and which opportunities exist to actively influence the dumping process. The objectives were to counteract any possible negative impact of the alteration processes. Special emphasis was on the impact caused by oxidation of sulfur containing minerals. This paper presented an analysis of the situation at United Schleenhain Mine and how it reflects on the work to date for MIBRAG's mines. A future outlook was also presented. Specifically, the paper discussed the development of the United Schleenhain mine and the post-mining landscape. The potential for discharge of substances was also evaluated along with acidification. 1 tab., 5 figs.

  14. clusters

    Indian Academy of Sciences (India)

    2017-09-27

    Sep 27, 2017 ... Author for correspondence (zh4403701@126.com). MS received 15 ... lic clusters using density functional theory (DFT)-GGA of the DMOL3 package. ... In the process of geometric optimization, con- vergence thresholds ..... and Postgraduate Research & Practice Innovation Program of. Jiangsu Province ...

  15. clusters

    Indian Academy of Sciences (India)

    environmental as well as technical problems during fuel gas utilization. ... adsorption on some alloys of Pd, namely PdAu, PdAg ... ried out on small neutral and charged Au24,26,27, Cu,28 ... study of Zanti et al.29 on Pdn (n = 1–9) clusters.

  16. Sentiment analysis of Arabic tweets using text mining techniques

    Science.gov (United States)

    Al-Horaibi, Lamia; Khan, Muhammad Badruddin

    2016-07-01

    Sentiment analysis has become a flourishing field of text mining and natural language processing. Sentiment analysis aims to determine whether the text is written to express positive, negative, or neutral emotions about a certain domain. Most sentiment analysis researchers focus on English texts, with very limited resources available for other complex languages, such as Arabic. In this study, the target was to develop an initial model that performs satisfactorily and measures Arabic Twitter sentiment by using machine learning approach, Naïve Bayes and Decision Tree for classification algorithms. The datasets used contains more than 2,000 Arabic tweets collected from Twitter. We performed several experiments to check the performance of the two algorithms classifiers using different combinations of text-processing functions. We found that available facilities for Arabic text processing need to be made from scratch or improved to develop accurate classifiers. The small functionalities developed by us in a Python language environment helped improve the results and proved that sentiment analysis in the Arabic domain needs lot of work on the lexicon side.

  17. Cluster Analysis of International Information and Social Development.

    Science.gov (United States)

    Lau, Jesus

    1990-01-01

    Analyzes information activities in relation to socioeconomic characteristics in low, middle, and highly developed economies for the years 1960 and 1977 through the use of cluster analysis. Results of data from 31 countries suggest that information development is achieved mainly by countries that have also achieved social development. (26…

  18. Making Sense of Cluster Analysis: Revelations from Pakistani Science Classes

    Science.gov (United States)

    Pell, Tony; Hargreaves, Linda

    2011-01-01

    Cluster analysis has been applied to quantitative data in educational research over several decades and has been a feature of the Maurice Galton's research in primary and secondary classrooms. It has offered potentially useful insights for teaching yet its implications for practice are rarely implemented. It has been subject also to negative…

  19. Cluster analysis for validated climatology stations using precipitation in Mexico

    NARCIS (Netherlands)

    Bravo Cabrera, J. L.; Azpra-Romero, E.; Zarraluqui-Such, V.; Gay-García, C.; Estrada Porrúa, F.

    2012-01-01

    Annual average of daily precipitation was used to group climatological stations into clusters using the k-means procedure and principal component analysis with varimax rotation. After a careful selection of the stations deployed in Mexico since 1950, we selected 349 characterized by having 35 to 40

  20. A Cluster Analysis of Personality Style in Adults with ADHD

    Science.gov (United States)

    Robin, Arthur L.; Tzelepis, Angela; Bedway, Marquita

    2008-01-01

    Objective: The purpose of this study was to use hierarchical linear cluster analysis to examine the normative personality styles of adults with ADHD. Method: A total of 311 adults with ADHD completed the Millon Index of Personality Styles, which consists of 24 scales assessing motivating aims, cognitive modes, and interpersonal behaviors. Results:…

  1. Characterization of population exposure to organochlorines: A cluster analysis application

    NARCIS (Netherlands)

    R.M. Guimarães (Raphael Mendonça); S. Asmus (Sven); A. Burdorf (Alex)

    2013-01-01

    textabstractThis study aimed to show the results from a cluster analysis application in the characterization of population exposure to organochlorines through variables related to time and exposure dose. Characteristics of 354 subjects in a population exposed to organochlorine pesticides residues

  2. Robustness in cluster analysis in the presence of anomalous observations

    NARCIS (Netherlands)

    Zhuk, EE

    Cluster analysis of multivariate observations in the presence of "outliers" (anomalous observations) in a sample is studied. The expected (mean) fraction of erroneous decisions for the decision rule is computed analytically by minimizing the intraclass scatter. A robust decision rule (stable to

  3. Language Learner Motivational Types: A Cluster Analysis Study

    Science.gov (United States)

    Papi, Mostafa; Teimouri, Yasser

    2014-01-01

    The study aimed to identify different second language (L2) learner motivational types drawing on the framework of the L2 motivational self system. A total of 1,278 secondary school students learning English in Iran completed a questionnaire survey. Cluster analysis yielded five different groups based on the strength of different variables within…

  4. Technological and mining analysis of mechanized systems used in roadways in Polish mines

    Energy Technology Data Exchange (ETDEWEB)

    Sikora, W; Giza, T; Siwiec, J [Politechnika Slaska, Gliwice (Poland). Instytut Mechanizacji Gornictwa

    1987-01-01

    Analyzes methods of mine drivage in Poland and materials handling systems. Of 1,620 km of roadways driven in 1982, 12% fell on roadways driven in coal and 88% on roadways driven in stone or stone and coal. Roadways driven in coal in most cases were situated at depths from 500 to 700 m. Roadway cross-section ranged from 12 to 18 m{sup 2}. Roadways in stone or stone and coal were driven by drilling and blasting. Loaders were used for stone handling. Roadways in coal were driven by heading machines. Advance rates of mine drivage by heading machines were 2 to 3 times higher than those by drilling and blasting with loaders for stone handling. Basic statistical data characterizing roadways and drivage methods are evaluated: roadway dimensions and depth advance rate depending on drivage methods and mining condition, types of heading machines and loaders.

  5. Performance Analysis of Unsupervised Clustering Methods for Brain Tumor Segmentation

    Directory of Open Access Journals (Sweden)

    Tushar H Jaware

    2013-10-01

    Full Text Available Medical image processing is the most challenging and emerging field of neuroscience. The ultimate goal of medical image analysis in brain MRI is to extract important clinical features that would improve methods of diagnosis & treatment of disease. This paper focuses on methods to detect & extract brain tumour from brain MR images. MATLAB is used to design, software tool for locating brain tumor, based on unsupervised clustering methods. K-Means clustering algorithm is implemented & tested on data base of 30 images. Performance evolution of unsupervised clusteringmethods is presented.

  6. Identifying clinical course patterns in SMS data using cluster analysis

    DEFF Research Database (Denmark)

    Kent, Peter; Kongsted, Alice

    2012-01-01

    ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically important...... showed that clinical course patterns can be identified by cluster analysis using all SMS time points as cluster variables. This method is simple, intuitive and does not require a high level of statistical skill. However, there are alternative ways of managing SMS data and many different methods...

  7. Analysis on evaluation ability of nonlinear safety assessment model of coal mines based on artificial neural network

    Institute of Scientific and Technical Information of China (English)

    SHI Shi-liang; LIU Hai-bo; LIU Ai-hua

    2004-01-01

    Based on the integration analysis of goods and shortcomings of various methods used in safety assessment of coal mines, combining nonlinear feature of mine safety sub-system, this paper establishes the neural network assessment model of mine safety, analyzes the ability of artificial neural network to evaluate mine safety state, and lays the theoretical foundation of artificial neural network using in the systematic optimization of mine safety assessment and getting reasonable accurate safety assessment result.

  8. Data mining learning bootstrap through semantic thumbnail analysis

    Science.gov (United States)

    Battiato, Sebastiano; Farinella, Giovanni Maria; Giuffrida, Giovanni; Tribulato, Giuseppe

    2007-01-01

    The rapid increase of technological innovations in the mobile phone industry induces the research community to develop new and advanced systems to optimize services offered by mobile phones operators (telcos) to maximize their effectiveness and improve their business. Data mining algorithms can run over data produced by mobile phones usage (e.g. image, video, text and logs files) to discover user's preferences and predict the most likely (to be purchased) offer for each individual customer. One of the main challenges is the reduction of the learning time and cost of these automatic tasks. In this paper we discuss an experiment where a commercial offer is composed by a small picture augmented with a short text describing the offer itself. Each customer's purchase is properly logged with all relevant information. Upon arrival of new items we need to learn who the best customers (prospects) for each item are, that is, the ones most likely to be interested in purchasing that specific item. Such learning activity is time consuming and, in our specific case, is not applicable given the large number of new items arriving every day. Basically, given the current customer base we are not able to learn on all new items. Thus, we need somehow to select among those new items to identify the best candidates. We do so by using a joint analysis between visual features and text to estimate how good each new item could be, that is, whether or not is worth to learn on it. Preliminary results show the effectiveness of the proposed approach to improve classical data mining techniques.

  9. Outcome-Driven Cluster Analysis with Application to Microarray Data.

    Directory of Open Access Journals (Sweden)

    Jessie J Hsu

    Full Text Available One goal of cluster analysis is to sort characteristics into groups (clusters so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes into groups of highly correlated genes that have the same effect on the outcome (recovery. We propose a random effects model where the genes within each group (cluster equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.

  10. GPR Detection of Buried Symmetrically Shaped Mine-like Objects using Selective Independent Component Analysis

    DEFF Research Database (Denmark)

    Karlsen, Brian; Sørensen, Helge Bjarup Dissing; Larsen, Jan

    2003-01-01

    from small-scale anti-personal (AP) mines to large-scale anti-tank (AT) mines were designed. Large-scale SF-GPR measurements on this series of mine-like objects buried in soil were performed. The SF-GPR data was acquired using a wideband monostatic bow-tie antenna operating in the frequency range 750......This paper addresses the detection of mine-like objects in stepped-frequency ground penetrating radar (SF-GPR) data as a function of object size, object content, and burial depth. The detection approach is based on a Selective Independent Component Analysis (SICA). SICA provides an automatic...... ranking of components, which enables the suppression of clutter, hence extraction of components carrying mine information. The goal of the investigation is to evaluate various time and frequency domain ICA approaches based on SICA. Performance comparison is based on a series of mine-like objects ranging...

  11. High-dimensional cluster analysis with the Masked EM Algorithm

    Science.gov (United States)

    Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.

    2014-01-01

    Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694

  12. A cluster analysis investigation of workaholism as a syndrome.

    Science.gov (United States)

    Aziz, Shahnaz; Zickar, Michael J

    2006-01-01

    Workaholism has been conceptualized as a syndrome although there have been few tests that explicitly consider its syndrome status. The authors analyzed a three-dimensional scale of workaholism developed by Spence and Robbins (1992) using cluster analysis. The authors identified three clusters of individuals, one of which corresponded to Spence and Robbins's profile of the workaholic (high work involvement, high drive to work, low work enjoyment). Consistent with previously conjectured relations with workaholism, individuals in the workaholic cluster were more likely to label themselves as workaholics, more likely to have acquaintances label them as workaholics, and more likely to have lower life satisfaction and higher work-life imbalance. The importance of considering workaholism as a syndrome and the implications for effective interventions are discussed. Copyright 2006 APA.

  13. Integrating Process Mining and Cognitive Analysis to Study EHR Workflow.

    Science.gov (United States)

    Furniss, Stephanie K; Burton, Matthew M; Grando, Adela; Larson, David W; Kaufman, David R

    2016-01-01

    There are numerous methods to study workflow. However, few produce the kinds of in-depth analyses needed to understand EHR-mediated workflow. Here we investigated variations in clinicians' EHR workflow by integrating quantitative analysis of patterns of users' EHR-interactions with in-depth qualitative analysis of user performance. We characterized 6 clinicians' patterns of information-gathering using a sequential process-mining approach. The analysis revealed 519 different screen transition patterns performed across 1569 patient cases. No one pattern was followed for more than 10% of patient cases, the 15 most frequent patterns accounted for over half ofpatient cases (53%), and 27% of cases exhibited unique patterns. By triangulating quantitative and qualitative analyses, we found that participants' EHR-interactive behavior was associated with their routine processes, patient case complexity, and EHR default settings. The proposed approach has significant potential to inform resource allocation for observation and training. In-depth observations helped us to explain variation across users.

  14. Cosmological analysis of galaxy clusters surveys in X-rays

    International Nuclear Information System (INIS)

    Clerc, N.

    2012-01-01

    Clusters of galaxies are the most massive objects in equilibrium in our Universe. Their study allows to test cosmological scenarios of structure formation with precision, bringing constraints complementary to those stemming from the cosmological background radiation, supernovae or galaxies. They are identified through the X-ray emission of their heated gas, thus facilitating their mapping at different epochs of the Universe. This report presents two surveys of galaxy clusters detected in X-rays and puts forward a method for their cosmological interpretation. Thanks to its multi-wavelength coverage extending over 10 sq. deg. and after one decade of expertise, the XMM-LSS allows a systematic census of clusters in a large volume of the Universe. In the framework of this survey, the first part of this report describes the techniques developed to the purpose of characterizing the detected objects. A particular emphasis is placed on the most distant ones (z ≥ 1) through the complementarity of observations in X-ray, optical and infrared bands. Then the X-CLASS survey is fully described. Based on XMM archival data, it provides a new catalogue of 800 clusters detected in X-rays. A cosmological analysis of this survey is performed thanks to 'CR-HR' diagrams. This new method self-consistently includes selection effects and scaling relations and provides a means to bypass the computation of individual cluster masses. Propositions are made for applying this method to future surveys as XMM-XXL and eRosita. (author) [fr

  15. Cluster analysis by optimal decomposition of induced fuzzy sets

    Energy Technology Data Exchange (ETDEWEB)

    Backer, E

    1978-01-01

    Nonsupervised pattern recognition is addressed and the concept of fuzzy sets is explored in order to provide the investigator (data analyst) additional information supplied by the pattern class membership values apart from the classical pattern class assignments. The basic ideas behind the pattern recognition problem, the clustering problem, and the concept of fuzzy sets in cluster analysis are discussed, and a brief review of the literature of the fuzzy cluster analysis is given. Some mathematical aspects of fuzzy set theory are briefly discussed; in particular, a measure of fuzziness is suggested. The optimization-clustering problem is characterized. Then the fundamental idea behind affinity decomposition is considered. Next, further analysis takes place with respect to the partitioning-characterization functions. The iterative optimization procedure is then addressed. The reclassification function is investigated and convergence properties are examined. Finally, several experiments in support of the method suggested are described. Four object data sets serve as appropriate test cases. 120 references, 70 figures, 11 tables. (RWR)

  16. Reliability analysis of mining equipment: A case study of a crushing plant at Jajarm Bauxite Mine in Iran

    International Nuclear Information System (INIS)

    Barabady, Javad; Kumar, Uday

    2008-01-01

    The performance of mining machines depends on the reliability of the equipment used, the operating environment, the maintenance efficiency, the operation process, the technical expertise of the miners, etc. As the size and complexity of mining equipments continue to increase, the implications of equipment failure become ever more critical. Therefore, reliability analysis is required to identify the bottlenecks in the system and to find the components or subsystems with low reliability for a given designed performance. It is important to select a suitable method for data collection as well as for reliability analysis. This paper presents a case study describing reliability and availability analysis of the crushing plant number 3 at Jajarm Bauxite Mine in Iran. In this study, the crushing plant number 3 is divided into six subsystems. The parameters of some probability distributions, such as Weibull, Exponential, and Lognormal distributions have been estimated by using ReliaSoft's Weibull++6 software. The results of the analysis show that the conveyer subsystem and secondary screen subsystem are critical from a reliability point of view, and the secondary crusher subsystem and conveyer subsystem are critical from an availability point of view. The study also shows that the reliability analysis is very useful for deciding maintenance intervals

  17. Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters

    Energy Technology Data Exchange (ETDEWEB)

    Yoo, Wucherl [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Koo, Michelle [Univ. of California, Berkeley, CA (United States); Cao, Yu [California Inst. of Technology (CalTech), Pasadena, CA (United States); Sim, Alex [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Nugent, Peter [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States); Wu, Kesheng [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2016-09-17

    Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe- art open-source big data processing tools. Our tool can ingest system logs and application performance measurements to extract key performance features, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of the big data analysis framework, we conduct case studies on the workflows from an astronomy project known as the Palomar Transient Factory (PTF) and the job logs from the genome analysis scientific cluster. Our study processed many terabytes of system logs and application performance measurements collected on the HPC systems at NERSC. The implementation of our tool is generic enough to be used for analyzing the performance of other HPC systems and Big Data workows.

  18. DGA Clustering and Analysis: Mastering Modern, Evolving Threats, DGALab

    Directory of Open Access Journals (Sweden)

    Alexander Chailytko

    2016-05-01

    Full Text Available Domain Generation Algorithms (DGA is a basic building block used in almost all modern malware. Malware researchers have attempted to tackle the DGA problem with various tools and techniques, with varying degrees of success. We present a complex solution to populate DGA feed using reversed DGAs, third-party feeds, and a smart DGA extraction and clustering based on emulation of a large number of samples. Smart DGA extraction requires no reverse engineering and works regardless of the DGA type or initialization vector, while enabling a cluster-based analysis. Our method also automatically allows analysis of the whole malware family, specific campaign, etc. We present our system and demonstrate its abilities on more than 20 malware families. This includes showing connections between different campaigns, as well as comparing results. Most importantly, we discuss how to utilize the outcome of the analysis to create smarter protections against similar malware.

  19. Analysis of RXTE data on Clusters of Galaxies

    Science.gov (United States)

    Petrosian, Vahe

    2004-01-01

    This grant provided support for the reduction, analysis and interpretation of of hard X-ray (HXR, for short) observations of the cluster of galaxies RXJO658--5557 scheduled for the week of August 23, 2002 under the RXTE Cycle 7 program (PI Vahe Petrosian, Obs. ID 70165). The goal of the observation was to search for and characterize the shape of the HXR component beyond the well established thermal soft X-ray (SXR) component. Such hard components have been detected in several nearby clusters. distant cluster would provide information on the characteristics of this radiation at a different epoch in the evolution of the imiverse and shed light on its origin. We (Petrosian, 2001) have argued that thermal bremsstrahlung, as proposed earlier, cannot be the mechanism for the production of the HXRs and that the most likely mechanism is Compton upscattering of the cosmic microwave radiation by relativistic electrons which are known to be present in the clusters and be responsible for the observed radio emission. Based on this picture we estimated that this cluster, in spite of its relatively large distance, will have HXR signal comparable to the other nearby ones. The planned observation of a relatively The proposed RXTE observations were carried out and the data have been analyzed. We detect a hard X-ray tail in the spectrum of this cluster with a flux very nearly equal to our predicted value. This has strengthen the case for the Compton scattering model. We intend the data obtained via this observation to be a part of a larger data set. We have identified other clusters of galaxies (in archival RXTE and other instrument data sets) with sufficiently high quality data where we can search for and measure (or at least put meaningful limits) on the strength of the hard component. With these studies we expect to clarify the mechanism for acceleration of particles in the intercluster medium and provide guidance for future observations of this intriguing phenomenon by instrument

  20. Mobility in Europe: Recent Trends from a Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Ioana Manafi

    2017-08-01

    Full Text Available During the past decade, Europe was confronted with major changes and events offering large opportunities for mobility. The EU enlargement process, the EU policies regarding youth, the economic crisis affecting national economies on different levels, political instabilities in some European countries, high rates of unemployment or the increasing number of refugees are only a few of the factors influencing net migration in Europe. Based on a set of socio-economic indicators for EU/EFTA countries and cluster analysis, the paper provides an overview of regional differences across European countries, related to migration magnitude in the identified clusters. The obtained clusters are in accordance with previous studies in migration, and appear stable during the period of 2005-2013, with only some exceptions. The analysis revealed three country clusters: EU/EFTA center-receiving countries, EU/EFTA periphery-sending countries and EU/EFTA outlier countries, the names suggesting not only the geographical position within Europe, but the trends in net migration flows during the years. Therewith, the results provide evidence for the persistence of a movement from periphery to center countries, which is correlated with recent flows of mobility in Europe.

  1. Full text clustering and relationship network analysis of biomedical publications.

    Directory of Open Access Journals (Sweden)

    Renchu Guan

    Full Text Available Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  2. The Productivity Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, E.

    2014-07-01

    Chennai, also called the Detroit of India, is India's second fastest growing auto market and exports auto components and vehicles to US, Germany, Japan and Brazil. For inclusive growth and sustainable development, 250 auto component industries in Ambattur, Thirumalisai and Thirumudivakkam Industrial Estates located in Chennai have adopted the Cluster Development Approach called Automotive Component Cluster. The objective is to study the Value Chain, Correlation and Data Envelopment Analysis by determining technical efficiency, peer weights, input and output slacks of 100 auto component industries in three estates. The methodology adopted is using Data Envelopment Analysis of Output Oriented Banker Charnes Cooper model by taking net worth, fixed assets, employment as inputs and gross output as outputs. The non-zero represents the weights for efficient clusters. The higher slack obtained reveals the excess net worth, fixed assets, employment and shortage in gross output. To conclude, the variables are highly correlated and the inefficient industries should increase their gross output or decrease the fixed assets or employment. Moreover for sustainable development, the cluster should strengthen infrastructure, technology, procurement, production and marketing interrelationships to decrease costs and to increase productivity and efficiency to compete in the indigenous and export market.

  3. Sirenomelia in Argentina: Prevalence, geographic clusters and temporal trends analysis.

    Science.gov (United States)

    Groisman, Boris; Liascovich, Rosa; Gili, Juan Antonio; Barbero, Pablo; Bidondo, María Paz

    2016-07-01

    Sirenomelia is a severe malformation of the lower body characterized by a single medial lower limb and a variable combination of visceral abnormalities. Given that Sirenomelia is a very rare birth defect, epidemiological studies are scarce. The aim of this study is to evaluate prevalence, geographic clusters and time trends of sirenomelia in Argentina, using data from the National Network of Congenital Anomalies of Argentina (RENAC) from November 2009 until December 2014. This is a descriptive study using data from the RENAC, a hospital-based surveillance system for newborns affected with major morphological congenital anomalies. We calculated sirenomelia prevalence throughout the period, searched for geographical clusters, and evaluated time trends. The prevalence of confirmed cases of sirenomelia throughout the period was 2.35 per 100,000 births. Cluster analysis showed no statistically significant geographical aggregates. Time-trends analysis showed that the prevalence was higher in years 2009 to 2010. The observed prevalence was higher than the observed in previous epidemiological studies in other geographic regions. We observed a likely real increase in the initial period of our study. We used strict diagnostic criteria, excluding cases that only had clinical diagnosis of sirenomelia. Therefore, real prevalence could be even higher. This study did not show any geographic clusters. Because etiology of sirenomelia has not yet been established, studies of epidemiological features of this defect may contribute to define its causes. Birth Defects Research (Part A) 106:604-611, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  4. Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

    Science.gov (United States)

    Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

    2016-04-01

    Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.

  5. Full text clustering and relationship network analysis of biomedical publications.

    Science.gov (United States)

    Guan, Renchu; Yang, Chen; Marchese, Maurizio; Liang, Yanchun; Shi, Xiaohu

    2014-01-01

    Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP) to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  6. Latent cluster analysis of ALS phenotypes identifies prognostically differing groups.

    Directory of Open Access Journals (Sweden)

    Jeban Ganesalingam

    2009-09-01

    Full Text Available Amyotrophic lateral sclerosis (ALS is a degenerative disease predominantly affecting motor neurons and manifesting as several different phenotypes. Whether these phenotypes correspond to different underlying disease processes is unknown. We used latent cluster analysis to identify groupings of clinical variables in an objective and unbiased way to improve phenotyping for clinical and research purposes.Latent class cluster analysis was applied to a large database consisting of 1467 records of people with ALS, using discrete variables which can be readily determined at the first clinic appointment. The model was tested for clinical relevance by survival analysis of the phenotypic groupings using the Kaplan-Meier method.The best model generated five distinct phenotypic classes that strongly predicted survival (p<0.0001. Eight variables were used for the latent class analysis, but a good estimate of the classification could be obtained using just two variables: site of first symptoms (bulbar or limb and time from symptom onset to diagnosis (p<0.00001.The five phenotypic classes identified using latent cluster analysis can predict prognosis. They could be used to stratify patients recruited into clinical trials and generating more homogeneous disease groups for genetic, proteomic and risk factor research.

  7. The Quantitative Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, Ethirajan

    2016-07-01

    Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity

  8. Analysis of Mining-induced Valley Closure Movements

    Science.gov (United States)

    Zhang, C.; Mitra, R.; Oh, J.; Hebblewhite, B.

    2016-05-01

    Valley closure movements have been observed for decades in Australia and overseas when underground mining occurred beneath or in close proximity to valleys and other forms of irregular topographies. Valley closure is defined as the inward movements of the valley sides towards the valley centreline. Due to the complexity of the local geology and the interplay between several geological, topographical and mining factors, the underlying mechanisms that actually cause this behaviour are not completely understood. A comprehensive programme of numerical modelling investigations has been carried out to further evaluate and quantify the influence of a number of these mining and geological factors and their inter-relationships. The factors investigated in this paper include longwall positional factors, horizontal stress, panel width, depth of cover and geological structures around the valley. It is found that mining in a series passing beneath the valley dramatically increases valley closure, and mining parallel to valley induces much more closure than other mining orientations. The redistribution of horizontal stress and influence of mining activity have also been recognised as important factors promoting valley closure, and the effect of geological structure around the valley is found to be relatively small. This paper provides further insight into both the valley closure mechanisms and how these mechanisms should be considered in valley closure prediction models.

  9. Analysis and Optimization of Entry Stability in Underground Longwall Mining

    Directory of Open Access Journals (Sweden)

    Yubing Gao

    2017-11-01

    Full Text Available For sustainable utilization of limited coal resources, it is important to increase the coal recovery rate and reduce mine accidents, especially those occurring in the entry (gateroad. Entry stabilities are vital for ventilation, transportation and other essential services in underground coal mining. In the present study, a finite difference model was built to investigate stress evolutions around the entry, and true triaxial tests were carried out at the laboratory to explore entry wall stabilities under different mining conditions. The modeling and experimental results indicated that a wide coal pillar was favorable for entry stabilities, but oversize pillars caused a serious waste of coal resources. As the width of the entry wall decreased, the integrated vertical stress, induced by two adjacent mining panels, coupled with each other and experienced an increase on the entry wall, which inevitably weakened the stability of the entry. Therefore, mining with coal pillars always involves a tradeoff between economy and safety. To address this problem, an innovative non-pillar mining technique by optimizing the entry surrounding structures was proposed. Numerical simulation showed that the deformation of the entry roof decreased by approximately 66% after adopting the new approach, compared with that using the conventional mining method. Field monitoring indicated that the stress condition of the entry was significantly improved and the average roof pressure decreased by appropriately 60.33% after adopting the new technique. This work provides an economical and effective approach to achieve sustainable exploitation of underground coal resources.

  10. Applications of Cluster Analysis to the Creation of Perfectionism Profiles: A Comparison of two Clustering Approaches

    Directory of Open Access Journals (Sweden)

    Jocelyn H Bolin

    2014-04-01

    Full Text Available Although traditional clustering methods (e.g., K-means have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  11. Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches.

    Science.gov (United States)

    Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C

    2014-01-01

    Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  12. Citation-related reliability analysis for a pilot sample of underground coal mines

    Energy Technology Data Exchange (ETDEWEB)

    Kinilakodi, H.; Grayson, R.L. [Penn State University, University Park, PA (United States)

    2011-05-15

    The scrutiny of underground coal mine safety was heightened because of the disasters that occurred in 2006-2007, and more recently in 2010. In the aftermath of the 2006 incidents, the U.S. Congress passed the Mine Improvement and New Emergency Response Act of 2006 (MINER Act), which strengthened the existing regulations and mandated new laws to address various issues related to emergency preparedness and response, escape from an emergency situation, and protection of miners. The National Mining Association-sponsored Mine Safety Technology and Training Commission study highlighted the role of risk management in identifying and controlling major hazards, which are elements that could come together and cause a mine disaster. In 2007 MSHA revised its approach to the 'Pattern of Violations' (POV) process in order to target unsafe mines and then force them to remediate conditions in their mines. The POV approach has certain limitations that make it difficult for it to be enforced. One very understandable way to focus on removing threats from major-hazard conditions is to use citation-related reliability analysis. The citation reliability approach, which focuses on the probability of not getting a citation on a given inspector day, is considered an analogue to the maintenance reliability approach, which many mine operators understand and use. In this study, the citation reliability approach was applied to a stratified random sample of 31 underground coal mines to examine its potential for broader application. The results clearly show the best-performing and worst-performing mines for compliance with mine safety standards, and they highlight differences among different mine sizes.

  13. Analysis of radon reduction and ventilation systems in uranium mines in China.

    Science.gov (United States)

    Hu, Peng-hua; Li, Xian-jie

    2012-09-01

    Mine ventilation is the most important way of reducing radon in uranium mines. At present, the radon and radon progeny levels in Chinese uranium mines where the cut and fill stoping method is used are 3-5 times higher than those in foreign uranium mines, as there is not much difference in the investments for ventilation protection between Chinese uranium mines and international advanced uranium mines with compaction methodology. In this paper, through the analysis of radon reduction and ventilation systems in Chinese uranium mines and the comparison of advantages and disadvantages between a variety of ventilation systems in terms of radon control, the authors try to illustrate the reasons for the higher radon and radon progeny levels in Chinese uranium mines and put forward some problems in three areas, namely the theory of radon control and ventilation systems, radon reduction ventilation measures and ventilation management. For these problems, this paper puts forward some proposals regarding some aspects, such as strengthening scrutiny, verifying and monitoring the practical situation, making clear ventilation plans, strictly following the mining sequence, promoting training of ventilation staff, enhancing ventilation system management, developing radon reduction ventilation technology, purchasing ventilation equipment as soon as possible in the future, and so on.

  14. Analysis of radon reduction and ventilation systems in uranium mines in China

    International Nuclear Information System (INIS)

    Hu Penghua; Li Xianjie

    2012-01-01

    Mine ventilation is the most important way of reducing radon in uranium mines. At present, the radon and radon progeny levels in Chinese uranium mines where the cut and fill stoping method is used are 3–5 times higher than those in foreign uranium mines, as there is not much difference in the investments for ventilation protection between Chinese uranium mines and international advanced uranium mines with compaction methodology. In this paper, through the analysis of radon reduction and ventilation systems in Chinese uranium mines and the comparison of advantages and disadvantages between a variety of ventilation systems in terms of radon control, the authors try to illustrate the reasons for the higher radon and radon progeny levels in Chinese uranium mines and put forward some problems in three areas, namely the theory of radon control and ventilation systems, radon reduction ventilation measures and ventilation management. For these problems, this paper puts forward some proposals regarding some aspects, such as strengthening scrutiny, verifying and monitoring the practical situation, making clear ventilation plans, strictly following the mining sequence, promoting training of ventilation staff, enhancing ventilation system management, developing radon reduction ventilation technology, purchasing ventilation equipment as soon as possible in the future, and so on.

  15. Web Mining of Hotel Customer Survey Data

    Directory of Open Access Journals (Sweden)

    Richard S. Segall

    2008-12-01

    Full Text Available This paper provides an extensive literature review and list of references on the background of web mining as applied specifically to hotel customer survey data. This research applies the techniques of web mining to actual text of written comments for hotel customers using Megaputer PolyAnalyst®. Web mining functionalities utilized include those such as clustering, link analysis, key word and phrase extraction, taxonomy, and dimension matrices. This paper provides screen shots of the web mining applications using Megaputer PolyAnalyst®. Conclusions and future directions of the research are presented.

  16. Statistical and Machine-Learning Data Mining Techniques for Better Predictive Modeling and Analysis of Big Data

    CERN Document Server

    Ratner, Bruce

    2011-01-01

    The second edition of a bestseller, Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition, titled Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data, contained 17 chapters of innovative and practical statistical data mining techniques. In this second edition, renamed to reflect the increased coverage of machine-learning data mining techniques, the author has

  17. Diagnostic analysis of electrodialysis in mine tailing materials

    DEFF Research Database (Denmark)

    Hansen, Henrik K.; Ribeiro, Alexandra B.; Mateus, Eduardo

    2007-01-01

    Removal of heavy metals from mine tailings and soil contaminated by copper mining activities was studied under batch electrodialytic conditions. Two types of mine tailings were treated: (i) freshly produced tailings coming directly from the flotation process, and (ii) tailings deposited...... in a tailings pond, for approximately 20 years. The main contaminant was copper-found in concentration around 800-1800 ppm. The fractionation of copper and other characteristics of the tailings differ for the two tailings, indicating natural oxidation reactions in the old deposited ones. Electrodialytical...

  18. Statistical analysis of the spatial distribution of galaxies and clusters

    International Nuclear Information System (INIS)

    Cappi, Alberto

    1993-01-01

    This thesis deals with the analysis of the distribution of galaxies and clusters, describing some observational problems and statistical results. First chapter gives a theoretical introduction, aiming to describe the framework of the formation of structures, tracing the history of the Universe from the Planck time, t_p = 10"-"4"3 sec and temperature corresponding to 10"1"9 GeV, to the present epoch. The most usual statistical tools and models of the galaxy distribution, with their advantages and limitations, are described in chapter two. A study of the main observed properties of galaxy clustering, together with a detailed statistical analysis of the effects of selecting galaxies according to apparent magnitude or diameter, is reported in chapter three. Chapter four delineates some properties of groups of galaxies, explaining the reasons of discrepant results on group distributions. Chapter five is a study of the distribution of galaxy clusters, with different statistical tools, like correlations, percolation, void probability function and counts in cells; it is found the same scaling-invariant behaviour of galaxies. Chapter six describes our finding that rich galaxy clusters too belong to the fundamental plane of elliptical galaxies, and gives a discussion of its possible implications. Finally chapter seven reviews the possibilities offered by multi-slit and multi-fibre spectrographs, and I present some observational work on nearby and distant galaxy clusters. In particular, I show the opportunities offered by ongoing surveys of galaxies coupled with multi-object fibre spectrographs, focusing on the ESO Key Programme A galaxy redshift survey in the south galactic pole region to which I collaborate and on MEFOS, a multi-fibre instrument with automatic positioning. Published papers related to the work described in this thesis are reported in the last appendix. (author) [fr

  19. Analysis of the situation of Spanish metal mining

    International Nuclear Information System (INIS)

    Espi Rodriguez, J. A.; Vazquez Guzman, F.; Leon Altamirano, C.; Perez Macias, D.

    2015-01-01

    This article is a summary of the original document drawn up by the Task Force on Mineral Resources and Reserves of the National Association of Mining Engineers, available in free digital format through the Link. (Author)

  20. Analysis of mining-medicinal waters regulation. Possible treatments

    Directory of Open Access Journals (Sweden)

    María del Mar Corral Lledó

    2006-12-01

    Full Text Available The mineral waters have been a part of our lives for many centuries, since they are believe to be beneficial and to have therapeutic properties for the human health. This fact makes a key issue to provide an analyses od the current legislation on this issue ( Royal Decree-Law 743/1928, April 25th, that aproves the statute on exploitation of spring of mineral and medicinal water; Mining Law of 1973 and its regulations, in order to determine the treatments that may be applied to these watersm meanly in the case of Legionella appearance, or as preventive maesure of it. The results of this analysis shows that there is no limitation or prohibition in the treatments to what they may be subjected, as long as their physico-chemical characteristic remains altered. Contamination by Legionella bacterium, is nowadays an issue of concern, and that is reason why Public Health Authority has established the quidelines to apply for spas in order to avoid or fight a possible outbreak.

  1. Project X: competitive intelligence data mining and analysis

    Science.gov (United States)

    Gilmore, John F.; Pagels, Michael A.; Palk, Justin

    2001-03-01

    Competitive Intelligence (CI) is a systematic and ethical program for gathering and analyzing information about your competitors' activities and general business trends to further your own company's goals. CI allows companies to gather extensive information on their competitors and to analyze what the competition is doing in order to maintain or gain a competitive edge. In commercial business this potentially translates into millions of dollars in annual savings or losses. The Internet provides an overwhelming portal of information for CI analysis. The problem is how a company can automate the translation of voluminous information into valuable and actionable knowledge. This paper describes Project X, an agent-based data mining system specifically developed for extracting and analyzing competitive information from the Internet. Project X gathers CI information from a variety of sources including online newspapers, corporate websites, industry sector reporting sites, speech archiving sites, video news casts, stock news sites, weather sites, and rumor sites. It uses individual industry specific (e.g., pharmaceutical, financial, aerospace, etc.) commercial sector ontologies to form the knowledge filtering and discovery structures/content required to filter and identify valuable competitive knowledge. Project X is described in detail and an example competitive intelligence case is shown demonstrating the system's performance and utility for business intelligence.

  2. Utilization of Selected Data Mining Methods for Communication Network Analysis

    Directory of Open Access Journals (Sweden)

    V. Ondryhal

    2011-06-01

    Full Text Available The aim of the project was to analyze the behavior of military communication networks based on work with real data collected continuously since 2005. With regard to the nature and amount of the data, data mining methods were selected for the purpose of analyses and experiments. The quality of real data is often insufficient for an immediate analysis. The article presents the data cleaning operations which have been carried out with the aim to improve the input data sample to obtain reliable models. Gradually, by means of properly chosen SW, network models were developed to verify generally valid patterns of network behavior as a bulk service. Furthermore, unlike the commercially available communication networks simulators, the models designed allowed us to capture nonstandard models of network behavior under an increased load, verify the correct sizing of the network to the increased load, and thus test its reliability. Finally, based on previous experience, the models enabled us to predict emergency situations with a reasonable accuracy.

  3. Pyrosequencing Based Microbial Community Analysis of Stabilized Mine Soils

    Science.gov (United States)

    Park, J. E.; Lee, B. T.; Son, A.

    2015-12-01

    Heavy metals leached from exhausted mines have been causing severe environmental problems in nearby soils and groundwater. Environmental mitigation was performed based on the heavy metal stabilization using Calcite and steel slag in Korea. Since the soil stabilization only temporarily immobilizes the contaminants to soil matrix, the potential risk of re-leaching heavy metal still exists. Therefore the follow-up management of stabilized soils and the corresponding evaluation methods are required to avoid the consequent contamination from the stabilized soils. In this study, microbial community analysis using pyrosequencing was performed for assessing the potential leaching of the stabilized soils. As a result of rarefaction curve and Chao1 and Shannon indices, the stabilized soil has shown lower richness and diversity as compared to non-contaminated negative control. At the phyla level, as the degree of contamination increases, most of phyla decreased with only exception of increased proteobacteria. Among proteobacteria, gamma-proteobacteria increased against the heavy metal contamination. At the species level, Methylobacter tundripaludum of gamma-proteobacteria showed the highest relative portion of microbial community, indicating that methanotrophs may play an important role in either solubilization or immobilization of heavy metals in stabilized soils.

  4. Sensory over responsivity and obsessive compulsive symptoms: A cluster analysis.

    Science.gov (United States)

    Ben-Sasson, Ayelet; Podoly, Tamar Yonit

    2017-02-01

    Several studies have examined the sensory component in Obsesseive Compulsive Disorder (OCD) and described an OCD subtype which has a unique profile, and that Sensory Phenomena (SP) is a significant component of this subtype. SP has some commonalities with Sensory Over Responsivity (SOR) and might be in part a characteristic of this subtype. Although there are some studies that have examined SOR and its relation to Obsessive Compulsive Symptoms (OCS), literature lacks sufficient data on this interplay. First to further examine the correlations between OCS and SOR, and to explore the correlations between SOR modalities (i.e. smell, touch, etc.) and OCS subscales (i.e. washing, ordering, etc.). Second, to investigate the cluster analysis of SOR and OCS dimensions in adults, that is, to classify the sample using the sensory scores to find whether a sensory OCD subtype can be specified. Our third goal was to explore the psychometric features of a new sensory questionnaire: the Sensory Perception Quotient (SPQ). A sample of non clinical adults (n=350) was recruited via e-mail, social media and social networks. Participants completed questionnaires for measuring SOR, OCS, and anxiety. SOR and OCI-F scores were moderately significantly correlated (n=274), significant correlations between all SOR modalities and OCS subscales were found with no specific higher correlation between one modality to one OCS subscale. Cluster analysis revealed four distinct clusters: (1) No OC and SOR symptoms (NONE; n=100), (2) High OC and SOR symptoms (BOTH; n=28), (3) Moderate OC symptoms (OCS; n=63), (4) Moderate SOR symptoms (SOR; n=83). The BOTH cluster had significantly higher anxiety levels than the other clusters, and shared OC subscales scores with the OCS cluster. The BOTH cluster also reported higher SOR scores across tactile, vision, taste and olfactory modalities. The SPQ was found reliable and suitable to detect SOR, the sample SPQ scores was normally distributed (n=350). SOR is a

  5. Analysis of plasmaspheric plumes: CLUSTER and IMAGE observations

    Directory of Open Access Journals (Sweden)

    F. Darrouzet

    2006-07-01

    Full Text Available Plasmaspheric plumes have been routinely observed by CLUSTER and IMAGE. The CLUSTER mission provides high time resolution four-point measurements of the plasmasphere near perigee. Total electron density profiles have been derived from the electron plasma frequency identified by the WHISPER sounder supplemented, in-between soundings, by relative variations of the spacecraft potential measured by the electric field instrument EFW; ion velocity is also measured onboard these satellites. The EUV imager onboard the IMAGE spacecraft provides global images of the plasmasphere with a spatial resolution of 0.1 RE every 10 min; such images acquired near apogee from high above the pole show the geometry of plasmaspheric plumes, their evolution and motion. We present coordinated observations of three plume events and compare CLUSTER in-situ data with global images of the plasmasphere obtained by IMAGE. In particular, we study the geometry and the orientation of plasmaspheric plumes by using four-point analysis methods. We compare several aspects of plume motion as determined by different methods: (i inner and outer plume boundary velocity calculated from time delays of this boundary as observed by the wave experiment WHISPER on the four spacecraft, (ii drift velocity measured by the electron drift instrument EDI onboard CLUSTER and (iii global velocity determined from successive EUV images. These different techniques consistently indicate that plasmaspheric plumes rotate around the Earth, with their foot fully co-rotating, but with their tip rotating slower and moving farther out.

  6. Herd Clustering: A synergistic data clustering approach using collective intelligence

    KAUST Repository

    Wong, Kachun

    2014-10-01

    Traditional data mining methods emphasize on analytical abilities to decipher data, assuming that data are static during a mining process. We challenge this assumption, arguing that we can improve the analysis by vitalizing data. In this paper, this principle is used to develop a new clustering algorithm. Inspired by herd behavior, the clustering method is a synergistic approach using collective intelligence called Herd Clustering (HC). The novel part is laid in its first stage where data instances are represented by moving particles. Particles attract each other locally and form clusters by themselves as shown in the case studies reported. To demonstrate its effectiveness, the performance of HC is compared to other state-of-the art clustering methods on more than thirty datasets using four performance metrics. An application for DNA motif discovery is also conducted. The results support the effectiveness of HC and thus the underlying philosophy. © 2014 Elsevier B.V.

  7. Co-clustering Analysis of Weblogs Using Bipartite Spectral Projection Approach

    DEFF Research Database (Denmark)

    Xu, Guandong; Zong, Yu; Dolog, Peter

    2010-01-01

    Web clustering is an approach for aggregating Web objects into various groups according to underlying relationships among them. Finding co-clusters of Web objects is an interesting topic in the context of Web usage mining, which is able to capture the underlying user navigational interest...... and content preference simultaneously. In this paper we will present an algorithm using bipartite spectral clustering to co-cluster Web users and pages. The usage data of users visiting Web sites is modeled as a bipartite graph and the spectral clustering is then applied to the graph representation of usage...... data. The proposed approach is evaluated by experiments performed on real datasets, and the impact of using various clustering algorithms is also investigated. Experimental results have demonstrated the employed method can effectively reveal the subset aggregates of Web users and pages which...

  8. HORIZONTAL BRANCH MORPHOLOGY OF GLOBULAR CLUSTERS: A MULTIVARIATE STATISTICAL ANALYSIS

    International Nuclear Information System (INIS)

    Jogesh Babu, G.; Chattopadhyay, Tanuka; Chattopadhyay, Asis Kumar; Mondal, Saptarshi

    2009-01-01

    The proper interpretation of horizontal branch (HB) morphology is crucial to the understanding of the formation history of stellar populations. In the present study a multivariate analysis is used (principal component analysis) for the selection of appropriate HB morphology parameter, which, in our case, is the logarithm of effective temperature extent of the HB (log T effHB ). Then this parameter is expressed in terms of the most significant observed independent parameters of Galactic globular clusters (GGCs) separately for coherent groups, obtained in a previous work, through a stepwise multiple regression technique. It is found that, metallicity ([Fe/H]), central surface brightness (μ v ), and core radius (r c ) are the significant parameters to explain most of the variations in HB morphology (multiple R 2 ∼ 0.86) for GGC elonging to the bulge/disk while metallicity ([Fe/H]) and absolute magnitude (M v ) are responsible for GGC belonging to the inner halo (multiple R 2 ∼ 0.52). The robustness is tested by taking 1000 bootstrap samples. A cluster analysis is performed for the red giant branch (RGB) stars of the GGC belonging to Galactic inner halo (Cluster 2). A multi-episodic star formation is preferred for RGB stars of GGC belonging to this group. It supports the asymptotic giant branch (AGB) model in three episodes instead of two as suggested by Carretta et al. for halo GGC while AGB model is suggested to be revisited for bulge/disk GGC.

  9. A cost-benefit analysis of landfill mining and material recycling in China

    International Nuclear Information System (INIS)

    Zhou, Chuanbin; Gong, Zhe; Hu, Junsong; Cao, Aixin; Liang, Hanwen

    2015-01-01

    Highlights: • Assessing the economic feasibility of landfill mining. • We applied a cost-benefit analysis model for landfill mining. • Four material cycling and energy recovery scenarios were designed. • We used net present value to evaluate the cost-benefit efficiency. - Abstract: Landfill mining is an environmentally-friendly technology that combines the concepts of material recycling and sustainable waste management, and it has received a great deal of worldwide attention because of its significant environmental and economic potential in material recycling, energy recovery, land reclamation and pollution prevention. This work applied a cost-benefit analysis model for assessing the economic feasibility, which is important for promoting landfill mining. The model includes eight indicators of costs and nine indicators of benefits. Four landfill mining scenarios were designed and analyzed based on field data. The economic feasibility of landfill mining was then evaluated by the indicator of net present value (NPV). According to our case study of a typical old landfill mining project in China (Yingchun landfill), rental of excavation and hauling equipment, waste processing and material transportation were the top three costs of landfill mining, accounting for 88.2% of the total cost, and the average cost per unit of stored waste was 12.7 USD ton −1 . The top three benefits of landfill mining were electricity generation by incineration, land reclamation and recycling soil-like materials. The NPV analysis of the four different scenarios indicated that the Yingchun landfill mining project could obtain a net positive benefit varying from 1.92 million USD to 16.63 million USD. However, the NPV was sensitive to the mode of land reuse, the availability of energy recovery facilities and the possibility of obtaining financial support by avoiding post-closure care

  10. A cost-benefit analysis of landfill mining and material recycling in China

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, Chuanbin, E-mail: cbzhou@rcees.ac.cn; Gong, Zhe; Hu, Junsong; Cao, Aixin; Liang, Hanwen

    2015-01-15

    Highlights: • Assessing the economic feasibility of landfill mining. • We applied a cost-benefit analysis model for landfill mining. • Four material cycling and energy recovery scenarios were designed. • We used net present value to evaluate the cost-benefit efficiency. - Abstract: Landfill mining is an environmentally-friendly technology that combines the concepts of material recycling and sustainable waste management, and it has received a great deal of worldwide attention because of its significant environmental and economic potential in material recycling, energy recovery, land reclamation and pollution prevention. This work applied a cost-benefit analysis model for assessing the economic feasibility, which is important for promoting landfill mining. The model includes eight indicators of costs and nine indicators of benefits. Four landfill mining scenarios were designed and analyzed based on field data. The economic feasibility of landfill mining was then evaluated by the indicator of net present value (NPV). According to our case study of a typical old landfill mining project in China (Yingchun landfill), rental of excavation and hauling equipment, waste processing and material transportation were the top three costs of landfill mining, accounting for 88.2% of the total cost, and the average cost per unit of stored waste was 12.7 USD ton{sup −1}. The top three benefits of landfill mining were electricity generation by incineration, land reclamation and recycling soil-like materials. The NPV analysis of the four different scenarios indicated that the Yingchun landfill mining project could obtain a net positive benefit varying from 1.92 million USD to 16.63 million USD. However, the NPV was sensitive to the mode of land reuse, the availability of energy recovery facilities and the possibility of obtaining financial support by avoiding post-closure care.

  11. Poisson cluster analysis of cardiac arrest incidence in Columbus, Ohio.

    Science.gov (United States)

    Warden, Craig; Cudnik, Michael T; Sasson, Comilla; Schwartz, Greg; Semple, Hugh

    2012-01-01

    Scarce resources in disease prevention and emergency medical services (EMS) need to be focused on high-risk areas of out-of-hospital cardiac arrest (OHCA). Cluster analysis using geographic information systems (GISs) was used to find these high-risk areas and test potential predictive variables. This was a retrospective cohort analysis of EMS-treated adults with OHCAs occurring in Columbus, Ohio, from April 1, 2004, through March 31, 2009. The OHCAs were aggregated to census tracts and incidence rates were calculated based on their adult populations. Poisson cluster analysis determined significant clusters of high-risk census tracts. Both census tract-level and case-level characteristics were tested for association with high-risk areas by multivariate logistic regression. A total of 2,037 eligible OHCAs occurred within the city limits during the study period. The mean incidence rate was 0.85 OHCAs/1,000 population/year. There were five significant geographic clusters with 76 high-risk census tracts out of the total of 245 census tracts. In the case-level analysis, being in a high-risk cluster was associated with a slightly younger age (-3 years, adjusted odds ratio [OR] 0.99, 95% confidence interval [CI] 0.99-1.00), not being white, non-Hispanic (OR 0.54, 95% CI 0.45-0.64), cardiac arrest occurring at home (OR 1.53, 95% CI 1.23-1.71), and not receiving bystander cardiopulmonary resuscitation (CPR) (OR 0.77, 95% CI 0.62-0.96), but with higher survival to hospital discharge (OR 1.78, 95% CI 1.30-2.46). In the census tract-level analysis, high-risk census tracts were also associated with a slightly lower average age (-0.1 years, OR 1.14, 95% CI 1.06-1.22) and a lower proportion of white, non-Hispanic patients (-0.298, OR 0.04, 95% CI 0.01-0.19), but also a lower proportion of high-school graduates (-0.184, OR 0.00, 95% CI 0.00-0.00). This analysis identified high-risk census tracts and associated census tract-level and case-level characteristics that can be used to

  12. Performance Based Clustering for Benchmarking of Container Ports: an Application of Dea and Cluster Analysis Technique

    Directory of Open Access Journals (Sweden)

    Jie Wu

    2010-12-01

    Full Text Available The operational performance of container ports has received more and more attentions in both academic and practitioner circles, the performance evaluation and process improvement of container ports have also been the focus of several studies. In this paper, Data Envelopment Analysis (DEA, an effective tool for relative efficiency assessment, is utilized for measuring the performances and benchmarking of the 77 world container ports in 2007. The used approaches in the current study consider four inputs (Capacity of Cargo Handling Machines, Number of Berths, Terminal Area and Storage Capacity and a single output (Container Throughput. The results for the efficiency scores are analyzed, and a unique ordering of the ports based on average cross efficiency is provided, also cluster analysis technique is used to select the more appropriate targets for poorly performing ports to use as benchmarks.

  13. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis

    Science.gov (United States)

    Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

    2015-01-01

    Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383

  14. Critical analysis of the Colombian mining legislation; Analisis critico de la legislacion minera colombiana

    Energy Technology Data Exchange (ETDEWEB)

    Vargas P, Elkin; Gonzalez S, Carmen Lucia

    2003-12-15

    The document analyses the Colombian mining legislation, Act 685 of 2001, based on the reasons expressed by the government and the miners for its conceit and approval. The document tries to determine the developments achieved by this new Mining Code considering international mining competitiveness and its adaptation to the constitutional rules about environment, indigenous communities, decentralization and sustainable development. The analysis formulates general and specific hypothesis about the proposed objectives of the reform, which are confronted with the arguments and critical evaluations of the results. Most hypothesis are not verified, thus demonstrating that the Colombian mining legislation is far from being the necessary instrument to promote mining activities, making it competitive according to international standards and adapted to the principles of sustainable development, healthy environment, community participation, ethnic minorities and regional autonomy.

  15. Opinion Mining in Latvian Text Using Semantic Polarity Analysis and Machine Learning Approach

    Directory of Open Access Journals (Sweden)

    Gatis Špats

    2016-07-01

    Full Text Available In this paper we demonstrate approaches for opinion mining in Latvian text. Authors have applied, combined and extended results of several previous studies and public resources to perform opinion mining in Latvian text using two approaches, namely, semantic polarity analysis and machine learning. One of the most significant constraints that make application of opinion mining for written content classification in Latvian text challenging is the limited publicly available text corpora for classifier training. We have joined several sources and created a publically available extended lexicon. Our results are comparable to or outperform current achievements in opinion mining in Latvian. Experiments show that lexicon-based methods provide more accurate opinion mining than the application of Naive Bayes machine learning classifier on Latvian tweets. Methods used during this study could be further extended using human annotators, unsupervised machine learning and bootstrapping to create larger corpora of classified text.

  16. plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters

    DEFF Research Database (Denmark)

    Kautsar, Satria A.; Suarez Duran, Hernando G.; Blin, Kai

    2017-01-01

    exploration of the nature and dynamics of gene clustering in plant metabolism. Moreover, spurred by the continuing decrease in costs of plant genome sequencing, they will allow genome mining technologies to be applied to plant natural product discovery. The plantiSMASH web server, precalculated results...

  17. Exploratory analysis of textual data from the Mother and Child Handbook using a text mining method (II): Monthly changes in the words recorded by mothers.

    Science.gov (United States)

    Tagawa, Miki; Matsuda, Yoshio; Manaka, Tomoko; Kobayashi, Makiko; Ohwada, Michitaka; Matsubara, Shigeki

    2017-01-01

    The aim of the study was to examine the possibility of converting subjective textual data written in the free column space of the Mother and Child Handbook (MCH) into objective information using text mining and to compare any monthly changes in the words written by the mothers. Pregnant women without complications (n = 60) were divided into two groups according to State-Trait Anxiety Inventory grade: low trait anxiety (group I, n = 39) and high trait anxiety (group II, n = 21). Exploratory analysis of the textual data from the MCH was conducted by text mining using the Word Miner software program. Using 1203 structural elements extracted after processing, a comparison of monthly changes in the words used in the mothers' comments was made between the two groups. The data was mainly analyzed by a correspondence analysis. The structural elements in groups I and II were divided into seven and six clusters, respectively, by cluster analysis. Correspondence analysis revealed clear monthly changes in the words used in the mothers' comments as the pregnancy progressed in group I, whereas the association was not clear in group II. The text mining method was useful for exploratory analysis of the textual data obtained from pregnant women, and the monthly change in the words used in the mothers' comments as pregnancy progressed differed according to their degree of unease. © 2016 Japan Society of Obstetrics and Gynecology.

  18. Diagnostics of subtropical plants functional state by cluster analysis

    Directory of Open Access Journals (Sweden)

    Oksana Belous

    2016-05-01

    Full Text Available The article presents an application example of statistical methods for data analysis on diagnosis of the adaptive capacity of subtropical plants varieties. We depicted selection indicators and basic physiological parameters that were defined as diagnostic. We used evaluation on a set of parameters of water regime, there are: determination of water deficit of the leaves, determining the fractional composition of water and detection parameters of the concentration of cell sap (CCS (for tea culture flushes. These settings are characterized by high liability and high responsiveness to the effects of many abiotic factors that determined the particular care in the selection of plant material for analysis and consideration of the impact on sustainability. On the basis of the experimental data calculated the coefficients of pair correlation between climatic factors and used physiological indicators. The result was a selection of physiological and biochemical indicators proposed to assess the adaptability and included in the basis of methodical recommendations on diagnostics of the functional state of the studied cultures. Analysis of complex studies involving a large number of indicators is quite difficult, especially does not allow to quickly identify the similarity of new varieties for their adaptive responses to adverse factors, and, therefore, to set general requirements to conditions of cultivation. Use of cluster analysis suggests that in the analysis of only quantitative data; define a set of variables used to assess varieties (and the more sampling, the more accurate the clustering will happen, be sure to ascertain the measure of similarity (or difference between objects. It is shown that the identification of diagnostic features, which are subjected to statistical processing, impact the accuracy of the varieties classification. Selection in result of the mono-clusters analysis (variety tea Kolhida; hazelnut Lombardsky red; variety kiwi Monty

  19. Cluster analysis for DNA methylation profiles having a detection threshold

    Directory of Open Access Journals (Sweden)

    Siegmund Kimberly D

    2006-07-01

    Full Text Available Abstract Background DNA methylation, a molecular feature used to investigate tumor heterogeneity, can be measured on many genomic regions using the MethyLight technology. Due to the combination of the underlying biology of DNA methylation and the MethyLight technology, the measurements, while being generated on a continuous scale, have a large number of 0 values. This suggests that conventional clustering methodology may not perform well on this data. Results We compare performance of existing methodology (such as k-means with two novel methods that explicitly allow for the preponderance of values at 0. We also consider how the ability to successfully cluster such data depends upon the number of informative genes for which methylation is measured and the correlation structure of the methylation values for those genes. We show that when data is collected for a sufficient number of genes, our models do improve clustering performance compared to methods, such as k-means, that do not explicitly respect the supposed biological realities of the situation. Conclusion The performance of analysis methods depends upon how well the assumptions of those methods reflect the properties of the data being analyzed. Differing technologies will lead to data with differing properties, and should therefore be analyzed differently. Consequently, it is prudent to give thought to what the properties of the data are likely to be, and which analysis method might therefore be likely to best capture those properties.

  20. Cluster Analysis of the International Stellarator Confinement Database

    International Nuclear Information System (INIS)

    Kus, A.; Dinklage, A.; Preuss, R.; Ascasibar, E.; Harris, J. H.; Okamura, S.; Yamada, H.; Sano, F.; Stroth, U.; Talmadge, J.

    2008-01-01

    Heterogeneous structure of collected data is one of the problems that occur during derivation of scalings for energy confinement time, and whose analysis tourns out to be wide and complicated matter. The International Stellarator Confinement Database [1], shortly ISCDB, comprises in its latest version 21 a total of 3647 observations from 8 experimental devices, 2067 therefrom beeing so far completed for upcoming analyses. For confinement scaling studies 1933 observation were chosen as the standard dataset. Here we describe a statistical method of cluster analysis for identification of possible cohesive substructures in ISDCB and present some preliminary results

  1. Accommodating error analysis in comparison and clustering of molecular fingerprints.

    Science.gov (United States)

    Salamon, H; Segal, M R; Ponce de Leon, A; Small, P M

    1998-01-01

    Molecular epidemiologic studies of infectious diseases rely on pathogen genotype comparisons, which usually yield patterns comprising sets of DNA fragments (DNA fingerprints). We use a highly developed genotyping system, IS6110-based restriction fragment length polymorphism analysis of Mycobacterium tuberculosis, to develop a computational method that automates comparison of large numbers of fingerprints. Because error in fragment length measurements is proportional to fragment length and is positively correlated for fragments within a lane, an align-and-count method that compensates for relative scaling of lanes reliably counts matching fragments between lanes. Results of a two-step method we developed to cluster identical fingerprints agree closely with 5 years of computer-assisted visual matching among 1,335 M. tuberculosis fingerprints. Fully documented and validated methods of automated comparison and clustering will greatly expand the scope of molecular epidemiology.

  2. Accident patterns for construction-related workers: a cluster analysis

    Science.gov (United States)

    Liao, Chia-Wen; Tyan, Yaw-Yauan

    2012-01-01

    The construction industry has been identified as one of the most hazardous industries. The risk of constructionrelated workers is far greater than that in a manufacturing based industry. However, some steps can be taken to reduce worker risk through effective injury prevention strategies. In this article, k-means clustering methodology is employed in specifying the factors related to different worker types and in identifying the patterns of industrial occupational accidents. Accident reports during the period 1998 to 2008 are extracted from case reports of the Northern Region Inspection Office of the Council of Labor Affairs of Taiwan. The results show that the cluster analysis can indicate some patterns of occupational injuries in the construction industry. Inspection plans should be proposed according to the type of construction-related workers. The findings provide a direction for more effective inspection strategies and injury prevention programs.

  3. Cluster analysis in systems of magnetic spheres and cubes

    Energy Technology Data Exchange (ETDEWEB)

    Pyanzina, E.S., E-mail: elena.pyanzina@urfu.ru [Ural Federal University, Lenin Av. 51, Ekaterinburg (Russian Federation); Gudkova, A.V. [Ural Federal University, Lenin Av. 51, Ekaterinburg (Russian Federation); Donaldson, J.G. [University of Vienna, Sensengasse 8, Vienna (Austria); Kantorovich, S.S. [Ural Federal University, Lenin Av. 51, Ekaterinburg (Russian Federation); University of Vienna, Sensengasse 8, Vienna (Austria)

    2017-06-01

    In the present work we use molecular dynamics simulations and graph-theory based cluster analysis to compare self-assembly in systems of magnetic spheres, and cubes where the dipole moment is oriented along the side of the cube in the [001] crystallographic direction. We show that under the same conditions cubes aggregate far less than their spherical counterparts. This difference can be explained in terms of the volume of phase space in which the formation of the bond is thermodynamically advantageous. It follows that this volume is much larger for a dipolar sphere than for a dipolar cube. - Highlights: • A comparison of the degree of self-assembly in systems of magnetic spheres and cubes. • Spheres are more likely to form larger clusters than cubes. • Differences in microstructure will manifest in the magnetic response of each system.

  4. Image Registration Algorithm Based on Parallax Constraint and Clustering Analysis

    Science.gov (United States)

    Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song

    2018-01-01

    To resolve the problem of slow computation speed and low matching accuracy in image registration, a new image registration algorithm based on parallax constraint and clustering analysis is proposed. Firstly, Harris corner detection algorithm is used to extract the feature points of two images. Secondly, use Normalized Cross Correlation (NCC) function to perform the approximate matching of feature points, and the initial feature pair is obtained. Then, according to the parallax constraint condition, the initial feature pair is preprocessed by K-means clustering algorithm, which is used to remove the feature point pairs with obvious errors in the approximate matching process. Finally, adopt Random Sample Consensus (RANSAC) algorithm to optimize the feature points to obtain the final feature point matching result, and the fast and accurate image registration is realized. The experimental results show that the image registration algorithm proposed in this paper can improve the accuracy of the image matching while ensuring the real-time performance of the algorithm.

  5. Network clustering coefficient approach to DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gerhardt, Guenther J.L. [Universidade Federal do Rio Grande do Sul-Hospital de Clinicas de Porto Alegre, Rua Ramiro Barcelos 2350/sala 2040/90035-003 Porto Alegre (Brazil); Departamento de Fisica e Quimica da Universidade de Caxias do Sul, Rua Francisco Getulio Vargas 1130, 95001-970 Caxias do Sul (Brazil); Lemke, Ney [Programa Interdisciplinar em Computacao Aplicada, Unisinos, Av. Unisinos, 950, 93022-000 Sao Leopoldo, RS (Brazil); Corso, Gilberto [Departamento de Biofisica e Farmacologia, Centro de Biociencias, Universidade Federal do Rio Grande do Norte, Campus Universitario, 59072 970 Natal, RN (Brazil)]. E-mail: corso@dfte.ufrn.br

    2006-05-15

    In this work we propose an alternative DNA sequence analysis tool based on graph theoretical concepts. The methodology investigates the path topology of an organism genome through a triplet network. In this network, triplets in DNA sequence are vertices and two vertices are connected if they occur juxtaposed on the genome. We characterize this network topology by measuring the clustering coefficient. We test our methodology against two main bias: the guanine-cytosine (GC) content and 3-bp (base pairs) periodicity of DNA sequence. We perform the test constructing random networks with variable GC content and imposed 3-bp periodicity. A test group of some organisms is constructed and we investigate the methodology in the light of the constructed random networks. We conclude that the clustering coefficient is a valuable tool since it gives information that is not trivially contained in 3-bp periodicity neither in the variable GC content.

  6. An Application of Multiplier Analysis in Analyzing the Role of Mining Sectors on Indonesian National Economy

    Science.gov (United States)

    Subanti, S.; Hakim, A. R.; Hakim, I. M.

    2018-03-01

    This purpose of the current study aims is to analyze the multiplier analysis on mining sector in Indonesia. The mining sectors defined by coal and metal; crude oil, natural gas, and geothermal; and other mining and quarrying. The multiplier analysis based from input output analysis, this divided by income multiplier and output multiplier. This results show that (1) Indonesian mining sectors ranked 6th with contribute amount of 6.81% on national total output; (2) Based on total gross value added, this sector contribute amount of 12.13% or ranked 4th; (3) The value from income multiplier is 0.7062 and the value from output multiplier is 1.2426.

  7. Steady state subchannel analysis of AHWR fuel cluster

    International Nuclear Information System (INIS)

    Dasgupta, A.; Chandraker, D.K.; Vijayan, P.K.; Saha, D.

    2006-09-01

    Subchannel analysis is a technique used to predict the thermal hydraulic behavior of reactor fuel assemblies. The rod cluster is subdivided into a number of parallel interacting flow subchannels. The conservation equations are solved for each of these subchannels, taking into account subchannel interactions. Subchannel analysis of AHWR D-5 fuel cluster has been carried out to determine the variations in thermal hydraulic conditions of coolant and fuel temperatures along the length of the fuel bundle. The hottest regions within the AHWR fuel bundle have been identified. The effect of creep on the fuel performance has also been studied. MCHFR has been calculated using Jansen-Levy correlation. The calculations have been backed by sensitivity analysis for parameters whose values are not known accurately. The sensitivity analysis showed the calculations to have a very low sensitivity to these parameters. Apart from the analysis, the report also includes a brief introduction of a few subchannel codes. A brief description of the equations and solution methodology used in COBRA-IIIC and COBRA-IV-I is also given. (author)

  8. Application of multivariate analysis to investigate the trace element contamination in top soil of coal mining district in Jorong, South Kalimantan, Indonesia

    Science.gov (United States)

    Pujiwati, Arie; Nakamura, K.; Watanabe, N.; Komai, T.

    2018-02-01

    Multivariate analysis is applied to investigate geochemistry of several trace elements in top soils and their relation with the contamination source as the influence of coal mines in Jorong, South Kalimantan. Total concentration of Cd, V, Co, Ni, Cr, Zn, As, Pb, Sb, Cu and Ba was determined in 20 soil samples by the bulk analysis. Pearson correlation is applied to specify the linear correlation among the elements. Principal Component Analysis (PCA) and Cluster Analysis (CA) were applied to observe the classification of trace elements and contamination sources. The results suggest that contamination loading is contributed by Cr, Cu, Ni, Zn, As, and Pb. The elemental loading mostly affects the non-coal mining area, for instances the area near settlement and agricultural land use. Moreover, the contamination source is classified into the areas that are influenced by the coal mining activity, the agricultural types, and the river mixing zone. Multivariate analysis could elucidate the elemental loading and the contamination sources of trace elements in the vicinity of coal mine area.

  9. Analysis of the Potential for Use of Floating Photovoltaic Systems on Mine Pit Lakes: Case Study at the Ssangyong Open-Pit Limestone Mine in Korea

    Directory of Open Access Journals (Sweden)

    Jinyoung Song

    2016-02-01

    Full Text Available Recently, the mining industry has introduced renewable energy technologies to resolve power supply problems at mines operating in polar regions or other remote areas, and to foster substitute industries, able to benefit from abandoned sites of exhausted mines. However, little attention has been paid to the potential placement of floating photovoltaic (PV systems operated on mine pit lakes because it was assumed that the topographic characteristics of open-pit mines are unsuitable for installing any type of PV systems. This study analyzed the potential of floating PV systems on a mine pit lake in Korea to break this misconception. Using a fish-eye lens camera and digital elevation models, a shading analysis was performed to identify the area suitable for installing a floating PV system. The layout of the floating PV system was designed in consideration of the optimal tilt angle and array spacing of the PV panels. The System Advisor Model (SAM by National Renewable Energy Laboratory, USA, was used to conduct energy simulations based on weather data and the system design. The results indicated that the proposed PV system could generate 971.57 MWh/year. The economic analysis (accounting for discount rate and a 20-year operational lifetime showed that the net present value would be $897,000 USD, and a payback period of about 12.3 years. Therefore, we could know that the economic effect of the floating PV system on the mine pit lake is relatively higher than that of PV systems in the other abandoned mines in Korea. The annual reduction of greenhouse gas emissions was analyzed and found to be 471.21 tCO2/year, which is twice the reduction effect achieved by forest restoration of an abandoned mine site. The economic feasibility of a floating PV system on a pit lake of an abandoned mine was thus established, and may be considered an efficient reuse option for abandoned mines.

  10. Comparative analysis of data mining techniques for business data

    Science.gov (United States)

    Jamil, Jastini Mohd; Shaharanee, Izwan Nizal Mohd

    2014-12-01

    Data mining is the process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data contained within a database. Companies are using this tool to further understand their customers, to design targeted sales and marketing campaigns, to predict what product customers will buy and the frequency of purchase, and to spot trends in customer preferences that can lead to new product development. In this paper, we conduct a systematic approach to explore several of data mining techniques in business application. The experimental result reveals that all data mining techniques accomplish their goals perfectly, but each of the technique has its own characteristics and specification that demonstrate their accuracy, proficiency and preference.

  11. SegMine workflows for semantic microarray data analysis in Orange4WS

    Directory of Open Access Journals (Sweden)

    Kulovesi Kimmo

    2011-10-01

    Full Text Available Abstract Background In experimental data analysis, bioinformatics researchers increasingly rely on tools that enable the composition and reuse of scientific workflows. The utility of current bioinformatics workflow environments can be significantly increased by offering advanced data mining services as workflow components. Such services can support, for instance, knowledge discovery from diverse distributed data and knowledge sources (such as GO, KEGG, PubMed, and experimental databases. Specifically, cutting-edge data analysis approaches, such as semantic data mining, link discovery, and visualization, have not yet been made available to researchers investigating complex biological datasets. Results We present a new methodology, SegMine, for semantic analysis of microarray data by exploiting general biological knowledge, and a new workflow environment, Orange4WS, with integrated support for web services in which the SegMine methodology is implemented. The SegMine methodology consists of two main steps. First, the semantic subgroup discovery algorithm is used to construct elaborate rules that identify enriched gene sets. Then, a link discovery service is used for the creation and visualization of new biological hypotheses. The utility of SegMine, implemented as a set of workflows in Orange4WS, is demonstrated in two microarray data analysis applications. In the analysis of senescence in human stem cells, the use of SegMine resulted in three novel research hypotheses that could improve understanding of the underlying mechanisms of senescence and identification of candidate marker genes. Conclusions Compared to the available data analysis systems, SegMine offers improved hypothesis generation and data interpretation for bioinformatics in an easy-to-use integrated workflow environment.

  12. Statistic analysis of grouping in evaluation of the behavior of stable chemical elements and physical-chemical parameters in effluent from uranium mining

    International Nuclear Information System (INIS)

    Pereira, Wagner de S.

    2013-01-01

    The Ore Treatment Unit (UTM) is a uranium mine off. The statistical analysis of clustering was used to evaluate the behavior of stable chemical elements and physico-chemical variables in their effluents. The use of cluster analysis proved effective in the evaluation, allowing to identify groups of chemical elements in physico-chemical variables and group analyzes (element and variables ). As a result, we can say, based on the analysis of the data, a strong link between Ca and Mg and between Al and TR 2 O 3 (rare earth oxides) in the UTM effluents. The SO 4 was also identified as strongly linked to total solids and dissolved and these linked to electrical conductivity. Other associations existed, but were not as strongly linked. Additional collections for seasonal evaluation are required so that assessments can be confirmed. Additional statistics analysis (ordination techniques) should be used to help identify the origins of the groups identified in this analysis. (author)

  13. Operational analysis of the tailings bund wall drainage system at mirny ore mining and processing enterprise

    Directory of Open Access Journals (Sweden)

    Aniskin Nikolay Alekseevich

    2016-12-01

    Full Text Available Issues of environmental safety of tailings of ore mining and processing enterprises are considered; parameters of drainage of bund walls are of great significance for the environmental safety. Description of the bund wall of Mirny ore mining and processing enterprise and the tailings filling layouts are given. Results of field observation and model study of the tailings bund wall drainage system at Mirny ore mining and processing enterprise are presented. The drainage system rebuilding project analysis was performed. Proposals for its improvement were set forward.

  14. Analysis of Learning Development With Sugeno Fuzzy Logic And Clustering

    Directory of Open Access Journals (Sweden)

    Maulana Erwin Saputra

    2017-06-01

    Full Text Available In the first journal, I made this attempt to analyze things that affect the achievement of students in each school of course vary. Because students are one of the goals of achieving the goals of successful educational organizations. The mental influence of students’ emotions and behaviors themselves in relation to learning performance. Fuzzy logic can be used in various fields as well as Clustering for grouping, as in Learning Development analyzes. The process will be performed on students based on the symptoms that exist. In this research will use fuzzy logic and clustering. Fuzzy is an uncertain logic but its excess is capable in the process of language reasoning so that in its design is not required complicated mathematical equations. However Clustering method is K-Means method is method where data analysis is broken down by group k (k = 1,2,3, .. k. To know the optimal number of Performance group. The results of the research is with a questionnaire entered into matlab will produce a value that means in generating the graph. And simplify the school in seeing Student performance in the learning process by using certain criteria. So from the system that obtained the results for a decision-making required by the school.

  15. IGSA: Individual Gene Sets Analysis, including Enrichment and Clustering.

    Science.gov (United States)

    Wu, Lingxiang; Chen, Xiujie; Zhang, Denan; Zhang, Wubing; Liu, Lei; Ma, Hongzhe; Yang, Jingbo; Xie, Hongbo; Liu, Bo; Jin, Qing

    2016-01-01

    Analysis of gene sets has been widely applied in various high-throughput biological studies. One weakness in the traditional methods is that they neglect the heterogeneity of genes expressions in samples which may lead to the omission of some specific and important gene sets. It is also difficult for them to reflect the severities of disease and provide expression profiles of gene sets for individuals. We developed an application software called IGSA that leverages a powerful analytical capacity in gene sets enrichment and samples clustering. IGSA calculates gene sets expression scores for each sample and takes an accumulating clustering strategy to let the samples gather into the set according to the progress of disease from mild to severe. We focus on gastric, pancreatic and ovarian cancer data sets for the performance of IGSA. We also compared the results of IGSA in KEGG pathways enrichment with David, GSEA, SPIA, ssGSEA and analyzed the results of IGSA clustering and different similarity measurement methods. Notably, IGSA is proved to be more sensitive and specific in finding significant pathways, and can indicate related changes in pathways with the severity of disease. In addition, IGSA provides with significant gene sets profile for each sample.

  16. Segmentation of Residential Gas Consumers Using Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Marta P. Fernandes

    2017-12-01

    Full Text Available The growing environmental concerns and liberalization of energy markets have resulted in an increased competition between utilities and a strong focus on efficiency. To develop new energy efficiency measures and optimize operations, utilities seek new market-related insights and customer engagement strategies. This paper proposes a clustering-based methodology to define the segmentation of residential gas consumers. The segments of gas consumers are obtained through a detailed clustering analysis using smart metering data. Insights are derived from the segmentation, where the segments result from the clustering process and are characterized based on the consumption profiles, as well as according to information regarding consumers’ socio-economic and household key features. The study is based on a sample of approximately one thousand households over one year. The representative load profiles of consumers are essentially characterized by two evident consumption peaks, one in the morning and the other in the evening, and an off-peak consumption. Significant insights can be derived from this methodology regarding typical consumption curves of the different segments of consumers in the population. This knowledge can assist energy utilities and policy makers in the development of consumer engagement strategies, demand forecasting tools and in the design of more sophisticated tariff systems.

  17. Prediction accident triangle in maintenance of underground mine facilities using Poisson distribution analysis

    Science.gov (United States)

    Khuluqi, M. H.; Prapdito, R. R.; Sambodo, F. P.

    2018-04-01

    In Indonesia, mining is categorized as a hazardous industry. In recent years, a dramatic increase of mining equipment and technological complexities had resulted in higher maintenance expectations that accompanied by the changes in the working conditions, especially on safety. Ensuring safety during the process of conducting maintenance works in underground mine is important as an integral part of accident prevention programs. Accident triangle has provided a support to safety practitioner to draw a road map in preventing accidents. Poisson distribution is appropriate for the analysis of accidents at a specific site in a given time period. Based on the analysis of accident statistics in the underground mine maintenance of PT. Freeport Indonesia from 2011 through 2016, it is found that 12 minor accidents for 1 major accident and 66 equipment damages for 1 major accident as a new value of accident triangle. The result can be used for the future need for improving the accident prevention programs.

  18. Feasibility Study of Parallel Finite Element Analysis on Cluster-of-Clusters

    Science.gov (United States)

    Muraoka, Masae; Okuda, Hiroshi

    With the rapid growth of WAN infrastructure and development of Grid middleware, it's become a realistic and attractive methodology to connect cluster machines on wide-area network for the execution of computation-demanding applications. Many existing parallel finite element (FE) applications have been, however, designed and developed with a single computing resource in mind, since such applications require frequent synchronization and communication among processes. There have been few FE applications that can exploit the distributed environment so far. In this study, we explore the feasibility of FE applications on the cluster-of-clusters. First, we classify FE applications into two types, tightly coupled applications (TCA) and loosely coupled applications (LCA) based on their communication pattern. A prototype of each application is implemented on the cluster-of-clusters. We perform numerical experiments executing TCA and LCA on both the cluster-of-clusters and a single cluster. Thorough these experiments, by comparing the performances and communication cost in each case, we evaluate the feasibility of FEA on the cluster-of-clusters.

  19. Archetypal analysis for machine learning and data mining

    DEFF Research Database (Denmark)

    Mørup, Morten; Hansen, Lars Kai

    2012-01-01

    of the observed data. We further demonstrate that the aa model is relevant for feature extraction and dimensionality reduction for a large variety of machine learning problems taken from computer vision, neuroimaging, chemistry, text mining and collaborative filtering leading to highly interpretable...

  20. Mining the archives: a cross-platform analysis of gene ...

    Science.gov (United States)

    Formalin-fixed paraffin-embedded (FFPE) tissue samples represent a potentially invaluable resource for genomic research into the molecular basis of disease. However, use of FFPE samples in gene expression studies has been limited by technical challenges resulting from degradation of nucleic acids. Here we evaluated gene expression profiles derived from fresh-frozen (FRO) and FFPE mouse liver tissues using two DNA microarray protocols and two whole transcriptome sequencing (RNA-seq) library preparation methodologies. The ribo-depletion protocol outperformed the other three methods by having the highest correlations of differentially expressed genes (DEGs) and best overlap of pathways between FRO and FFPE groups. We next tested the effect of sample time in formalin (18 hours or 3 weeks) on gene expression profiles. Hierarchical clustering of the datasets indicated that test article treatment, and not preservation method, was the main driver of gene expression profiles. Meta- and pathway analyses indicated that biological responses were generally consistent for 18-hour and 3-week FFPE samples compared to FRO samples. However, clear erosion of signal intensity with time in formalin was evident, and DEG numbers differed by platform and preservation method. Lastly, we investigated the effect of age in FFPE block on genomic profiles. RNA-seq analysis of 8-, 19-, and 26-year-old control blocks using the ribo-depletion protocol resulted in comparable quality metrics, inc

  1. RHSEG and Subdue: Background and Preliminary Approach for Combining these Technologies for Enhanced Image Data Analysis, Mining and Knowledge Discovery

    Science.gov (United States)

    Tilton, James C.; Cook, Diane J.

    2008-01-01

    Under a project recently selected for funding by NASA's Science Mission Directorate under the Applied Information Systems Research (AISR) program, Tilton and Cook will design and implement the integration of the Subdue graph based knowledge discovery system, developed at the University of Texas Arlington and Washington State University, with image segmentation hierarchies produced by the RHSEG software, developed at NASA GSFC, and perform pilot demonstration studies of data analysis, mining and knowledge discovery on NASA data. Subdue represents a method for discovering substructures in structural databases. Subdue is devised for general-purpose automated discovery, concept learning, and hierarchical clustering, with or without domain knowledge. Subdue was developed by Cook and her colleague, Lawrence B. Holder. For Subdue to be effective in finding patterns in imagery data, the data must be abstracted up from the pixel domain. An appropriate abstraction of imagery data is a segmentation hierarchy: a set of several segmentations of the same image at different levels of detail in which the segmentations at coarser levels of detail can be produced from simple merges of regions at finer levels of detail. The RHSEG program, a recursive approximation to a Hierarchical Segmentation approach (HSEG), can produce segmentation hierarchies quickly and effectively for a wide variety of images. RHSEG and HSEG were developed at NASA GSFC by Tilton. In this presentation we provide background on the RHSEG and Subdue technologies and present a preliminary analysis on how RHSEG and Subdue may be combined to enhance image data analysis, mining and knowledge discovery.

  2. Cluster analysis in systems of magnetic spheres and cubes

    Science.gov (United States)

    Pyanzina, E. S.; Gudkova, A. V.; Donaldson, J. G.; Kantorovich, S. S.

    2017-06-01

    In the present work we use molecular dynamics simulations and graph-theory based cluster analysis to compare self-assembly in systems of magnetic spheres, and cubes where the dipole moment is oriented along the side of the cube in the [001] crystallographic direction. We show that under the same conditions cubes aggregate far less than their spherical counterparts. This difference can be explained in terms of the volume of phase space in which the formation of the bond is thermodynamically advantageous. It follows that this volume is much larger for a dipolar sphere than for a dipolar cube.

  3. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Årup; Frutiger, Sally A.

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing. Hum. Brain Mapping 15...

  4. A cluster analysis on road traffic accidents using genetic algorithms

    Science.gov (United States)

    Saharan, Sabariah; Baragona, Roberto

    2017-04-01

    The analysis of traffic road accidents is increasingly important because of the accidents cost and public road safety. The availability or large data sets makes the study of factors that affect the frequency and severity accidents are viable. However, the data are often highly unbalanced and overlapped. We deal with the data set of the road traffic accidents recorded in Christchurch, New Zealand, from 2000-2009 with a total of 26440 accidents. The data is in a binary set and there are 50 factors road traffic accidents with four level of severity. We used genetic algorithm for the analysis because we are in the presence of a large unbalanced data set and standard clustering like k-means algorithm may not be suitable for the task. The genetic algorithm based on clustering for unknown K, (GCUK) has been used to identify the factors associated with accidents of different levels of severity. The results provided us with an interesting insight into the relationship between factors and accidents severity level and suggest that the two main factors that contributes to fatal accidents are "Speed greater than 60 km h" and "Did not see other people until it was too late". A comparison with the k-means algorithm and the independent component analysis is performed to validate the results.

  5. Adaptation of chemical methods of analysis to the matrix of pyrite-acidified mining lakes

    International Nuclear Information System (INIS)

    Herzsprung, P.; Friese, K.

    2000-01-01

    Owing to the unusual matrix of pyrite-acidified mining lakes, the analysis of chemical parameters may be difficult. A number of methodological improvements have been developed so far, and a comprehensive validation of methods is envisaged. The adaptation of the available methods to small-volume samples of sediment pore waters and the adaptation of sensitivity to the expected concentration ranges is an important element of the methods applied in analyses of biogeochemical processes in mining lakes [de

  6. Physicochemical properties of different corn varieties by principal components analysis and cluster analysis

    International Nuclear Information System (INIS)

    Zeng, J.; Li, G.; Sun, J.

    2013-01-01

    Principal components analysis and cluster analysis were used to investigate the properties of different corn varieties. The chemical compositions and some properties of corn flour which processed by drying milling were determined. The results showed that the chemical compositions and physicochemical properties were significantly different among twenty six corn varieties. The quality of corn flour was concerned with five principal components from principal component analysis and the contribution rate of starch pasting properties was important, which could account for 48.90%. Twenty six corn varieties could be classified into four groups by cluster analysis. The consistency between principal components analysis and cluster analysis indicated that multivariate analyses were feasible in the study of corn variety properties. (author)

  7. Cluster analysis of autoantibodies in 852 patients with systemic lupus erythematosus from a single center.

    Science.gov (United States)

    Artim-Esen, Bahar; Çene, Erhan; Şahinkaya, Yasemin; Ertan, Semra; Pehlivan, Özlem; Kamali, Sevil; Gül, Ahmet; Öcal, Lale; Aral, Orhan; Inanç, Murat

    2014-07-01

    Associations between autoantibodies and clinical features have been described in systemic lupus erythematosus (SLE). Herein, we aimed to define autoantibody clusters and their clinical correlations in a large cohort of patients with SLE. We analyzed 852 patients with SLE who attended our clinic. Seven autoantibodies were selected for cluster analysis: anti-DNA, anti-Sm, anti-RNP, anticardiolipin (aCL) immunoglobulin (Ig)G or IgM, lupus anticoagulant (LAC), anti-Ro, and anti-La. Two-step clustering and Kaplan-Meier survival analyses were used. Five clusters were identified. A cluster consisted of patients with only anti-dsDNA antibodies, a cluster of anti-Sm and anti-RNP, a cluster of aCL IgG/M and LAC, and a cluster of anti-Ro and anti-La antibodies. Analysis revealed 1 more cluster that consisted of patients who did not belong to any of the clusters formed by antibodies chosen for cluster analysis. Sm/RNP cluster had significantly higher incidence of pulmonary hypertension and Raynaud phenomenon. DsDNA cluster had the highest incidence of renal involvement. In the aCL/LAC cluster, there were significantly more patients with neuropsychiatric involvement, antiphospholipid syndrome, autoimmune hemolytic anemia, and thrombocytopenia. According to the Systemic Lupus International Collaborating Clinics damage index, the highest frequency of damage was in the aCL/LAC cluster. Comparison of 10 and 20 years survival showed reduced survival in the aCL/LAC cluster. This study supports the existence of autoantibody clusters with distinct clinical features in SLE and shows that forming clinical subsets according to autoantibody clusters may be useful in predicting the outcome of the disease. Autoantibody clusters in SLE may exhibit differences according to the clinical setting or population.

  8. [Typologies of Madrid's citizens (Spain) at the end-of-life: cluster analysis].

    Science.gov (United States)

    Ortiz-Gonçalves, Belén; Perea-Pérez, Bernardo; Labajo González, Elena; Albarrán Juan, Elena; Santiago-Sáez, Andrés

    2018-03-06

    To establish typologies within Madrid's citizens (Spain) with regard to end-of-life by cluster analysis. The SPAD 8 programme was implemented in a sample from a health care centre in the autonomous region of Madrid (Spain). A multiple correspondence analysis technique was used, followed by a cluster analysis to create a dendrogram. A cross-sectional study was made beforehand with the results of the questionnaire. Five clusters stand out. Cluster 1: a group who preferred not to answer numerous questions (5%). Cluster 2: in favour of receiving palliative care and euthanasia (40%). Cluster 3: would oppose assisted suicide and would not ask for spiritual assistance (15%). Cluster 4: would like to receive palliative care and assisted suicide (16%). Cluster 5: would oppose assisted suicide and would ask for spiritual assistance (24%). The following four clusters stood out. Clusters 2 and 4 would like to receive palliative care, euthanasia (2) and assisted suicide (4). Clusters 4 and 5 regularly practiced their faith and their family members did not receive palliative care. Clusters 3 and 5 would be opposed to euthanasia and assisted suicide in particular. Clusters 2, 4 and 5 had not completed an advance directive document (2, 4 and 5). Clusters 2 and 3 seldom practiced their faith. This study could be taken into consideration to improve the quality of end-of-life care choices. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.

  9. Coal Mine Permit Boundaries

    Data.gov (United States)

    Earth Data Analysis Center, University of New Mexico — ESRI ArcView shapefile depicting New Mexico coal mines permitted under the Surface Mining Control and Reclamation Act of 1977 (SMCRA), by either the NM Mining these...

  10. Reliability analysis of cluster-based ad-hoc networks

    International Nuclear Information System (INIS)

    Cook, Jason L.; Ramirez-Marquez, Jose Emmanuel

    2008-01-01

    The mobile ad-hoc wireless network (MAWN) is a new and emerging network scheme that is being employed in a variety of applications. The MAWN varies from traditional networks because it is a self-forming and dynamic network. The MAWN is free of infrastructure and, as such, only the mobile nodes comprise the network. Pairs of nodes communicate either directly or through other nodes. To do so, each node acts, in turn, as a source, destination, and relay of messages. The virtue of a MAWN is the flexibility this provides; however, the challenge for reliability analyses is also brought about by this unique feature. The variability and volatility of the MAWN configuration makes typical reliability methods (e.g. reliability block diagram) inappropriate because no single structure or configuration represents all manifestations of a MAWN. For this reason, new methods are being developed to analyze the reliability of this new networking technology. New published methods adapt to this feature by treating the configuration probabilistically or by inclusion of embedded mobility models. This paper joins both methods together and expands upon these works by modifying the problem formulation to address the reliability analysis of a cluster-based MAWN. The cluster-based MAWN is deployed in applications with constraints on networking resources such as bandwidth and energy. This paper presents the problem's formulation, a discussion of applicable reliability metrics for the MAWN, and illustration of a Monte Carlo simulation method through the analysis of several example networks

  11. Shape Analysis of HII Regions - I. Statistical Clustering

    Science.gov (United States)

    Campbell-White, Justyn; Froebrich, Dirk; Kume, Alfred

    2018-04-01

    We present here our shape analysis method for a sample of 76 Galactic HII regions from MAGPIS 1.4 GHz data. The main goal is to determine whether physical properties and initial conditions of massive star cluster formation is linked to the shape of the regions. We outline a systematic procedure for extracting region shapes and perform hierarchical clustering on the shape data. We identified six groups that categorise HII regions by common morphologies. We confirmed the validity of these groupings by bootstrap re-sampling and the ordinance technique multidimensional scaling. We then investigated associations between physical parameters and the assigned groups. Location is mostly independent of group, with a small preference for regions of similar longitudes to share common morphologies. The shapes are homogeneously distributed across Galactocentric distance and latitude. One group contains regions that are all younger than 0.5 Myr and ionised by low- to intermediate-mass sources. Those in another group are all driven by intermediate- to high-mass sources. One group was distinctly separated from the other five and contained regions at the surface brightness detection limit for the survey. We find that our hierarchical procedure is most sensitive to the spatial sampling resolution used, which is determined for each region from its distance. We discuss how these errors can be further quantified and reduced in future work by utilising synthetic observations from numerical simulations of HII regions. We also outline how this shape analysis has further applications to other diffuse astronomical objects.

  12. Time series clustering analysis of health-promoting behavior

    Science.gov (United States)

    Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng

    2013-10-01

    Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.

  13. Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies.

    Science.gov (United States)

    Cohen, Raphael; Elhadad, Michael; Elhadad, Noémie

    2013-01-16

    The increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related terms. EHR corpora, however, exhibit specific statistical and linguistic characteristics when compared with corpora in the biomedical literature domain. We focus on copy-and-paste redundancy: clinicians typically copy and paste information from previous notes when documenting a current patient encounter. Thus, within a longitudinal patient record, one expects to observe heavy redundancy. In this paper, we ask three research questions: (i) How can redundancy be quantified in large-scale text corpora? (ii) Conventional wisdom is that larger corpora yield better results in text mining. But how does the observed EHR redundancy affect text mining? Does such redundancy introduce a bias that distorts learned models? Or does the redundancy introduce benefits by highlighting stable and important subsets of the corpus? (iii) How can one mitigate the impact of redundancy on text mining? We analyze a large-scale EHR corpus and quantify redundancy both in terms of word and semantic concept repetition. We observe redundancy levels of about 30% and non-standard distribution of both words and concepts. We measure the impact of redundancy on two standard text-mining applications: collocation identification and topic modeling. We compare the results of these methods on synthetic data with controlled levels of redundancy and observe significant performance variation. Finally, we compare two mitigation strategies to avoid redundancy-induced bias: (i) a baseline strategy, keeping only the last note for each patient in the corpus; (ii) removing redundant notes with an efficient fingerprinting-based algorithm. (a)For text mining, preprocessing the EHR corpus with fingerprinting yields

  14. Quantitative analysis of the taxation of uranium mines in Australia and Canada

    International Nuclear Information System (INIS)

    Barnett, D.W.; Anderson, D.L.

    1984-01-01

    The degree of neutrality of a tax policy is a gauge of how willing a government is to share in the risk of mineral development. This paper analyzes the practical characteristics of the uranium taxation policies of the Northern Territory in Australia and Saskatchewan in Canada. It superimposes these two policies on a large Australian uranium mine, based on the Ranger mine, and on a slightly larger Canadian mine, based on the Key Lake mine. The analysis focuses on the impact on the net-present-value of the producers' returns, the sharing of economic rent between the arms of government and the producer, and on the apparent neutrality of the tax policies. 24 references, 6 figures

  15. Analysis of radon reduction by ventilation in uranium mines in China

    International Nuclear Information System (INIS)

    Hu Penghua; Li Xianjie

    2011-01-01

    Mine ventilation is the most important way to reduce radon in uranium mines. At present, the concentrations of radon and its daughters in underground air is 3-5 times higher than those in other countries, at the same protection conditions. In this paper, through the analysis of radon reduction status in Chinese uranium mines and the comparison of advantages and shortcomings between variety of ventilation and radon reduction measures, the reasons for higher radon and radon daughter concentration in Chinese uranium mines are discussed and some problems are put forward in three aspects: radon reduction ventilation theory, measures and management. Based on above problems, this paper puts forward some proposals and measures, such as strengthening examination and verification and monitoring practical situation, making clear ventilation plan, training ventilation technician, enhancing ventilation system management, developing radon reduction ventilation research and putting ventilation equipment in place as soon as possible in future. (authors)

  16. Numerical Analysis on Failure Modes and Mechanisms of Mine Pillars under Shear Loading

    Directory of Open Access Journals (Sweden)

    Tianhui Ma

    2016-01-01

    Full Text Available Severe damage occurs frequently in mine pillars subjected to shear stresses. The empirical design charts or formulas for mine pillars are not applicable to orebodies under shear. In this paper, the failure process of pillars under shear stresses was investigated by numerical simulations using the rock failure process analysis (RFPA 2D software. The numerical simulation results indicate that the strength of mine pillars and the corresponding failure mode vary with different width-to-height ratios and dip angles. With increasing dip angle, stress concentration first occurs at the intersection between the pillar and the roof, leading to formation of microcracks. Damage gradually develops from the surface to the core of the pillar. The damage process is tracked with acoustic emission monitoring. The study in this paper can provide an effective means for understanding the failure mechanism, planning, and design of mine pillars.

  17. Cluster, adaptation and extroversion : a cognitive and entrepreneurial analysis of the Marche music cluster

    NARCIS (Netherlands)

    Tappi, D.

    2005-01-01

    Over recent decades, clusters like industrial districts have increasingly attracted attention in economic debate. The study of clusters, particularly in the Italian literature, highlights the inadequacy of the mainstream body of explanation to provide a theory of the emergence and transformation

  18. Mine Clearance Industry: Background, Geography, Funding, Analysis and Future Projections

    Science.gov (United States)

    2007-12-01

    and against the Zulu .39 In Sudan, during the defense of Khartoum, British officers believed that landmines were an effective form of defense.40...parties from attack during the Zulu Wars (1879).42 During the Boer War (1899-1902) the British used mines to protect a railway; a Royal Engineer...to fight the landmine problem was the Hazardous Area Life-Support Organization (HALO Trust) in 1988. The founder of HALO Trust was former British

  19. Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.

    Science.gov (United States)

    Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D

    2017-06-01

    Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.

  20. Assessment of genetic divergence in tomato through agglomerative hierarchical clustering and principal component analysis

    International Nuclear Information System (INIS)

    Iqbal, Q.; Saleem, M.Y.; Hameed, A.; Asghar, M.

    2014-01-01

    For the improvement of qualitative and quantitative traits, existence of variability has prime importance in plant breeding. Data on different morphological and reproductive traits of 47 tomato genotypes were analyzed for correlation,agglomerative hierarchical clustering and principal component analysis (PCA) to select genotypes and traits for future breeding program. Correlation analysis revealed significant positive association between yield and yield components like fruit diameter, single fruit weight and number of fruits plant-1. Principal component (PC) analysis depicted first three PCs with Eigen-value higher than 1 contributing 81.72% of total variability for different traits. The PC-I showed positive factor loadings for all the traits except number of fruits plant-1. The contribution of single fruit weight and fruit diameter was highest in PC-1. Cluster analysis grouped all genotypes into five divergent clusters. The genotypes in cluster-II and cluster-V exhibited uniform maturity and higher yield. The D2 statistics confirmed highest distance between cluster- III and cluster-V while maximum similarity was observed in cluster-II and cluster-III. It is therefore suggested that crosses between genotypes of cluster-II and cluster-V with those of cluster-I and cluster-III may exhibit heterosis in F1 for hybrid breeding and for selection of superior genotypes in succeeding generations for cross breeding programme. (author)

  1. Numerical analysis of the resonance mechanism of the lumped parameter system model for acoustic mine detection

    International Nuclear Information System (INIS)

    Wang Chi; Zhou Yu-Qiu; Shen Gao-Wei; Wu Wen-Wen; Ding Wei

    2013-01-01

    The method of numerical analysis is employed to study the resonance mechanism of the lumped parameter system model for acoustic mine detection. Based on the basic principle of the acoustic resonance technique for mine detection and the characteristics of low-frequency acoustics, the ''soil-mine'' system could be equivalent to a damping ''mass-spring'' resonance model with a lumped parameter analysis method. The dynamic simulation software, Adams, is adopted to analyze the lumped parameter system model numerically. The simulated resonance frequency and anti-resonance frequency are 151 Hz and 512 Hz respectively, basically in agreement with the published resonance frequency of 155 Hz and anti-resonance frequency of 513 Hz, which were measured in the experiment. Therefore, the technique of numerical simulation is validated to have the potential for analyzing the acoustic mine detection model quantitatively. The influences of the soil and mine parameters on the resonance characteristics of the soil—mine system could be investigated by changing the parameter setup in a flexible manner. (electromagnetism, optics, acoustics, heat transfer, classical mechanics, and fluid dynamics)

  2. Sensitization trajectories in childhood revealed by using a cluster analysis

    DEFF Research Database (Denmark)

    Schoos, Ann-Marie M.; Chawes, Bo L.; Melen, Erik

    2017-01-01

    Prospective Studies on Asthma in Childhood 2000 (COPSAC2000) birth cohort with specific IgE against 13 common food and inhalant allergens at the ages of ½, 1½, 4, and 6 years. An unsupervised cluster analysis for 3-dimensional data (nonnegative sparse parallel factor analysis) was used to extract latent......BACKGROUND: Assessment of sensitization at a single time point during childhood provides limited clinical information. We hypothesized that sensitization develops as specific patterns with respect to age at debut, development over time, and involved allergens and that such patterns might be more...... biologically and clinically relevant. OBJECTIVE: We sought to explore latent patterns of sensitization during the first 6 years of life and investigate whether such patterns associate with the development of asthma, rhinitis, and eczema. METHODS: We investigated 398 children from the at-risk Copenhagen...

  3. A novel model for Time-Series Data Clustering Based on piecewise SVD and BIRCH for Stock Data Analysis on Hadoop Platform

    Directory of Open Access Journals (Sweden)

    Ibgtc Bowala

    2017-06-01

    Full Text Available With the rapid growth of financial markets, analyzers are paying more attention on predictions. Stock data are time series data, with huge amounts. Feasible solution for handling the increasing amount of data is to use a cluster for parallel processing, and Hadoop parallel computing platform is a typical representative. There are various statistical models for forecasting time series data, but accurate clusters are a pre-requirement. Clustering analysis for time series data is one of the main methods for mining time series data for many other analysis processes. However, general clustering algorithms cannot perform clustering for time series data because series data has a special structure and a high dimensionality has highly co-related values due to high noise level. A novel model for time series clustering is presented using BIRCH, based on piecewise SVD, leading to a novel dimension reduction approach. Highly co-related features are handled using SVD with a novel approach for dimensionality reduction in order to keep co-related behavior optimal and then use BIRCH for clustering. The algorithm is a novel model that can handle massive time series data. Finally, this new model is successfully applied to real stock time series data of Yahoo finance with satisfactory results.

  4. MINE-NEC - A Game for the Analysis of Regional Water Policies in Open-Pit Lignite Mining Areas: An Improved Implementation for the NEC PC-8201A

    OpenAIRE

    Kaden, S.; Varis, O.

    1986-01-01

    The game MINE was developed for the analysis of regional water policies in open-pit lignite mining areas. It is implemented for a GDR test area. The purpose of the game is above all to teach decision makers and their staff in mining regions in order to get a better understanding of the complex interrelated socio-economic processes with respect t o water management in such regions. The game is designed to be played by five groups of players representing municipal and industrial water supply, a...

  5. Integrating PROOF Analysis in Cloud and Batch Clusters

    International Nuclear Information System (INIS)

    Rodríguez-Marrero, Ana Y; Fernández-del-Castillo, Enol; López García, Álvaro; Marco de Lucas, Jesús; Matorras Weinig, Francisco; González Caballero, Isidro; Cuesta Noriega, Alberto

    2012-01-01

    High Energy Physics (HEP) analysis are becoming more complex and demanding due to the large amount of data collected by the current experiments. The Parallel ROOT Facility (PROOF) provides researchers with an interactive tool to speed up the analysis of huge volumes of data by exploiting parallel processing on both multicore machines and computing clusters. The typical PROOF deployment scenario is a permanent set of cores configured to run the PROOF daemons. However, this approach is incapable of adapting to the dynamic nature of interactive usage. Several initiatives seek to improve the use of computing resources by integrating PROOF with a batch system, such as Proof on Demand (PoD) or PROOF Cluster. These solutions are currently in production at Universidad de Oviedo and IFCA and are positively evaluated by users. Although they are able to adapt to the computing needs of users, they must comply with the specific configuration, OS and software installed at the batch nodes. Furthermore, they share the machines with other workloads, which may cause disruptions in the interactive service for users. These limitations make PROOF a typical use-case for cloud computing. In this work we take profit from Cloud Infrastructure at IFCA in order to provide a dynamic PROOF environment where users can control the software configuration of the machines. The Proof Analysis Framework (PAF) facilitates the development of new analysis and offers a transparent access to PROOF resources. Several performance measurements are presented for the different scenarios (PoD, SGE and Cloud), showing a speed improvement closely correlated with the number of cores used.

  6. Determining wood chip size: image analysis and clustering methods

    Directory of Open Access Journals (Sweden)

    Paolo Febbi

    2013-09-01

    Full Text Available One of the standard methods for the determination of the size distribution of wood chips is the oscillating screen method (EN 15149- 1:2010. Recent literature demonstrated how image analysis could return highly accurate measure of the dimensions defined for each individual particle, and could promote a new method depending on the geometrical shape to determine the chip size in a more accurate way. A sample of wood chips (8 litres was sieved through horizontally oscillating sieves, using five different screen hole diameters (3.15, 8, 16, 45, 63 mm; the wood chips were sorted in decreasing size classes and the mass of all fractions was used to determine the size distribution of the particles. Since the chip shape and size influence the sieving results, Wang’s theory, which concerns the geometric forms, was considered. A cluster analysis on the shape descriptors (Fourier descriptors and size descriptors (area, perimeter, Feret diameters, eccentricity was applied to observe the chips distribution. The UPGMA algorithm was applied on Euclidean distance. The obtained dendrogram shows a group separation according with the original three sieving fractions. A comparison has been made between the traditional sieve and clustering results. This preliminary result shows how the image analysis-based method has a high potential for the characterization of wood chip size distribution and could be further investigated. Moreover, this method could be implemented in an online detection machine for chips size characterization. An improvement of the results is expected by using supervised multivariate methods that utilize known class memberships. The main objective of the future activities will be to shift the analysis from a 2-dimensional method to a 3- dimensional acquisition process.

  7. Using Dynamic Fourier Analysis to Discriminate Between Seismic Signals from Natural Earthquakes and Mining Explosions

    Directory of Open Access Journals (Sweden)

    Maria C. Mariani

    2017-08-01

    Full Text Available A sequence of intraplate earthquakes occurred in Arizona at the same location where miningexplosions were carried out in previous years. The explosions and some of the earthquakes generatedvery similar seismic signals. In this study Dynamic Fourier Analysis is used for discriminating signalsoriginating from natural earthquakes and mining explosions. Frequency analysis of seismogramsrecorded at regional distances shows that compared with the mining explosions the earthquake signalshave larger amplitudes in the frequency interval ~ 6 to 8 Hz and significantly smaller amplitudes inthe frequency interval ~ 2 to 4 Hz. This type of analysis permits identifying characteristics in theseismograms frequency yielding to detect potentially risky seismic events.

  8. Cluster analysis of received constellations for optical performance monitoring

    NARCIS (Netherlands)

    van Weerdenburg, J.J.A.; van Uden, R.; Sillekens, E.; de Waardt, H.; Koonen, A.M.J.; Okonkwo, C.

    2016-01-01

    Performance monitoring based on centroid clustering to investigate constellation generation offsets. The tool allows flexibility in constellation generation tolerances by forwarding centroids to the demapper. The relation of fibre nonlinearities and singular value decomposition of intra-cluster

  9. Analysis of gold and silver concentration on gold mining tailings by neutron activation analysis

    International Nuclear Information System (INIS)

    Sadikov, I.I.; Salimov, M.I.; Sadykova, Z.O.

    2014-01-01

    Full text: Instrumental neutron-activation analysis without radiochemical separation is one of most applicable and often used methods to analyze the concentration of gold, silver and other rare and noble metals in gold ores. This method is not suitable for analyzing low concentration of gold and silver in gold mining tailings due to rather high concentration of some elements. Samples are dissolved by boiling in a mixture of concentrated hydrochloric and nitric acids to extract gold and silver into the solution. Chemical yield of gold and silver after dissolution of the sample and further chromatographic separation is between 92 and 95 percent respectively

  10. The composite sequential clustering technique for analysis of multispectral scanner data

    Science.gov (United States)

    Su, M. Y.

    1972-01-01

    The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.

  11. Genetic Diversity and Relationships of Neolamarckia cadamba (Roxb. Bosser progenies through cluster analysis

    Directory of Open Access Journals (Sweden)

    M. Preethi Shree

    2018-04-01

    Full Text Available Genetic diversity analysis was conducted for biometric attributes in 20 progenies of Neolamarckia cadamba. The application of D2 clustering technique in Neolamarckia cadamba genetic resources resolved the 20 progenies into five clusters. The maximum intra cluster distance was shown by the cluster II. The maximum inter cluster distance was recorded between cluster III and V which indicated the presence of wider genetic distance between Neolamarckia cadamba progenies. Among the growth attributes, volume (36.84 % contributed maximum towards genetic divergence followed by bole height, basal diameter, tree height, number of branches in Neolamarckia cadamba progenies.

  12. QTL global meta-analysis: are trait determining genes clustered?

    Directory of Open Access Journals (Sweden)

    Adelson David L

    2009-04-01

    Full Text Available Abstract Background A key open question in biology is if genes are physically clustered with respect to their known functions or phenotypic effects. This is of particular interest for Quantitative Trait Loci (QTL where a QTL region could contain a number of genes that contribute to the trait being measured. Results We observed a significant increase in gene density within QTL regions compared to non-QTL regions and/or the entire bovine genome. By grouping QTL from the Bovine QTL Viewer database into 8 categories of non-redundant regions, we have been able to analyze gene density and gene function distribution, based on Gene Ontology (GO with relation to their location within QTL regions, outside of QTL regions and across the entire bovine genome. We identified a number of GO terms that were significantly over represented within particular QTL categories. Furthermore, select GO terms expected to be associated with the QTL category based on common biological knowledge have also proved to be significantly over represented in QTL regions. Conclusion Our analysis provides evidence of over represented GO terms in QTL regions. This increased GO term density indicates possible clustering of gene functions within QTL regions of the bovine genome. Genes with similar functions may be grouped in specific locales and could be contributing to QTL traits. Moreover, we have identified over-represented GO terminology that from a biological standpoint, makes sense with respect to QTL category type.

  13. Cluster decay analysis and related structure effects of fissionable ...

    Indian Academy of Sciences (India)

    2015-08-01

    Aug 1, 2015 ... Collective clusterization approach of dynamical cluster decay model (DCM) has been ... fusion–fission process resulting in the emission of symmetric and/or ... represents the relative separation distance between two fragments or clusters ... decay constant λ or decay half-life T1/2 is defined as λ = (ln 2/T1/2) ...

  14. Maximum-entropy clustering algorithm and its global convergence analysis

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Constructing a batch of differentiable entropy functions touniformly approximate an objective function by means of the maximum-entropy principle, a new clustering algorithm, called maximum-entropy clustering algorithm, is proposed based on optimization theory. This algorithm is a soft generalization of the hard C-means algorithm and possesses global convergence. Its relations with other clustering algorithms are discussed.

  15. Hybrid Tracking Algorithm Improvements and Cluster Analysis Methods.

    Science.gov (United States)

    1982-02-26

    UPGMA ), and Ward’s method. Ling’s papers describe a (k,r) clustering method. Each of these methods have individual characteristics which make them...Reference 7), UPGMA is probably the most frequently used clustering strategy. UPGMA tries to group new points into an existing cluster by using an

  16. Analysis on the influence of rainfall and mine water ratio against pH in East pit 3 West Banko coal mine

    Directory of Open Access Journals (Sweden)

    Rochyani Neny

    2017-01-01

    Full Text Available In the coal mining area, the pH of mine water is found tend to low and acids. In order to increase the pH, it is important to consider the treatment of acid mine drainage using lime, due the indicators of pollution. This work is focused on the influence of rainfall volume on the pH of acid mine drainage. This research conducted using a ratio of mine water and rainfall water that varies in the 9 (nine conditions, respectively: 1: 1, 1: 2, 1: 3, 1: 4 and 1: 5 and 5: 4, 5: 3 , 5: 2 and 5: 1. The results were then measured and tested with statistical analysis. The ratio of rainfall and mine water showed a significant effect on the pH. The higher of the rainfall lead to increase pH. This condition will affect the water neutralization process using lime where there are some possible differences on dose of lime needed to neutralized the acid mine drainage in the rainy season and dry season.

  17. CHOOSING A HEALTH INSTITUTION WITH MULTIPLE CORRESPONDENCE ANALYSIS AND CLUSTER ANALYSIS IN A POPULATION BASED STUDY

    Directory of Open Access Journals (Sweden)

    ASLI SUNER

    2013-06-01

    Full Text Available Multiple correspondence analysis is a method making easy to interpret the categorical variables given in contingency tables, showing the similarities, associations as well as divergences among these variables via graphics on a lower dimensional space. Clustering methods are helped to classify the grouped data according to their similarities and to get useful summarized data from them. In this study, interpretations of multiple correspondence analysis are supported by cluster analysis; factors affecting referred health institute such as age, disease group and health insurance are examined and it is aimed to compare results of the methods.

  18. Analysis of Air Particles Around Site Plan of Gold Mining, North Sumatera

    International Nuclear Information System (INIS)

    Gatot-Suhariyono; Erizal-Tanjung

    2004-01-01

    Analysis of air particles around site plan of gold mining, North Sumatra has been conducted. Air particles of TSP (Total Suspended Particulate), which has maximum diameter around 45 μm (PM 2.5 ) was sampled in four places using impactor cascade. The measurement results indicate that concentration of TSP and PM 10 /PM 2.5 were in site plan center of mining smaller than quality standard of ambient air (PP RI no. 41/1999), while the concentration in areas of around it was on the contrary. The concentration in areas of around the mining was not because of air particle from in site plan center of mining. Based on regulatory of BAPEDAL head no. Kep-107/BAPEDAL/11/1997, concentration of PM 10 /PM 2.5 and TSP in site plan center of mining is in moderate category, while in areas of around the mining are in unhealthy category. Unhealthy category affects decrease at view distance and happened dust defilement everywhere, while moderate category is only happened degradation of view distance. (author)

  19. MMPI profiles of males accused of severe crimes: a cluster analysis

    NARCIS (Netherlands)

    Spaans, M.; Barendregt, M.; Muller, E.; Beurs, E. de; Nijman, H.L.I.; Rinne, T.

    2009-01-01

    In studies attempting to classify criminal offenders by cluster analysis of Minnesota Multiphasic Personality Inventory-2 (MMPI-2) data, the number of clusters found varied between 10 (the Megargee System) and two (one cluster indicating no psychopathology and one exhibiting serious

  20. Cluster analysis of rural, urban, and curbside atmospheric particle size data.

    Science.gov (United States)

    Beddows, David C S; Dall'Osto, Manuel; Harrison, Roy M

    2009-07-01

    Particle size is a key determinant of the hazard posed by airborne particles. Continuous multivariate particle size data have been collected using aerosol particle size spectrometers sited at four locations within the UK: Harwell (Oxfordshire); Regents Park (London); British Telecom Tower (London); and Marylebone Road (London). These data have been analyzed using k-means cluster analysis, deduced to be the preferred cluster analysis technique, selected from an option of four partitional cluster packages, namelythe following: Fuzzy; k-means; k-median; and Model-Based clustering. Using cluster validation indices k-means clustering was shown to produce clusters with the smallest size, furthest separation, and importantly the highest degree of similarity between the elements within each partition. Using k-means clustering, the complexity of the data set is reduced allowing characterization of the data according to the temporal and spatial trends of the clusters. At Harwell, the rural background measurement site, the cluster analysis showed that the spectra may be differentiated by their modal-diameters and average temporal trends showing either high counts during the day-time or night-time hours. Likewise for the urban sites, the cluster analysis differentiated the spectra into a small number of size distributions according their modal-diameter, the location of the measurement site, and time of day. The responsible aerosol emission, formation, and dynamic processes can be inferred according to the cluster characteristics and correlation to concurrently measured meteorological, gas phase, and particle phase measurements.

  1. Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms.

    Science.gov (United States)

    Esplin, M Sean; Manuck, Tracy A; Varner, Michael W; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M; Ilekis, John

    2015-09-01

    We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators' ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB. We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks' gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software. One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ(2) analysis (PD and DH, P cluster 3 of SPTB. We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Complementing the Numbers: A Text Mining Analysis of College Course Withdrawals

    Science.gov (United States)

    Michalski, Greg V.

    2011-01-01

    Excessive college course withdrawals are costly to the student and the institution in terms of time to degree completion, available classroom space, and other resources. Although generally well quantified, detailed analysis of the reasons given by students for course withdrawal is less common. To address this, a text mining analysis was performed…

  3. The relationship between supplier networks and industrial clusters: an analysis based on the cluster mapping method

    Directory of Open Access Journals (Sweden)

    Ichiro IWASAKI

    2010-06-01

    Full Text Available Michael Porter’s concept of competitive advantages emphasizes the importance of regional cooperation of various actors in order to gain competitiveness on globalized markets. Foreign investors may play an important role in forming such cooperation networks. Their local suppliers tend to concentrate regionally. They can form, together with local institutions of education, research, financial and other services, development agencies, the nucleus of cooperative clusters. This paper deals with the relationship between supplier networks and clusters. Two main issues are discussed in more detail: the interest of multinational companies in entering regional clusters and the spillover effects that may stem from their participation. After the discussion on the theoretical background, the paper introduces a relatively new analytical method: “cluster mapping” - a method that can spot regional hot spots of specific economic activities with cluster building potential. Experience with the method was gathered in the US and in the European Union. After the discussion on the existing empirical evidence, the authors introduce their own cluster mapping results, which they obtained by using a refined version of the original methodology.

  4. Higgs Pair Production: Choosing Benchmarks With Cluster Analysis

    CERN Document Server

    Carvalho, Alexandra; Dorigo, Tommaso; Goertz, Florian; Gottardo, Carlo A.; Tosi, Mia

    2016-01-01

    New physics theories often depend on a large number of free parameters. The precise values of those parameters in some cases drastically affect the resulting phenomenology of fundamental physics processes, while in others finite variations can leave it basically invariant at the level of detail experimentally accessible. When designing a strategy for the analysis of experimental data in the search for a signal predicted by a new physics model, it appears advantageous to categorize the parameter space describing the model according to the corresponding kinematical features of the final state. A multi-dimensional test statistic can be used to gauge the degree of similarity in the kinematics of different models; a clustering algorithm using that metric may then allow the division of the space into homogeneous regions, each of which can be successfully represented by a benchmark point. Searches targeting those benchmark points are then guaranteed to be sensitive to a large area of the parameter space. In this doc...

  5. An Analysis of Trainers' Perspectives within an Ecological Framework: Factors that Influence Mine Safety Training Processes.

    Science.gov (United States)

    Haas, Emily J; Hoebbel, Cassandra L; Rost, Kristen A

    2014-09-01

    Satisfactory completion of mine safety training is a prerequisite for being hired and for continued employment in the coal industry. Although training includes content to develop skills in a variety of mineworker competencies, research and recommendations continue to specify that specific limitations in the self-escape portion of training still exist and that mineworkers need to be better prepared to respond to emergencies that could occur in their mine. Ecological models are often used to inform the development of health promotion programs but have not been widely applied to occupational health and safety training programs. Nine mine safety trainers participated in in-depth semi-structured interviews. A theoretical analysis of the interviews was completed via an ecological lens. Each level of the social ecological model was used to examine factors that could be addressed both during and after mine safety training. The analysis suggests that problems surrounding communication and collaboration, leadership development, and responsibility and accountability at different levels within the mining industry contribute to deficiencies in mineworkers' mastery and maintenance of skills. This study offers a new technique to identify limitations in safety training systems and processes. The analysis suggests that training should be developed and disseminated with consideration of various levels-individual, interpersonal, organizational, and community-to promote skills. If factors identified within and between levels are addressed, it may be easier to sustain mineworker competencies that are established during safety training.

  6. Clustering Game Behavior Data

    DEFF Research Database (Denmark)

    Bauckhage, C.; Drachen, Anders; Sifa, Rafet

    2015-01-01

    of the causes, the proliferation of behavioral data poses the problem of how to derive insights therefrom. Behavioral data sets can be large, time-dependent and high-dimensional. Clustering offers a way to explore such data and to discover patterns that can reduce the overall complexity of the data. Clustering...... and other techniques for player profiling and play style analysis have, therefore, become popular in the nascent field of game analytics. However, the proper use of clustering techniques requires expertise and an understanding of games is essential to evaluate results. With this paper, we address game data...... scientists and present a review and tutorial focusing on the application of clustering techniques to mine behavioral game data. Several algorithms are reviewed and examples of their application shown. Key topics such as feature normalization are discussed and open problems in the context of game analytics...

  7. Survey of Analysis of Crime Detection Techniques Using Data Mining and Machine Learning

    Science.gov (United States)

    Prabakaran, S.; Mitra, Shilpa

    2018-04-01

    Data mining is the field containing procedures for finding designs or patterns in a huge dataset, it includes strategies at the convergence of machine learning and database framework. It can be applied to various fields like future healthcare, market basket analysis, education, manufacturing engineering, crime investigation etc. Among these, crime investigation is an interesting application to process crime characteristics to help the society for a better living. This paper survey various data mining techniques used in this domain. This study may be helpful in designing new strategies for crime prediction and analysis.

  8. Design database for quantitative trait loci (QTL) data warehouse, data mining, and meta-analysis.

    Science.gov (United States)

    Hu, Zhi-Liang; Reecy, James M; Wu, Xiao-Lin

    2012-01-01

    A database can be used to warehouse quantitative trait loci (QTL) data from multiple sources for comparison, genomic data mining, and meta-analysis. A robust database design involves sound data structure logistics, meaningful data transformations, normalization, and proper user interface designs. This chapter starts with a brief review of relational database basics and concentrates on issues associated with curation of QTL data into a relational database, with emphasis on the principles of data normalization and structure optimization. In addition, some simple examples of QTL data mining and meta-analysis are included. These examples are provided to help readers better understand the potential and importance of sound database design.

  9. Advances in research methods for information systems research data mining, data envelopment analysis, value focused thinking

    CERN Document Server

    Osei-Bryson, Kweku-Muata

    2013-01-01

    Advances in social science research methodologies and data analytic methods are changing the way research in information systems is conducted. New developments in statistical software technologies for data mining (DM) such as regression splines or decision tree induction can be used to assist researchers in systematic post-positivist theory testing and development. Established management science techniques like data envelopment analysis (DEA), and value focused thinking (VFT) can be used in combination with traditional statistical analysis and data mining techniques to more effectively explore

  10. Quantitative analysis of raw materials mining of Sverdlovsk region in Russia

    Science.gov (United States)

    Tarasyev, Alexander M.; Vasilev, Julian; Turygina, Victoria F.

    2016-06-01

    The purpose of this article is to show the application of some qualitative methods for the analysis of a dataset for raw materials. The main approaches used are related to the correlation analysis and forecasting with trend lines. It is proved that the future mining of particular ores can be predicted on the basis of mathematical modeling. It is also shown that there exists a strong correlation between the mining of some specific raw materials. Some of the revealed correlations have meaningful explanations, and for others one should look for sophisticated interpretations. The applied approach can be used for forecasting of raw materials exploitation in various regions of Russia and in other countries.

  11. The Top Ten Algorithms in Data Mining

    CERN Document Server

    Wu, Xindong

    2009-01-01

    From classification and clustering to statistical learning, association analysis, and link mining, this book covers the most important topics in data mining research. It presents the ten most influential algorithms used in the data mining community today. Each chapter provides a detailed description of the algorithm, a discussion of available software implementation, advanced topics, and exercises. With a simple data set, examples illustrate how each algorithm works and highlight the overall performance of each algorithm in a real-world application. Featuring contributions from leading researc

  12. Data Mining and Statistics for Decision Making

    CERN Document Server

    Tufféry, Stéphane

    2011-01-01

    Data mining is the process of automatically searching large volumes of data for models and patterns using computational techniques from statistics, machine learning and information theory; it is the ideal tool for such an extraction of knowledge. Data mining is usually associated with a business or an organization's need to identify trends and profiles, allowing, for example, retailers to discover patterns on which to base marketing objectives. This book looks at both classical and recent techniques of data mining, such as clustering, discriminant analysis, logistic regression, generalized lin

  13. Mining Sequential Update Summarization with Hierarchical Text Analysis

    Directory of Open Access Journals (Sweden)

    Chunyun Zhang

    2016-01-01

    Full Text Available The outbreak of unexpected news events such as large human accident or natural disaster brings about a new information access problem where traditional approaches fail. Mostly, news of these events shows characteristics that are early sparse and later redundant. Hence, it is very important to get updates and provide individuals with timely and important information of these incidents during their development, especially when being applied in wireless and mobile Internet of Things (IoT. In this paper, we define the problem of sequential update summarization extraction and present a new hierarchical update mining system which can broadcast with useful, new, and timely sentence-length updates about a developing event. The new system proposes a novel method, which incorporates techniques from topic-level and sentence-level summarization. To evaluate the performance of the proposed system, we apply it to the task of sequential update summarization of temporal summarization (TS track at Text Retrieval Conference (TREC 2013 to compute four measurements of the update mining system: the expected gain, expected latency gain, comprehensiveness, and latency comprehensiveness. Experimental results show that our proposed method has good performance.

  14. Investigating spousal concordance of diabetes through statistical analysis and data mining.

    Directory of Open Access Journals (Sweden)

    Jong-Yi Wang

    Full Text Available Spousal clustering of diabetes merits attention. Whether old-age vulnerability or a shared family environment determines the concordance of diabetes is also uncertain. This study investigated the spousal concordance of diabetes and compared the risk of diabetes concordance between couples and noncouples by using nationally representative data.A total of 22,572 individuals identified from the 2002-2013 National Health Insurance Research Database of Taiwan constituted 5,643 couples and 5,643 noncouples through 1:1 dual propensity score matching (PSM. Factors associated with concordance in both spouses with diabetes were analyzed at the individual level. The risk of diabetes concordance between couples and noncouples was compared at the couple level. Logistic regression was the main statistical method. Statistical data were analyzed using SAS 9.4. C&RT and Apriori of data mining conducted in IBM SPSS Modeler 13 served as a supplement to statistics.High odds of the spousal concordance of diabetes were associated with old age, middle levels of urbanization, and high comorbidities (all P < 0.05. The dual PSM analysis revealed that the risk of diabetes concordance was significantly higher in couples (5.19% than in noncouples (0.09%; OR = 61.743, P < 0.0001.A high concordance rate of diabetes in couples may indicate the influences of assortative mating and shared environment. Diabetes in a spouse implicates its risk in the partner. Family-based diabetes care that emphasizes the screening of couples at risk of diabetes by using the identified risk factors is suggested in prospective clinical practice interventions.

  15. Performance analysis of clustering techniques over microarray data: A case study

    Science.gov (United States)

    Dash, Rasmita; Misra, Bijan Bihari

    2018-03-01

    Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.

  16. Mining the protein data bank to differentiate error from structural variation in clustered static structures: an examination of HIV protease.

    Science.gov (United States)

    Venkatakrishnan, Balasubramanian; Palii, Miorel-Lucian; Agbandje-McKenna, Mavis; McKenna, Robert

    2012-03-01

    The Protein Data Bank (PDB) contains over 71,000 structures. Extensively studied proteins have hundreds of submissions available, including mutations, different complexes, and space groups, allowing for application of data-mining algorithms to analyze an array of static structures and gain insight about a protein's structural variation and possibly its dynamics. This investigation is a case study of HIV protease (PR) using in-house algorithms for data mining and structure superposition through generalized formulæ that account for multiple conformations and fractional occupancies. Temperature factors (B-factors) are compared with spatial displacement from the mean structure over the entire study set and separately over bound and ligand-free structures, to assess the significance of structural deviation in a statistical context. Space group differences are also examined.

  17. Data mining analysis of Professor Liu Shangyi’s prescription characteristics in clinical medicine for the treatment of cancer patients with stomachache

    Directory of Open Access Journals (Sweden)

    Wen-Qi Huang

    2018-01-01

    Full Text Available Objective: To analyze National Chinese Medicine Master Liu Shangyi’s prescription characteristics of clinical medicine for the treatment of cancer patients with stomachache. Methods: Data on prescriptions for cancer patients with stomachache between January 2014 and July 2016 were collected. The composing principles were analyzed by unsupervised data mining methods including Apriori algorithm in association rules and complex system entropy cluster. Results: Based on the analysis of 120 prescriptions, the frequency of each herb and association rules among the herbs were computed. Four core combinations and two new prescriptions were mined from the database. Compared to the before treatment, the clinical symptomatic grading of stomachache after treatment was lower (P < 0.001. Conclusion: Professor Liu has been successful in the treatment of cancer patients with stomachache by prescribing medication that aids in activating blood circulation, removing dampness, and alleviating pain.

  18. Exploring the potential of data mining techniques for the analysis of accident patterns

    DEFF Research Database (Denmark)

    Prato, Carlo Giacomo; Bekhor, Shlomo; Galtzur, Ayelet

    2010-01-01

    Research in road safety faces major challenges: individuation of the most significant determinants of traffic accidents, recognition of the most recurrent accident patterns, and allocation of resources necessary to address the most relevant issues. This paper intends to comprehend which data mining...... and association rules) data mining techniques are implemented for the analysis of traffic accidents occurred in Israel between 2001 and 2004. Results show that descriptive techniques are useful to classify the large amount of analyzed accidents, even though introduce problems with respect to the clear...... importance of input and intermediate neurons, and the relative importance of hundreds of association rules. Further research should investigate whether limiting the analysis to fatal accidents would simplify the task of data mining techniques in recognizing accident patterns without the “noise” probably...

  19. Identification of mine waters by statistical multivariate methods

    Energy Technology Data Exchange (ETDEWEB)

    Mali, N [IGGG, Ljubljana (Slovenia)

    1992-01-01

    Three water-bearing aquifers are present in the Velenje lignite mine. The aquifer waters have differing chemical composition; a geochemical water analysis can therefore determine the source of mine water influx. Mine water samples from different locations in the mine were analyzed, the results of chemical content and of electric conductivity of mine water were statistically processed by means of MICROGAS, SPSS-X and IN STATPAC computer programs, which apply three multivariate statistical methods (discriminate, cluster and factor analysis). Reliability of calculated values was determined with the Kolmogorov and Smirnov tests. It is concluded that laboratory analysis of single water samples can produce measurement errors, but statistical processing of water sample data can identify origin and movement of mine water. 15 refs.

  20. Hydrologic analysis for ecological risk assessment of watersheds with abandoned mine lands

    International Nuclear Information System (INIS)

    Gallagher, D.; Babendreier, J.; Cherry, D.

    1999-01-01

    As part of on-going study of acid mine drainage (AMD), a comprehensive ecological risk assessment was conducted in the Leading Creek Watershed in southeast Ohio. The watershed is influenced by agriculture and active and abandoned coal-mining operations. This work presents a broad overview of several quantitative measures of hydrology and hydraulic watershed properties available for in risk assessment and evaluates their relation to metrics of ecology. Data analysis included statistical comparisons of metrics of ecology, ecotoxicology, water quality, and physically based parameters describing land use, geomorphology, flow, velocity, and particle size. A multiple regression analysis indicated that abandoned mining operations dominated impacts upon aquatic ecology. It also indicated low flow velocity measurements and a ratio of maximum velocity to average velocity at low flow where helpful in describing variation in macroinvertebrate Total Taxa scores. Other key parameters also identified strong impact relationships with biodiversity trends and included pH, simple knowledge of any mining upstream, calculated % of the subshed covered by strip mines, and the measured depth of streambed sediments from site to site

  1. Strategic Mine Planning: A SWOT Analysis Applied to KOV Open Pit Mine in the Democratic Republic of Congo

    OpenAIRE

    Patrick May Mukonki

    2017-01-01

    KOV pit (Kamoto Oliveira Virgule) is located 10 km from Kolwezi town, one of the mineral rich town in the Lualaba province of the Democratic Republic of Congo. The KOV pit is currently operating under the Katanga Mining Limited (KML), a Glencore-Gecamines (a State Owned Company) join venture. Recently, the mine optimization process provided a life of mine of approximately 10 years withnice pushbacks using the Datamine NPV Scheduler software. In previous KOV pit studies, we recently outlined t...

  2. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

    Directory of Open Access Journals (Sweden)

    Marco Borri

    Full Text Available To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment.The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4. Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters.The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4, determined with cluster validation, produced the best separation between reducing and non-reducing clusters.The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.

  3. Haplotyping Problem, A Clustering Approach

    International Nuclear Information System (INIS)

    Eslahchi, Changiz; Sadeghi, Mehdi; Pezeshk, Hamid; Kargar, Mehdi; Poormohammadi, Hadi

    2007-01-01

    Construction of two haplotypes from a set of Single Nucleotide Polymorphism (SNP) fragments is called haplotype reconstruction problem. One of the most popular computational model for this problem is Minimum Error Correction (MEC). Since MEC is an NP-hard problem, here we propose a novel heuristic algorithm based on clustering analysis in data mining for haplotype reconstruction problem. Based on hamming distance and similarity between two fragments, our iterative algorithm produces two clusters of fragments; then, in each iteration, the algorithm assigns a fragment to one of the clusters. Our results suggest that the algorithm has less reconstruction error rate in comparison with other algorithms

  4. Mining Views : database views for data mining

    NARCIS (Netherlands)

    Blockeel, H.; Calders, T.; Fromont, É.; Goethals, B.; Prado, A.; Nijssen, S.; De Raedt, L.

    2007-01-01

    We propose a relational database model towards the integration of data mining into relational database systems, based on the so called virtual mining views. We show that several types of patterns and models over the data, such as itemsets, association rules, decision trees and clusterings, can be

  5. Analysis of the dynamical cluster approximation for the Hubbard model

    OpenAIRE

    Aryanpour, K.; Hettler, M. H.; Jarrell, M.

    2002-01-01

    We examine a central approximation of the recently introduced Dynamical Cluster Approximation (DCA) by example of the Hubbard model. By both analytical and numerical means we study non-compact and compact contributions to the thermodynamic potential. We show that approximating non-compact diagrams by their cluster analogs results in a larger systematic error as compared to the compact diagrams. Consequently, only the compact contributions should be taken from the cluster, whereas non-compact ...

  6. Decision-making on the integration of renewable energy in the mining industry: A case studies analysis, a cost analysis and a SWOT analysis

    Directory of Open Access Journals (Sweden)

    Kateryna Zharan

    2017-01-01

    Full Text Available The mining industry is showing increasing interest in using renewable energy (RE technologies as one of the principles of sustainable mining. This is witnessed in several pilot projects in major mining countries around the world. Positive factors which favor this interest are gaining importance and negative barrier factors seem to be less relevant. For a mine operator, the switch from fossil fuel to RE technologies is the outcome of decision making processes. So far, research about such decision making on the use of RE in mining is underdeveloped. The purpose of this paper to present a practical decision rule based on a principle of indifference between RE and fossil fuel technologies and on appropriate time management. To achieve this objective, three investigations are made: (i a case studies analysis, (ii a comparative cost analysis, and (iii a SWOT analysis.

  7. Examining Mobile Learning Trends 2003-2008: A Categorical Meta-Trend Analysis Using Text Mining Techniques

    Science.gov (United States)

    Hung, Jui-Long; Zhang, Ke

    2012-01-01

    This study investigated the longitudinal trends of academic articles in Mobile Learning (ML) using text mining techniques. One hundred and nineteen (119) refereed journal articles and proceedings papers from the SCI/SSCI database were retrieved and analyzed. The taxonomies of ML publications were grouped into twelve clusters (topics) and four…

  8. X-Ray Morphological Analysis of the Planck ESZ Clusters

    Energy Technology Data Exchange (ETDEWEB)

    Lovisari, Lorenzo; Forman, William R.; Jones, Christine; Andrade-Santos, Felipe; Randall, Scott; Kraft, Ralph [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138 (United States); Ettori, Stefano [INAF, Osservatorio Astronomico di Bologna, via Ranzani 1, I-40127 Bologna (Italy); Arnaud, Monique; Démoclès, Jessica; Pratt, Gabriel W. [Laboratoire AIM, IRFU/Service d’Astrophysique—CEA/DRF—CNRS—Université Paris Diderot, Bât. 709, CEA-Saclay, F-91191 Gif-sur-Yvette Cedex (France)

    2017-09-01

    X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev–Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper we determine eight morphological parameters for the Planck Early Sunyaev–Zeldovich (ESZ) objects observed with XMM-Newton . We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.

  9. [Analysis of the characteristics of the older adults with depression using data mining decision tree analysis].

    Science.gov (United States)

    Park, Myonghwa; Choi, Sora; Shin, A Mi; Koo, Chul Hoi

    2013-02-01

    The purpose of this study was to develop a prediction model for the characteristics of older adults with depression using the decision tree method. A large dataset from the 2008 Korean Elderly Survey was used and data of 14,970 elderly people were analyzed. Target variable was depression and 53 input variables were general characteristics, family & social relationship, economic status, health status, health behavior, functional status, leisure & social activity, quality of life, and living environment. Data were analyzed by decision tree analysis, a data mining technique using SPSS Window 19.0 and Clementine 12.0 programs. The decision trees were classified into five different rules to define the characteristics of older adults with depression. Classification & Regression Tree (C&RT) showed the best prediction with an accuracy of 80.81% among data mining models. Factors in the rules were life satisfaction, nutritional status, daily activity difficulty due to pain, functional limitation for basic or instrumental daily activities, number of chronic diseases and daily activity difficulty due to disease. The different rules classified by the decision tree model in this study should contribute as baseline data for discovering informative knowledge and developing interventions tailored to these individual characteristics.

  10. A STUDY OF TEXT MINING METHODS, APPLICATIONS,AND TECHNIQUES

    OpenAIRE

    R. Rajamani*1 & S. Saranya2

    2017-01-01

    Data mining is used to extract useful information from the large amount of data. It is used to implement and solve different types of research problems. The research related areas in data mining are text mining, web mining, image mining, sequential pattern mining, spatial mining, medical mining, multimedia mining, structure mining and graph mining. Text mining also referred to text of data mining, it is also called knowledge discovery in text (KDT) or knowledge of intelligent text analysis. T...

  11. Identification and validation of asthma phenotypes in Chinese population using cluster analysis.

    Science.gov (United States)

    Wang, Lei; Liang, Rui; Zhou, Ting; Zheng, Jing; Liang, Bing Miao; Zhang, Hong Ping; Luo, Feng Ming; Gibson, Peter G; Wang, Gang

    2017-10-01

    Asthma is a heterogeneous airway disease, so it is crucial to clearly identify clinical phenotypes to achieve better asthma management. To identify and prospectively validate asthma clusters in a Chinese population. Two hundred eighty-four patients were consecutively recruited and 18 sociodemographic and clinical variables were collected. Hierarchical cluster analysis was performed by the Ward method followed by k-means cluster analysis. Then, a prospective 12-month cohort study was used to validate the identified clusters. Five clusters were successfully identified. Clusters 1 (n = 71) and 3 (n = 81) were mild asthma phenotypes with slight airway obstruction and low exacerbation risk, but with a sex differential. Cluster 2 (n = 65) described an "allergic" phenotype, cluster 4 (n = 33) featured a "fixed airflow limitation" phenotype with smoking, and cluster 5 (n = 34) was a "low socioeconomic status" phenotype. Patients in clusters 2, 4, and 5 had distinctly lower socioeconomic status and more psychological symptoms. Cluster 2 had a significantly increased risk of exacerbations (risk ratio [RR] 1.13, 95% confidence interval [CI] 1.03-1.25), unplanned visits for asthma (RR 1.98, 95% CI 1.07-3.66), and emergency visits for asthma (RR 7.17, 95% CI 1.26-40.80). Cluster 4 had an increased risk of unplanned visits (RR 2.22, 95% CI 1.02-4.81), and cluster 5 had increased emergency visits (RR 12.72, 95% CI 1.95-69.78). Kaplan-Meier analysis confirmed that cluster grouping was predictive of time to the first asthma exacerbation, unplanned visit, emergency visit, and hospital admission (P clusters as "allergic asthma," "fixed airflow limitation," and "low socioeconomic status" phenotypes that are at high risk of severe asthma exacerbations and that have management implications for clinical practice in developing countries. Copyright © 2017 American College of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  12. Genome-scale analysis of positional clustering of mouse testis-specific genes

    Directory of Open Access Journals (Sweden)

    Lee Bernett TK

    2005-01-01

    Full Text Available Abstract Background Genes are not randomly distributed on a chromosome as they were thought even after removal of tandem repeats. The positional clustering of co-expressed genes is known in prokaryotes and recently reported in several eukaryotic organisms such as Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens. In order to further investigate the mode of tissue-specific gene clustering in higher eukaryotes, we have performed a genome-scale analysis of positional clustering of the mouse testis-specific genes. Results Our computational analysis shows that a large proportion of testis-specific genes are clustered in groups of 2 to 5 genes in the mouse genome. The number of clusters is much higher than expected by chance even after removal of tandem repeats. Conclusion Our result suggests that testis-specific genes tend to cluster on the mouse chromosomes. This provides another piece of evidence for the hypothesis that clusters of tissue-specific genes do exist.

  13. Text Mining in Organizational Research.

    Science.gov (United States)

    Kobayashi, Vladimer B; Mol, Stefan T; Berkers, Hannah A; Kismihók, Gábor; Den Hartog, Deanne N

    2018-07-01

    Despite the ubiquity of textual data, so far few researchers have applied text mining to answer organizational research questions. Text mining, which essentially entails a quantitative approach to the analysis of (usually) voluminous textual data, helps accelerate knowledge discovery by radically increasing the amount data that can be analyzed. This article aims to acquaint organizational researchers with the fundamental logic underpinning text mining, the analytical stages involved, and contemporary techniques that may be used to achieve different types of objectives. The specific analytical techniques reviewed are (a) dimensionality reduction, (b) distance and similarity computing, (c) clustering, (d) topic modeling, and (e) classification. We describe how text mining may extend contemporary organizational research by allowing the testing of existing or new research questions with data that are likely to be rich, contextualized, and ecologically valid. After an exploration of how evidence for the validity of text mining output may be generated, we conclude the article by illustrating the text mining process in a job analysis setting using a dataset composed of job vacancies.

  14. ANALYSIS OF DEVELOPING BATIK INDUSTRY CLUSTER IN BAKARAN VILLAGE CENTRAL JAVA PROVINCE

    Directory of Open Access Journals (Sweden)

    Hermanto Hermanto

    2017-06-01

    Full Text Available SMEs grow in a cluster in a certain geographical area. The entrepreneurs grow and thrive through the business cluster. Central Java Province has a lot of business clusters in improving the regional economy, one of which is batik industry cluster. Pati Regency is one of regencies / city in Central Java that has the lowest turnover. Batik industy cluster in Pati develops quite well, which can be seen from the increasing number of batik industry incorporated in the cluster. This research examines the strategy of developing the batik industry cluster in Pati Regency. The purpose of this research is to determine the proper strategy for developing the batik industry clusters in Pati. The method of research is quantitative. The analysis tool of this research is the Strengths, Weakness, Opportunity, Threats (SWOT analysis. The result of SWOT analysis in this research shows that the proper strategy for developing the batik industry cluster in Pati is optimizing the management of batik business cluster in Bakaran Village; the local government provides information of the facility of business capital loans; the utilization of labors from Bakaran Village while improving the quality of labors by training, and marketing the Bakaran batik to the broader markets while maintaining the quality of batik. Advice that can be given from this research is that the parties who have a role in batik industry cluster development in Bakaran Village, Pati Regency, such as the Local Government.

  15. Analysis of genetic association using hierarchical clustering and cluster validation indices.

    Science.gov (United States)

    Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L

    2017-10-01

    It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. WebGimm: An integrated web-based platform for cluster analysis, functional analysis, and interactive visualization of results.

    Science.gov (United States)

    Joshi, Vineet K; Freudenberg, Johannes M; Hu, Zhen; Medvedovic, Mario

    2011-01-17

    Cluster analysis methods have been extensively researched, but the adoption of new methods is often hindered by technical barriers in their implementation and use. WebGimm is a free cluster analysis web-service, and an open source general purpose clustering web-server infrastructure designed to facilitate easy deployment of integrated cluster analysis servers based on clustering and functional annotation algorithms implemented in R. Integrated functional analyses and interactive browsing of both, clustering structure and functional annotations provides a complete analytical environment for cluster analysis and interpretation of results. The Java Web Start client-based interface is modeled after the familiar cluster/treeview packages making its use intuitive to a wide array of biomedical researchers. For biomedical researchers, WebGimm provides an avenue to access state of the art clustering procedures. For Bioinformatics methods developers, WebGimm offers a convenient avenue to deploy their newly developed clustering methods. WebGimm server, software and manuals can be freely accessed at http://ClusterAnalysis.org/.

  17. An improved clustering algorithm based on reverse learning in intelligent transportation

    Science.gov (United States)

    Qiu, Guoqing; Kou, Qianqian; Niu, Ting

    2017-05-01

    With the development of artificial intelligence and data mining technology, big data has gradually entered people's field of vision. In the process of dealing with large data, clustering is an important processing method. By introducing the reverse learning method in the clustering process of PAM clustering algorithm, to further improve the limitations of one-time clustering in unsupervised clustering learning, and increase the diversity of clustering clusters, so as to improve the quality of clustering. The algorithm analysis and experimental results show that the algorithm is feasible.

  18. Data warehousing as a basis for web-based documentation of data mining and analysis.

    Science.gov (United States)

    Karlsson, J; Eklund, P; Hallgren, C G; Sjödin, J G

    1999-01-01

    In this paper we present a case study for data warehousing intended to support data mining and analysis. We also describe a prototype for data retrieval. Further we discuss some technical issues related to a particular choice of a patient record environment.

  19. Practical application of solid phase spectrophotometry in analysis of materials and goods of mining and metallurgy

    International Nuclear Information System (INIS)

    Duan Qunzhang

    1999-01-01

    The author reviewed recent development and practical application of solid phase spectrophotometry in analysis of materials and goods of mining-metallurgy. Separation and preconcentration and conditions of coloring determination, sensitivity and range of detection, as well as interference of corresponding method are discussed

  20. Comparison analysis for classification algorithm in data mining and the study of model use

    Science.gov (United States)

    Chen, Junde; Zhang, Defu

    2018-04-01

    As a key technique in data mining, classification algorithm was received extensive attention. Through an experiment of classification algorithm in UCI data set, we gave a comparison analysis method for the different algorithms and the statistical test was used here. Than that, an adaptive diagnosis model for preventive electricity stealing and leakage was given as a specific case in the paper.

  1. A practitioners guide to resampling for data analysis, data mining, and modeling: A cookbook for starters

    NARCIS (Netherlands)

    van den Broek, Egon

    A practitioner’s guide to resampling for data analysis, data mining, and modeling provides a gentle and pragmatic introduction in the proposed topics. Its supporting Web site was offline and, hence, its potentially added value could not be verified. The book refrains from using advanced mathematics

  2. What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis

    Science.gov (United States)

    Thomas, Emily H.; Galambos, Nora

    2004-01-01

    To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…

  3. The Determination of Children's Knowledge of Global Lunar Patterns from Online Essays Using Text Mining Analysis

    Science.gov (United States)

    Cheon, Jongpil; Lee, Sangno; Smith, Walter; Song, Jaeki; Kim, Yongjin

    2013-01-01

    The purpose of this study was to use text mining analysis of early adolescents' online essays to determine their knowledge of global lunar patterns. Australian and American students in grades five to seven wrote about global lunar patterns they had discovered by sharing observations with each other via the Internet. These essays were analyzed for…

  4. Analysis on the choice of the most suitable metal prices in a mining investment project

    International Nuclear Information System (INIS)

    Torre, L. de la; Espi, J. a.

    2014-01-01

    The mineral price assigned in mining project design is critical to determining the economic feasibility of a project. Nevertheless, although it is not difficult to find literature about market metal prices, it is much more complicated to achieve a specific methodology for calculating the value or which justifications are appropriate to include. This study presents an analysis of various methods for selecting metal prices and investigates the mechanisms and motives underlying price selections. The results describe various attitudes adopted by the designers of mining investment project, and how the price can be determined not just by means of forecasting also by consideration of other relevant parameters. (Author)

  5. Highly Robust Methods in Data Mining

    Czech Academy of Sciences Publication Activity Database

    Kalina, Jan

    2013-01-01

    Roč. 8, č. 1 (2013), s. 9-24 ISSN 1452-4864 Institutional support: RVO:67985807 Keywords : data mining * robust statistics * high-dimensional data * cluster analysis * logistic regression * neural networks Subject RIV: BB - Applied Statistics, Operational Research

  6. Cluster analysis of HZE particle tracks as applied to space radiobiology problems

    International Nuclear Information System (INIS)

    Batmunkh, M.; Bayarchimeg, L.; Lkhagva, O.; Belov, O.

    2013-01-01

    A cluster analysis is performed of ionizations in tracks produced by the most abundant nuclei in the charge and energy spectra of the galactic cosmic rays. The frequency distribution of clusters is estimated for cluster sizes comparable to the DNA molecule at different packaging levels. For this purpose, an improved K-mean-based algorithm is suggested. This technique allows processing particle tracks containing a large number of ionization events without setting the number of clusters as an input parameter. Using this method, the ionization distribution pattern is analyzed depending on the cluster size and particle's linear energy transfer

  7. Application of cluster analysis and unsupervised learning to multivariate tissue characterization

    International Nuclear Information System (INIS)

    Momenan, R.; Insana, M.F.; Wagner, R.F.; Garra, B.S.; Loew, M.H.

    1987-01-01

    This paper describes a procedure for classifying tissue types from unlabeled acoustic measurements (data type unknown) using unsupervised cluster analysis. These techniques are being applied to unsupervised ultrasonic image segmentation and tissue characterization. The performance of a new clustering technique is measured and compared with supervised methods, such as a linear Bayes classifier. In these comparisons two objectives are sought: a) How well does the clustering method group the data?; b) Do the clusters correspond to known tissue classes? The first question is investigated by a measure of cluster similarity and dispersion. The second question involves a comparison with a supervised technique using labeled data

  8. Participant intimacy: A cluster analysis of the intranuclear cascade

    International Nuclear Information System (INIS)

    Cugnon, J.; Knoll, J.; Randrup, J.

    1981-01-01

    The intranuclear cascade for relativistic nuclear collisions is analyzed in terms of clusters consisting of groups of nucleons which are dynamically linked to each other by violent interactions. The formation cross sections for the different cluster types as well as their intrinsic dynamics are studied and compared with the predictions of the linear cascade model ( rows-on-rows ). (orig.)

  9. An evaluation of centrality measures used in cluster analysis

    Science.gov (United States)

    Engström, Christopher; Silvestrov, Sergei

    2014-12-01

    Clustering of data into groups of similar objects plays an important part when analysing many types of data, especially when the datasets are large as they often are in for example bioinformatics, social networks and computational linguistics. Many clustering algorithms such as K-means and some types of hierarchical clustering need a number of centroids representing the 'center' of the clusters. The choice of centroids for the initial clusters often plays an important role in the quality of the clusters. Since a data point with a high centrality supposedly lies close to the 'center' of some cluster, this can be used to assign centroids rather than through some other method such as picking them at random. Some work have been done to evaluate the use of centrality measures such as degree, betweenness and eigenvector centrality in clustering algorithms. The aim of this article is to compare and evaluate the usefulness of a number of common centrality measures such as the above mentioned and others such as PageRank and related measures.

  10. Security and Correctness Analysis on Privacy-Preserving k-Means Clustering Schemes

    Science.gov (United States)

    Su, Chunhua; Bao, Feng; Zhou, Jianying; Takagi, Tsuyoshi; Sakurai, Kouichi

    Due to the fast development of Internet and the related IT technologies, it becomes more and more easier to access a large amount of data. k-means clustering is a powerful and frequently used technique in data mining. Many research papers about privacy-preserving k-means clustering were published. In this paper, we analyze the existing privacy-preserving k-means clustering schemes based on the cryptographic techniques. We show those schemes will cause the privacy breach and cannot output the correct results due to the faults in the protocol construction. Furthermore, we analyze our proposal as an option to improve such problems but with intermediate information breach during the computation.

  11. Project management in mine actions using Multi-Criteria-Analysis-based decision support system

    Directory of Open Access Journals (Sweden)

    Marko Mladineo

    2014-12-01

    Full Text Available In this paper, a Web-based Decision Support System (Web DSS, that supports humanitarian demining operations and restoration of mine-contaminated areas, is presented. The financial shortage usually triggers a need for priority setting in Project Management in Mine actions. As part of the FP7 Project TIRAMISU, a specialized Web DSS has been developed to achieve a fully transparent priority setting process. It allows stakeholders and donors to actively join the decision making process using a user-friendly and intuitive Web application. The main advantage of this Web DSS is its unique way of managing a mine action project using Multi-Criteria Analysis (MCA, namely the PROMETHEE method, in order to select priorities for demining actions. The developed Web DSS allows decision makers to use several predefined scenarios (different criteria weights or to develop their own, so it allows project managers to compare different demining possibilities with ease.

  12. Residual subsidence analysis after the end of coal mine work. Example from Lorraine Colliery, France

    International Nuclear Information System (INIS)

    Al Heib, M.; Nicolas, M.; Noirel, J.F.; Wojtkowiak, F.

    2005-01-01

    This paper describes the residual movements associated with the deep coal mines. The studied case relates to works located into Lorraine coal basin. The paper is divided into two sections. The first one describes subsidence phenomena, especially the residual phase in terms of amplitude, duration and localization. The second one focus on Morsbach case: the total and residual subsidence measurements will be analyzed and compared to the state of the art as well as the currant knowledge. The results of the analysis show that the duration of residual movements does not exceed 24 months and their amplitude is about 5% of total subsidence. We analyze also the declarations of the mining damage during and after the mining period. Damages occur, after this period are probably due to late observations. (authors)

  13. A comparison of heuristic and model-based clustering methods for dietary pattern analysis.

    Science.gov (United States)

    Greve, Benjamin; Pigeot, Iris; Huybrechts, Inge; Pala, Valeria; Börnhorst, Claudia

    2016-02-01

    Cluster analysis is widely applied to identify dietary patterns. A new method based on Gaussian mixture models (GMM) seems to be more flexible compared with the commonly applied k-means and Ward's method. In the present paper, these clustering approaches are compared to find the most appropriate one for clustering dietary data. The clustering methods were applied to simulated data sets with different cluster structures to compare their performance knowing the true cluster membership of observations. Furthermore, the three methods were applied to FFQ data assessed in 1791 children participating in the IDEFICS (Identification and Prevention of Dietary- and Lifestyle-Induced Health Effects in Children and Infants) Study to explore their performance in practice. The GMM outperformed the other methods in the simulation study in 72 % up to 100 % of cases, depending on the simulated cluster structure. Comparing the computationally less complex k-means and Ward's methods, the performance of k-means was better in 64-100 % of cases. Applied to real data, all methods identified three similar dietary patterns which may be roughly characterized as a 'non-processed' cluster with a high consumption of fruits, vegetables and wholemeal bread, a 'balanced' cluster with only slight preferences of single foods and a 'junk food' cluster. The simulation study suggests that clustering via GMM should be preferred due to its higher flexibility regarding cluster volume, shape and orientation. The k-means seems to be a good alternative, being easier to use while giving similar results when applied to real data.

  14. Common Factor Analysis Versus Principal Component Analysis: Choice for Symptom Cluster Research

    Directory of Open Access Journals (Sweden)

    Hee-Ju Kim, PhD, RN

    2008-03-01

    Conclusion: If the study purpose is to explain correlations among variables and to examine the structure of the data (this is usual for most cases in symptom cluster research, CFA provides a more accurate result. If the purpose of a study is to summarize data with a smaller number of variables, PCA is the choice. PCA can also be used as an initial step in CFA because it provides information regarding the maximum number and nature of factors. In using factor analysis for symptom cluster research, several issues need to be considered, including subjectivity of solution, sample size, symptom selection, and level of measure.

  15. Identifying novel phenotypes of acute heart failure using cluster analysis of clinical variables.

    Science.gov (United States)

    Horiuchi, Yu; Tanimoto, Shuzou; Latif, A H M Mahbub; Urayama, Kevin Y; Aoki, Jiro; Yahagi, Kazuyuki; Okuno, Taishi; Sato, Yu; Tanaka, Tetsu; Koseki, Keita; Komiyama, Kota; Nakajima, Hiroyoshi; Hara, Kazuhiro; Tanabe, Kengo

    2018-07-01

    Acute heart failure (AHF) is a heterogeneous disease caused by various cardiovascular (CV) pathophysiology and multiple non-CV comorbidities. We aimed to identify clinically important subgroups to improve our understanding of the pathophysiology of AHF and inform clinical decision-making. We evaluated detailed clinical data of 345 consecutive AHF patients using non-hierarchical cluster analysis of 77 variables, including age, sex, HF etiology, comorbidities, physical findings, laboratory data, electrocardiogram, echocardiogram and treatment during hospitalization. Cox proportional hazards regression analysis was performed to estimate the association between the clusters and clinical outcomes. Three clusters were identified. Cluster 1 (n=108) represented "vascular failure". This cluster had the highest average systolic blood pressure at admission and lung congestion with type 2 respiratory failure. Cluster 2 (n=89) represented "cardiac and renal failure". They had the lowest ejection fraction (EF) and worst renal function. Cluster 3 (n=148) comprised mostly older patients and had the highest prevalence of atrial fibrillation and preserved EF. Death or HF hospitalization within 12-month occurred in 23% of Cluster 1, 36% of Cluster 2 and 36% of Cluster 3 (p=0.034). Compared with Cluster 1, risk of death or HF hospitalization was 1.74 (95% CI, 1.03-2.95, p=0.037) for Cluster 2 and 1.82 (95% CI, 1.13-2.93, p=0.014) for Cluster 3. Cluster analysis may be effective in producing clinically relevant categories of AHF, and may suggest underlying pathophysiology and potential utility in predicting clinical outcomes. Copyright © 2018 Elsevier B.V. All rights reserved.

  16. Human Behavior Analysis by Means of Multimodal Context Mining

    Directory of Open Access Journals (Sweden)

    Oresti Banos

    2016-08-01

    Full Text Available There is sufficient evidence proving the impact that negative lifestyle choices have on people’s health and wellness. Changing unhealthy behaviours requires raising people’s self-awareness and also providing healthcare experts with a thorough and continuous description of the user’s conduct. Several monitoring techniques have been proposed in the past to track users’ behaviour; however, these approaches are either subjective and prone to misreporting, such as questionnaires, or only focus on a specific component of context, such as activity counters. This work presents an innovative multimodal context mining framework to inspect and infer human behaviour in a more holistic fashion. The proposed approach extends beyond the state-of-the-art, since it not only explores a sole type of context, but also combines diverse levels of context in an integral manner. Namely, low-level contexts, including activities, emotions and locations, are identified from heterogeneous sensory data through machine learning techniques. Low-level contexts are combined using ontological mechanisms to derive a more abstract representation of the user’s context, here referred to as high-level context. An initial implementation of the proposed framework supporting real-time context identification is also presented. The developed system is evaluated for various realistic scenarios making use of a novel multimodal context open dataset and data on-the-go, demonstrating prominent context-aware capabilities at both low and high levels.

  17. A novel water quality data analysis framework based on time-series data mining.

    Science.gov (United States)

    Deng, Weihui; Wang, Guoyin

    2017-07-01

    The rapid development of time-series data mining provides an emerging method for water resource management research. In this paper, based on the time-series data mining methodology, we propose a novel and general analysis framework for water quality time-series data. It consists of two parts: implementation components and common tasks of time-series data mining in water quality data. In the first part, we propose to granulate the time series into several two-dimensional normal clouds and calculate the similarities in the granulated level. On the basis of the similarity matrix, the similarity search, anomaly detection, and pattern discovery tasks in the water quality time-series instance dataset can be easily implemented in the second part. We present a case study of this analysis framework on weekly Dissolve Oxygen time-series data collected from five monitoring stations on the upper reaches of Yangtze River, China. It discovered the relationship of water quality in the mainstream and tributary as well as the main changing patterns of DO. The experimental results show that the proposed analysis framework is a feasible and efficient method to mine the hidden and valuable knowledge from water quality historical time-series data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. The Flemish frozen-vegetable industry as an example of cluster analysis : Flanders Vegetable Valley

    NARCIS (Netherlands)

    Vanhaverbeke, W.P.M.; Larosse, J.; Winnen, W.; Hulsink, W.; Dons, J.J.M.

    2008-01-01

    In this contribution we present a strategic analysis of the cluster dynamics in the frozen-vegetable industry in Flanders (Belgium)1. The main purpose of this case is twofold. First, we determine the added value of using data about customer and supplier relationships in cluster analysis. Second, we

  19. Tracking Undergraduate Student Achievement in a First-Year Physiology Course Using a Cluster Analysis Approach

    Science.gov (United States)

    Brown, S. J.; White, S.; Power, N.

    2015-01-01

    A cluster analysis data classification technique was used on assessment scores from 157 undergraduate nursing students who passed 2 successive compulsory courses in human anatomy and physiology. Student scores in five summative assessment tasks, taken in each of the courses, were used as inputs for a cluster analysis procedure. We aimed to group…

  20. Performance Analysis of Cluster Formation in Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Edgar Romo Montiel

    2017-12-01

    Full Text Available Clustered-based wireless sensor networks have been extensively used in the literature in order to achieve considerable energy consumption reductions. However, two aspects of such systems have been largely overlooked. Namely, the transmission probability used during the cluster formation phase and the way in which cluster heads are selected. Both of these issues have an important impact on the performance of the system. For the former, it is common to consider that sensor nodes in a clustered-based Wireless Sensor Network (WSN use a fixed transmission probability to send control data in order to build the clusters. However, due to the highly variable conditions experienced by these networks, a fixed transmission probability may lead to extra energy consumption. In view of this, three different transmission probability strategies are studied: optimal, fixed and adaptive. In this context, we also investigate cluster head selection schemes, specifically, we consider two intelligent schemes based on the fuzzy C-means and k-medoids algorithms and a random selection with no intelligence. We show that the use of intelligent schemes greatly improves the performance of the system, but their use entails higher complexity and selection delay. The main performance metrics considered in this work are energy consumption, successful transmission probability and cluster formation latency. As an additional feature of this work, we study the effect of errors in the wireless channel and the impact on the performance of the system under the different transmission probability schemes.

  1. Performance Analysis of Cluster Formation in Wireless Sensor Networks.

    Science.gov (United States)

    Montiel, Edgar Romo; Rivero-Angeles, Mario E; Rubino, Gerardo; Molina-Lozano, Heron; Menchaca-Mendez, Rolando; Menchaca-Mendez, Ricardo

    2017-12-13

    Clustered-based wireless sensor networks have been extensively used in the literature in order to achieve considerable energy consumption reductions. However, two aspects of such systems have been largely overlooked. Namely, the transmission probability used during the cluster formation phase and the way in which cluster heads are selected. Both of these issues have an important impact on the performance of the system. For the former, it is common to consider that sensor nodes in a clustered-based Wireless Sensor Network (WSN) use a fixed transmission probability to send control data in order to build the clusters. However, due to the highly variable conditions experienced by these networks, a fixed transmission probability may lead to extra energy consumption. In view of this, three different transmission probability strategies are studied: optimal, fixed and adaptive. In this context, we also investigate cluster head selection schemes, specifically, we consider two intelligent schemes based on the fuzzy C-means and k-medoids algorithms and a random selection with no intelligence. We show that the use of intelligent schemes greatly improves the performance of the system, but their use entails higher complexity and selection delay. The main performance metrics considered in this work are energy consumption, successful transmission probability and cluster formation latency. As an additional feature of this work, we study the effect of errors in the wireless channel and the impact on the performance of the system under the different transmission probability schemes.

  2. Higgs pair production: choosing benchmarks with cluster analysis

    Energy Technology Data Exchange (ETDEWEB)

    Carvalho, Alexandra; Dall’Osso, Martino; Dorigo, Tommaso [Dipartimento di Fisica e Astronomia and INFN, Sezione di Padova,Via Marzolo 8, I-35131 Padova (Italy); Goertz, Florian [CERN,1211 Geneva 23 (Switzerland); Gottardo, Carlo A. [Physikalisches Institut, Universität Bonn,Nussallee 12, 53115 Bonn (Germany); Tosi, Mia [CERN,1211 Geneva 23 (Switzerland)

    2016-04-20

    New physics theories often depend on a large number of free parameters. The phenomenology they predict for fundamental physics processes is in some cases drastically affected by the precise value of those free parameters, while in other cases is left basically invariant at the level of detail experimentally accessible. When designing a strategy for the analysis of experimental data in the search for a signal predicted by a new physics model, it appears advantageous to categorize the parameter space describing the model according to the corresponding kinematical features of the final state. A multi-dimensional test statistic can be used to gauge the degree of similarity in the kinematics predicted by different models; a clustering algorithm using that metric may allow the division of the space into homogeneous regions, each of which can be successfully represented by a benchmark point. Searches targeting those benchmarks are then guaranteed to be sensitive to a large area of the parameter space. In this document we show a practical implementation of the above strategy for the study of non-resonant production of Higgs boson pairs in the context of extensions of the standard model with anomalous couplings of the Higgs bosons. A non-standard value of those couplings may significantly enhance the Higgs boson pair-production cross section, such that the process could be detectable with the data that the LHC will collect in Run 2.

  3. Clusters of galaxies as tools in observational cosmology : results from x-ray analysis

    International Nuclear Information System (INIS)

    Weratschnig, J.M.

    2009-01-01

    Clusters of galaxies are the largest gravitationally bound structures in the universe. They can be used as ideal tools to study large scale structure formation (e.g. when studying merger clusters) and provide highly interesting environments to analyse several characteristic interaction processes (like ram pressure stripping of galaxies, magnetic fields). In this dissertation thesis, we have studied several clusters of galaxies using X-ray observations. To obtain scientific results, we have applied different data reduction and analysis methods. With a combination of morphological and spectral analysis, the merger cluster Abell 514 was studied in much detail. It has a highly interesting morphology and shows signs for an ongoing merger as well as a shock. using a new method to detect substructure, we have analysed several clusters to determine whether any substructure is present in the X-ray image. This hints towards a real structure in the distribution of the intra-cluster medium (ICM) and is evidence for ongoing mergers. The results from this analysis are extensively used with the cluster of galaxies Abell S1136. Here, we study the ICM distribution and compare its structure with the spatial distribution of star forming galaxies. Cluster magnetic fields are another important topic of my thesis. They can be studied in Radio observations, which can be put into relation with results from X-ray observations. using observational data from several clusters, we could support the theory that cluster magnetic fields are frozen into the ICM. (author)

  4. Interactive K-Means Clustering Method Based on User Behavior for Different Analysis Target in Medicine.

    Science.gov (United States)

    Lei, Yang; Yu, Dai; Bin, Zhang; Yang, Yang

    2017-01-01

    Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor's knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K -means clustering method to improve the user's satisfactions towards the result. The core of this method is to get the user's feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user's business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user's requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.

  5. Phenotypic clustering: a novel method for microglial morphology analysis.

    Science.gov (United States)

    Verdonk, Franck; Roux, Pascal; Flamant, Patricia; Fiette, Laurence; Bozza, Fernando A; Simard, Sébastien; Lemaire, Marc; Plaud, Benoit; Shorte, Spencer L; Sharshar, Tarek; Chrétien, Fabrice; Danckaert, Anne

    2016-06-17

    Microglial cells are tissue-resident macrophages of the central nervous system. They are extremely dynamic, sensitive to their microenvironment and present a characteristic complex and heterogeneous morphology and distribution within the brain tissue. Many experimental clues highlight a strong link between their morphology and their function in response to aggression. However, due to their complex "dendritic-like" aspect that constitutes the major pool of murine microglial cells and their dense network, precise and powerful morphological studies are not easy to realize and complicate correlation with molecular or clinical parameters. Using the knock-in mouse model CX3CR1(GFP/+), we developed a 3D automated confocal tissue imaging system coupled with morphological modelling of many thousands of microglial cells revealing precise and quantitative assessment of major cell features: cell density, cell body area, cytoplasm area and number of primary, secondary and tertiary processes. We determined two morphological criteria that are the complexity index (CI) and the covered environment area (CEA) allowing an innovative approach lying in (i) an accurate and objective study of morphological changes in healthy or pathological condition, (ii) an in situ mapping of the microglial distribution in different neuroanatomical regions and (iii) a study of the clustering of numerous cells, allowing us to discriminate different sub-populations. Our results on more than 20,000 cells by condition confirm at baseline a regional heterogeneity of the microglial distribution and phenotype that persists after induction of neuroinflammation by systemic injection of lipopolysaccharide (LPS). Using clustering analysis, we highlight that, at resting state, microglial cells are distributed in four microglial sub-populations defined by their CI and CEA with a regional pattern and a specific behaviour after challenge. Our results counteract the classical view of a homogenous regional resting

  6. Cluster Computing For Real Time Seismic Array Analysis.

    Science.gov (United States)

    Martini, M.; Giudicepietro, F.

    A seismic array is an instrument composed by a dense distribution of seismic sen- sors that allow to measure the directional properties of the wavefield (slowness or wavenumber vector) radiated by a seismic source. Over the last years arrays have been widely used in different fields of seismological researches. In particular they are applied in the investigation of seismic sources on volcanoes where they can be suc- cessfully used for studying the volcanic microtremor and long period events which are critical for getting information on the volcanic systems evolution. For this reason arrays could be usefully employed for the volcanoes monitoring, however the huge amount of data produced by this type of instruments and the processing techniques which are quite time consuming limited their potentiality for this application. In order to favor a direct application of arrays techniques to continuous volcano monitoring we designed and built a small PC cluster able to near real time computing the kinematics properties of the wavefield (slowness or wavenumber vector) produced by local seis- mic source. The cluster is composed of 8 Intel Pentium-III bi-processors PC working at 550 MHz, and has 4 Gigabytes of RAM memory. It runs under Linux operating system. The developed analysis software package is based on the Multiple SIgnal Classification (MUSIC) algorithm and is written in Fortran. The message-passing part is based upon the LAM programming environment package, an open-source imple- mentation of the Message Passing Interface (MPI). The developed software system includes modules devote to receiving date by internet and graphical applications for the continuous displaying of the processing results. The system has been tested with a data set collected during a seismic experiment conducted on Etna in 1999 when two dense seismic arrays have been deployed on the northeast and the southeast flanks of this volcano. A real time continuous acquisition system has been simulated by

  7. Global classification of human facial healthy skin using PLS discriminant analysis and clustering analysis.

    Science.gov (United States)

    Guinot, C; Latreille, J; Tenenhaus, M; Malvy, D J

    2001-04-01

    Today's classifications of healthy skin are predominantly based on a very limited number of skin characteristics, such as skin oiliness or susceptibility to sun exposure. The aim of the present analysis was to set up a global classification of healthy facial skin, using mathematical models. This classification is based on clinical, biophysical skin characteristics and self-reported information related to the skin, as well as the results of a theoretical skin classification assessed separately for the frontal and the malar zones of the face. In order to maximize the predictive power of the models with a minimum of variables, the Partial Least Square (PLS) discriminant analysis method was used. The resulting PLS components were subjected to clustering analyses to identify the plausible number of clusters and to group the individuals according to their proximities. Using this approach, four PLS components could be constructed and six clusters were found relevant. So, from the 36 hypothetical combinations of the theoretical skin types classification, we tended to a strengthened six classes proposal. Our data suggest that the association of the PLS discriminant analysis and the clustering methods leads to a valid and simple way to classify healthy human skin and represents a potentially useful tool for cosmetic and dermatological research.

  8. SPATIAL CLUSTER AND OUTLIER IDENTIFICATION OF GEOCHEMICAL ASSOCIATION OF ELEMENTS: A CASE STUDY IN JUIRUI COPPER MINING AREA

    Directory of Open Access Journals (Sweden)

    Tien Thanh NGUYEN

    2016-12-01

    Full Text Available Spatial clusters and spatial outliers play an important role in the study of the spatial distribution patterns of geochemical data. They characterize the fundamental properties of mineralization processes, the spatial distribution of mineral deposits, and ore element concentrations in mineral districts. In this study, a new method for the study of spatial distribution patterns of multivariate data is proposed based on a combination of robust Mahalanobis distance and local Moran’s Ii. In order to construct the spatial matrix, the Moran's I spatial correlogram was first used to determine the range. The robust Mahalanobis distances were then computed for an association of elements. Finally, local Moran’s Ii statistics was used to measure the degree of spatial association and discover the spatial distribution patterns of associations of Cu, Au, Mo, Ag, Pb, Zn, As, and Sb elements including spatial clusters and spatial outliers. Spatial patterns were analyzed at six different spatial scales (2km, 4 km, 6 km, 8 km, 10 km and 12 km for both the raw data and Box-Cox transformed data. The results show that identified spatial cluster and spatial outlier areas using local Moran’s Ii and the robust Mahalanobis accord the objective reality and have a good conformity with known deposits in the study area.

  9. Comparative analysis of clustering methods for gene expression time course data

    Directory of Open Access Journals (Sweden)

    Ivan G. Costa

    2004-01-01

    Full Text Available This work performs a data driven comparative study of clustering methods used in the analysis of gene expression time courses (or time series. Five clustering methods found in the literature of gene expression analysis are compared: agglomerative hierarchical clustering, CLICK, dynamical clustering, k-means and self-organizing maps. In order to evaluate the methods, a k-fold cross-validation procedure adapted to unsupervised methods is applied. The accuracy of the results is assessed by the comparison of the partitions obtained in these experiments with gene annotation, such as protein function and series classification.

  10. Modeling N Cycling during Succession after Forest Disturbance: an Analysis of N Mining and Retention Hypothesis

    Science.gov (United States)

    Zhou, Z.; Ollinger, S. V.; Ouimette, A.; Lovett, G. M.; Fuss, C. B.; Goodale, C. L.

    2017-12-01

    Dissolved inorganic nitrogen losses at the Hubbard Brook Experimental Forest (HBEF), New Hampshire, USA, have declined in recent decades, a pattern that counters expectations based on prevailing theory. An unbalanced ecosystem nitrogen (N) budget implies there is a missing component for N sink. Hypotheses to explain this discrepancy include increasing rates of denitrification and accumulation of N in mineral soil pools following N mining by plants. Here, we conducted a modeling analysis fused with field measurements of N cycling, specifically examining the hypothesis relevant to N mining and retention in mineral soils. We included simplified representations of both mechanisms, N mining and retention, in a revised ecosystem process model, PnET-SOM, to evaluate the dynamics of N cycling during succession after forest disturbance at the HBEF. The predicted N mining during the early succession was regulated by a metric representing a potential demand of extra soil N for large wood growth. The accumulation of nitrate in mineral soil pools was a function of the net aboveground biomass accumulation and soil N availability and parameterized based on field 15N tracer incubation data. The predicted patterns of forest N dynamics were consistent with observations. The addition of the new algorithms also improved the predicted DIN export in stream water with an R squared of 0.35 (Ppay back the mined N in mineral soils. Predicted ecosystem N balance showed that N gas loss could account for 14-46% of the total N deposition, the soil mining about 103% during the early succession, and soil retention about 35% at the current forest stage at the HBEF.

  11. Evaluation model of commercial geological exploration and mining development project and analysis of some technical problems in commercial negotiation

    International Nuclear Information System (INIS)

    Yao Zhenkai

    2012-01-01

    A composite evaluation model of commercial geological exploration and mining development project was discussed, this new model consists of polity-economy-technique (PET) synthetic evaluation sub-model and geology-mining-metallurgy (GMM) technique evaluation sub-model. Besides, some key technical problems in commercial negotiation, such as information screening, quoted price and analysis of deadline, were briefly analyzed. (author)

  12. Hierarchical cluster analysis of progression patterns in open-angle glaucoma patients with medical treatment.

    Science.gov (United States)

    Bae, Hyoung Won; Rho, Seungsoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2014-04-29

    To classify medically treated open-angle glaucoma (OAG) by the pattern of progression using hierarchical cluster analysis, and to determine OAG progression characteristics by comparing clusters. Ninety-five eyes of 95 OAG patients who received medical treatment, and who had undergone visual field (VF) testing at least once per year for 5 or more years. OAG was classified into subgroups using hierarchical cluster analysis based on the following five variables: baseline mean deviation (MD), baseline visual field index (VFI), MD slope, VFI slope, and Glaucoma Progression Analysis (GPA) printout. After that, other parameters were compared between clusters. Two clusters were made after a hierarchical cluster analysis. Cluster 1 showed -4.06 ± 2.43 dB baseline MD, 92.58% ± 6.27% baseline VFI, -0.28 ± 0.38 dB per year MD slope, -0.52% ± 0.81% per year VFI slope, and all "no progression" cases in GPA printout, whereas cluster 2 showed -8.68 ± 3.81 baseline MD, 77.54 ± 12.98 baseline VFI, -0.72 ± 0.55 MD slope, -2.22 ± 1.89 VFI slope, and seven "possible" and four "likely" progression cases in GPA printout. There were no significant differences in age, sex, mean IOP, central corneal thickness, and axial length between clusters. However, cluster 2 included more high-tension glaucoma patients and used a greater number of antiglaucoma eye drops significantly compared with cluster 1. Hierarchical cluster analysis of progression patterns divided OAG into slow and fast progression groups, evidenced by assessing the parameters of glaucomatous progression in VF testing. In the fast progression group, the prevalence of high-tension glaucoma was greater and the number of antiglaucoma medications administered was increased versus the slow progression group. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.

  13. OMERACT-based fibromyalgia symptom subgroups: an exploratory cluster analysis.

    Science.gov (United States)

    Vincent, Ann; Hoskin, Tanya L; Whipple, Mary O; Clauw, Daniel J; Barton, Debra L; Benzo, Roberto P; Williams, David A

    2014-10-16

    The aim of this study was to identify subsets of patients with fibromyalgia with similar symptom profiles using the Outcome Measures in Rheumatology (OMERACT) core symptom domains. Female patients with a diagnosis of fibromyalgia and currently meeting fibromyalgia research survey criteria completed the Brief Pain Inventory, the 30-item Profile of Mood States, the Medical Outcomes Sleep Scale, the Multidimensional Fatigue Inventory, the Multiple Ability Self-Report Questionnaire, the Fibromyalgia Impact Questionnaire-Revised (FIQ-R) and the Short Form-36 between 1 June 2011 and 31 October 2011. Hierarchical agglomerative clustering was used to identify subgroups of patients with similar symptom profiles. To validate the results from this sample, hierarchical agglomerative clustering was repeated in an external sample of female patients with fibromyalgia with similar inclusion criteria. A total of 581 females with a mean age of 55.1 (range, 20.1 to 90.2) years were included. A four-cluster solution best fit the data, and each clustering variable differed significantly (P FIQ-R total scores (P = 0.0004)). In our study, we incorporated core OMERACT symptom domains, which allowed for clustering based on a comprehensive symptom profile. Although our exploratory cluster solution needs confirmation in a longitudinal study, this approach could provide a rationale to support the study of individualized clinical evaluation and intervention.

  14. Comparison of Outputs for Variable Combinations Used in Cluster Analysis on Polarmetric Imagery

    National Research Council Canada - National Science Library

    Petre, Melinda

    2008-01-01

    .... More specifically, two techniques, Cluster Analysis (CA) and Principle Component Analysis (PCA) can be combined to process Stoke s imagery by distinguishing between pixels, and producing groups of pixels with similar characteristics...

  15. Symptom Clusters in People Living with HIV Attending Five Palliative Care Facilities in Two Sub-Saharan African Countries: A Hierarchical Cluster Analysis.

    Science.gov (United States)

    Moens, Katrien; Siegert, Richard J; Taylor, Steve; Namisango, Eve; Harding, Richard

    2015-01-01

    Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries. Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward's method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores. Among the sample (N=217) the mean age was 36.5 (SD 9.0), 73.2% were female, and 49.1% were on antiretroviral therapy (ART). The cluster analysis produced five symptom clusters identified as: 1) dermatological; 2) generalised anxiety and elimination; 3) social and image; 4) persistently present; and 5) a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, ppeople living with HIV with longitudinally collected symptom data to test cluster stability and identify common symptom trajectories is recommended.

  16. The quantitative analysis of silicon carbide surface smoothing by Ar and Xe cluster ions

    Science.gov (United States)

    Ieshkin, A. E.; Kireev, D. S.; Ermakov, Yu. A.; Trifonov, A. S.; Presnov, D. E.; Garshev, A. V.; Anufriev, Yu. V.; Prokhorova, I. G.; Krupenin, V. A.; Chernysh, V. S.

    2018-04-01

    The gas cluster ion beam technique was used for the silicon carbide crystal surface smoothing. The effect of processing by two inert cluster ions, argon and xenon, was quantitatively compared. While argon is a standard element for GCIB, results for xenon clusters were not reported yet. Scanning probe microscopy and high resolution transmission electron microscopy techniques were used for the analysis of the surface roughness and surface crystal layer quality. The gas cluster ion beam processing results in surface relief smoothing down to average roughness about 1 nm for both elements. It was shown that xenon as the working gas is more effective: sputtering rate for xenon clusters is 2.5 times higher than for argon at the same beam energy. High resolution transmission electron microscopy analysis of the surface defect layer gives values of 7 ± 2 nm and 8 ± 2 nm for treatment with argon and xenon clusters.

  17. [Principal component analysis and cluster analysis of inorganic elements in sea cucumber Apostichopus japonicus].

    Science.gov (United States)

    Liu, Xiao-Fang; Xue, Chang-Hu; Wang, Yu-Ming; Li, Zhao-Jie; Xue, Yong; Xu, Jie

    2011-11-01

    The present study is to investigate the feasibility of multi-elements analysis in determination of the geographical origin of sea cucumber Apostichopus japonicus, and to make choice of the effective tracers in sea cucumber Apostichopus japonicus geographical origin assessment. The content of the elements such as Al, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Mo, Cd, Hg and Pb in sea cucumber Apostichopus japonicus samples from seven places of geographical origin were determined by means of ICP-MS. The results were used for the development of elements database. Cluster analysis(CA) and principal component analysis (PCA) were applied to differentiate the sea cucumber Apostichopus japonicus geographical origin. Three principal components which accounted for over 89% of the total variance were extracted from the standardized data. The results of Q-type cluster analysis showed that the 26 samples could be clustered reasonably into five groups, the classification results were significantly associated with the marine distribution of the sea cucumber Apostichopus japonicus samples. The CA and PCA were the effective methods for elements analysis of sea cucumber Apostichopus japonicus samples. The content of the mineral elements in sea cucumber Apostichopus japonicus samples was good chemical descriptors for differentiating their geographical origins.

  18. Global myeloma research clusters, output, and citations: a bibliometric mapping and clustering analysis.

    Directory of Open Access Journals (Sweden)

    Jens Peter Andersen

    Full Text Available International collaborative research is a mechanism for improving the development of disease-specific therapies and for improving health at the population level. However, limited data are available to assess the trends in research output related to orphan diseases.We used bibliometric mapping and clustering methods to illustrate the level of fragmentation in myeloma research and the development of collaborative efforts. Publication data from Thomson Reuters Web of Science were retrieved for 2005-2009 and followed until 2013. We created a database of multiple myeloma publications, and we analysed impact and co-authorship density to identify scientific collaborations, developments, and international key players over time. The global annual publication volume for studies on multiple myeloma increased from 1,144 in 2005 to 1,628 in 2009, which represents a 43% increase. This increase is high compared to the 24% and 14% increases observed for lymphoma and leukaemia. The major proportion (>90% of publications was from the US and EU over the study period. The output and impact in terms of citations, identified several successful groups with a large number of intra-cluster collaborations in the US and EU. The US-based myeloma clusters clearly stand out as the most productive and highly cited, and the European Myeloma Network members exhibited a doubling of collaborative publications from 2005 to 2009, still increasing up to 2013.Multiple myeloma research output has increased substantially in the past decade. The fragmented European myeloma research activities based on national or regional groups are progressing, but they require a broad range of targeted research investments to improve multiple myeloma health care.

  19. Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis.

    Science.gov (United States)

    Kim, Mi Ae; Shin, Seung Woo; Park, Jong Sook; Uh, Soo Taek; Chang, Hun Soo; Bae, Da Jeong; Cho, You Sook; Park, Hae Sim; Yoon, Ho Joo; Choi, Byoung Whui; Kim, Yong Hoon; Park, Choon Sik

    2017-11-01

    Asthma is a heterogeneous disease characterized by various types of airway inflammation and obstruction. Therefore, it is classified into several subphenotypes, such as early-onset atopic, obese non-eosinophilic, benign, and eosinophilic asthma, using cluster analysis. A number of asthmatics frequently experience exacerbation over a long-term follow-up period, but the exacerbation-prone subphenotype has rarely been evaluated by cluster analysis. This prompted us to identify clusters reflecting asthma exacerbation. A uniform cluster analysis method was applied to 259 adult asthmatics who were regularly followed-up for over 1 year using 12 variables, selected on the basis of their contribution to asthma phenotypes. After clustering, clinical profiles and exacerbation rates during follow-up were compared among the clusters. Four subphenotypes were identified: cluster 1 was comprised of patients with early-onset atopic asthma with preserved lung function, cluster 2 late-onset non-atopic asthma with impaired lung function, cluster 3 early-onset atopic asthma with severely impaired lung function, and cluster 4 late-onset non-atopic asthma with well-preserved lung function. The patients in clusters 2 and 3 were identified as exacerbation-prone asthmatics, showing a higher risk of asthma exacerbation. Two different phenotypes of exacerbation-prone asthma were identified among Korean asthmatics using cluster analysis; both were characterized by impaired lung function, but the age at asthma onset and atopic status were different between the two. Copyright © 2017 The Korean Academy of Asthma, Allergy and Clinical Immunology · The Korean Academy of Pediatric Allergy and Respiratory Disease

  20. Towards educational data mining: Using data mining methods for automated chat analysis to understand and support inquiry learning processes

    OpenAIRE

    Anjewierden , Anjo; Kolloffel , Bas; Hulshof , Casper

    2007-01-01

    In this paper we investigate the application of data mining methods to provide learners with real-time adaptive feedback on the nature and patterns of their on-line communication while learning collaboratively.We derived two models for classifying chat messages using data mining techniques and tested these on an actual data set [16]. The reliability of the classification of chat messages is established by comparing the models performance to that of humans. Results indicate that the classifica...

  1. Cluster analysis of tropical cyclone tracks in the Southern Hemisphere

    Energy Technology Data Exchange (ETDEWEB)

    Ramsay, Hamish A. [Monash University, Monash Weather and Climate, School of Mathematical Sciences, Clayton, VIC (Australia); Camargo, Suzana J.; Kim, Daehyun [Columbia University, Lamont-Doherty Earth Observatory, Palisades, NY (United States)

    2012-08-15

    A probabilistic clustering method is used to describe various aspects of tropical cyclone (TC) tracks in the Southern Hemisphere, for the period 1969-2008. A total of 7 clusters are examined: three in the South Indian Ocean, three in the Australian Region, and one in the South Pacific Ocean. Large-scale environmental variables related to TC genesis in each cluster are explored, including sea surface temperature, low-level relative vorticity, deep-layer vertical wind shear, outgoing longwave radiation, El Nino-Southern Oscillation (ENSO) and the Madden-Julian Oscillation (MJO). Composite maps, constructed 2 days prior to genesis, show some of these to be significant precursors to TC formation - most prominently, westerly wind anomalies equatorward of the main development regions. Clusters are also evaluated with respect to their genesis location, seasonality, mean peak intensity, track duration, landfall location, and intensity at landfall. ENSO is found to play a significant role in modulating annual frequency and mean genesis location in three of the seven clusters (two in the South Indian Ocean and one in the Pacific). The ENSO-modulating effect on genesis frequency is caused primarily by changes in low-level zonal flow between the equator and 10 S, and associated relative vorticity changes in the main development regions. ENSO also has a significant effect on mean genesis location in three clusters, with TCs forming further equatorward (poleward) during El Nino (La Nina) in addition to large shifts in mean longitude. The MJO has a strong influence on TC genesis in all clusters, though the amount modulation is found to be sensitive to the definition of the MJO. (orig.)

  2. A clustering analysis of lipoprotein diameters in the metabolic syndrome

    Directory of Open Access Journals (Sweden)

    Frazier-Wood Alexis C

    2011-12-01

    Full Text Available Abstract Background The presence of smaller low-density lipoproteins (LDL has been associated with atherosclerosis risk, and the insulin resistance (IR underlying the metabolic syndrome (MetS. In addition, some research has supported the association of very low-, low- and high-density lipoprotein (VLDL HDL particle diameters with components of the metabolic syndrome (MetS, although this has been the focus of less research. We aimed to explore the relationship of VLDL, LDL and HDL diameters to MetS and its features, and by clustering individuals by their diameters of VLDL, LDL and HDL particles, to capture information across all three fractions of lipoprotein into a unified phenotype. Methods We used nuclear magnetic resonance spectroscopy measurements on fasting plasma samples from a general population sample of 1,036 adults (mean ± SD, 48.8 ± 16.2 y of age. Using latent class analysis, the sample was grouped by the diameter of their fasting lipoproteins, and mixed effects models tested whether the distribution of MetS components varied across the groups. Results Eight discrete groups were identified. Two groups (N = 251 were enriched with individuals meeting criteria for the MetS, and were characterized by the smallest LDL/HDL diameters. One of those two groups, one was additionally distinguished by large VLDL, and had significantly higher blood pressure, fasting glucose, triglycerides, and waist circumference (WC; P Conclusions While small LDL diameters remain associated with IR and the MetS, the occurrence of these in conjunction with a shift to overall larger VLDL diameter may identify those with the highest fasting glucose, TG and WC within the MetS. If replicated, the association of this phenotype with more severe IR-features indicated that it may contribute to identifying of those most at risk for incident type II diabetes and cardiometabolic disease.

  3. Methodology сomparative statistical analysis of Russian industry based on cluster analysis

    Directory of Open Access Journals (Sweden)

    Sergey S. Shishulin

    2017-01-01

    Full Text Available The article is devoted to researching of the possibilities of applying multidimensional statistical analysis in the study of industrial production on the basis of comparing its growth rates and structure with other developed and developing countries of the world. The purpose of this article is to determine the optimal set of statistical methods and the results of their application to industrial production data, which would give the best access to the analysis of the result.Data includes such indicators as output, output, gross value added, the number of employed and other indicators of the system of national accounts and operational business statistics. The objects of observation are the industry of the countrys of the Customs Union, the United States, Japan and Erope in 2005-2015. As the research tool used as the simplest methods of transformation, graphical and tabular visualization of data, and methods of statistical analysis. In particular, based on a specialized software package (SPSS, the main components method, discriminant analysis, hierarchical methods of cluster analysis, Ward’s method and k-means were applied.The application of the method of principal components to the initial data makes it possible to substantially and effectively reduce the initial space of industrial production data. Thus, for example, in analyzing the structure of industrial production, the reduction was from fifteen industries to three basic, well-interpreted factors: the relatively extractive industries (with a low degree of processing, high-tech industries and consumer goods (medium-technology sectors. At the same time, as a result of comparison of the results of application of cluster analysis to the initial data and data obtained on the basis of the principal components method, it was established that clustering industrial production data on the basis of new factors significantly improves the results of clustering.As a result of analyzing the parameters of

  4. Cluster-cluster clustering

    International Nuclear Information System (INIS)

    Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C.S.; Yale Univ., New Haven, CT; California Univ., Santa Barbara; Cambridge Univ., England; Sussex Univ., Brighton, England)

    1985-01-01

    The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales. 30 references

  5. Cluster Analysis of Acute Care Use Yields Insights for Tailored Pediatric Asthma Interventions.

    Science.gov (United States)

    Abir, Mahshid; Truchil, Aaron; Wiest, Dawn; Nelson, Daniel B; Goldstick, Jason E; Koegel, Paul; Lozon, Marie M; Choi, Hwajung; Brenner, Jeffrey

    2017-09-01

    We undertake this study to understand patterns of pediatric asthma-related acute care use to inform interventions aimed at reducing potentially avoidable hospitalizations. Hospital claims data from 3 Camden city facilities for 2010 to 2014 were used to perform cluster analysis classifying patients aged 0 to 17 years according to their asthma-related hospital use. Clusters were based on 2 variables: asthma-related ED visits and hospitalizations. Demographics and a number of sociobehavioral and use characteristics were compared across clusters. Children who met the criteria (3,170) were included in the analysis. An examination of a scree plot showing the decline in within-cluster heterogeneity as the number of clusters increased confirmed that clusters of pediatric asthma patients according to hospital use exist in the data. Five clusters of patients with distinct asthma-related acute care use patterns were observed. Cluster 1 (62% of patients) showed the lowest rates of acute care use. These patients were least likely to have a mental health-related diagnosis, were less likely to have visited multiple facilities, and had no hospitalizations for asthma. Cluster 2 (19% of patients) had a low number of asthma ED visits and onetime hospitalization. Cluster 3 (11% of patients) had a high number of ED visits and low hospitalization rates, and the highest rates of multiple facility use. Cluster 4 (7% of patients) had moderate ED use for both asthma and other illnesses, and high rates of asthma hospitalizations; nearly one quarter received care at all facilities, and 1 in 10 had a mental health diagnosis. Cluster 5 (1% of patients) had extreme rates of acute care use. Differences observed between groups across multiple sociobehavioral factors suggest these clusters may represent children who differ along multiple dimensions, in addition to patterns of service use, with implications for tailored interventions. Copyright © 2017 American College of Emergency Physicians

  6. Assessment of Random Assignment in Training and Test Sets using Generalized Cluster Analysis Technique

    Directory of Open Access Journals (Sweden)

    Sorana D. BOLBOACĂ

    2011-06-01

    Full Text Available Aim: The properness of random assignment of compounds in training and validation sets was assessed using the generalized cluster technique. Material and Method: A quantitative Structure-Activity Relationship model using Molecular Descriptors Family on Vertices was evaluated in terms of assignment of carboquinone derivatives in training and test sets during the leave-many-out analysis. Assignment of compounds was investigated using five variables: observed anticancer activity and four structure descriptors. Generalized cluster analysis with K-means algorithm was applied in order to investigate if the assignment of compounds was or not proper. The Euclidian distance and maximization of the initial distance using a cross-validation with a v-fold of 10 was applied. Results: All five variables included in analysis proved to have statistically significant contribution in identification of clusters. Three clusters were identified, each of them containing both carboquinone derivatives belonging to training as well as to test sets. The observed activity of carboquinone derivatives proved to be normal distributed on every. The presence of training and test sets in all clusters identified using generalized cluster analysis with K-means algorithm and the distribution of observed activity within clusters sustain a proper assignment of compounds in training and test set. Conclusion: Generalized cluster analysis using the K-means algorithm proved to be a valid method in assessment of random assignment of carboquinone derivatives in training and test sets.

  7. Cluster analysis in severe emphysema subjects using phenotype and genotype data: an exploratory investigation

    Directory of Open Access Journals (Sweden)

    Martinez Fernando J

    2010-03-01

    Full Text Available Abstract Background Numerous studies have demonstrated associations between genetic markers and COPD, but results have been inconsistent. One reason may be heterogeneity in disease definition. Unsupervised learning approaches may assist in understanding disease heterogeneity. Methods We selected 31 phenotypic variables and 12 SNPs from five candidate genes in 308 subjects in the National Emphysema Treatment Trial (NETT Genetics Ancillary Study cohort. We used factor analysis to select a subset of phenotypic variables, and then used cluster analysis to identify subtypes of severe emphysema. We examined the phenotypic and genotypic characteristics of each cluster. Results We identified six factors accounting for 75% of the shared variability among our initial phenotypic variables. We selected four phenotypic variables from these factors for cluster analysis: 1 post-bronchodilator FEV1 percent predicted, 2 percent bronchodilator responsiveness, and quantitative CT measurements of 3 apical emphysema and 4 airway wall thickness. K-means cluster analysis revealed four clusters, though separation between clusters was modest: 1 emphysema predominant, 2 bronchodilator responsive, with higher FEV1; 3 discordant, with a lower FEV1 despite less severe emphysema and lower airway wall thickness, and 4 airway predominant. Of the genotypes examined, membership in cluster 1 (emphysema-predominant was associated with TGFB1 SNP rs1800470. Conclusions Cluster analysis may identify meaningful disease subtypes and/or groups of related phenotypic variables even in a highly selected group of severe emphysema subjects, and may be useful for genetic association studies.

  8. Analysis of US underground thin seam mining potential. Volume 1. Text. Final technical report, December 1978. [In thin seams

    Energy Technology Data Exchange (ETDEWEB)

    Pimental, R. A; Barell, D.; Fine, R. J.; Douglas, W. J.

    1979-06-01

    An analysis of the potential for US underground thin seam (< 28'') coal mining is undertaken to provide basic information for use in making a decision on further thin seam mining equipment development. The characteristics of the present low seam mines and their mining methods are determined, in order to establish baseline data against which changes in mine characteristics can be monitored as a function of time. A detailed data base of thin seam coal resources is developed through a quantitative and qualitative analysis at the bed, county and state level. By establishing present and future coal demand and relating demand to production and resources, the market for thin seam coal has been identified. No thin seam coal demand of significance is forecast before the year 2000. Current uncertainty as to coal's future does not permit market forecasts beyond the year 2000 with a sufficient level of reliability.

  9. Mining environmental high-throughput sequence data sets to identify divergent amplicon clusters for phylogenetic reconstruction and morphotype visualization.

    Science.gov (United States)

    Gimmler, Anna; Stoeck, Thorsten

    2015-08-01

    Environmental high-throughput sequencing (envHTS) is a very powerful tool, which in protistan ecology is predominantly used for the exploration of diversity and its geographic and local patterns. We here used a pyrosequenced V4-SSU rDNA data set from a solar saltern pond as test case to exploit such massive protistan amplicon data sets beyond this descriptive purpose. Therefore, we combined a Swarm-based blastn network including 11 579 ciliate V4 amplicons to identify divergent amplicon clusters with targeted polymerase chain reaction (PCR) primer design for full-length small subunit of the ribosomal DNA retrieval and probe design for fluorescence in situ hybridization (FISH). This powerful strategy allows to benefit from envHTS data sets to (i) reveal the phylogenetic position of the taxon behind divergent amplicons; (ii) improve phylogenetic resolution and evolutionary history of specific taxon groups; (iii) solidly assess an amplicons (species') degree of similarity to its closest described relative; (iv) visualize the morphotype behind a divergent amplicons cluster; (v) rapidly FISH screen many environmental samples for geographic/habitat distribution and abundances of the respective organism and (vi) to monitor the success of enrichment strategies in live samples for cultivation and isolation of the respective organisms. © 2015 Society for Applied Microbiology and John Wiley & Sons Ltd.

  10. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome

    Energy Technology Data Exchange (ETDEWEB)

    Lalonde, Michel, E-mail: mlalonde15@rogers.com; Wassenaar, Richard [Department of Physics, Carleton University, Ottawa, Ontario K1S 5B6 (Canada); Wells, R. Glenn; Birnie, David; Ruddy, Terrence D. [Division of Cardiology, University of Ottawa Heart Institute, Ottawa, Ontario K1Y 4W7 (Canada)

    2014-07-15

    Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster

  11. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome

    International Nuclear Information System (INIS)

    Lalonde, Michel; Wassenaar, Richard; Wells, R. Glenn; Birnie, David; Ruddy, Terrence D.

    2014-01-01

    Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster

  12. Nurses' beliefs about nursing diagnosis: A study with cluster analysis.

    Science.gov (United States)

    D'Agostino, Fabio; Pancani, Luca; Romero-Sánchez, José Manuel; Lumillo-Gutierrez, Iris; Paloma-Castro, Olga; Vellone, Ercole; Alvaro, Rosaria

    2018-06-01

    To identify clusters of nurses in relation to their beliefs about nursing diagnosis among two populations (Italian and Spanish); to investigate differences among clusters of nurses in each population considering the nurses' socio-demographic data, attitudes towards nursing diagnosis, intentions to make nursing diagnosis and actual behaviours in making nursing diagnosis. Nurses' beliefs concerning nursing diagnosis can influence its use in practice but this is still unclear. A cross-sectional design. A convenience sample of nurses in Italy and Spain was enrolled. Data were collected between 2014-2015 using tools, that is, a socio-demographic questionnaire and behavioural, normative and control beliefs, attitudes, intentions and behaviours scales. The sample included 499 nurses (272 Italians & 227 Spanish). Of these, 66.5% of the Italian and 90.7% of the Spanish sample were female. The mean age was 36.5 and 45.2 years old in the Italian and Spanish sample respectively. Six clusters of nurses were identified in Spain and four in Italy. Three clusters were similar among the two populations. Similar significant associations between age, years of work, attitudes towards nursing diagnosis, intentions to make nursing diagnosis and behaviours in making nursing diagnosis and cluster membership in each population were identified. Belief profiles identified unique subsets of nurses that have distinct characteristics. Categorizing nurses by belief patterns may help administrators and educators to tailor interventions aimed at improving nursing diagnosis use in practice. © 2018 John Wiley & Sons Ltd.

  13. Identification and comparative analysis of the protocadherin cluster in a reptile, the green anole lizard.

    Directory of Open Access Journals (Sweden)

    Xiao-Juan Jiang

    Full Text Available BACKGROUND: The vertebrate protocadherins are a subfamily of cell adhesion molecules that are predominantly expressed in the nervous system and are believed to play an important role in establishing the complex neural network during animal development. Genes encoding these molecules are organized into a cluster in the genome. Comparative analysis of the protocadherin subcluster organization and gene arrangements in different vertebrates has provided interesting insights into the history of vertebrate genome evolution. Among tetrapods, protocadherin clusters have been fully characterized only in mammals. In this study, we report the identification and comparative analysis of the protocadherin cluster in a reptile, the green anole lizard (Anolis carolinensis. METHODOLOGY/PRINCIPAL FINDINGS: We show that the anole protocadherin cluster spans over a megabase and encodes a total of 71 genes. The number of genes in the anole protocadherin cluster is significantly higher than that in the coelacanth (49 genes and mammalian (54-59 genes clusters. The anole protocadherin genes are organized into four subclusters: the delta, alpha, beta and gamma. This subcluster organization is identical to that of the coelacanth protocadherin cluster, but differs from the mammalian clusters which lack the delta subcluster. The gene number expansion in the anole protocadherin cluster is largely due to the extensive gene duplication in the gammab subgroup. Similar to coelacanth and elephant shark protocadherin genes, the anole protocadherin genes have experienced a low frequency of gene conversion. CONCLUSIONS/SIGNIFICANCE: Our results suggest that similar to the protocadherin clusters in other vertebrates, the evolution of anole protocadherin cluster is driven mainly by lineage-specific gene duplications and degeneration. Our analysis also shows that loss of the protocadherin delta subcluster in the mammalian lineage occurred after the divergence of mammals and reptiles

  14. U3O8 production cost analysis study. Sandstone deposit mine model EA-730, Volume 1

    International Nuclear Information System (INIS)

    1978-08-01

    Objective was the development and testing of a model for estimating the production cost of conventional uranium mining. The model used evolved from a base case underground mine of 1000 tons per day output at a nominal depth of 900 feet, and from base-case open pit mines of 2000 tons per day output at 30-, 120-, and 240-foot depths. In addition, an alternate production method employing heap leaching was partially investigated, to be merged with similar work performed by another contractor. The model was internally structured into component submodels capable of reflecting the contributory factors which aggregate into the computed production cost. A financial submodel based on last-quarter 1976 prices used conventional accounting practices to generate a cash flow and profit-and-loss record over the mine life. From this a selling price was obtained based on a desired discounted cash flow return on equity. This submodel is also capable of accepting input inflation rates so that costs in current dollars for future years can be estimated. A Monte Carlo method of the analysis of variance was applied to 50 model runs to obtain a statistical estimate for the expected variance in production cost

  15. Spatiotemporal analysis of changes in lode mining claims around the McDermitt Caldera, northern Nevada and southern Oregon

    Science.gov (United States)

    Coyan, Joshua; Zientek, Michael L.; Mihalasky, Mark J.

    2017-01-01

    Resource managers and agencies involved with planning for future federal land needs are required to complete an assessment of and forecast for future land use every ten years. Predicting mining activities on federal lands is difficult as current regulations do not require disclosure of exploration results. In these cases, historic mining claims may serve as a useful proxy for determining where mining-related activities may occur. We assess the utility of using a space–time cube (STC) and associated analyses to evaluate and characterize mining claim activities around the McDermitt Caldera in northern Nevada and southern Oregon. The most significant advantage of arranging the mining claim data into a STC is the ability to visualize and compare the data, which allows scientists to better understand patterns and results. Additional analyses of the STC (i.e., Trend, Emerging Hot Spot, Hot Spot, and Cluster and Outlier Analyses) provide extra insights into the data and may aid in predicting future mining claim activities.

  16. Working with Data: Discovering Knowledge through Mining and Analysis; Systematic Knowledge Management and Knowledge Discovery; Text Mining; Methodological Approach in Discovering User Search Patterns through Web Log Analysis; Knowledge Discovery in Databases Using Formal Concept Analysis; Knowledge Discovery with a Little Perspective.

    Science.gov (United States)

    Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.

    2000-01-01

    These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)

  17. Clustering Methods Application for Customer Segmentation to Manage Advertisement Campaign

    OpenAIRE

    Maciej Kutera; Mirosława Lasek

    2010-01-01

    Clustering methods are recently so advanced elaborated algorithms for large collection data analysis that they have been already included today to data mining methods. Clustering methods are nowadays larger and larger group of methods, very quickly evolving and having more and more various applications. In the article, our research concerning usefulness of clustering methods in customer segmentation to manage advertisement campaign is presented. We introduce results obtained by using four sel...

  18. Clustering analysis of malware behavior using Self Organizing Map

    DEFF Research Database (Denmark)

    Pirscoveanu, Radu-Stefan; Stevanovic, Matija; Pedersen, Jens Myrup

    2016-01-01

    For the time being, malware behavioral classification is performed by means of Anti-Virus (AV) generated labels. The paper investigates the inconsistencies associated with current practices by evaluating the identified differences between current vendors. In this paper we rely on Self Organizing...... Map, an unsupervised machine learning algorithm, for generating clusters that capture the similarities between malware behavior. A data set of approximately 270,000 samples was used to generate the behavioral profile of malicious types in order to compare the outcome of the proposed clustering...... approach with the labels collected from 57 Antivirus vendors using VirusTotal. Upon evaluating the results, the paper concludes on shortcomings of relying on AV vendors for labeling malware samples. In order to solve the problem, a cluster-based classification is proposed, which should provide more...

  19. Comparative analysis of bacteria in uranium mining wastes

    International Nuclear Information System (INIS)

    Tzvetkova, T.; Flemming, K.; Selenska-Pobell, S.

    2002-01-01

    Compositional analysis of predominant bacterial groups in three different kinds of uranium wastes gives indications for different biogeological processes running at the studied sites which seems to be influenced by the anthropological activities involved in the production of uranium. (orig.)

  20. Analysis of rockburst and rockfall accidents in relation to class of stope support, regional support, energy of seismic events and mining layout

    CSIR Research Space (South Africa)

    Cichowicz, A

    1994-01-01

    Full Text Available This report discusses the assessment of safety risk and the analysis of Falls Of Ground (FOG) in mines due to seismic events and mining layout during the period of 1991-1992 on a single mine. The multivariate analysis was used to obtain a...

  1. Marketing Mix Formulation for Higher Education: An Integrated Analysis Employing Analytic Hierarchy Process, Cluster Analysis and Correspondence Analysis

    Science.gov (United States)

    Ho, Hsuan-Fu; Hung, Chia-Chi

    2008-01-01

    Purpose: The purpose of this paper is to examine how a graduate institute at National Chiayi University (NCYU), by using a model that integrates analytic hierarchy process, cluster analysis and correspondence analysis, can develop effective marketing strategies. Design/methodology/approach: This is primarily a quantitative study aimed at…

  2. Influence of birth cohort on age of onset cluster analysis in bipolar I disorder

    DEFF Research Database (Denmark)

    Bauer, M; Glenn, T; Alda, M

    2015-01-01

    Purpose: Two common approaches to identify subgroups of patients with bipolar disorder are clustering methodology (mixture analysis) based on the age of onset, and a birth cohort analysis. This study investigates if a birth cohort effect will influence the results of clustering on the age of onset...... cohort. Model-based clustering (mixture analysis) was then performed on the age of onset data using the residuals. Clinical variables in subgroups were compared. Results: There was a strong birth cohort effect. Without adjusting for the birth cohort, three subgroups were found by clustering. After...... on the age of onset, and that there is a birth cohort effect. Including the birth cohort adjustment altered the number and characteristics of subgroups detected when clustering by age of onset. Further investigation is needed to determine if combining both approaches will identify subgroups that are more...

  3. Parkinson's Disease Subtypes Identified from Cluster Analysis of Motor and Non-motor Symptoms.

    Science.gov (United States)

    Mu, Jesse; Chaudhuri, Kallol R; Bielza, Concha; de Pedro-Cuesta, Jesus; Larrañaga, Pedro; Martinez-Martin, Pablo

    2017-01-01

    Parkinson's disease is now considered a complex, multi-peptide, central, and peripheral nervous system disorder with considerable clinical heterogeneity. Non-motor symptoms play a key role in the trajectory of Parkinson's disease, from prodromal premotor to end stages. To understand the clinical heterogeneity of Parkinson's disease, this study used cluster analysis to search for subtypes from a large, multi-center, international, and well-characterized cohort of Parkinson's disease patients across all motor stages, using a combination of cardinal motor features (bradykinesia, rigidity, tremor, axial signs) and, for the first time, specific validated rater-based non-motor symptom scales. Two independent international cohort studies were used: (a) the validation study of the Non-Motor Symptoms Scale ( n = 411) and (b) baseline data from the global Non-Motor International Longitudinal Study ( n = 540). k -means cluster analyses were performed on the non-motor and motor domains (domains clustering) and the 30 individual non-motor symptoms alone (symptoms clustering), and hierarchical agglomerative clustering was performed to group symptoms together. Four clusters are identified from the domains clustering supporting previous studies: mild, non-motor dominant, motor-dominant, and severe. In addition, six new smaller clusters are identified from the symptoms clustering, each characterized by clinically-relevant non-motor symptoms. The clusters identified in this study present statistical confirmation of the increasingly important role of non-motor symptoms (NMS) in Parkinson's disease heterogeneity and take steps toward subtype-specific treatment packages.

  4. iterClust: a statistical framework for iterative clustering analysis.

    Science.gov (United States)

    Ding, Hongxu; Wang, Wanxin; Califano, Andrea

    2018-03-22

    In a scenario where populations A, B1 and B2 (subpopulations of B) exist, pronounced differences between A and B may mask subtle differences between B1 and B2. Here we present iterClust, an iterative clustering framework, which can separate more pronounced differences (e.g. A and B) in starting iterations, followed by relatively subtle differences (e.g. B1 and B2), providing a comprehensive clustering trajectory. iterClust is implemented as a Bioconductor R package. andrea.califano@columbia.edu, hd2326@columbia.edu. Supplementary information is available at Bioinformatics online.

  5. Dynamic analysis of clustered building structures using substructures methods

    International Nuclear Information System (INIS)

    Leimbach, K.R.; Krutzik, N.J.

    1989-01-01

    The dynamic substructure approach to the building cluster on a common base mat starts with the generation of Ritz-vectors for each building on a rigid foundation. The base mat plus the foundation soil is subjected to kinematic constraint modes, for example constant, linear, quadratic or cubic constraints. These constraint modes are also imposed on the buildings. By enforcing kinematic compatibility of the complete structural system on the basis of the constraint modes a reduced Ritz model of the complete cluster is obtained. This reduced model can now be analyzed by modal time history or response spectrum methods

  6. Applying Clustering to Statistical Analysis of Student Reasoning about Two-Dimensional Kinematics

    Science.gov (United States)

    Springuel, R. Padraic; Wittman, Michael C.; Thompson, John R.

    2007-01-01

    We use clustering, an analysis method not presently common to the physics education research community, to group and characterize student responses to written questions about two-dimensional kinematics. Previously, clustering has been used to analyze multiple-choice data; we analyze free-response data that includes both sketches of vectors and…

  7. Differences Between Ward's and UPGMA Methods of Cluster Analysis: Implications for School Psychology.

    Science.gov (United States)

    Hale, Robert L.; Dougherty, Donna

    1988-01-01

    Compared the efficacy of two methods of cluster analysis, the unweighted pair-groups method using arithmetic averages (UPGMA) and Ward's method, for students grouped on intelligence, achievement, and social adjustment by both clustering methods. Found UPGMA more efficacious based on output, on cophenetic correlation coefficients generated by each…

  8. The use of a cluster analysis in across herd genetic evaluation for ...

    African Journals Online (AJOL)

    To investigate the possibility of a genotype x environment interaction in Bonsmara cattle, a cluster analysis was performed on weaning weight records of 72 811 Bonsmara calves, the progeny of 1 434 sires and 24 186 dams in 35 herds. The following environmental factors were used to classify herds into clusters: solution ...

  9. The reflection of hierarchical cluster analysis of co-occurrence matrices in SPSS

    NARCIS (Netherlands)

    Zhou, Q.; Leng, F.; Leydesdorff, L.

    2015-01-01

    Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare

  10. Identifying At-Risk Students in General Chemistry via Cluster Analysis of Affective Characteristics

    Science.gov (United States)

    Chan, Julia Y. K.; Bauer, Christopher F.

    2014-01-01

    The purpose of this study is to identify academically at-risk students in first-semester general chemistry using affective characteristics via cluster analysis. Through the clustering of six preselected affective variables, three distinct affective groups were identified: low (at-risk), medium, and high. Students in the low affective group…

  11. Social Learning Network Analysis Model to Identify Learning Patterns Using Ontology Clustering Techniques and Meaningful Learning

    Science.gov (United States)

    Firdausiah Mansur, Andi Besse; Yusof, Norazah

    2013-01-01

    Clustering on Social Learning Network still not explored widely, especially when the network focuses on e-learning system. Any conventional methods are not really suitable for the e-learning data. SNA requires content analysis, which involves human intervention and need to be carried out manually. Some of the previous clustering techniques need…

  12. Symptom Cluster Research With Biomarkers and Genetics Using Latent Class Analysis.

    Science.gov (United States)

    Conley, Samantha

    2017-12-01

    The purpose of this article is to provide an overview of latent class analysis (LCA) and examples from symptom cluster research that includes biomarkers and genetics. A review of LCA with genetics and biomarkers was conducted using Medline, Embase, PubMed, and Google Scholar. LCA is a robust latent variable model used to cluster categorical data and allows for the determination of empirically determined symptom clusters. Researchers should consider using LCA to link empirically determined symptom clusters to biomarkers and genetics to better understand the underlying etiology of symptom clusters. The full potential of LCA in symptom cluster research has not yet been realized because it has been used in limited populations, and researchers have explored limited biologic pathways.

  13. Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability.

    Science.gov (United States)

    Miller, Christopher B; Bartlett, Delwyn J; Mullins, Anna E; Dodds, Kirsty L; Gordon, Christopher J; Kyle, Simon D; Kim, Jong Won; D'Rozario, Angela L; Lee, Rico S C; Comas, Maria; Marshall, Nathaniel S; Yee, Brendon J; Espie, Colin A; Grunstein, Ronald R

    2016-11-01

    To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative ( q )-EEG and heart rate variability (HRV). Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q -EEG. Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. © 2016 Associated Professional Sleep Societies, LLC.

  14. The identification of credit card encoders by hierarchical cluster analysis of the jitters of magnetic stripes.

    Science.gov (United States)

    Leung, S C; Fung, W K; Wong, K H

    1999-01-01

    The relative bit density variation graphs of 207 specimen credit cards processed by 12 encoding machines were examined first visually, and then classified by means of hierarchical cluster analysis. Twenty-nine credit cards being treated as 'questioned' samples were tested by way of cluster analysis against 'controls' derived from known encoders. It was found that hierarchical cluster analysis provided a high accuracy of identification with all 29 'questioned' samples classified correctly. On the other hand, although visual comparison of jitter graphs was less discriminating, it was nevertheless capable of giving a reasonably accurate result.

  15. Profiling physical activity motivation based on self-determination theory: a cluster analysis approach.

    Science.gov (United States)

    Friederichs, Stijn Ah; Bolman, Catherine; Oenema, Anke; Lechner, Lilian

    2015-01-01

    In order to promote physical activity uptake and maintenance in individuals who do not comply with physical activity guidelines, it is important to increase our understanding of physical activity motivation among this group. The present study aimed to examine motivational profiles in a large sample of adults who do not comply with physical activity guidelines. The sample for this study consisted of 2473 individuals (31.4% male; age 44.6 ± 12.9). In order to generate motivational profiles based on motivational regulation, a cluster analysis was conducted. One-way analyses of variance were then used to compare the clusters in terms of demographics, physical activity level, motivation to be active and subjective experience while being active. Three motivational clusters were derived based on motivational regulation scores: a low motivation cluster, a controlled motivation cluster and an autonomous motivation cluster. These clusters differed significantly from each other with respect to physical activity behavior, motivation to be active and subjective experience while being active. Overall, the autonomous motivation cluster displayed more favorable characteristics compared to the other two clusters. The results of this study provide additional support for the importance of autonomous motivation in the context of physical activity behavior. The three derived clusters may be relevant in the context of physical activity interventions as individuals within the different clusters might benefit most from different intervention approaches. In addition, this study shows that cluster analysis is a useful method for differentiating between motivational profiles in large groups of individuals who do not comply with physical activity guidelines.

  16. Deconstructing Bipolar Disorder and Schizophrenia: A cross-diagnostic cluster analysis of cognitive phenotypes.

    Science.gov (United States)

    Lee, Junghee; Rizzo, Shemra; Altshuler, Lori; Glahn, David C; Miklowitz, David J; Sugar, Catherine A; Wynn, Jonathan K; Green, Michael F

    2017-02-01

    Bipolar disorder (BD) and schizophrenia (SZ) show substantial overlap. It has been suggested that a subgroup of patients might contribute to these overlapping features. This study employed a cross-diagnostic cluster analysis to identify subgroups of individuals with shared cognitive phenotypes. 143 participants (68 BD patients, 39 SZ patients and 36 healthy controls) completed a battery of EEG and performance assessments on perception, nonsocial cognition and social cognition. A K-means cluster analysis was conducted with all participants across diagnostic groups. Clinical symptoms, functional capacity, and functional outcome were assessed in patients. A two-cluster solution across 3 groups was the most stable. One cluster including 44 BD patients, 31 controls and 5 SZ patients showed better cognition (High cluster) than the other cluster with 24 BD patients, 35 SZ patients and 5 controls (Low cluster). BD patients in the High cluster performed better than BD patients in the Low cluster across cognitive domains. Within each cluster, participants with different clinical diagnoses showed different profiles across cognitive domains. All patients are in the chronic phase and out of mood episode at the time of assessment and most of the assessment were behavioral measures. This study identified two clusters with shared cognitive phenotype profiles that were not proxies for clinical diagnoses. The finding of better social cognitive performance of BD patients than SZ patients in the Lowe cluster suggest that relatively preserved social cognition may be important to identify disease process distinct to each disorder. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. Molecular-dynamics analysis of mobile helium cluster reactions near surfaces of plasma-exposed tungsten

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Lin; Maroudas, Dimitrios, E-mail: maroudas@ecs.umass.edu [Department of Chemical Engineering, University of Massachusetts, Amherst, Massachusetts 01003-9303 (United States); Hammond, Karl D. [Department of Chemical Engineering, University of Missouri, Columbia, Missouri 65211 (United States); Wirth, Brian D. [Department of Nuclear Engineering, University of Tennessee, Knoxville, Tennessee 37996 (United States)

    2015-10-28

    We report the results of a systematic atomic-scale analysis of the reactions of small mobile helium clusters (He{sub n}, 4 ≤ n ≤ 7) near low-Miller-index tungsten (W) surfaces, aiming at a fundamental understanding of the near-surface dynamics of helium-carrying species in plasma-exposed tungsten. These small mobile helium clusters are attracted to the surface and migrate to the surface by Fickian diffusion and drift due to the thermodynamic driving force for surface segregation. As the clusters migrate toward the surface, trap mutation (TM) and cluster dissociation reactions are activated at rates higher than in the bulk. TM produces W adatoms and immobile complexes of helium clusters surrounding W vacancies located within the lattice planes at a short distance from the surface. These reactions are identified and characterized in detail based on the analysis of a large number of molecular-dynamics trajectories for each such mobile cluster near W(100), W(110), and W(111) surfaces. TM is found to be the dominant cluster reaction for all cluster and surface combinations, except for the He{sub 4} and He{sub 5} clusters near W(100) where cluster partial dissociation following TM dominates. We find that there exists a critical cluster size, n = 4 near W(100) and W(111) and n = 5 near W(110), beyond which the formation of multiple W adatoms and vacancies in the TM reactions is observed. The identified cluster reactions are responsible for important structural, morphological, and compositional features in the plasma-exposed tungsten, including surface adatom populations, near-surface immobile helium-vacancy complexes, and retained helium content, which are expected to influence the amount of hydrogen re-cycling and tritium retention in fusion tokamaks.

  18. Is Toscana A Formal Concept Analysis Based Solution In Web Usage Mining?

    Directory of Open Access Journals (Sweden)

    Dan-Andrei SITAR-TĂUT

    2012-01-01

    Full Text Available Analyzing large amount of data come from web logs represents a complex, but challenging nowadays problem with implication in various fields, thing that lets open a way for theoretically infinite approaches an implementations. The main goal of our paper represents the possibility of applying the formal concept analysis as viable solution of sustaining the web mining process, based on a technological open-source solution called TOSCANA.

  19. The computer system of automatical microscope analysis of mines' individual dosimeters

    International Nuclear Information System (INIS)

    Zorawski, A.; Hawrynski, M.; Kluszczynski, D.

    1988-01-01

    The Institute of Occupational Medicine (IOM) carriers on routine investigations on miners' individual exposure to radon and its alpha radioactive daughters in Polish mines [1]. Evaluation of miners' exposure is based on automatic analysis of track detectors by computer SYSTEM RADON. The IOM used detectors of size 2x3 cm cut from Kodak LT115 or LR115 dosimetry foil. The scheme of the system is presented in Fig.1 whereas Table 1 includes specification of its elements

  20. Analysis of the planned post-mining landscape of MIBRAG's open-cast mines with regard to a possible environmental impact of alteration processes in mixed dumps

    Energy Technology Data Exchange (ETDEWEB)

    Jolas, P.; Hofmann, B. [Mitteldeutsche Braunkohlengesellschaft, Theissen (Germany)

    2010-07-01

    There has been an increasing body of knowledge with regard to hydro- and geochemical alteration processes in overburden dumps and their impact on groundwater quality in lignite mining and reclamation operations associated with post-mining landscapes in Germany. The operators of the MIBRAG mines have examined issues regarding alteration processes and how they affect the environment and which opportunities exist to actively influence the dumping process. The objectives were to counteract any possible negative impact of the alteration processes. Special emphasis was on the impact caused by oxidation of sulfur containing minerals. This paper presented an analysis of the situation at United Schleenhain Mine and how it reflects on the work to date for MIBRAG's mines. A future outlook was also presented. Specifically, the paper discussed the development of the United Schleenhain mine and the post-mining landscape. The potential for discharge of substances was also evaluated along with acidification. 1 tab., 5 figs.

  1. Radiological impacts of Jackpile-Paguate uranium mines: an analysis of alternatives of decommissioning

    International Nuclear Information System (INIS)

    Momeni, M.H.; Tsai, S.Y.H.; Yang, J.Y.; Gureghian, A.B.; Dungey, C.E.

    1983-03-01

    Potential pathways of radiation exposure and radiation-induced genetic and somatic effects from materials at the mine complex under five alternatives of decommissioning were analyzed using UDAD and PRIM computer codes. The principal pathways of exposure included in the analysis were inhalation of airborne radionuclides, ingestion of food and water containing radionuclides, and extended exposure to gamma and beta radiation from either airborne or ground-deposited radionuclides. The alternatives of decommissioning include (A) No Action (site will be fenced, otherwise left as it is), (B) No Future Use (site will be fenced and all disturbed area will be covered with 30 cm of soil, no grazing on the site); (C1) Grazing Land Use as developed by Anaconda Company (protore, waste piles, and open pits covered with 120 cm of soil, the remainder of the disturbed areas covered with 30 cm of soil, pits backfilled 90 cm above the equilibrium groundwater recovery level, no human habitation or farming allowed on the mine site, but grazing would be allowed); (C2) Grazing Land Use as developed by US Department of the Interior (similar to Alternative C1, but the pits covered with 300 cm of soil above the groundwater recovery level); and (D) Maximum Future Use (similar to Alternative C2, except construction of commercial and industrial facilities, storage, recreation, and further mining would be allowed). Radiation doses from atmospheric transport and ingestion of radionuclides were calculated, and somatic and genetic effects in individuals living within 80 km from the mine complex were predicted. Hydrological flow patterns in the mine area were analyzed to determine the potential for future contamination of surface water and groundwater and to determine the groundwater recovery level after reclamation, thus permitting incorporation of corrective actions into the reclamation procedures

  2. Trends in business process analysis: from verification to process mining

    NARCIS (Netherlands)

    Aalst, van der W.M.P.; Cardoso, J.; Cordeiro, J.; Filipe, J.

    2007-01-01

    Business process analysis ranges from model verification at design-time to the monitoring of processes at runtime. Much progress has been achieved in process verification. Today we are able to verify the entire reference model of SAP without any problems. Moreover, more and more processes leave

  3. Spatiotemporal Data Mining, Analysis, and Visualization of Human Activity Data

    Science.gov (United States)

    Li, Xun

    2012-01-01

    This dissertation addresses the research challenge of developing efficient new methods for discovering useful patterns and knowledge in large volumes of electronically collected spatiotemporal activity data. I propose to analyze three types of such spatiotemporal activity data in a methodological framework that integrates spatial analysis, data…

  4. Crowd Analysis by Using Optical Flow and Density Based Clustering

    DEFF Research Database (Denmark)

    Santoro, Francesco; Pedro, Sergio; Tan, Zheng-Hua

    2010-01-01

    In this paper, we present a system to detect and track crowds in a video sequence captured by a camera. In a first step, we compute optical flows by means of pyramidal Lucas-Kanade feature tracking. Afterwards, a density based clustering is used to group similar vectors. In the last step...

  5. Weighted Clustering

    DEFF Research Database (Denmark)

    Ackerman, Margareta; Ben-David, Shai; Branzei, Simina

    2012-01-01

    We investigate a natural generalization of the classical clustering problem, considering clustering tasks in which different instances may have different weights.We conduct the first extensive theoretical analysis on the influence of weighted data on standard clustering algorithms in both...... the partitional and hierarchical settings, characterizing the conditions under which algorithms react to weights. Extending a recent framework for clustering algorithm selection, we propose intuitive properties that would allow users to choose between clustering algorithms in the weighted setting and classify...

  6. [Text mining, a method for computer-assisted analysis of scientific texts, demonstrated by an analysis of author networks].

    Science.gov (United States)

    Hahn, P; Dullweber, F; Unglaub, F; Spies, C K

    2014-06-01

    Searching for relevant publications is becoming more difficult with the increasing number of scientific articles. Text mining as a specific form of computer-based data analysis may be helpful in this context. Highlighting relations between authors and finding relevant publications concerning a specific subject using text analysis programs are illustrated graphically by 2 performed examples. © Georg Thieme Verlag KG Stuttgart · New York.

  7. Accidental Water Pollution Risk Analysis of Mine Tailings Ponds in Guanting Reservoir Watershed, Zhangjiakou City, China.

    Science.gov (United States)

    Liu, Renzhi; Liu, Jing; Zhang, Zhijiao; Borthwick, Alistair; Zhang, Ke

    2015-12-02

    Over the past half century, a surprising number of major pollution incidents occurred due to tailings dam failures. Most previous studies of such incidents comprised forensic analyses of environmental impacts after a tailings dam failure, with few considering the combined pollution risk before incidents occur at a watershed-scale. We therefore propose Watershed-scale Tailings-pond Pollution Risk Analysis (WTPRA), designed for multiple mine tailings ponds, stemming from previous watershed-scale accidental pollution risk assessments. Transferred and combined risk is embedded using risk rankings of multiple routes of the "source-pathway-target" in the WTPRA. The previous approach is modified using multi-criteria analysis, dam failure models, and instantaneous water quality models, which are modified for application to multiple tailings ponds. The study area covers the basin of Gutanting Reservoir (the largest backup drinking water source for Beijing) in Zhangjiakou City, where many mine tailings ponds are located. The resultant map shows that risk is higher downstream of Gutanting Reservoir and in its two tributary basins (i.e., Qingshui River and Longyang River). Conversely, risk is lower in the midstream and upstream reaches. The analysis also indicates that the most hazardous mine tailings ponds are located in Chongli and Xuanhua, and that Guanting Reservoir is the most vulnerable receptor. Sensitivity and uncertainty analyses are performed to validate the robustness of the WTPRA method.

  8. Accidental Water Pollution Risk Analysis of Mine Tailings Ponds in Guanting Reservoir Watershed, Zhangjiakou City, China

    Science.gov (United States)

    Liu, Renzhi; Liu, Jing; Zhang, Zhijiao; Borthwick, Alistair; Zhang, Ke

    2015-01-01

    Over the past half century, a surprising number of major pollution incidents occurred due to tailings dam failures. Most previous studies of such incidents comprised forensic analyses of environmental impacts after a tailings dam failure, with few considering the combined pollution risk before incidents occur at a watershed-scale. We therefore propose Watershed-scale Tailings-pond Pollution Risk Analysis (WTPRA), designed for multiple mine tailings ponds, stemming from previous watershed-scale accidental pollution risk assessments. Transferred and combined risk is embedded using risk rankings of multiple routes of the “source-pathway-target” in the WTPRA. The previous approach is modified using multi-criteria analysis, dam failure models, and instantaneous water quality models, which are modified for application to multiple tailings ponds. The study area covers the basin of Gutanting Reservoir (the largest backup drinking water source for Beijing) in Zhangjiakou City, where many mine tailings ponds are located. The resultant map shows that risk is higher downstream of Gutanting Reservoir and in its two tributary basins (i.e., Qingshui River and Longyang River). Conversely, risk is lower in the midstream and upstream reaches. The analysis also indicates that the most hazardous mine tailings ponds are located in Chongli and Xuanhua, and that Guanting Reservoir is the most vulnerable receptor. Sensitivity and uncertainty analyses are performed to validate the robustness of the WTPRA method. PMID:26633450

  9. Accidental Water Pollution Risk Analysis of Mine Tailings Ponds in Guanting Reservoir Watershed, Zhangjiakou City, China

    Directory of Open Access Journals (Sweden)

    Renzhi Liu

    2015-12-01

    Full Text Available Over the past half century, a surprising number of major pollution incidents occurred due to tailings dam failures. Most previous studies of such incidents comprised forensic analyses of environmental impacts after a tailings dam failure, with few considering the combined pollution risk before incidents occur at a watershed-scale. We therefore propose Watershed-scale Tailings-pond Pollution Risk Analysis (WTPRA, designed for multiple mine tailings ponds, stemming from previous watershed-scale accidental pollution risk assessments. Transferred and combined risk is embedded using risk rankings of multiple routes of the “source-pathway-target” in the WTPRA. The previous approach is modified using multi-criteria analysis, dam failure models, and instantaneous water quality models, which are modified for application to multiple tailings ponds. The study area covers the basin of Gutanting Reservoir (the largest backup drinking water source for Beijing in Zhangjiakou City, where many mine tailings ponds are located. The resultant map shows that risk is higher downstream of Gutanting Reservoir and in its two tributary basins (i.e., Qingshui River and Longyang River. Conversely, risk is lower in the midstream and upstream reaches. The analysis also indicates that the most hazardous mine tailings ponds are located in Chongli and Xuanhua, and that Guanting Reservoir is the most vulnerable receptor. Sensitivity and uncertainty analyses are performed to validate the robustness of the WTPRA method.

  10. A Proposed Data Fusion Architecture for Micro-Zone Analysis and Data Mining

    Energy Technology Data Exchange (ETDEWEB)

    Kevin McCarthy; Milos Manic

    2012-08-01

    Data Fusion requires the ability to combine or “fuse” date from multiple data sources. Time Series Analysis is a data mining technique used to predict future values from a data set based upon past values. Unlike other data mining techniques, however, Time Series places special emphasis on periodicity and how seasonal and other time-based factors tend to affect trends over time. One of the difficulties encountered in developing generic time series techniques is the wide variability of the data sets available for analysis. This presents challenges all the way from the data gathering stage to results presentation. This paper presents an architecture designed and used to facilitate the collection of disparate data sets well suited to Time Series analysis as well as other predictive data mining techniques. Results show this architecture provides a flexible, dynamic framework for the capture and storage of a myriad of dissimilar data sets and can serve as a foundation from which to build a complete data fusion architecture.

  11. Performance comparison analysis library communication cluster system using merge sort

    Science.gov (United States)

    Wulandari, D. A. R.; Ramadhan, M. E.

    2018-04-01

    Begins by using a single processor, to increase the speed of computing time, the use of multi-processor was introduced. The second paradigm is known as parallel computing, example cluster. The cluster must have the communication potocol for processing, one of it is message passing Interface (MPI). MPI have many library, both of them OPENMPI and MPICH2. Performance of the cluster machine depend on suitable between performance characters of library communication and characters of the problem so this study aims to analyze the comparative performances libraries in handling parallel computing process. The case study in this research are MPICH2 and OpenMPI. This case research execute sorting’s problem to know the performance of cluster system. The sorting problem use mergesort method. The research method is by implementing OpenMPI and MPICH2 on a Linux-based cluster by using five computer virtual then analyze the performance of the system by different scenario tests and three parameters for to know the performance of MPICH2 and OpenMPI. These performances are execution time, speedup and efficiency. The results of this study showed that the addition of each data size makes OpenMPI and MPICH2 have an average speed-up and efficiency tend to increase but at a large data size decreases. increased data size doesn’t necessarily increased speed up and efficiency but only execution time example in 100000 data size. OpenMPI has a execution time greater than MPICH2 example in 1000 data size average execution time with MPICH2 is 0,009721 and OpenMPI is 0,003895 OpenMPI can customize communication needs.

  12. Phenotypes of asthma in low-income children and adolescents: cluster analysis

    Directory of Open Access Journals (Sweden)

    Anna Lucia Barros Cabral

    Full Text Available ABSTRACT Objective: Studies characterizing asthma phenotypes have predominantly included adults or have involved children and adolescents in developed countries. Therefore, their applicability in other populations, such as those of developing countries, remains indeterminate. Our objective was to determine how low-income children and adolescents with asthma in Brazil are distributed across a cluster analysis. Methods: We included 306 children and adolescents (6-18 years of age with a clinical diagnosis of asthma and under medical treatment for at least one year of follow-up. At enrollment, all the patients were clinically stable. For the cluster analysis, we selected 20 variables commonly measured in clinical practice and considered important in defining asthma phenotypes. Variables with high multicollinearity were excluded. A cluster analysis was applied using a twostep agglomerative test and log-likelihood distance measure. Results: Three clusters were defined for our population. Cluster 1 (n = 94 included subjects with normal pulmonary function, mild eosinophil inflammation, few exacerbations, later age at asthma onset, and mild atopy. Cluster 2 (n = 87 included those with normal pulmonary function, a moderate number of exacerbations, early age at asthma onset, more severe eosinophil inflammation, and moderate atopy. Cluster 3 (n = 108 included those with poor pulmonary function, frequent exacerbations, severe eosinophil inflammation, and severe atopy. Conclusions: Asthma was characterized by the presence of atopy, number of exacerbations, and lung function in low-income children and adolescents in Brazil. The many similarities with previous cluster analyses of phenotypes indicate that this approach shows good generalizability.

  13. In-depth motivic analysis based on multiparametric closed pattern and cyclic sequence mining

    DEFF Research Database (Denmark)

    Lartillot, Olivier

    2014-01-01

    presents a much simpler description and justification of this general strategy, as well as significant simplifications of the model, in particular concerning the management of pattern cyclicity. A new method for automated bundling of patterns belonging to same motivic or thematic classes is also presented....... The good performance of the method is shown through the analysis of a piece from the JKUPDD database. Ground-truth motives are detected, while additional relevant information completes the ground-truth musicological analysis. The system, implemented in Matlab, is made publicly available as part of Mining......Suite, a new open-source framework for audio and music analysis....

  14. Multiple Regression Analysis of Unconfined Compression Strength of Mine Tailings Matrices

    Directory of Open Access Journals (Sweden)

    Mahmood Ali A.

    2017-01-01

    Full Text Available As part of a novel approach of sustainable development of mine tailings, experimental and numerical analysis is carried out on newly formulated tailings matrices. Several physical characteristic tests are carried out including the unconfined compression strength test to ascertain the integrity of these matrices when subjected to loading. The current paper attempts a multiple regression analysis of the unconfined compressive strength test results of these matrices to investigate the most pertinent factors affecting their strength. Results of this analysis showed that the suggested equation is reasonably applicable to the range of binder combinations used.

  15. SNAP: A General Purpose Network Analysis and Graph Mining Library.

    Science.gov (United States)

    Leskovec, Jure; Sosič, Rok

    2016-10-01

    Large networks are becoming a widely used abstraction for studying complex systems in a broad set of disciplines, ranging from social network analysis to molecular biology and neuroscience. Despite an increasing need to analyze and manipulate large networks, only a limited number of tools are available for this task. Here, we describe Stanford Network Analysis Platform (SNAP), a general-purpose, high-performance system that provides easy to use, high-level operations for analysis and manipulation of large networks. We present SNAP functionality, describe its implementational details, and give performance benchmarks. SNAP has been developed for single big-memory machines and it balances the trade-off between maximum performance, compact in-memory graph representation, and the ability to handle dynamic graphs where nodes and edges are being added or removed over time. SNAP can process massive networks with hundreds of millions of nodes and billions of edges. SNAP offers over 140 different graph algorithms that can efficiently manipulate large graphs, calculate structural properties, generate regular and random graphs, and handle attributes and meta-data on nodes and edges. Besides being able to handle large graphs, an additional strength of SNAP is that networks and their attributes are fully dynamic, they can be modified during the computation at low cost. SNAP is provided as an open source library in C++ as well as a module in Python. We also describe the Stanford Large Network Dataset, a set of social and information real-world networks and datasets, which we make publicly available. The collection is a complementary resource to our SNAP software and is widely used for development and benchmarking of graph analytics algorithms.

  16. Data mining techniques for performance analysis of onshore wind farms

    International Nuclear Information System (INIS)

    Astolfi, Davide; Castellani, Francesco; Garinei, Alberto; Terzi, Ludovico

    2015-01-01

    Highlights: • Indicators are formulated for monitoring quality of wind turbines performances. • State dynamics is processed for formulation of two Malfunctioning Indexes. • Power curve analysis is revisited. • A novel definition of polar efficiency is formulated and its consistency is checked. • Mechanical effects of wakes are analyzed as nacelle stationarity and misalignment. - Abstract: Wind turbines are an energy conversion system having a low density on the territory, and therefore needing accurate condition monitoring in the operative phase. Supervisory Control And Data Acquisition (SCADA) control systems have become ubiquitous in wind energy technology and they pose the challenge of extracting from them simple and explanatory information on goodness of operation and performance. In the present work, post processing methods are applied on the SCADA measurements of two onshore wind farms sited in southern Italy. Innovative and meaningful indicators of goodness of performance are formulated. The philosophy is a climax in the granularity of the analysis: first, Malfunctioning Indexes are proposed, which quantify goodness of merely operational behavior of the machine, irrespective of the quality of output. Subsequently the focus is shifted to the analysis of the farms in the productive phase: dependency of farm efficiency on wind direction is investigated through the polar plot, which is revisited in a novel way in order to make it consistent for onshore wind farms. Finally, the inability of the nacelle to optimally follow meandering wind due to wakes is analysed through a Stationarity Index and a Misalignment Index, which are shown to capture the relation between mechanical behavior of the turbine and degradation of the power output

  17. Patterns of Brucellosis Infection Symptoms in Azerbaijan: A Latent Class Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Rita Ismayilova

    2014-01-01

    Full Text Available Brucellosis infection is a multisystem disease, with a broad spectrum of symptoms. We investigated the existence of clusters of infected patients according to their clinical presentation. Using national surveillance data from the Electronic-Integrated Disease Surveillance System, we applied a latent class cluster (LCC analysis on symptoms to determine clusters of brucellosis cases. A total of 454 cases reported between July 2011 and July 2013 were analyzed. LCC identified a two-cluster model and the Vuong-Lo-Mendell-Rubin likelihood ratio supported the cluster model. Brucellosis cases in the second cluster (19% reported higher percentages of poly-lymphadenopathy, hepatomegaly, arthritis, myositis, and neuritis and changes in liver function tests compared to cases of the first cluster. Patients in the second cluster had a severe brucellosis disease course and were associated with longer delay in seeking medical attention. Moreover, most of them were from Beylagan, a region focused on sheep and goat livestock production in south-central Azerbaijan. Patients in cluster 2 accounted for one-quarter of brucellosis cases and had a more severe clinical presentation. Delay in seeking medical care may explain severe illness. Future work needs to determine the factors that influence brucellosis case seeking and identify brucellosis species, particularly among cases from Beylagan.

  18. Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability

    Science.gov (United States)

    Miller, Christopher B.; Bartlett, Delwyn J.; Mullins, Anna E.; Dodds, Kirsty L.; Gordon, Christopher J.; Kyle, Simon D.; Kim, Jong Won; D'Rozario, Angela L.; Lee, Rico S.C.; Comas, Maria; Marshall, Nathaniel S.; Yee, Brendon J.; Espie, Colin A.; Grunstein, Ronald R.

    2016-01-01

    Study Objectives: To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative (q)-EEG and heart rate variability (HRV). Methods: Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. Results: From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q-EEG. Clinical Trial Registration: Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. Citation: Miller CB, Bartlett DJ, Mullins AE, Dodds KL, Gordon CJ, Kyle SD, Kim JW, D'Rozario AL, Lee RS, Comas M, Marshall NS, Yee BJ, Espie CA, Grunstein RR. Clusters of Insomnia Disorder: an exploratory cluster analysis of objective sleep parameters reveals differences in neurocognitive functioning, quantitative EEG, and heart rate variability. SLEEP 2016;39(11):1993–2004. PMID:27568796

  19. Cluster analysis and ecology of living benthonic foraminiferids from inner shelf off Ratnagiri, West Coast, India

    Digital Repository Service at National Institute of Oceanography (India)

    Nigam, R.; Sarupria, J.S.

    Q-mode cluster analysis explains the spatial distribution data of living benthonic foraminiferids from the inner shelf off Ratnagiri. Two main biotopes and two sub-biotopes are revognised within the study area; biotope A, characterised by @i...

  20. An Evaluation of Practical Applicability of Multi-Assortment Production Break-Even Analysis based on Mining Companies

    Science.gov (United States)

    Fuksa, Dariusz; Trzaskuś-Żak, Beata; Gałaś, Zdzisław; Utrata, Arkadiusz

    2017-03-01

    In the practice of mining companies, the vast majority of them produce more than one product. The analysis of the break-even, which is referred to as CVP (Cost-Volume-Profit) analysis (Wilkinson, 2005; Czopek, 2003) in their case is significantly constricted, given the necessity to include multi-assortment structure in the analysis, which may have more than 20 types of assortments (depending on the grain size) in their offer, as in the case of open-pit mines. The article presents methods of evaluation of break-even (volume and value) for both a single-assortment production and a multi-assortment production. The complexity of problem of break-even evaluation for multi-assortment production has resulted in formation of many methods, and, simultaneously, various approaches to its analysis, especially differences in accounting fixed costs, which may be either totally accounted for among particular assortments, relating to the whole company or partially accounted for among particular assortments and partially relating to the company, as a whole. The evaluation of the chosen methods of break-even analysis, given the availability of data, was based on two examples of mining companies: an open-pit mine of rock materials and an underground hard coal mine. The selection of methods was set by the available data provided by the companies. The data for the analysis comes from internal documentation of the mines - financial statements, breakdowns and cost calculations.

  1. Statistical Techniques Applied to Aerial Radiometric Surveys (STAARS): cluster analysis. National Uranium Resource Evaluation

    International Nuclear Information System (INIS)

    Pirkle, F.L.; Stablein, N.K.; Howell, J.A.; Wecksung, G.W.; Duran, B.S.

    1982-11-01

    One objective of the aerial radiometric surveys flown as part of the US Department of Energy's National Uranium Resource Evaluation (NURE) program was to ascertain the regional distribution of near-surface radioelement abundances. Some method for identifying groups of observations with similar radioelement values was therefore required. It is shown in this report that cluster analysis can identify such groups even when no a priori knowledge of the geology of an area exists. A method of convergent k-means cluster analysis coupled with a hierarchical cluster analysis is used to classify 6991 observations (three radiometric variables at each observation location) from the Precambrian rocks of the Copper Mountain, Wyoming, area. Another method, one that combines a principal components analysis with a convergent k-means analysis, is applied to the same data. These two methods are compared with a convergent k-means analysis that utilizes available geologic knowledge. All three methods identify four clusters. Three of the clusters represent background values for the Precambrian rocks of the area, and one represents outliers (anomalously high 214 Bi). A segmentation of the data corresponding to geologic reality as discovered by other methods has been achieved based solely on analysis of aerial radiometric data. The techniques employed are composites of classical clustering methods designed to handle the special problems presented by large data sets. 20 figures, 7 tables

  2. The cost of respirable coal mine dust: an analysis based on new black lung claims

    Energy Technology Data Exchange (ETDEWEB)

    Page, S.J.; Organiscak, J.A.; Lichtman, K. [US Bureau of Mines, Pittsburgh, PA (United States). Dept. of the Interior

    1997-12-01

    The article provides summation of the monetary costs of new compensation claims associated with levels of unmitigated respirable coal mine dust and the resultant lung disease known as black lung and compares these compensation costs to the cost of dust control technology research by the US Bureau of Mines. It presents an analysis of these expenditures and projects these costs over the period from 1991 to 2010, based on projected future new claims which are assumed to be approved for federal and state benefit payment. Since current and future dust control research efforts cannot change past claim histories, a valid comparison of future research spending with other incurred costs must examine only the cost of future new claims. The bias of old claim costs was eliminated in this analysis by examining only claims since 1980. The results estimate that for an expected 339 new approved claims annually from 1991 to 2010, the Federal Trust Fund costs will be 985 million dollars. During this same period, state black lung compensation is estimated to be 18.2 billion dollars. The Bureau of Mines dust control research expenditures are estimated as 0.44% of the projected future black lung-related costs. 9 refs., 4 figs., 3 tabs.

  3. Groundwater Mixing Process Identification in Deep Mines Based on Hydrogeochemical Property Analysis

    Directory of Open Access Journals (Sweden)

    Bo Liu

    2016-12-01

    Full Text Available Karst collapse columns, as a potential water passageway for mine water inrush, are always considered a critical problem for the development of deep mining techniques. This study aims to identify the mixing process of groundwater deriving two different limestone karst-fissure aquifer systems. Based on analysis of mining groundwater hydrogeochemical properties, hydraulic connection between the karst-fissure objective aquifer systems was revealed. In this paper, piper diagram was used to calculate the mixing ratios at different sampling points in the aquifer systems, and PHREEQC Interactive model (Version 2.5, USGS, Reston, VA, USA, 2001 was applied to modify the mixing ratios and model the water–rock interactions during the mixing processes. The analysis results show that the highest mixing ratio is 0.905 in the C12 borehole that is located nearest to the #2 karst collapse column, and the mixing ratio decreases with the increase of the distance from the #2 karst collapse column. It demonstrated that groundwater of the two aquifers mixed through the passage of #2 karst collapse column. As a result, the proposed Piper-PHREEQC based method can provide accurate identification of karst collapse columns’ water conductivity, and can be applied to practical applications.

  4. Analysis of candidates for interacting galaxy clusters. I. A1204 and A2029/A2033

    Science.gov (United States)

    Gonzalez, Elizabeth Johana; de los Rios, Martín; Oio, Gabriel A.; Lang, Daniel Hernández; Tagliaferro, Tania Aguirre; Domínguez R., Mariano J.; Castellón, José Luis Nilo; Cuevas L., Héctor; Valotto, Carlos A.

    2018-04-01

    Context. Merging galaxy clusters allow for the study of different mass components, dark and baryonic, separately. Also, their occurrence enables to test the ΛCDM scenario, which can be used to put constraints on the self-interacting cross-section of the dark-matter particle. Aim. It is necessary to perform a homogeneous analysis of these systems. Hence, based on a recently presented sample of candidates for interacting galaxy clusters, we present the analysis of two of these cataloged systems. Methods: In this work, the first of a series devoted to characterizing galaxy clusters in merger processes, we perform a weak lensing analysis of clusters A1204 and A2029/A2033 to derive the total masses of each identified interacting structure together with a dynamical study based on a two-body model. We also describe the gas and the mass distributions in the field through a lensing and an X-ray analysis. This is the first of a series of works which will analyze these type of system in order to characterize them. Results: Neither merging cluster candidate shows evidence of having had a recent merger event. Nevertheless, there is dynamical evidence that these systems could be interacting or could interact in the future. Conclusions: It is necessary to include more constraints in order to improve the methodology of classifying merging galaxy clusters. Characterization of these clusters is important in order to properly understand the nature of these systems and their connection with dynamical studies.

  5. Problems of accounting, cost concerns and economic analysis in the mining enrichment industry

    Energy Technology Data Exchange (ETDEWEB)

    Slabinskiy, V T

    1980-01-01

    Mining enrichment enterprises of the ferrous and nonferrous metallurgy, coal and chemical industry have much in common in the area of technology of production, technical base, organization of labor and production. This in turn presupposes the possible development of a common procedure of accounting of expenditures for production, calculation of net cost of output and analysis of production-economic activity of enterprises. Based on scientific research and generalization of advanced experience of practical workers, means of improvement of economic operation in mining enrichment enterprises are outlined according to increasing demands of production control. An outline of analytic accounting of expenditures which provides for multitarget use of information has been developed: for organization of operational control of the formulation of net cost of output, determination of the results of self support activities of structural subdivisions of an enterprise, computation of the efficiency of scientific and technical progress. Experience of use of economic and mathematical methods in computers for this purpose is discussed.

  6. Analysis of the electrical disturbances in CERN power distribution network with pattern mining methods

    CERN Document Server

    Abramenko, Oleksii

    2017-01-01

    The current research focuses on the perturbations within the electrical network of the LHC and its subsystems by analyzing measurements collected from oscilloscopes installed across different CERN sites, and alarms by electrical equipments. We analyze amplitude and duration of the glitches and, together with other relevant variables, correlate them with beam stopping events. The work also tries to identify assets affected by such perturbations using data mining and, in particular, frequent pattern mining methods. On the practical side we summarize results of our work by putting forward a prototype of a software tool enabling online monitoring of the alarms coming from the electrical network and facilitating glitch detection and analysis by a technical operator.

  7. Financial planning and analysis techniques of mining firms: a note on Canadian practice

    Energy Technology Data Exchange (ETDEWEB)

    Blanco, H.; Zanibbi, L.R. (Laurentian University, Sudbury, ON (Canada). School of Commerce and Administration)

    1992-06-01

    This paper reports on the results of a survey of the financial planning and analysis techniques in use in the mining industry in Canada. The study was undertaken to determine the current status of these practices within mining firms in Canada and to investigate the extent to which the techniques are grouped together within individual firms. In addition, tests were performed on the relationship between these groups of techniques and both organizational size and price volatility of end product. The results show that a few techniques are widely utilized in this industry but that the techniques used most frequently are not as sophisticated as reported in previous, more broadly based surveys. The results also show that firms tend to use 'bundles' of techniques and that the relative use of some of these groups of techniques is weakly associated with both organizational size and type of end product. 19 refs., 7 tabs.

  8. Analysis of Economic and Social Effects of Pueblo Viejo Mining Project

    DEFF Research Database (Denmark)

    Parra, Cristian; Pacheco Cueva, Vladimir

    some key recommendations to effectively transform its effects into social capital and an improved level of human development. Current market conditions and new mining projects are providing extraordinary, positive and long term possibilities for improving the social and economic wellbeing in host......The research was conducted during the last quarter of 2010 and the first semester of 2011. It included planning the field work, conducting interviews, collecting data, reviewing secondary sources of information, carrying out analysis, internal and external session discussions and writing up...... the final report. This document contributes to the ongoing understanding of the potential impacts and effects of the Pueblo Viejo mining project on the human development and general progress of the people of Dominican Republic. We explain the most significant and potential effects of the project and provide...

  9. Analysis of Land Subsidence Monitoring in Mining Area with Time-Series Insar Technology

    Science.gov (United States)

    Sun, N.; Wang, Y. J.

    2018-04-01

    Time-series InSAR technology has become a popular land subsidence monitoring method in recent years, because of its advantages such as high accuracy, wide area, low expenditure, intensive monitoring points and free from accessibility restrictions. In this paper, we applied two kinds of satellite data, ALOS PALSAR and RADARSAT-2, to get the subsidence monitoring results of the study area in two time periods by time-series InSAR technology. By analyzing the deformation range, rate and amount, the time-series analysis of land subsidence in mining area was realized. The results show that InSAR technology could be used to monitor land subsidence in large area and meet the demand of subsidence monitoring in mining area.

  10. Subtypes of autism by cluster analysis based on structural MRI data.

    Science.gov (United States)

    Hrdlicka, Michal; Dudova, Iva; Beranova, Irena; Lisy, Jiri; Belsan, Tomas; Neuwirth, Jiri; Komarek, Vladimir; Faladova, Ludvika; Havlovicova, Marketa; Sedlacek, Zdenek; Blatny, Marek; Urbanek, Tomas

    2005-05-01

    The aim of our study was to subcategorize Autistic Spectrum Disorders (ASD) using a multidisciplinary approach. Sixty four autistic patients (mean age 9.4+/-5.6 years) were entered into a cluster analysis. The clustering analysis was based on MRI data. The clusters obtained did not differ significantly in the overall severity of autistic symptomatology as measured by the total score on the Childhood Autism Rating Scale (CARS). The clusters could be characterized as showing significant differences: Cluster 1: showed the largest sizes of the genu and splenium of the corpus callosum (CC), the lowest pregnancy order and the lowest frequency of facial dysmorphic features. Cluster 2: showed the largest sizes of the amygdala and hippocampus (HPC), the least abnormal visual response on the CARS, the lowest frequency of epilepsy and the least frequent abnormal psychomotor development during the first year of life. Cluster 3: showed the largest sizes of the caput of the nucleus caudatus (NC), the smallest sizes of the HPC and facial dysmorphic features were always present. Cluster 4: showed the smallest sizes of the genu and splenium of the CC, as well as the amygdala, and caput of the NC, the most abnormal visual response on the CARS, the highest frequency of epilepsy, the highest pregnancy order, abnormal psychomotor development during the first year of life was always present and facial dysmorphic features were always present. This multidisciplinary approach seems to be a promising method for subtyping autism.

  11. Schedulability Analysis and Optimization for the Synthesis of Multi-Cluster Distributed Embedded Systems

    DEFF Research Database (Denmark)

    Pop, Paul; Eles, Petru; Peng, Zebo

    2003-01-01

    We present an approach to schedulability analysis for the synthesis of multi-cluster distributed embedded systems consisting of time-triggered and event-triggered clusters, interconnected via gateways. We have also proposed a buffer size and worst case queuing delay analysis for the gateways......, responsible for routing inter-cluster traffic. Optimization heuristics for the priority assignment and synthesis of bus access parameters aimed at producing a schedulable system with minimal buffer needs have been proposed. Extensive experiments and a real-life example show the efficiency of our approaches....

  12. Schedulability Analysis and Optimization for the Synthesis of Multi-Cluster Distributed Embedded Systems

    DEFF Research Database (Denmark)

    Pop, Paul; Eles, Petru; Peng, Zebo

    2003-01-01

    An approach to schedulability analysis for the synthesis of multi-cluster distributed embedded systems consisting of time-triggered and event-triggered clusters, interconnected via gateways, is presented. A buffer size and worst case queuing delay analysis for the gateways, responsible for routing...... inter-cluster traffic, is also proposed. Optimisation heuristics for the priority assignment and synthesis of bus access parameters aimed at producing a schedulable system with minimal buffer needs have been proposed. Extensive experiments and a real-life example show the efficiency of the approaches....

  13. Clustering applications in financial and economic analysis of the crop production in the Russian regions

    Directory of Open Access Journals (Sweden)

    Gromov Vladislav Vladimirovich

    2013-08-01

    Full Text Available We used the complex mathematical modeling, multivariate statistical-analysis, fuzzy sets to analyze the financial and economic state of the crop production in Russian regions. We developed a system of indicators, detecting the state agricultural sector in the region, based on the results of correlation, factor, cluster analysis and statistics of the Federal State Statistics Service. We performed clustering analyses to divide regions of Russia on selected factors into five groups. A qualitative and quantitative characteristics of each cluster was received.

  14. Factors and competitiveness analysis in rare earth mining, new methodology: case study from Brazil

    Directory of Open Access Journals (Sweden)

    Gustavo A. Silva

    2018-03-01

    Full Text Available Rare earths are increasingly being applied in high-tech industries, such as green energy (e.g. wind power, hybrid cars, electric cars, permanent high-performance magnets, superconductors, luminophores and many other industrial sectors involved in modern technologies. Given that China dominates this market and imposes restrictions on production and exports whenever opportunities arise, it is becoming more and more challenging to develop business ventures in this sector. Several initiatives were taken to prospect new resources and develop the production chain, including the mining of these mineral assets around the world, but some factors of uncertainties, including current low prices, increased the challenge of transforming the current resources into deposits or productive mines. Thus, analyzing the competitiveness of advanced projects becomes indispensable. This work has the objective of introducing a new methodology of competitiveness analysis, where some variables are considered as main factors that can contribute strongly to make unfeasible a mining enterprise for the use of rare earth elements (REE with this methodology, which is quite practical and reproducible, it was possible to verify some real facts, such as: the fact that the Lynas Mount Weld CLD (AUS Project is resilient to the uncertainties of the RE sector, at the same time as the Molycorp Project is facing major financial difficulties (under judicial reorganization. It was also possible to verify that the Araxá Project of CBMM in Brazil is one of the most competitive in this country. Thus, we contribute to the existing literature, providing a new methodology for competitiveness analysis in rare earth mining. Keywords: Earth sciences, Business, Economics, Industry

  15. Application of Elements of TPM Strategy for Operation Analysis of Mining Machine

    Science.gov (United States)

    Brodny, Jaroslaw; Tutak, Magdalena

    2017-12-01

    Total Productive Maintenance (TPM) strategy includes group of activities and actions in order to maintenance machines in failure-free state and without breakdowns thanks to tending limitation of failures, non-planned shutdowns, lacks and non-planned service of machines. These actions are ordered to increase effectiveness of utilization of possessed devices and machines in company. Very significant element of this strategy is connection of technical actions with changes in their perception by employees. Whereas fundamental aim of introduction this strategy is improvement of economic efficiency of enterprise. Increasing competition and necessity of reduction of production costs causes that also mining enterprises are forced to introduce this strategy. In the paper examples of use of OEE model for quantitative evaluation of selected mining devices were presented. OEE model is quantitative tool of TPM strategy and can be the base for further works connected with its introduction. OEE indicator is the product of three components which include availability and performance of the studied machine and the quality of the obtained product. The paper presents the results of the effectiveness analysis of the use of a set of mining machines included in the longwall system, which is the first and most important link in the technological line of coal production. The set of analyzed machines included the longwall shearer, armored face conveyor and cruscher. From a reliability point of view, the analyzed set of machines is a system that is characterized by the serial structure. The analysis was based on data recorded by the industrial automation system used in the mines. This method of data acquisition ensured their high credibility and a full time synchronization. Conclusions from the research and analyses should be used to reduce breakdowns, failures and unplanned downtime, increase performance and improve production quality.

  16. Factors and competitiveness analysis in rare earth mining, new methodology: case study from Brazil.

    Science.gov (United States)

    Silva, Gustavo A; Petter, Carlos O; Albuquerque, Nelson R

    2018-03-01

    Rare earths are increasingly being applied in high-tech industries, such as green energy (e.g. wind power), hybrid cars, electric cars, permanent high-performance magnets, superconductors, luminophores and many other industrial sectors involved in modern technologies. Given that China dominates this market and imposes restrictions on production and exports whenever opportunities arise, it is becoming more and more challenging to develop business ventures in this sector. Several initiatives were taken to prospect new resources and develop the production chain, including the mining of these mineral assets around the world, but some factors of uncertainties, including current low prices, increased the challenge of transforming the current resources into deposits or productive mines. Thus, analyzing the competitiveness of advanced projects becomes indispensable. This work has the objective of introducing a new methodology of competitiveness analysis, where some variables are considered as main factors that can contribute strongly to make unfeasible a mining enterprise for the use of rare earth elements (REE) with this methodology, which is quite practical and reproducible, it was possible to verify some real facts, such as: the fact that the Lynas Mount Weld CLD (AUS) Project is resilient to the uncertainties of the RE sector, at the same time as the Molycorp Project is facing major financial difficulties (under judicial reorganization). It was also possible to verify that the Araxá Project of CBMM in Brazil is one of the most competitive in this country. Thus, we contribute to the existing literature, providing a new methodology for competitiveness analysis in rare earth mining.

  17. FLOCK cluster analysis of mast cell event clustering by high-sensitivity flow cytometry predicts systemic mastocytosis.

    Science.gov (United States)

    Dorfman, David M; LaPlante, Charlotte D; Pozdnyakova, Olga; Li, Betty

    2015-11-01

    In our high-sensitivity flow cytometric approach for systemic mastocytosis (SM), we identified mast cell event clustering as a new diagnostic criterion for the disease. To objectively characterize mast cell gated event distributions, we performed cluster analysis using FLOCK, a computational approach to identify cell subsets in multidimensional flow cytometry data in an unbiased, automated fashion. FLOCK identified discrete mast cell populations in most cases of SM (56/75 [75%]) but only a minority of non-SM cases (17/124 [14%]). FLOCK-identified mast cell populations accounted for 2.46% of total cells on average in SM cases and 0.09% of total cells on average in non-SM cases (P < .0001) and were predictive of SM, with a sensitivity of 75%, a specificity of 86%, a positive predictive value of 76%, and a negative predictive value of 85%. FLOCK analysis provides useful diagnostic information for evaluating patients with suspected SM, and may be useful for the analysis of other hematopoietic neoplasms. Copyright© by the American Society for Clinical Pathology.

  18. SU-E-J-98: Radiogenomics: Correspondence Between Imaging and Genetic Features Based On Clustering Analysis

    International Nuclear Information System (INIS)

    Harmon, S; Wendelberger, B; Jeraj, R

    2014-01-01

    Purpose: Radiogenomics aims to establish relationships between patient genotypes and imaging phenotypes. An open question remains on how best to integrate information from these distinct datasets. This work investigates if similarities in genetic features across patients correspond to similarities in PET-imaging features, assessed with various clustering algorithms. Methods: [ 18 F]FDG PET data was obtained for 26 NSCLC patients from a public database (TCIA). Tumors were contoured using an in-house segmentation algorithm combining gradient and region-growing techniques; resulting ROIs were used to extract 54 PET-based features. Corresponding genetic microarray data containing 48,778 elements were also obtained for each tumor. Given mismatch in feature sizes, two dimension reduction techniques were also applied to the genetic data: principle component analysis (PCA) and selective filtering of 25 NSCLC-associated genes-ofinterest (GOI). Gene datasets (full, PCA, and GOI) and PET feature datasets were independently clustered using K-means and hierarchical clustering using variable number of clusters (K). Jaccard Index (JI) was used to score similarity of cluster assignments across different datasets. Results: Patient clusters from imaging data showed poor similarity to clusters from gene datasets, regardless of clustering algorithms or number of clusters (JI mean = 0.3429±0.1623). Notably, we found clustering algorithms had different sensitivities to data reduction techniques. Using hierarchical clustering, the PCA dataset showed perfect cluster agreement to the full-gene set (JI =1) for all values of K, and the agreement between the GOI set and the full-gene set decreased as number of clusters increased (JI=0.9231 and 0.5769 for K=2 and 5, respectively). K-means clustering assignments were highly sensitive to data reduction and showed poor stability for different values of K (JI range : 0.2301–1). Conclusion: Using commonly-used clustering algorithms, we found

  19. SU-E-J-98: Radiogenomics: Correspondence Between Imaging and Genetic Features Based On Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Harmon, S; Wendelberger, B [University of Wisconsin-Madison, Madison, WI (United States); Jeraj, R [University of Wisconsin-Madison, Madison, WI (United States); University of Ljubljana (Slovenia)

    2014-06-01

    Purpose: Radiogenomics aims to establish relationships between patient genotypes and imaging phenotypes. An open question remains on how best to integrate information from these distinct datasets. This work investigates if similarities in genetic features across patients correspond to similarities in PET-imaging features, assessed with various clustering algorithms. Methods: [{sup 18}F]FDG PET data was obtained for 26 NSCLC patients from a public database (TCIA). Tumors were contoured using an in-house segmentation algorithm combining gradient and region-growing techniques; resulting ROIs were used to extract 54 PET-based features. Corresponding genetic microarray data containing 48,778 elements were also obtained for each tumor. Given mismatch in feature sizes, two dimension reduction techniques were also applied to the genetic data: principle component analysis (PCA) and selective filtering of 25 NSCLC-associated genes-ofinterest (GOI). Gene datasets (full, PCA, and GOI) and PET feature datasets were independently clustered using K-means and hierarchical clustering using variable number of clusters (K). Jaccard Index (JI) was used to score similarity of cluster assignments across different datasets. Results: Patient clusters from imaging data showed poor similarity to clusters from gene datasets, regardless of clustering algorithms or number of clusters (JI{sub mean}= 0.3429±0.1623). Notably, we found clustering algorithms had different sensitivities to data reduction techniques. Using hierarchical clustering, the PCA dataset showed perfect cluster agreement to the full-gene set (JI =1) for all values of K, and the agreement between the GOI set and the full-gene set decreased as number of clusters increased (JI=0.9231 and 0.5769 for K=2 and 5, respectively). K-means clustering assignments were highly sensitive to data reduction and showed poor stability for different values of K (JI{sub range}: 0.2301–1). Conclusion: Using commonly-used clustering algorithms

  20. 75 FR 17529 - High-Voltage Continuous Mining Machine Standard for Underground Coal Mines

    Science.gov (United States)

    2010-04-06

    ... High-Voltage Continuous Mining Machine Standard for Underground Coal Mines AGENCY: Mine Safety and... of high-voltage continuous mining machines in underground coal mines. It also revises MSHA's design...-- Underground Coal Mines III. Section-by-Section Analysis A. Part 18--Electric Motor-Driven Mine Equipment and...