WorldWideScience

Sample records for pattern mining algorithm

  1. Frequent Pattern Mining Algorithms for Data Clustering

    DEFF Research Database (Denmark)

    Zimek, Arthur; Assent, Ira; Vreeken, Jilles

    2014-01-01

    that frequent pattern mining was at the cradle of subspace clustering—yet, it quickly developed into an independent research field. In this chapter, we discuss how frequent pattern mining algorithms have been extended and generalized towards the discovery of local clusters in high-dimensional data......Discovering clusters in subspaces, or subspace clustering and related clustering paradigms, is a research field where we find many frequent pattern mining related influences. In fact, as the first algorithms for subspace clustering were based on frequent pattern mining algorithms, it is fair to say....... In particular, we discuss several example algorithms for subspace clustering or projected clustering as well as point out recent research questions and open topics in this area relevant to researchers in either clustering or pattern mining...

  2. Research on parallel algorithm for sequential pattern mining

    Science.gov (United States)

    Zhou, Lijuan; Qin, Bai; Wang, Yu; Hao, Zhongxiao

    2008-03-01

    Sequential pattern mining is the mining of frequent sequences related to time or other orders from the sequence database. Its initial motivation is to discover the laws of customer purchasing in a time section by finding the frequent sequences. In recent years, sequential pattern mining has become an important direction of data mining, and its application field has not been confined to the business database and has extended to new data sources such as Web and advanced science fields such as DNA analysis. The data of sequential pattern mining has characteristics as follows: mass data amount and distributed storage. Most existing sequential pattern mining algorithms haven't considered the above-mentioned characteristics synthetically. According to the traits mentioned above and combining the parallel theory, this paper puts forward a new distributed parallel algorithm SPP(Sequential Pattern Parallel). The algorithm abides by the principal of pattern reduction and utilizes the divide-and-conquer strategy for parallelization. The first parallel task is to construct frequent item sets applying frequent concept and search space partition theory and the second task is to structure frequent sequences using the depth-first search method at each processor. The algorithm only needs to access the database twice and doesn't generate the candidated sequences, which abates the access time and improves the mining efficiency. Based on the random data generation procedure and different information structure designed, this paper simulated the SPP algorithm in a concrete parallel environment and implemented the AprioriAll algorithm. The experiments demonstrate that compared with AprioriAll, the SPP algorithm had excellent speedup factor and efficiency.

  3. Efficient frequent pattern mining algorithm based on node sets in cloud computing environment

    Science.gov (United States)

    Billa, V. N. Vinay Kumar; Lakshmanna, K.; Rajesh, K.; Reddy, M. Praveen Kumar; Nagaraja, G.; Sudheer, K.

    2017-11-01

    The ultimate goal of Data Mining is to determine the hidden information which is useful in making decisions using the large databases collected by an organization. This Data Mining involves many tasks that are to be performed during the process. Mining frequent itemsets is the one of the most important tasks in case of transactional databases. These transactional databases contain the data in very large scale where the mining of these databases involves the consumption of physical memory and time in proportion to the size of the database. A frequent pattern mining algorithm is said to be efficient only if it consumes less memory and time to mine the frequent itemsets from the given large database. Having these points in mind in this thesis we proposed a system which mines frequent itemsets in an optimized way in terms of memory and time by using cloud computing as an important factor to make the process parallel and the application is provided as a service. A complete framework which uses a proven efficient algorithm called FIN algorithm. FIN algorithm works on Nodesets and POC (pre-order coding) tree. In order to evaluate the performance of the system we conduct the experiments to compare the efficiency of the same algorithm applied in a standalone manner and in cloud computing environment on a real time data set which is traffic accidents data set. The results show that the memory consumption and execution time taken for the process in the proposed system is much lesser than those of standalone system.

  4. A Comparative Study of Frequent and Maximal Periodic Pattern Mining Algorithms in Spatiotemporal Databases

    Science.gov (United States)

    Obulesu, O.; Rama Mohan Reddy, A., Dr; Mahendra, M.

    2017-08-01

    Detecting regular and efficient cyclic models is the demanding activity for data analysts due to unstructured, vigorous and enormous raw information produced from web. Many existing approaches generate large candidate patterns in the occurrence of huge and complex databases. In this work, two novel algorithms are proposed and a comparative examination is performed by considering scalability and performance parameters. The first algorithm is, EFPMA (Extended Regular Model Detection Algorithm) used to find frequent sequential patterns from the spatiotemporal dataset and the second one is, ETMA (Enhanced Tree-based Mining Algorithm) for detecting effective cyclic models with symbolic database representation. EFPMA is an algorithm grows models from both ends (prefixes and suffixes) of detected patterns, which results in faster pattern growth because of less levels of database projection compared to existing approaches such as Prefixspan and SPADE. ETMA uses distinct notions to store and manage transactions data horizontally such as segment, sequence and individual symbols. ETMA exploits a partition-and-conquer method to find maximal patterns by using symbolic notations. Using this algorithm, we can mine cyclic models in full-series sequential patterns including subsection series also. ETMA reduces the memory consumption and makes use of the efficient symbolic operation. Furthermore, ETMA only records time-series instances dynamically, in terms of character, series and section approaches respectively. The extent of the pattern and proving efficiency of the reducing and retrieval techniques from synthetic and actual datasets is a really open & challenging mining problem. These techniques are useful in data streams, traffic risk analysis, medical diagnosis, DNA sequence Mining, Earthquake prediction applications. Extensive investigational outcomes illustrates that the algorithms outperforms well towards efficiency and scalability than ECLAT, STNR and MAFIA approaches.

  5. Pattern recognition algorithms for data mining scalability, knowledge discovery and soft granular computing

    CERN Document Server

    Pal, Sankar K

    2004-01-01

    Pattern Recognition Algorithms for Data Mining addresses different pattern recognition (PR) tasks in a unified framework with both theoretical and experimental results. Tasks covered include data condensation, feature selection, case generation, clustering/classification, and rule generation and evaluation. This volume presents various theories, methodologies, and algorithms, using both classical approaches and hybrid paradigms. The authors emphasize large datasets with overlapping, intractable, or nonlinear boundary classes, and datasets that demonstrate granular computing in soft frameworks.Organized into eight chapters, the book begins with an introduction to PR, data mining, and knowledge discovery concepts. The authors analyze the tasks of multi-scale data condensation and dimensionality reduction, then explore the problem of learning with support vector machine (SVM). They conclude by highlighting the significance of granular computing for different mining tasks in a soft paradigm.

  6. Personal continuous route pattern mining

    Institute of Scientific and Technical Information of China (English)

    Qian YE; Ling CHEN; Gen-cai CHEN

    2009-01-01

    In the daily life, people often repeat regular routes in certain periods. In this paper, a mining system is developed to find the continuous route patterns of personal past trips. In order to count the diversity of personal moving status, the mining system employs the adaptive GPS data recording and five data filters to guarantee the clean trips data. The mining system uses a client/server architecture to protect personal privacy and to reduce the computational load. The server conducts the main mining procedure but with insufficient information to recover real personal routes. In order to improve the scalability of sequential pattern mining, a novel pattern mining algorithm, continuous route pattern mining (CRPM), is proposed. This algorithm can tolerate the different disturbances in real routes and extract the frequent patterns. Experimental results based on nine persons' trips show that CRPM can extract more than two times longer route patterns than the traditional route pattern mining algorithms.

  7. An novel frequent probability pattern mining algorithm based on circuit simulation method in uncertain biological networks

    Science.gov (United States)

    2014-01-01

    Background Motif mining has always been a hot research topic in bioinformatics. Most of current research on biological networks focuses on exact motif mining. However, due to the inevitable experimental error and noisy data, biological network data represented as the probability model could better reflect the authenticity and biological significance, therefore, it is more biological meaningful to discover probability motif in uncertain biological networks. One of the key steps in probability motif mining is frequent pattern discovery which is usually based on the possible world model having a relatively high computational complexity. Methods In this paper, we present a novel method for detecting frequent probability patterns based on circuit simulation in the uncertain biological networks. First, the partition based efficient search is applied to the non-tree like subgraph mining where the probability of occurrence in random networks is small. Then, an algorithm of probability isomorphic based on circuit simulation is proposed. The probability isomorphic combines the analysis of circuit topology structure with related physical properties of voltage in order to evaluate the probability isomorphism between probability subgraphs. The circuit simulation based probability isomorphic can avoid using traditional possible world model. Finally, based on the algorithm of probability subgraph isomorphism, two-step hierarchical clustering method is used to cluster subgraphs, and discover frequent probability patterns from the clusters. Results The experiment results on data sets of the Protein-Protein Interaction (PPI) networks and the transcriptional regulatory networks of E. coli and S. cerevisiae show that the proposed method can efficiently discover the frequent probability subgraphs. The discovered subgraphs in our study contain all probability motifs reported in the experiments published in other related papers. Conclusions The algorithm of probability graph isomorphism

  8. Algorithms for Regular Tree Grammar Network Search and Their Application to Mining Human-viral Infection Patterns.

    Science.gov (United States)

    Smoly, Ilan; Carmel, Amir; Shemer-Avni, Yonat; Yeger-Lotem, Esti; Ziv-Ukelson, Michal

    2016-03-01

    Network querying is a powerful approach to mine molecular interaction networks. Most state-of-the-art network querying tools either confine the search to a prespecified topology in the form of some template subnetwork, or do not specify any topological constraints at all. Another approach is grammar-based queries, which are more flexible and expressive as they allow for expressing the topology of the sought pattern according to some grammar-based logic. Previous grammar-based network querying tools were confined to the identification of paths. In this article, we extend the patterns identified by grammar-based query approaches from paths to trees. For this, we adopt a higher order query descriptor in the form of a regular tree grammar (RTG). We introduce a novel problem and propose an algorithm to search a given graph for the k highest scoring subgraphs matching a tree accepted by an RTG. Our algorithm is based on the combination of dynamic programming with color coding, and includes an extension of previous k-best parsing optimization approaches to avoid isomorphic trees in the output. We implement the new algorithm and exemplify its application to mining viral infection patterns within molecular interaction networks. Our code is available online.

  9. Making Pattern Mining Useful

    NARCIS (Netherlands)

    Vreeken, J.

    2009-01-01

    The discovery of patterns plays an important role in data mining. A pattern can be any type of regularity displayed in that data, such as, e.g. which items are typically sold together, which genes are mostly active for patients of a certain disease, etc, etc. Generally speaking, finding a pattern is

  10. URL Mining Using Agglomerative Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Chinmay R. Deshmukh

    2015-02-01

    Full Text Available Abstract The tremendous growth of the web world incorporates application of data mining techniques to the web logs. Data Mining and World Wide Web encompasses an important and active area of research. Web log mining is analysis of web log files with web pages sequences. Web mining is broadly classified as web content mining web usage mining and web structure mining. Web usage mining is a technique to discover usage patterns from Web data in order to understand and better serve the needs of Web-based applications. URL mining refers to a subclass of Web mining that helps us to investigate the details of a Uniform Resource Locator. URL mining can be advantageous in the fields of security and protection. The paper introduces a technique for mining a collection of user transactions with an Internet search engine to discover clusters of similar queries and similar URLs. The information we exploit is a clickthrough data each record consist of a users query to a search engine along with the URL which the user selected from among the candidates offered by search engine. By viewing this dataset as a bipartite graph with the vertices on one side corresponding to queries and on the other side to URLs one can apply an agglomerative clustering algorithm to the graphs vertices to identify related queries and URLs.

  11. Efficient constraint-based Sequential Pattern Mining (SPM algorithm to understand customers’ buying behaviour from time stamp-based sequence dataset

    Directory of Open Access Journals (Sweden)

    Niti Ashish Kumar Desai

    2015-12-01

    Full Text Available Business Strategies are formulated based on an understanding of customer needs. This requires development of a strategy to understand customer behaviour and buying patterns, both current and future. This involves understanding, first how an organization currently understands customer needs and second predicting future trends to drive growth. This article focuses on purchase trend of customer, where timing of purchase is more important than association of item to be purchased, and which can be found out with Sequential Pattern Mining (SPM methods. Conventional SPM algorithms worked purely on frequency identifying patterns that were more frequent but suffering from challenges like generation of huge number of uninteresting patterns, lack of user’s interested patterns, rare item problem, etc. Article attempts a solution through development of a SPM algorithm based on various constraints like Gap, Compactness, Item, Recency, Profitability and Length along with Frequency constraint. Incorporation of six additional constraints is as well to ensure that all patterns are recently active (Recency, active for certain time span (Compactness, profitable and indicative of next timeline for purchase (Length―Item―Gap. The article also attempts to throw light on how proposed Constraint-based Prefix Span algorithm is helpful to understand buying behaviour of customer which is in formative stage.

  12. A direct mining approach to efficient constrained graph pattern discovery

    DEFF Research Database (Denmark)

    Zhu, Feida; Zhang, Zequn; Qu, Qiang

    2013-01-01

    Despite the wealth of research on frequent graph pattern mining, how to efficiently mine the complete set of those with constraints still poses a huge challenge to the existing algorithms mainly due to the inherent bottleneck in the mining paradigm. In essence, mining requests with explicitly-spe...

  13. Parallel Algorithms and Patterns

    Energy Technology Data Exchange (ETDEWEB)

    Robey, Robert W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-06-16

    This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.

  14. A Mining Algorithm for Extracting Decision Process Data Models

    Directory of Open Access Journals (Sweden)

    Cristina-Claudia DOLEAN

    2011-01-01

    Full Text Available The paper introduces an algorithm that mines logs of user interaction with simulation software. It outputs a model that explicitly shows the data perspective of the decision process, namely the Decision Data Model (DDM. In the first part of the paper we focus on how the DDM is extracted by our mining algorithm. We introduce it as pseudo-code and, then, provide explanations and examples of how it actually works. In the second part of the paper, we use a series of small case studies to prove the robustness of the mining algorithm and how it deals with the most common patterns we found in real logs.

  15. Contrast data mining concepts, algorithms, and applications

    CERN Document Server

    Dong, Guozhu

    2012-01-01

    A Fruitful Field for Researching Data Mining Methodology and for Solving Real-Life Problems Contrast Data Mining: Concepts, Algorithms, and Applications collects recent results from this specialized area of data mining that have previously been scattered in the literature, making them more accessible to researchers and developers in data mining and other fields. The book not only presents concepts and techniques for contrast data mining, but also explores the use of contrast mining to solve challenging problems in various scientific, medical, and business domains. Learn from Real Case Studies

  16. The Top Ten Algorithms in Data Mining

    CERN Document Server

    Wu, Xindong

    2009-01-01

    From classification and clustering to statistical learning, association analysis, and link mining, this book covers the most important topics in data mining research. It presents the ten most influential algorithms used in the data mining community today. Each chapter provides a detailed description of the algorithm, a discussion of available software implementation, advanced topics, and exercises. With a simple data set, examples illustrate how each algorithm works and highlight the overall performance of each algorithm in a real-world application. Featuring contributions from leading researc

  17. Frequent pattern mining

    CERN Document Server

    Aggarwal, Charu C

    2014-01-01

    Proposes numerous methods to solve some of the most fundamental problems in data mining and machine learning Presents various simplified perspectives, providing a range of information to benefit both students and practitioners Includes surveys on key research content, case studies and future research directions

  18. An Incremental High-Utility Mining Algorithm with Transaction Insertion

    Science.gov (United States)

    Gan, Wensheng; Zhang, Binbin

    2015-01-01

    Association-rule mining is commonly used to discover useful and meaningful patterns from a very large database. It only considers the occurrence frequencies of items to reveal the relationships among itemsets. Traditional association-rule mining is, however, not suitable in real-world applications since the purchased items from a customer may have various factors, such as profit or quantity. High-utility mining was designed to solve the limitations of association-rule mining by considering both the quantity and profit measures. Most algorithms of high-utility mining are designed to handle the static database. Fewer researches handle the dynamic high-utility mining with transaction insertion, thus requiring the computations of database rescan and combination explosion of pattern-growth mechanism. In this paper, an efficient incremental algorithm with transaction insertion is designed to reduce computations without candidate generation based on the utility-list structures. The enumeration tree and the relationships between 2-itemsets are also adopted in the proposed algorithm to speed up the computations. Several experiments are conducted to show the performance of the proposed algorithm in terms of runtime, memory consumption, and number of generated patterns. PMID:25811038

  19. An Incremental High-Utility Mining Algorithm with Transaction Insertion

    Directory of Open Access Journals (Sweden)

    Jerry Chun-Wei Lin

    2015-01-01

    Full Text Available Association-rule mining is commonly used to discover useful and meaningful patterns from a very large database. It only considers the occurrence frequencies of items to reveal the relationships among itemsets. Traditional association-rule mining is, however, not suitable in real-world applications since the purchased items from a customer may have various factors, such as profit or quantity. High-utility mining was designed to solve the limitations of association-rule mining by considering both the quantity and profit measures. Most algorithms of high-utility mining are designed to handle the static database. Fewer researches handle the dynamic high-utility mining with transaction insertion, thus requiring the computations of database rescan and combination explosion of pattern-growth mechanism. In this paper, an efficient incremental algorithm with transaction insertion is designed to reduce computations without candidate generation based on the utility-list structures. The enumeration tree and the relationships between 2-itemsets are also adopted in the proposed algorithm to speed up the computations. Several experiments are conducted to show the performance of the proposed algorithm in terms of runtime, memory consumption, and number of generated patterns.

  20. An Evolutionary Algorithm to Mine High-Utility Itemsets

    Directory of Open Access Journals (Sweden)

    Jerry Chun-Wei Lin

    2015-01-01

    Full Text Available High-utility itemset mining (HUIM is a critical issue in recent years since it can be used to reveal the profitable products by considering both the quantity and profit factors instead of frequent itemset mining (FIM of association rules (ARs. In this paper, an evolutionary algorithm is presented to efficiently mine high-utility itemsets (HUIs based on the binary particle swarm optimization. A maximal pattern (MP-tree strcutrue is further designed to solve the combinational problem in the evolution process. Substantial experiments on real-life datasets show that the proposed binary PSO-based algorithm has better results compared to the state-of-the-art GA-based algorithm.

  1. Handling Dynamic Weights in Weighted Frequent Pattern Mining

    Science.gov (United States)

    Ahmed, Chowdhury Farhan; Tanbeer, Syed Khairuzzaman; Jeong, Byeong-Soo; Lee, Young-Koo

    Even though weighted frequent pattern (WFP) mining is more effective than traditional frequent pattern mining because it can consider different semantic significances (weights) of items, existing WFP algorithms assume that each item has a fixed weight. But in real world scenarios, the weight (price or significance) of an item can vary with time. Reflecting these changes in item weight is necessary in several mining applications, such as retail market data analysis and web click stream analysis. In this paper, we introduce the concept of a dynamic weight for each item, and propose an algorithm, DWFPM (dynamic weighted frequent pattern mining), that makes use of this concept. Our algorithm can address situations where the weight (price or significance) of an item varies dynamically. It exploits a pattern growth mining technique to avoid the level-wise candidate set generation-and-test methodology. Furthermore, it requires only one database scan, so it is eligible for use in stream data mining. An extensive performance analysis shows that our algorithm is efficient and scalable for WFP mining using dynamic weights.

  2. A node linkage approach for sequential pattern mining.

    Directory of Open Access Journals (Sweden)

    Osvaldo Navarro

    Full Text Available Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT, has better performance and scalability in comparison with state of the art algorithms.

  3. An algorithm, implementation and execution ontology design pattern

    NARCIS (Netherlands)

    Lawrynowicz, A.; Esteves, D.; Panov, P.; Soru, T.; Dzeroski, S.; Vanschoren, J.

    2016-01-01

    This paper describes an ontology design pattern for modeling algorithms, their implementations and executions. This pattern is derived from the research results on data mining/machine learning ontologies, but is more generic. We argue that the proposed pattern will foster the development of

  4. A partition enhanced mining algorithm for distributed association rule mining systems

    Directory of Open Access Journals (Sweden)

    A.O. Ogunde

    2015-11-01

    Full Text Available The extraction of patterns and rules from large distributed databases through existing Distributed Association Rule Mining (DARM systems is still faced with enormous challenges such as high response times, high communication costs and inability to adapt to the constantly changing databases. In this work, a Partition Enhanced Mining Algorithm (PEMA is presented to address these problems. In PEMA, the Association Rule Mining Coordinating Agent receives a request and decides the appropriate data sites, partitioning strategy and mining agents to use. The mining process is divided into two stages. In the first stage, the data agents horizontally segment the databases with small average transaction length into relatively smaller partitions based on the number of available sites and the available memory. On the other hand, databases with relatively large average transaction length were vertically partitioned. After this, Mobile Agent-Based Association Rule Mining-Agents, which are the mining agents, carry out the discovery of the local frequent itemsets. At the second stage, the local frequent itemsets were incrementally integrated by the from one data site to another to get the global frequent itemsets. This reduced the response time and communication cost in the system. Results from experiments conducted on real datasets showed that the average response time of PEMA showed an improvement over existing algorithms. Similarly, PEMA incurred lower communication costs with average size of messages exchanged lower when compared with benchmark DARM systems. This result showed that PEMA could be efficiently deployed for efficient discovery of valuable knowledge in distributed databases.

  5. Quantum algorithm for association rules mining

    Science.gov (United States)

    Yu, Chao-Hua; Gao, Fei; Wang, Qing-Le; Wen, Qiao-Yan

    2016-10-01

    Association rules mining (ARM) is one of the most important problems in knowledge discovery and data mining. Given a transaction database that has a large number of transactions and items, the task of ARM is to acquire consumption habits of customers by discovering the relationships between itemsets (sets of items). In this paper, we address ARM in the quantum settings and propose a quantum algorithm for the key part of ARM, finding frequent itemsets from the candidate itemsets and acquiring their supports. Specifically, for the case in which there are Mf(k ) frequent k -itemsets in the Mc(k ) candidate k -itemsets (Mf(k )≤Mc(k ) ), our algorithm can efficiently mine these frequent k -itemsets and estimate their supports by using parallel amplitude estimation and amplitude amplification with complexity O (k/√{Mc(k )Mf(k ) } ɛ ) , where ɛ is the error for estimating the supports. Compared with the classical counterpart, i.e., the classical sampling-based algorithm, whose complexity is O (k/Mc(k ) ɛ2) , our quantum algorithm quadratically improves the dependence on both ɛ and Mc(k ) in the best case when Mf(k )≪Mc(k ) and on ɛ alone in the worst case when Mf(k )≈Mc(k ) .

  6. Developing and Implementing the Data Mining Algorithms in RAVEN

    International Nuclear Information System (INIS)

    Sen, Ramazan Sonat; Maljovec, Daniel Patrick; Alfonsi, Andrea; Rabiti, Cristian

    2015-01-01

    The RAVEN code is becoming a comprehensive tool to perform probabilistic risk assessment, uncertainty quantification, and verification and validation. The RAVEN code is being developed to support many programs and to provide a set of methodologies and algorithms for advanced analysis. Scientific computer codes can generate enormous amounts of data. To post-process and analyze such data might, in some cases, take longer than the initial software runtime. Data mining algorithms/methods help in recognizing and understanding patterns in the data, and thus discover knowledge in databases. The methodologies used in the dynamic probabilistic risk assessment or in uncertainty and error quantification analysis couple system/physics codes with simulation controller codes, such as RAVEN. RAVEN introduces both deterministic and stochastic elements into the simulation while the system/physics code model the dynamics deterministically. A typical analysis is performed by sampling values of a set of parameter values. A major challenge in using dynamic probabilistic risk assessment or uncertainty and error quantification analysis for a complex system is to analyze the large number of scenarios generated. Data mining techniques are typically used to better organize and understand data, i.e. recognizing patterns in the data. This report focuses on development and implementation of Application Programming Interfaces (APIs) for different data mining algorithms, and the application of these algorithms to different databases.

  7. Developing and Implementing the Data Mining Algorithms in RAVEN

    Energy Technology Data Exchange (ETDEWEB)

    Sen, Ramazan Sonat [Idaho National Lab. (INL), Idaho Falls, ID (United States); Maljovec, Daniel Patrick [Idaho National Lab. (INL), Idaho Falls, ID (United States); Alfonsi, Andrea [Idaho National Lab. (INL), Idaho Falls, ID (United States); Rabiti, Cristian [Idaho National Lab. (INL), Idaho Falls, ID (United States)

    2015-09-01

    The RAVEN code is becoming a comprehensive tool to perform probabilistic risk assessment, uncertainty quantification, and verification and validation. The RAVEN code is being developed to support many programs and to provide a set of methodologies and algorithms for advanced analysis. Scientific computer codes can generate enormous amounts of data. To post-process and analyze such data might, in some cases, take longer than the initial software runtime. Data mining algorithms/methods help in recognizing and understanding patterns in the data, and thus discover knowledge in databases. The methodologies used in the dynamic probabilistic risk assessment or in uncertainty and error quantification analysis couple system/physics codes with simulation controller codes, such as RAVEN. RAVEN introduces both deterministic and stochastic elements into the simulation while the system/physics code model the dynamics deterministically. A typical analysis is performed by sampling values of a set of parameter values. A major challenge in using dynamic probabilistic risk assessment or uncertainty and error quantification analysis for a complex system is to analyze the large number of scenarios generated. Data mining techniques are typically used to better organize and understand data, i.e. recognizing patterns in the data. This report focuses on development and implementation of Application Programming Interfaces (APIs) for different data mining algorithms, and the application of these algorithms to different databases.

  8. Classification of Internet banking customers using data mining algorithms

    Directory of Open Access Journals (Sweden)

    Reza Radfar

    2014-03-01

    Full Text Available Classifying customers using data mining algorithms, enables banks to keep old customers loyality while attracting new ones. Using decision tree as a data mining technique, we can optimize customer classification provided that the appropriate decision tree is selected. In this article we have presented an appropriate model to classify customers who use internet banking service. The model is developed based on CRISP-DM standard and we have used real data of Sina bank’s Internet bank. In compare to other decision trees, ours is based on both optimization and accuracy factors that recognizes new potential internet banking customers using a three level classification, which is low/medium and high. This is a practical, documentary-based research. Mining customer rules enables managers to make policies based on found out patterns in order to have a better perception of what customers really desire.

  9. Mining a database of single amplified genomes from Red Sea brine pool extremophiles—improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA)

    Science.gov (United States)

    Grötzinger, Stefan W.; Alam, Intikhab; Ba Alawi, Wail; Bajic, Vladimir B.; Stingl, Ulrich; Eppinger, Jörg

    2014-01-01

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available

  10. Mining a database of single amplified genomes from Red Sea brine pool extremophiles – Improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA

    Directory of Open Access Journals (Sweden)

    Stefan Wolfgang Grötzinger

    2014-04-01

    Full Text Available Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs and poor homology of novel extremophile’s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the INDIGO data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes may translate into false positives when searching for specific functions. The Profile & Pattern Matching (PPM strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO-terms (which represent enzyme function profiles and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern. The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2,577 E.C. numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from 6 different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter and PROSITE IDs (pattern filter. Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website.

  11. Mining a database of single amplified genomes from Red Sea brine pool extremophiles-improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA).

    KAUST Repository

    Grötzinger, Stefan W.

    2014-04-07

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile\\'s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available

  12. An algorithm of opinion leaders mining based on signed network

    Science.gov (United States)

    Cao, Linlin; Zheng, Mingchun; Zhang, Yuanyuan; Zhang, Fuming

    2018-04-01

    With the rapid development of mobile Internet, user gradually become the leader of social media, the abruptly rise of new media has changed the traditional information's dissemination pattern and regularity. There is new era significance of opinion leaders, gatekeepers in the classical theory of mass communication, and it has further expansion and extension to a certain extent. In the existing mining of opinion leaders, it is mainly from the research of network structure and user behavior without considering an important attribute: whether the user has a real impact. In this paper, we take the symbolic network as the research tool, by giving symbol which correspondingly represents support or oppose to the link about point of view relationship between users and combining traditional algorithms of mining with symbolism which can describe the change of view between users, we will get the opinion leader who has real impact on users, then the result is more accurate and effective.

  13. Recommending Learning Activities in Social Network Using Data Mining Algorithms

    Science.gov (United States)

    Mahnane, Lamia

    2017-01-01

    In this paper, we show how data mining algorithms (e.g. Apriori Algorithm (AP) and Collaborative Filtering (CF)) is useful in New Social Network (NSN-AP-CF). "NSN-AP-CF" processes the clusters based on different learning styles. Next, it analyzes the habits and the interests of the users through mining the frequent episodes by the…

  14. An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences.

    Science.gov (United States)

    Ye, Kai; Kosters, Walter A; Ijzerman, Adriaan P

    2007-03-15

    Pattern discovery in protein sequences is often based on multiple sequence alignments (MSA). The procedure can be computationally intensive and often requires manual adjustment, which may be particularly difficult for a set of deviating sequences. In contrast, two algorithms, PRATT2 (http//www.ebi.ac.uk/pratt/) and TEIRESIAS (http://cbcsrv.watson.ibm.com/) are used to directly identify frequent patterns from unaligned biological sequences without an attempt to align them. Here we propose a new algorithm with more efficiency and more functionality than both PRATT2 and TEIRESIAS, and discuss some of its applications to G protein-coupled receptors, a protein family of important drug targets. In this study, we designed and implemented six algorithms to mine three different pattern types from either one or two datasets using a pattern growth approach. We compared our approach to PRATT2 and TEIRESIAS in efficiency, completeness and the diversity of pattern types. Compared to PRATT2, our approach is faster, capable of processing large datasets and able to identify the so-called type III patterns. Our approach is comparable to TEIRESIAS in the discovery of the so-called type I patterns but has additional functionality such as mining the so-called type II and type III patterns and finding discriminating patterns between two datasets. The source code for pattern growth algorithms and their pseudo-code are available at http://www.liacs.nl/home/kosters/pg/.

  15. Data mining theories, algorithms, and examples

    CERN Document Server

    Ye, Nong

    2013-01-01

    AN OVERVIEW OF DATA MINING METHODOLOGIESIntroduction to data mining methodologiesMETHODOLOGIES FOR MINING CLASSIFICATION AND PREDICTION PATTERNSRegression modelsBayes classifiersDecision treesMulti-layer feedforward artificial neural networksSupport vector machinesSupervised clusteringMETHODOLOGIES FOR MINING CLUSTERING AND ASSOCIATION PATTERNSHierarchical clusteringPartitional clusteringSelf-organized mapProbability distribution estimationAssociation rulesBayesian networksMETHODOLOGIES FOR MINING DATA REDUCTION PATTERNSPrincipal components analysisMulti-dimensional scalingLatent variable anal

  16. Mining of high utility-probability sequential patterns from uncertain databases.

    Directory of Open Access Journals (Sweden)

    Binbin Zhang

    Full Text Available High-utility sequential pattern mining (HUSPM has become an important issue in the field of data mining. Several HUSPM algorithms have been designed to mine high-utility sequential patterns (HUPSPs. They have been applied in several real-life situations such as for consumer behavior analysis and event detection in sensor networks. Nonetheless, most studies on HUSPM have focused on mining HUPSPs in precise data. But in real-life, uncertainty is an important factor as data is collected using various types of sensors that are more or less accurate. Hence, data collected in a real-life database can be annotated with existing probabilities. This paper presents a novel pattern mining framework called high utility-probability sequential pattern mining (HUPSPM for mining high utility-probability sequential patterns (HUPSPs in uncertain sequence databases. A baseline algorithm with three optional pruning strategies is presented to mine HUPSPs. Moroever, to speed up the mining process, a projection mechanism is designed to create a database projection for each processed sequence, which is smaller than the original database. Thus, the number of unpromising candidates can be greatly reduced, as well as the execution time for mining HUPSPs. Substantial experiments both on real-life and synthetic datasets show that the designed algorithm performs well in terms of runtime, number of candidates, memory usage, and scalability for different minimum utility and minimum probability thresholds.

  17. Large-Scale Constraint-Based Pattern Mining

    Science.gov (United States)

    Zhu, Feida

    2009-01-01

    We studied the problem of constraint-based pattern mining for three different data formats, item-set, sequence and graph, and focused on mining patterns of large sizes. Colossal patterns in each data formats are studied to discover pruning properties that are useful for direct mining of these patterns. For item-set data, we observed robustness of…

  18. SPATIO-TEMPORAL PATTERN MINING ON TRAJECTORY DATA USING ARM

    Directory of Open Access Journals (Sweden)

    S. Khoshahval

    2017-09-01

    Full Text Available Preliminary mobile was considered to be a device to make human connections easier. But today the consumption of this device has been evolved to a platform for gaming, web surfing and GPS-enabled application capabilities. Embedding GPS in handheld devices, altered them to significant trajectory data gathering facilities. Raw GPS trajectory data is a series of points which contains hidden information. For revealing hidden information in traces, trajectory data analysis is needed. One of the most beneficial concealed information in trajectory data is user activity patterns. In each pattern, there are multiple stops and moves which identifies users visited places and tasks. This paper proposes an approach to discover user daily activity patterns from GPS trajectories using association rules. Finding user patterns needs extraction of user’s visited places from stops and moves of GPS trajectories. In order to locate stops and moves, we have implemented a place recognition algorithm. After extraction of visited points an advanced association rule mining algorithm, called Apriori was used to extract user activity patterns. This study outlined that there are useful patterns in each trajectory that can be emerged from raw GPS data using association rule mining techniques in order to find out about multiple users’ behaviour in a system and can be utilized in various location-based applications.

  19. Spatio-Temporal Pattern Mining on Trajectory Data Using Arm

    Science.gov (United States)

    Khoshahval, S.; Farnaghi, M.; Taleai, M.

    2017-09-01

    Preliminary mobile was considered to be a device to make human connections easier. But today the consumption of this device has been evolved to a platform for gaming, web surfing and GPS-enabled application capabilities. Embedding GPS in handheld devices, altered them to significant trajectory data gathering facilities. Raw GPS trajectory data is a series of points which contains hidden information. For revealing hidden information in traces, trajectory data analysis is needed. One of the most beneficial concealed information in trajectory data is user activity patterns. In each pattern, there are multiple stops and moves which identifies users visited places and tasks. This paper proposes an approach to discover user daily activity patterns from GPS trajectories using association rules. Finding user patterns needs extraction of user's visited places from stops and moves of GPS trajectories. In order to locate stops and moves, we have implemented a place recognition algorithm. After extraction of visited points an advanced association rule mining algorithm, called Apriori was used to extract user activity patterns. This study outlined that there are useful patterns in each trajectory that can be emerged from raw GPS data using association rule mining techniques in order to find out about multiple users' behaviour in a system and can be utilized in various location-based applications.

  20. Useful Pattern Mining on Time Series

    DEFF Research Database (Denmark)

    Goumatianos, Nikitas; Christou, Ioannis T; Lindgren, Peter

    2013-01-01

    We present the architecture of a “useful pattern” mining system that is capable of detecting thousands of different candlestick sequence patterns at the tick or any higher granularity levels. The system architecture is highly distributed and performs most of its highly compute-intensive aggregation...... calculations as complex but efficient distributed SQL queries on the relational databases that store the time-series. We present initial results from mining all frequent candlestick sequences with the characteristic property that when they occur then, with an average at least 60% probability, they signal a 2...

  1. Zips : mining compressing sequential patterns in streams

    NARCIS (Netherlands)

    Hoang, T.L.; Calders, T.G.K.; Yang, J.; Mörchen, F.; Fradkin, D.; Chau, D.H.; Vreeken, J.; Leeuwen, van M.; Faloutsos, C.

    2013-01-01

    We propose a streaming algorithm, based on the minimal description length (MDL) principle, for extracting non-redundant sequential patterns. For static databases, the MDL-based approach that selects patterns based on their capacity to compress data rather than their frequency, was shown to be

  2. Towards an evaluation framework for process mining algorithms

    NARCIS (Netherlands)

    Rozinat, A.; Alves De Medeiros, A.K.; Günther, C.W.; Weijters, A.J.M.M.; Aalst, van der W.M.P.

    2007-01-01

    Although there has been a lot of progress in developing process mining algorithms in recent years, no effort has been put in developing a common means of assessing the quality of the models discovered by these algorithms. In this paper, we outline elements of an evaluation framework that is intended

  3. Randomized algorithms in automatic control and data mining

    CERN Document Server

    Granichin, Oleg; Toledano-Kitai, Dvora

    2015-01-01

    In the fields of data mining and control, the huge amount of unstructured data and the presence of uncertainty in system descriptions have always been critical issues. The book Randomized Algorithms in Automatic Control and Data Mining introduces the readers to the fundamentals of randomized algorithm applications in data mining (especially clustering) and in automatic control synthesis. The methods proposed in this book guarantee that the computational complexity of classical algorithms and the conservativeness of standard robust control techniques will be reduced. It is shown that when a problem requires "brute force" in selecting among options, algorithms based on random selection of alternatives offer good results with certain probability for a restricted time and significantly reduce the volume of operations.

  4. pubmed.mineR: An R package with text-mining algorithms to ...

    Indian Academy of Sciences (India)

    2015-09-29

    Sep 29, 2015 ... using text-mining algorithms for biomedical research pur- poses. ... studies are described to illustrate some potential uses of ... This is the most applied task. ... other alphabets (for example, Greek alphabets) and hyphens.

  5. A New Fast Vertical Method for Mining Frequent Patterns

    Directory of Open Access Journals (Sweden)

    Zhihong Deng

    2010-12-01

    Full Text Available Vertical mining methods are very effective for mining frequent patterns and usually outperform horizontal mining methods. However, the vertical methods become ineffective since the intersection time starts to be costly when the cardinality of tidset (tid-list or diffset is very large or there are a very large number of transactions. In this paper, we propose a novel vertical algorithm called PPV for fast frequent pattern discovery. PPV works based on a data structure called Node-lists, which is obtained from a coding prefix-tree called PPC-tree. The efficiency of PPV is achieved with three techniques. First, the Node-list is much more compact compared with previous proposed vertical structure (such as tid-lists or diffsets since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of support is transformed into the intersection of Node-lists and the complexity of intersecting two Node-lists can be reduced to O(m+n by an efficient strategy, where m and n are the cardinalities of the two Node-lists respectively. Third, the ancestor-descendant relationship of two nodes, which is the basic step of intersecting Node-lists, can be very efficiently verified by Pre-Post codes of nodes. We experimentally compare our algorithm with FP-growth, and two prominent vertical algorithms (Eclat and dEclat on a number of databases. The experimental results show that PPV is an efficient algorithm that outperforms FP-growth, Eclat, and dEclat.

  6. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

    Directory of Open Access Journals (Sweden)

    Md. Rezaul Karim

    2012-03-01

    Full Text Available Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.

  7. Analysing Customer Opinions with Text Mining Algorithms

    Science.gov (United States)

    Consoli, Domenico

    2009-08-01

    Knowing what the customer thinks of a particular product/service helps top management to introduce improvements in processes and products, thus differentiating the company from their competitors and gain competitive advantages. The customers, with their preferences, determine the success or failure of a company. In order to know opinions of the customers we can use technologies available from the web 2.0 (blog, wiki, forums, chat, social networking, social commerce). From these web sites, useful information must be extracted, for strategic purposes, using techniques of sentiment analysis or opinion mining.

  8. An Application of Data Mining Algorithms for Shipbuilding Cost Estimation

    NARCIS (Netherlands)

    Kaluzny, B.L.; Barbici, S.; Berg, G.; Chiomento, R.; Derpanis,D.; Jonsson, U.; Shaw, R.H.A.D.; Smit, M.C.; Ramaroson, F.

    2011-01-01

    This article presents a novel application of known data mining algorithms to the problem of estimating the cost of ship development and construction. The work is a product of North Atlantic Treaty Organization Research and Technology Organization Systems Analysis and Studies 076 Task Group “NATO

  9. Predicting mining activity with parallel genetic algorithms

    Science.gov (United States)

    Talaie, S.; Leigh, R.; Louis, S.J.; Raines, G.L.; Beyer, H.G.; O'Reilly, U.M.; Banzhaf, Arnold D.; Blum, W.; Bonabeau, C.; Cantu-Paz, E.W.; ,; ,

    2005-01-01

    We explore several different techniques in our quest to improve the overall model performance of a genetic algorithm calibrated probabilistic cellular automata. We use the Kappa statistic to measure correlation between ground truth data and data predicted by the model. Within the genetic algorithm, we introduce a new evaluation function sensitive to spatial correctness and we explore the idea of evolving different rule parameters for different subregions of the land. We reduce the time required to run a simulation from 6 hours to 10 minutes by parallelizing the code and employing a 10-node cluster. Our empirical results suggest that using the spatially sensitive evaluation function does indeed improve the performance of the model and our preliminary results also show that evolving different rule parameters for different regions tends to improve overall model performance. Copyright 2005 ACM.

  10. Genetic algorithms in loading pattern optimization

    International Nuclear Information System (INIS)

    Yilmazbayhan, A.; Tombakoglu, M.; Bekar, K. B.; Erdemli, A. Oe

    2001-01-01

    Genetic Algorithm (GA) based systems are used for the loading pattern optimization. The use of Genetic Algorithm operators such as regional crossover, crossover and mutation, and selection of initial population size for PWRs are discussed. Antithetic variates are used to generate the initial population. The performance of GA with antithetic variates is compared to traditional GA. The results of multi-cycle optimization are discussed for objective function taking into account cycle burn-up and discharge burn-up

  11. A Survey on Accessing Data over Cloud Environment using Data mining Algorithms

    OpenAIRE

    B.Prasanalakshmi; A.Selvaraj

    2015-01-01

    In today's world to access the large set of data is more complex, because the data may be structured and unstructured like in the form of text, images, videos, etc., it cannot be controlled from the internet users this is known as Big data. Useful data can be accessed through extracting from big data with the help of data mining algorithms. Data mining is a technique for determine the patterns; classify the data, clustering from the large set of data. In this paper we will discuss how large s...

  12. Rare itemsets mining algorithm based on RP-Tree and spark framework

    Science.gov (United States)

    Liu, Sainan; Pan, Haoan

    2018-05-01

    For the issues of the rare itemsets mining in big data, this paper proposed a rare itemsets mining algorithm based on RP-Tree and Spark framework. Firstly, it arranged the data vertically according to the transaction identifier, in order to solve the defects of scan the entire data set, the vertical datasets are divided into frequent vertical datasets and rare vertical datasets. Then, it adopted the RP-Tree algorithm to construct the frequent pattern tree that contains rare items and generate rare 1-itemsets. After that, it calculated the support of the itemsets by scanning the two vertical data sets, finally, it used the iterative process to generate rare itemsets. The experimental show that the algorithm can effectively excavate rare itemsets and have great superiority in execution time.

  13. Algorithms for adaptive nonlinear pattern recognition

    Science.gov (United States)

    Schmalz, Mark S.; Ritter, Gerhard X.; Hayden, Eric; Key, Gary

    2011-09-01

    In Bayesian pattern recognition research, static classifiers have featured prominently in the literature. A static classifier is essentially based on a static model of input statistics, thereby assuming input ergodicity that is not realistic in practice. Classical Bayesian approaches attempt to circumvent the limitations of static classifiers, which can include brittleness and narrow coverage, by training extensively on a data set that is assumed to cover more than the subtense of expected input. Such assumptions are not realistic for more complex pattern classification tasks, for example, object detection using pattern classification applied to the output of computer vision filters. In contrast, we have developed a two step process, that can render the majority of static classifiers adaptive, such that the tracking of input nonergodicities is supported. Firstly, we developed operations that dynamically insert (or resp. delete) training patterns into (resp. from) the classifier's pattern database, without requiring that the classifier's internal representation of its training database be completely recomputed. Secondly, we developed and applied a pattern replacement algorithm that uses the aforementioned pattern insertion/deletion operations. This algorithm is designed to optimize the pattern database for a given set of performance measures, thereby supporting closed-loop, performance-directed optimization. This paper presents theory and algorithmic approaches for the efficient computation of adaptive linear and nonlinear pattern recognition operators that use our pattern insertion/deletion technology - in particular, tabular nearest-neighbor encoding (TNE) and lattice associative memories (LAMs). Of particular interest is the classification of nonergodic datastreams that have noise corruption with time-varying statistics. The TNE and LAM based classifiers discussed herein have been successfully applied to the computation of object classification in hyperspectral

  14. An Optimization Routing Algorithm for Green Communication in Underground Mines

    Directory of Open Access Journals (Sweden)

    Heng Xu

    2018-06-01

    Full Text Available With the long-term dependence of humans on ore-based energy, underground mines are utilized around the world, and underground mining is often dangerous. Therefore, many underground mines have established networks that manage and acquire information from sensor nodes deployed on miners and in other places. Since the power supplies of many mobile sensor nodes are batteries, green communication is an effective approach of reducing the energy consumption of a network and extending its longevity. To reduce the energy consumption of networks, all factors that negatively influence the lifetime should be considered. The degree constraint minimum spanning tree (DCMST is introduced in this study to consider all the heterogeneous factors and assign weights for the next step of the evaluation. Then, a genetic algorithm (GA is introduced to cluster sensor nodes in the network and balance energy consumption according to several heterogeneous factors and routing paths from DCMST. Based on a comparison of the simulation results, the optimization routing algorithm proposed in this study for use in green communication in underground mines can effectively reduce the network energy consumption and extend the lifetimes of networks.

  15. MINING ON CAR DATABASE EMPLOYING LEARNING AND CLUSTERING ALGORITHMS

    OpenAIRE

    Muhammad Rukunuddin Ghalib; Shivam Vohra; Sunish Vohra; Akash Juneja

    2013-01-01

    In data mining, classification is a form of data analysis that can be used to extract models describing important data classes. Two of the known learning algorithms used are Naïve Bayesian (NB) and SMO (Self-Minimal-Optimisation) .Thus the following two learning algorithms are used on a Car review database and thus a model is hence created which predicts the characteristic of a review comment after getting trained. It was found that model successfully predicted correctly about the review comm...

  16. Mining the National Career Assessment Examination Result Using Clustering Algorithm

    Science.gov (United States)

    Pagudpud, M. V.; Palaoag, T. T.; Padirayon, L. M.

    2018-03-01

    Education is an essential process today which elicits authorities to discover and establish innovative strategies for educational improvement. This study applied data mining using clustering technique for knowledge extraction from the National Career Assessment Examination (NCAE) result in the Division of Quirino. The NCAE is an examination given to all grade 9 students in the Philippines to assess their aptitudes in the different domains. Clustering the students is helpful in identifying students’ learning considerations. With the use of the RapidMiner tool, clustering algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), k-means, k-medoid, expectation maximization clustering, and support vector clustering algorithms were analyzed. The silhouette indexes of the said clustering algorithms were compared, and the result showed that the k-means algorithm with k = 3 and silhouette index equal to 0.196 is the most appropriate clustering algorithm to group the students. Three groups were formed having 477 students in the determined group (cluster 0), 310 proficient students (cluster 1) and 396 developing students (cluster 2). The data mining technique used in this study is essential in extracting useful information from the NCAE result to better understand the abilities of students which in turn is a good basis for adopting teaching strategies.

  17. Loading pattern optimization using ant colony algorithm

    International Nuclear Information System (INIS)

    Hoareau, Fabrice

    2008-01-01

    Electricite de France (EDF) operates 58 nuclear power plants (NPP), of the Pressurized Water Reactor type. The loading pattern optimization of these NPP is currently done by EDF expert engineers. Within this framework, EDF R and D has developed automatic optimization tools that assist the experts. LOOP is an industrial tool, developed by EDF R and D and based on a simulated annealing algorithm. In order to improve the results of such automatic tools, new optimization methods have to be tested. Ant Colony Optimization (ACO) algorithms are recent methods that have given very good results on combinatorial optimization problems. In order to evaluate the performance of such methods on loading pattern optimization, direct comparisons between LOOP and a mock-up based on the Max-Min Ant System algorithm (a particular variant of ACO algorithms) were made on realistic test-cases. It is shown that the results obtained by the ACO mock-up are very similar to those of LOOP. Future research will consist in improving these encouraging results by using parallelization and by hybridizing the ACO algorithm with local search procedures. (author)

  18. Mining Co-Location Patterns with Clustering Items from Spatial Data Sets

    Science.gov (United States)

    Zhou, G.; Li, Q.; Deng, G.; Yue, T.; Zhou, X.

    2018-05-01

    The explosive growth of spatial data and widespread use of spatial databases emphasize the need for the spatial data mining. Co-location patterns discovery is an important branch in spatial data mining. Spatial co-locations represent the subsets of features which are frequently located together in geographic space. However, the appearance of a spatial feature C is often not determined by a single spatial feature A or B but by the two spatial features A and B, that is to say where A and B appear together, C often appears. We note that this co-location pattern is different from the traditional co-location pattern. Thus, this paper presents a new concept called clustering terms, and this co-location pattern is called co-location patterns with clustering items. And the traditional algorithm cannot mine this co-location pattern, so we introduce the related concept in detail and propose a novel algorithm. This algorithm is extended by join-based approach proposed by Huang. Finally, we evaluate the performance of this algorithm.

  19. Web Usage Mining, Pattern Discovery dan Log File

    OpenAIRE

    Tri Suratno; Toni Prahasto; Adian Fatchur Rochim

    2014-01-01

    Analysis  of  data  to  access  the  server  can  provide  significant  and  useful  information  for  performance  improvement,  restructuring  andimproving the effectiveness of a web site. Data mining is one of the most effective way to detect a series of patterns of information from large amounts of data. Application of  data mining  on  Internet use  called web  mining  is a set of  data mining  techniques  are  used  for the web. Web mining technologies and data mining is a combination o...

  20. Hospitalization patterns associated with Appalachian coal mining.

    Science.gov (United States)

    Hendryx, Michael; Ahern, Melissa M; Nurkiewicz, Timothy R

    2007-12-01

    The goal of this study was to test whether the volume of coal mining was related to population hospitalization risk for diseases postulated to be sensitive or insensitive to coal mining by-products. The study was a retrospective analysis of 2001 adult hospitalization data (n = 93,952) for West Virginia, Kentucky, and Pennsylvania, merged with county-level coal production figures. Hospitalization data were obtained from the Health Care Utilization Project National Inpatient Sample. Diagnoses postulated to be sensitive to coal mining by-product exposure were contrasted with diagnoses postulated to be insensitive to exposure. Data were analyzed using hierarchical nonlinear models, controlling for patient age, gender, insurance, comorbidities, hospital teaching status, county poverty, and county social capital. Controlling for covariates, the volume of coal mining was significantly related to hospitalization risk for two conditions postulated to be sensitive to exposure: hypertension and chronic obstructive pulmonary disease (COPD). The odds for a COPD hospitalization increased 1% for each 1462 tons of coal, and the odds for a hypertension hospitalization increased 1% for each 1873 tons of coal. Other conditions were not related to mining volume. Exposure to particulates or other pollutants generated by coal mining activities may be linked to increased risk of COPD and hypertension hospitalizations. Limitations in the data likely result in an underestimate of associations.

  1. Application of Data Mining Algorithm to Recipient of Motorcycle Installment

    Directory of Open Access Journals (Sweden)

    Harry Dhika

    2015-12-01

    Full Text Available The study was conducted in the subsidiaries that provide services of finance related to the purchase of a motorcycle on credit. At the time of applying, consumers enter their personal data. Based on the personal data, it will be known whether the consumer credit data is approved or rejected. From 224 consumer data obtained, it is known that the number of consumers whose applications are approved is 87% or about 217 consumers and consumers whose application is rejected is 16% or as much as 6 consumers. Acceptance of motorcycle financing on credit by using the method of applying the algorithm through CRIS-P DM is the industry standard in the processing of data mining. The algorithm used in the decision making is the algorithm C4.5. The results obtained previously, the level of accuracy is measured with the Confusion Matrix and Receiver Operating characteristic (ROC. Evaluation of the Confusion Matrix is intended to seek the value of accuracy, precision value, and the value of recall data. While the Receiver Operating Characteristic (ROC is used to find data tables and comparison Area Under Curve (AUC.

  2. Mining for Social Media: Usage Patterns of Small Businesses

    OpenAIRE

    Balan, Shilpa; Rege, Janhavi

    2017-01-01

    Background: Information can now be rapidly exchanged due to social media. Due to its openness, Twitter has generated massive amounts of data. In this paper, we apply data mining and analytics to extract the usage patterns of social media by small businesses. Objectives: The aim of this paper is to describe with an example how data mining can be applied to social media. This paper further examines the impact of social media on small businesses. The Twitter posts related to small businesses are...

  3. AN EFFECTIVE RECOMMENDATIONS BY DIFFUSION ALGORITHM FOR WEB GRAPH MINING

    Directory of Open Access Journals (Sweden)

    S. Vasukipriya

    2013-04-01

    Full Text Available The information on the World Wide Web grows in an explosive rate. Societies are relying more on the Web for their miscellaneous needs of information. Recommendation systems are active information filtering systems that attempt to present the information items like movies, music, images, books recommendations, tags recommendations, query suggestions, etc., to the users. Various kinds of data bases are used for the recommendations; fundamentally these data bases can be molded in the form of many types of graphs. Aiming at provided that a general framework on effective DR (Recommendations by Diffusion algorithm for web graphs mining. First introduce a novel graph diffusion model based on heat diffusion. This method can be applied to both undirected graphs and directed graphs. Then it shows how to convert different Web data sources into correct graphs in our models.

  4. Ecosystem Health Assessment of Mining Cities Based on Landscape Pattern

    Science.gov (United States)

    Yu, W.; Liu, Y.; Lin, M.; Fang, F.; Xiao, R.

    2017-09-01

    Ecosystem health assessment (EHA) is one of the most important aspects in ecosystem management. Nowadays, ecological environment of mining cities is facing various problems. In this study, through ecosystem health theory and remote sensing images in 2005, 2009 and 2013, landscape pattern analysis and Vigor-Organization-Resilience (VOR) model were applied to set up an evaluation index system of ecosystem health of mining city to assess the healthy level of ecosystem in Panji District Huainan city. Results showed a temporal stable but high spatial heterogeneity landscape pattern during 2005-2013. According to the regional ecosystem health index, it experienced a rapid decline after a slight increase, and finally it maintained at an ordinary level. Among these areas, a significant distinction was presented in different towns. It indicates that the ecosystem health of Tianjijiedao town, the regional administrative centre, descended rapidly during the study period, and turned into the worst level in the study area. While the Hetuan Town, located in the northwestern suburb area of Panji District, stayed on a relatively better level than other towns. The impacts of coal mining collapse area, land reclamation on the landscape pattern and ecosystem health status of mining cities were also discussed. As a result of underground coal mining, land subsidence has become an inevitable problem in the study area. In addition, the coal mining subsidence area has brought about the destruction of the farmland, construction land and water bodies, which causing the change of the regional landscape pattern and making the evaluation of ecosystem health in mining area more difficult. Therefore, this study provided an ecosystem health approach for relevant departments to make scientific decisions.

  5. A genetic algorithm approach to recognition and data mining

    Energy Technology Data Exchange (ETDEWEB)

    Punch, W.F.; Goodman, E.D.; Min, Pei [Michigan State Univ., East Lansing, MI (United States)] [and others

    1996-12-31

    We review here our use of genetic algorithm (GA) and genetic programming (GP) techniques to perform {open_quotes}data mining,{close_quotes} the discovery of particular/important data within large datasets, by finding optimal data classifications using known examples. Our first experiments concentrated on the use of a K-nearest neighbor algorithm in combination with a GA. The GA selected weights for each feature so as to optimize knn classification based on a linear combination of features. This combined GA-knn approach was successfully applied to both generated and real-world data. We later extended this work by substituting a GP for the GA. The GP-knn could not only optimize data classification via linear combinations of features but also determine functional relationships among the features. This allowed for improved performance and new information on important relationships among features. We review the effectiveness of the overall approach on examples from biology and compare the effectiveness of the GA and GP.

  6. Filter Pattern Search Algorithms for Mixed Variable Constrained Optimization Problems

    National Research Council Canada - National Science Library

    Abramson, Mark A; Audet, Charles; Dennis, Jr, J. E

    2004-01-01

    .... This class combines and extends the Audet-Dennis Generalized Pattern Search (GPS) algorithms for bound constrained mixed variable optimization, and their GPS-filter algorithms for general nonlinear constraints...

  7. The Smallest Valid Extension-Based Efficient, Rare Graph Pattern Mining, Considering Length-Decreasing Support Constraints and Symmetry Characteristics of Graphs

    Directory of Open Access Journals (Sweden)

    Unil Yun

    2016-05-01

    Full Text Available Frequent graph mining has been proposed to find interesting patterns (i.e., frequent sub-graphs from databases composed of graph transaction data, which can effectively express complex and large data in the real world. In addition, various applications for graph mining have been suggested. Traditional graph pattern mining methods use a single minimum support threshold factor in order to check whether or not mined patterns are interesting. However, it is not a sufficient factor that can consider valuable characteristics of graphs such as graph sizes and features of graph elements. That is, previous methods cannot consider such important characteristics in their mining operations since they only use a fixed minimum support threshold in the mining process. For this reason, in this paper, we propose a novel graph mining algorithm that can consider various multiple, minimum support constraints according to the types of graph elements and changeable minimum support conditions, depending on lengths of graph patterns. In addition, the proposed algorithm performs in mining operations more efficiently because it can minimize duplicated operations and computational overheads by considering symmetry features of graphs. Experimental results provided in this paper demonstrate that the proposed algorithm outperforms previous mining approaches in terms of pattern generation, runtime and memory usage.

  8. Mining algorithm for association rules in big data based on Hadoop

    Science.gov (United States)

    Fu, Chunhua; Wang, Xiaojing; Zhang, Lijun; Qiao, Liying

    2018-04-01

    In order to solve the problem that the traditional association rules mining algorithm has been unable to meet the mining needs of large amount of data in the aspect of efficiency and scalability, take FP-Growth as an example, the algorithm is realized in the parallelization based on Hadoop framework and Map Reduce model. On the basis, it is improved using the transaction reduce method for further enhancement of the algorithm's mining efficiency. The experiment, which consists of verification of parallel mining results, comparison on efficiency between serials and parallel, variable relationship between mining time and node number and between mining time and data amount, is carried out in the mining results and efficiency by Hadoop clustering. Experiments show that the paralleled FP-Growth algorithm implemented is able to accurately mine frequent item sets, with a better performance and scalability. It can be better to meet the requirements of big data mining and efficiently mine frequent item sets and association rules from large dataset.

  9. A Framework for Mining Actionable Navigation Patterns from In-Store RFID Datasets via Indoor Mapping

    Directory of Open Access Journals (Sweden)

    Bin Shen

    2015-03-01

    Full Text Available With the quick development of RFID technology and the decreasing prices of RFID devices, RFID is becoming widely used in various intelligent services. Especially in the retail application domain, RFID is increasingly adopted to capture the shopping tracks and behavior of in-store customers. To further enhance the potential of this promising application, in this paper, we propose a unified framework for RFID-based path analytics, which uses both in-store shopping paths and RFID-based purchasing data to mine actionable navigation patterns. Four modules of this framework are discussed, which are: (1 mapping from the physical space to the cyber space, (2 data preprocessing, (3 pattern mining and (4 knowledge understanding and utilization. In the data preprocessing module, the critical problem of how to capture the mainstream shopping path sequences while wiping out unnecessary redundant and repeated details is addressed in detail. To solve this problem, two types of redundant patterns, i.e., loop repeat pattern and palindrome-contained pattern are recognized and the corresponding processing algorithms are proposed. The experimental results show that the redundant pattern filtering functions are effective and scalable. Overall, this work builds a bridge between indoor positioning and advanced data mining technologies, and provides a feasible way to study customers’ shopping behaviors via multi-source RFID data.

  10. A Framework for Mining Actionable Navigation Patterns from In-Store RFID Datasets via Indoor Mapping

    Science.gov (United States)

    Shen, Bin; Zheng, Qiuhua; Li, Xingsen; Xu, Libo

    2015-01-01

    With the quick development of RFID technology and the decreasing prices of RFID devices, RFID is becoming widely used in various intelligent services. Especially in the retail application domain, RFID is increasingly adopted to capture the shopping tracks and behavior of in-store customers. To further enhance the potential of this promising application, in this paper, we propose a unified framework for RFID-based path analytics, which uses both in-store shopping paths and RFID-based purchasing data to mine actionable navigation patterns. Four modules of this framework are discussed, which are: (1) mapping from the physical space to the cyber space, (2) data preprocessing, (3) pattern mining and (4) knowledge understanding and utilization. In the data preprocessing module, the critical problem of how to capture the mainstream shopping path sequences while wiping out unnecessary redundant and repeated details is addressed in detail. To solve this problem, two types of redundant patterns, i.e., loop repeat pattern and palindrome-contained pattern are recognized and the corresponding processing algorithms are proposed. The experimental results show that the redundant pattern filtering functions are effective and scalable. Overall, this work builds a bridge between indoor positioning and advanced data mining technologies, and provides a feasible way to study customers’ shopping behaviors via multi-source RFID data. PMID:25751076

  11. Mining Temporal Patterns to Improve Agents Behavior: Two Case Studies

    Science.gov (United States)

    Fournier-Viger, Philippe; Nkambou, Roger; Faghihi, Usef; Nguifo, Engelbert Mephu

    We propose two mechanisms for agent learning based on the idea of mining temporal patterns from agent behavior. The first one consists of extracting temporal patterns from the perceived behavior of other agents accomplishing a task, to learn the task. The second learning mechanism consists in extracting temporal patterns from an agent's own behavior. In this case, the agent then reuses patterns that brought self-satisfaction. In both cases, no assumption is made on how the observed agents' behavior is internally generated. A case study with a real application is presented to illustrate each learning mechanism.

  12. Brick: Mining Pedagogically Interesting Sequential Patterns

    NARCIS (Netherlands)

    Anjewierden, Anjo; Gijlers, Hannie; Saab, Nadira; de Hoog, Robert; Pechenizkiy, Mykola; Calders, Toon; Conati, Cristina; Ventura, Sebastian; Romero, Cristobal; Stamper, John

    2011-01-01

    One of the goals of the SCY project (www.scy-net.eu) is to make (inquiry) learning environments adaptive. The idea is to develop “pedagogical agents” that monitor learner behaviour through the actions they perform and identify patterns that point to systematic behaviour, or lack thereof. To achieve

  13. Study on the Method of Association Rules Mining Based on Genetic Algorithm and Application in Analysis of Seawater Samples

    Directory of Open Access Journals (Sweden)

    Qiuhong Sun

    2014-04-01

    Full Text Available Based on the data mining research, the data mining based on genetic algorithm method, the genetic algorithm is briefly introduced, while the genetic algorithm based on two important theories and theoretical templates principle implicit parallelism is also discussed. Focuses on the application of genetic algorithms for association rule mining method based on association rule mining, this paper proposes a genetic algorithm fitness function structure, data encoding, such as the title of the improvement program, in particular through the early issues study, proposed the improved adaptive Pc, Pm algorithm is applied to the genetic algorithm, thereby improving efficiency of the algorithm. Finally, a genetic algorithm based association rule mining algorithm, and be applied in sea water samples database in data mining and prove its effective.

  14. Using an improved association rules mining optimization algorithm in web-based mobile-learning system

    Science.gov (United States)

    Huang, Yin; Chen, Jianhua; Xiong, Shaojun

    2009-07-01

    Mobile-Learning (M-learning) makes many learners get the advantages of both traditional learning and E-learning. Currently, Web-based Mobile-Learning Systems have created many new ways and defined new relationships between educators and learners. Association rule mining is one of the most important fields in data mining and knowledge discovery in databases. Rules explosion is a serious problem which causes great concerns, as conventional mining algorithms often produce too many rules for decision makers to digest. Since Web-based Mobile-Learning System collects vast amounts of student profile data, data mining and knowledge discovery techniques can be applied to find interesting relationships between attributes of learners, assessments, the solution strategies adopted by learners and so on. Therefore ,this paper focus on a new data-mining algorithm, combined with the advantages of genetic algorithm and simulated annealing algorithm , called ARGSA(Association rules based on an improved Genetic Simulated Annealing Algorithm), to mine the association rules. This paper first takes advantage of the Parallel Genetic Algorithm and Simulated Algorithm designed specifically for discovering association rules. Moreover, the analysis and experiment are also made to show the proposed method is superior to the Apriori algorithm in this Mobile-Learning system.

  15. Comparison analysis for classification algorithm in data mining and the study of model use

    Science.gov (United States)

    Chen, Junde; Zhang, Defu

    2018-04-01

    As a key technique in data mining, classification algorithm was received extensive attention. Through an experiment of classification algorithm in UCI data set, we gave a comparison analysis method for the different algorithms and the statistical test was used here. Than that, an adaptive diagnosis model for preventive electricity stealing and leakage was given as a specific case in the paper.

  16. A new taxonomy of sublinear keyword pattern matching algorithms

    NARCIS (Netherlands)

    Cleophas, L.G.W.A.; Watson, B.W.; Zwaan, G.

    2004-01-01

    Abstract This paper presents a new taxonomy of sublinear (multiple) keyword pattern matching algorithms. Based on an earlier taxonomy by Watson and Zwaan [WZ96, WZ95], this new taxonomy includes not only suffix-based algorithms related to the Boyer-Moore, Commentz-Walter and Fan-Su algorithms, but

  17. A hybrid heuristic algorithm for the open-pit-mining operational planning problem.

    OpenAIRE

    Souza, Marcone Jamilson Freitas; Coelho, Igor Machado; Ribas, Sabir; Santos, Haroldo Gambini; Merschmann, Luiz Henrique de Campos

    2010-01-01

    This paper deals with the Open-Pit-Mining Operational Planning problem with dynamic truck allocation. The objective is to optimize mineral extraction in the mines by minimizing the number of mining trucks used to meet production goals and quality requirements. According to the literature, this problem is NPhard, so a heuristic strategy is justified. We present a hybrid algorithm that combines characteristics of two metaheuristics: Greedy Randomized Adaptive Search Procedures and General Varia...

  18. On the Suitability of Genetic-Based Algorithms for Data Mining

    NARCIS (Netherlands)

    Choenni, R.S.

    1998-01-01

    Data mining has as goal to extract knowledge from large databases. A database may be considered as a search space consisting of an enormous number of elements, and a mining algorithm as a search strategy. In general, an exhaustive search of the space is infeasible. Therefore, efficient search

  19. pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.

    Science.gov (United States)

    Rani, Jyoti; Shah, A B Rauf; Ramachandran, Srinivasan

    2015-10-01

    The PubMed literature database is a valuable source of information for scientific research. It is rich in biomedical literature with more than 24 million citations. Data-mining of voluminous literature is a challenging task. Although several text-mining algorithms have been developed in recent years with focus on data visualization, they have limitations such as speed, are rigid and are not available in the open source. We have developed an R package, pubmed.mineR, wherein we have combined the advantages of existing algorithms, overcome their limitations, and offer user flexibility and link with other packages in Bioconductor and the Comprehensive R Network (CRAN) in order to expand the user capabilities for executing multifaceted approaches. Three case studies are presented, namely, 'Evolving role of diabetes educators', 'Cancer risk assessment' and 'Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus sizes and with compute intensive functions. The pubmed.mineR is available at http://cran.rproject. org/web/packages/pubmed.mineR.

  20. Application of a genetic algorithm to core reload pattern optimization

    International Nuclear Information System (INIS)

    Tanker, E.; Tanker, A.Z.

    1994-01-01

    A genetic algorithm is applied to reload pattern optimization of a PWR core. Evaluating all different distributions of a given batch load separately is found slow and ineffective. Allowing patterns from different distributions to combine reproduce, an optimized pattern better than that obtained from from linear programming is found, albeit in a longer time. (authors). 5 refs., 2 tabs

  1. DNA pattern recognition using canonical correlation algorithm.

    Science.gov (United States)

    Sarkar, B K; Chakraborty, Chiranjib

    2015-10-01

    We performed canonical correlation analysis as an unsupervised statistical tool to describe related views of the same semantic object for identifying patterns. A pattern recognition technique based on canonical correlation analysis (CCA) was proposed for finding required genetic code in the DNA sequence. Two related but different objects were considered: one was a particular pattern, and other was test DNA sequence. CCA found correlations between two observations of the same semantic pattern and test sequence. It is concluded that the relationship possesses maximum value in the position where the pattern exists. As a case study, the potential of CCA was demonstrated on the sequence found from HIV-1 preferred integration sites. The subsequences on the left and right flanking from the integration site were considered as the two views, and statistically significant relationships were established between these two views to elucidate the viral preference as an important factor for the correlation.

  2. Star pattern recognition algorithm aided by inertial information

    Science.gov (United States)

    Liu, Bao; Wang, Ke-dong; Zhang, Chao

    2011-08-01

    Star pattern recognition is one of the key problems of the celestial navigation. The traditional star pattern recognition approaches, such as the triangle algorithm and the star angular distance algorithm, are a kind of all-sky matching method whose recognition speed is slow and recognition success rate is not high. Therefore, the real time and reliability of CNS (Celestial Navigation System) is reduced to some extent, especially for the maneuvering spacecraft. However, if the direction of the camera optical axis can be estimated by other navigation systems such as INS (Inertial Navigation System), the star pattern recognition can be fulfilled in the vicinity of the estimated direction of the optical axis. The benefits of the INS-aided star pattern recognition algorithm include at least the improved matching speed and the improved success rate. In this paper, the direction of the camera optical axis, the local matching sky, and the projection of stars on the image plane are estimated by the aiding of INS firstly. Then, the local star catalog for the star pattern recognition is established in real time dynamically. The star images extracted in the camera plane are matched in the local sky. Compared to the traditional all-sky star pattern recognition algorithms, the memory of storing the star catalog is reduced significantly. Finally, the INS-aided star pattern recognition algorithm is validated by simulations. The results of simulations show that the algorithm's computation time is reduced sharply and its matching success rate is improved greatly.

  3. Data mining algorithms for land cover change detection: a review

    Indian Academy of Sciences (India)

    Sangram Panigrahi

    2017-11-24

    Nov 24, 2017 ... values, poor quality measurement, high resolution and high dimensional data. The land cover .... These data sets also include quality assurance information, ...... 2012 A new data mining framework for forest fire mapping.

  4. Mining Productive-Associated Periodic-Frequent Patterns in Body Sensor Data for Smart Home Care.

    Science.gov (United States)

    Ismail, Walaa N; Hassan, Mohammad Mehedi

    2017-04-26

    The understanding of various health-oriented vital sign data generated from body sensor networks (BSNs) and discovery of the associations between the generated parameters is an important task that may assist and promote important decision making in healthcare. For example, in a smart home scenario where occupants' health status is continuously monitored remotely, it is essential to provide the required assistance when an unusual or critical situation is detected in their vital sign data. In this paper, we present an efficient approach for mining the periodic patterns obtained from BSN data. In addition, we employ a correlation test on the generated patterns and introduce productive-associated periodic-frequent patterns as the set of correlated periodic-frequent items. The combination of these measures has the advantage of empowering healthcare providers and patients to raise the quality of diagnosis as well as improve treatment and smart care, especially for elderly people in smart homes. We develop an efficient algorithm named PPFP-growth (Productive Periodic-Frequent Pattern-growth) to discover all productive-associated periodic frequent patterns using these measures. PPFP-growth is efficient and the productiveness measure removes uncorrelated periodic items. An experimental evaluation on synthetic and real datasets shows the efficiency of the proposed PPFP-growth algorithm, which can filter a huge number of periodic patterns to reveal only the correlated ones.

  5. Mining Productive-Associated Periodic-Frequent Patterns in Body Sensor Data for Smart Home Care

    Directory of Open Access Journals (Sweden)

    Walaa N. Ismail

    2017-04-01

    Full Text Available The understanding of various health-oriented vital sign data generated from body sensor networks (BSNs and discovery of the associations between the generated parameters is an important task that may assist and promote important decision making in healthcare. For example, in a smart home scenario where occupants’ health status is continuously monitored remotely, it is essential to provide the required assistance when an unusual or critical situation is detected in their vital sign data. In this paper, we present an efficient approach for mining the periodic patterns obtained from BSN data. In addition, we employ a correlation test on the generated patterns and introduce productive-associated periodic-frequent patterns as the set of correlated periodic-frequent items. The combination of these measures has the advantage of empowering healthcare providers and patients to raise the quality of diagnosis as well as improve treatment and smart care, especially for elderly people in smart homes. We develop an efficient algorithm named PPFP-growth (Productive Periodic-Frequent Pattern-growth to discover all productive-associated periodic frequent patterns using these measures. PPFP-growth is efficient and the productiveness measure removes uncorrelated periodic items. An experimental evaluation on synthetic and real datasets shows the efficiency of the proposed PPFP-growth algorithm, which can filter a huge number of periodic patterns to reveal only the correlated ones.

  6. Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method

    Directory of Open Access Journals (Sweden)

    Knaus William A

    2006-03-01

    Full Text Available Abstract Background Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness, hence leaving a serious bottleneck. In this paper we propose a method for automatically acquiring knowledge to shorten the pattern list by locating the novel and interesting ones. Methods The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high "surprise score" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures the degree of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance. Results We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a large patient database and a biomedical literature citation knowledgebase. The system estimated association scores for 50,000 patterns, composed of disease entities and lab results, by querying the database and the knowledgebase. It then computed the surprise scores by comparing the pairs of association scores. Finally, the system estimated statistical significance of the scores. Conclusion The dual-mining method eliminates more than 90% of patterns with strong associations, thus identifying them as uninteresting. We found that the pruning of patterns using the surprise score matched the biomedical evidence in the 100 cases that were examined by hand. The method automates the acquisition of

  7. Mining

    Directory of Open Access Journals (Sweden)

    Khairullah Khan

    2014-09-01

    Full Text Available Opinion mining is an interesting area of research because of its applications in various fields. Collecting opinions of people about products and about social and political events and problems through the Web is becoming increasingly popular every day. The opinions of users are helpful for the public and for stakeholders when making certain decisions. Opinion mining is a way to retrieve information through search engines, Web blogs and social networks. Because of the huge number of reviews in the form of unstructured text, it is impossible to summarize the information manually. Accordingly, efficient computational methods are needed for mining and summarizing the reviews from corpuses and Web documents. This study presents a systematic literature survey regarding the computational techniques, models and algorithms for mining opinion components from unstructured reviews.

  8. AC-600 reactor reloading pattern optimization by using genetic algorithms

    International Nuclear Information System (INIS)

    Wu Hongchun; Xie Zhongsheng; Yao Dong; Li Dongsheng; Zhang Zongyao

    2000-01-01

    The use of genetic algorithms to optimize reloading pattern of the nuclear power plant reactor is proposed. And a new encoding and translating method is given. Optimization results of minimizing core power peak and maximizing cycle length for both low-leakage and out-in loading pattern of AC-600 reactor are obtained

  9. An Efficient Association Rule Hiding Algorithm for Privacy Preserving Data Mining

    OpenAIRE

    Yogendra Kumar Jain,; Vinod Kumar Yadav,; Geetika S. Panday

    2011-01-01

    The security of the large database that contains certain crucial information, it will become a serious issue when sharing data to the network against unauthorized access. Privacy preserving data mining is a new research trend in privacy data for data mining and statistical database. Association analysis is a powerful toolfor discovering relationships which are hidden in large database. Association rules hiding algorithms get strong and efficient performance for protecting confidential and cru...

  10. Mining Experiential Patterns from Game-Logs of Board Game

    Directory of Open Access Journals (Sweden)

    Liang Wang

    2015-01-01

    Full Text Available In board games, game-logs record past game processes, which can be regarded as an accumulation of experience. Similar to a real person, a computer player can gradually increase its skill by learning from game-logs. Therefore, the game becomes more interesting. This paper proposes an extensible approach to mine experiential patterns from increasing game-logs. The computer player improves its strategies by utilizing these growing patterns, just as it acquires experience. To evaluate the effect and performance of the approach, we designed a sample board game as a test platform and elaborated an experiment consisting of a series of tests. Experimental results show that our approach is effective and efficient.

  11. Mining for Social Media: Usage Patterns of Small Businesses

    Directory of Open Access Journals (Sweden)

    Balan Shilpa

    2017-03-01

    Full Text Available Background: Information can now be rapidly exchanged due to social media. Due to its openness, Twitter has generated massive amounts of data. In this paper, we apply data mining and analytics to extract the usage patterns of social media by small businesses. Objectives: The aim of this paper is to describe with an example how data mining can be applied to social media. This paper further examines the impact of social media on small businesses. The Twitter posts related to small businesses are analyzed in detail. Methods/Approach: The patterns of social media usage by small businesses are observed using IBM Watson Analytics. In this paper, we particularly analyze tweets on Twitter for the hashtag #smallbusiness. Results: It is found that the number of females posting topics related to small business on Twitter is greater than the number of males. It is also found that the number of negative posts in Twitter is relatively low. Conclusions: Small firms are beginning to understand the importance of social media to realize their business goals. For future research, further analysis can be performed on the date and time the tweets were posted.

  12. Practical mine ventilation optimization based on genetic algorithms for free splitting networks

    Energy Technology Data Exchange (ETDEWEB)

    Acuna, E.; Maynard, R.; Hall, S. [Laurentian Univ., Sudbury, ON (Canada). Mirarco Mining Innovation; Hardcastle, S.G.; Li, G. [Natural Resources Canada, Sudbury, ON (Canada). CANMET Mining and Mineral Sciences Laboratories; Lowndes, I.S. [Nottingham Univ., Nottingham (United Kingdom). Process and Environmental Research Division; Tonnos, A. [Bestech, Sudbury, ON (Canada)

    2010-07-01

    The method used to optimize the design and operation of mine ventilation has generally been based on case studies and expert knowledge. It has yet to benefit from optimization techniques used and proven in other fields of engineering. Currently, optimization of mine ventilation systems is a manual based decision process performed by an experienced mine ventilation specialist assisted by commercial ventilation distribution solvers. These analysis tools are widely used in the mining industry to evaluate the practical and economic viability of alternative ventilation system configurations. The scenario which is usually selected is the one that reports the lowest energy consumption while delivering the required airflow distribution. Since most commercial solvers do not have an integrated optimization algorithm network, the process of generating a series of potential ventilation solutions using the conventional iterative design strategy can be time consuming. For that reason, a genetic algorithm (GA) optimization routine was developed in combination with a ventilation solver to determine the potential optimal solutions of a primary mine ventilation system based on a free splitting network. The optimization method was used in a small size mine ventilation network. The technique was shown to have the capacity to generate good feasible solutions and improve upon the manual results obtained by mine ventilation specialists. 9 refs., 7 tabs., 3 figs.

  13. Research on Health State Perception Algorithm of Mining Equipment Based on Frequency Closeness

    Directory of Open Access Journals (Sweden)

    Gang Wang

    2014-06-01

    Full Text Available The health state perception of mining equipment is intended to have an online real- time knowledge and analysis of the running conditions of large mining equipments. Due to its unknown failure mode, a challenge was raised to the traditional fault diagnosis of mining equipments. A health state perception algorithm of mining equipment was introduced in this paper, and through continuous sampling of the machine vibration data, the time-series data set was set up; subsequently, the mode set based on the frequency closeness was constructed by the d neighborhood method combined with the TSDM algorithm, thus the forecast method on the basis of the dual mode set was eventually formed. In the calculation of the frequency closeness, the Goertzel algorithm was introduced to effectively decrease the computation amount. It was indicated through the simulation test on the vibration data of the drum shaft base that the health state of the device could be effectively distinguished. The algorithm has been successfully applied to equipment monitoring in the Huoer Xinhe Coal Mine of Shanxi Coal Imp&Exp. Group Co., Ltd.

  14. Monitoring, analyzing and simulating of spatial-temporal changes of landscape pattern over mining area

    Science.gov (United States)

    Liu, Pei; Han, Ruimei; Wang, Shuangting

    2014-11-01

    According to the merits of remotely sensed data in depicting regional land cover and Land changes, multi- objective information processing is employed to remote sensing images to analyze and simulate land cover in mining areas. In this paper, multi-temporal remotely sensed data were selected to monitor the pattern, distri- bution and trend of LUCC and predict its impacts on ecological environment and human settlement in mining area. The monitor, analysis and simulation of LUCC in this coal mining areas are divided into five steps. The are information integration of optical and SAR data, LULC types extraction with SVM classifier, LULC trends simulation with CA Markov model, landscape temporal changes monitoring and analysis with confusion matrixes and landscape indices. The results demonstrate that the improved data fusion algorithm could make full use of information extracted from optical and SAR data; SVM classifier has an efficient and stable ability to obtain land cover maps, which could provide a good basis for both land cover change analysis and trend simulation; CA Markov model is able to predict LULC trends with good performance, and it is an effective way to integrate remotely sensed data with spatial-temporal model for analysis of land use / cover change and corresponding environmental impacts in mining area. Confusion matrixes are combined with landscape indices to evaluation and analysis show that, there was a sustained downward trend in agricultural land and bare land, but a continues growth trend tendency in water body, forest and other lands, and building area showing a wave like change, first increased and then decreased; mining landscape has undergone a from small to large and large to small process of fragmentation, agricultural land is the strongest influenced landscape type in this area, and human activities are the primary cause, so the problem should be pay more attentions by government and other organizations.

  15. A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Spatiotemporal Association Analysis of Mobile Trajectory Big Data

    Directory of Open Access Journals (Sweden)

    Dawen Xia

    2018-01-01

    Full Text Available Frequent pattern mining is an effective approach for spatiotemporal association analysis of mobile trajectory big data in data-driven intelligent transportation systems. While existing parallel algorithms have been successfully applied to frequent pattern mining of large-scale trajectory data, two major challenges are how to overcome the inherent defects of Hadoop to cope with taxi trajectory big data including massive small files and how to discover the implicitly spatiotemporal frequent patterns with MapReduce. To conquer these challenges, this paper presents a MapReduce-based Parallel Frequent Pattern growth (MR-PFP algorithm to analyze the spatiotemporal characteristics of taxi operating using large-scale taxi trajectories with massive small file processing strategies on a Hadoop platform. More specifically, we first implement three methods, that is, Hadoop Archives (HAR, CombineFileInputFormat (CFIF, and Sequence Files (SF, to overcome the existing defects of Hadoop and then propose two strategies based on their performance evaluations. Next, we incorporate SF into Frequent Pattern growth (FP-growth algorithm and then implement the optimized FP-growth algorithm on a MapReduce framework. Finally, we analyze the characteristics of taxi operating in both spatial and temporal dimensions by MR-PFP in parallel. The results demonstrate that MR-PFP is superior to existing Parallel FP-growth (PFP algorithm in efficiency and scalability.

  16. Historical Feature Pattern Extraction Based Network Attack Situation Sensing Algorithm

    OpenAIRE

    Zeng, Yong; Liu, Dacheng; Lei, Zhou

    2014-01-01

    The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE). First, HFPE algorithm seeks similar indications from the history si...

  17. Mining Spatiotemporal Patterns of the Elder's Daily Movement

    Science.gov (United States)

    Chen, C. R.; Chen, C. F.; Liu, M. E.; Tsai, S. J.; Son, N. T.; Kinh, L. V.

    2016-06-01

    With rapid developments in wearable device technology, a vast amount of spatiotemporal data, such as people's movement and physical activities, are generated. Information derived from the data reveals important knowledge that can contribute a long-term care and psychological assessment of the elders' living condition especially in long-term care institutions. This study aims to develop a method to investigate the spatial-temporal movement patterns of the elders with their outdoor trajectory information. To achieve the goal, GPS based location data of the elderly subjects from long-term care institutions are collected and analysed with geographic information system (GIS). A GIS statistical model is developed to mine the elderly subjects' spatiotemporal patterns with the location data and represent their daily movement pattern at particular time. The proposed method first finds the meaningful trajectory and extracts the frequent patterns from the time-stamp location data. Then, a density-based clustering method is used to identify the major moving range and the gather/stay hotspot in both spatial and temporal dimensions. The preliminary results indicate that the major moving area of the elderly people encompasses their dorm and has a short moving distance who often stay in the same site. Subjects' outdoor appearance are corresponded to their life routine. The results can be useful for understanding elders' social network construction, risky area identification and medical care monitoring.

  18. Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns.

    Directory of Open Access Journals (Sweden)

    Shaoming Pan

    Full Text Available Declustering techniques are widely used in distributed environments to reduce query response time through parallel I/O by splitting large files into several small blocks and then distributing those blocks among multiple storage nodes. Unfortunately, however, many small geospatial image data files cannot be further split for distributed storage. In this paper, we propose a complete theoretical system for the distributed storage of small geospatial image data files based on mining the access patterns of geospatial image data using their historical access log information. First, an algorithm is developed to construct an access correlation matrix based on the analysis of the log information, which reveals the patterns of access to the geospatial image data. Then, a practical heuristic algorithm is developed to determine a reasonable solution based on the access correlation matrix. Finally, a number of comparative experiments are presented, demonstrating that our algorithm displays a higher total parallel access probability than those of other algorithms by approximately 10-15% and that the performance can be further improved by more than 20% by simultaneously applying a copy storage strategy. These experiments show that the algorithm can be applied in distributed environments to help realize parallel I/O and thereby improve system performance.

  19. Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns.

    Science.gov (United States)

    Pan, Shaoming; Li, Yongkai; Xu, Zhengquan; Chong, Yanwen

    2015-01-01

    Declustering techniques are widely used in distributed environments to reduce query response time through parallel I/O by splitting large files into several small blocks and then distributing those blocks among multiple storage nodes. Unfortunately, however, many small geospatial image data files cannot be further split for distributed storage. In this paper, we propose a complete theoretical system for the distributed storage of small geospatial image data files based on mining the access patterns of geospatial image data using their historical access log information. First, an algorithm is developed to construct an access correlation matrix based on the analysis of the log information, which reveals the patterns of access to the geospatial image data. Then, a practical heuristic algorithm is developed to determine a reasonable solution based on the access correlation matrix. Finally, a number of comparative experiments are presented, demonstrating that our algorithm displays a higher total parallel access probability than those of other algorithms by approximately 10-15% and that the performance can be further improved by more than 20% by simultaneously applying a copy storage strategy. These experiments show that the algorithm can be applied in distributed environments to help realize parallel I/O and thereby improve system performance.

  20. Genetic Algorithm Calibration of Probabilistic Cellular Automata for Modeling Mining Permit Activity

    Science.gov (United States)

    Louis, S.J.; Raines, G.L.

    2003-01-01

    We use a genetic algorithm to calibrate a spatially and temporally resolved cellular automata to model mining activity on public land in Idaho and western Montana. The genetic algorithm searches through a space of transition rule parameters of a two dimensional cellular automata model to find rule parameters that fit observed mining activity data. Previous work by one of the authors in calibrating the cellular automaton took weeks - the genetic algorithm takes a day and produces rules leading to about the same (or better) fit to observed data. These preliminary results indicate that genetic algorithms are a viable tool in calibrating cellular automata for this application. Experience gained during the calibration of this cellular automata suggests that mineral resource information is a critical factor in the quality of the results. With automated calibration, further refinements of how the mineral-resource information is provided to the cellular automaton will probably improve our model.

  1. Mining association patterns of drug-interactions using post marketing FDA's spontaneous reporting data.

    Science.gov (United States)

    Ibrahim, Heba; Saad, Amr; Abdo, Amany; Sharaf Eldin, A

    2016-04-01

    Pharmacovigilance (PhV) is an important clinical activity with strong implications for population health and clinical research. The main goal of PhV is the timely detection of adverse drug events (ADEs) that are novel in their clinical nature, severity and/or frequency. Drug interactions (DI) pose an important problem in the development of new drugs and post marketing PhV that contribute to 6-30% of all unexpected ADEs. Therefore, the early detection of DI is vital. Spontaneous reporting systems (SRS) have served as the core data collection system for post marketing PhV since the 1960s. The main objective of our study was to particularly identify signals of DI from SRS. In addition, we are presenting an optimized tailored mining algorithm called "hybrid Apriori". The proposed algorithm is based on an optimized and modified association rule mining (ARM) approach. A hybrid Apriori algorithm has been applied to the SRS of the United States Food and Drug Administration's (U.S. FDA) adverse events reporting system (FAERS) in order to extract significant association patterns of drug interaction-adverse event (DIAE). We have assessed the resulting DIAEs qualitatively and quantitatively using two different triage features: a three-element taxonomy and three performance metrics. These features were applied on two random samples of 100 interacting and 100 non-interacting DIAE patterns. Additionally, we have employed logistic regression (LR) statistic method to quantify the magnitude and direction of interactions in order to test for confounding by co-medication in unknown interacting DIAE patterns. Hybrid Apriori extracted 2933 interacting DIAE patterns (including 1256 serious ones) and 530 non-interacting DIAE patterns. Referring to the current knowledge using four different reliable resources of DI, the results showed that the proposed method can extract signals of serious interacting DIAEs. Various association patterns could be identified based on the relationships among

  2. Formulations and algorithms for problems on rock mass and support deformation during mining

    Science.gov (United States)

    Seryakov, VM

    2018-03-01

    The analysis of problem formulations to calculate stress-strain state of mine support and surrounding rocks mass in rock mechanics shows that such formulations incompletely describe the mechanical features of joint deformation in the rock mass–support system. The present paper proposes an algorithm to take into account the actual conditions of rock mass and support interaction and the algorithm implementation method to ensure efficient calculation of stresses in rocks and support.

  3. A study of the Bienstock-Zuckerberg algorithm, Applications in Mining and Resource Constrained Project Scheduling

    OpenAIRE

    Muñoz, Gonzalo; Espinoza, Daniel; Goycoolea, Marcos; Moreno, Eduardo; Queyranne, Maurice; Rivera, Orlando

    2016-01-01

    We study a Lagrangian decomposition algorithm recently proposed by Dan Bienstock and Mark Zuckerberg for solving the LP relaxation of a class of open pit mine project scheduling problems. In this study we show that the Bienstock-Zuckerberg (BZ) algorithm can be used to solve LP relaxations corresponding to a much broader class of scheduling problems, including the well-known Resource Constrained Project Scheduling Problem (RCPSP), and multi-modal variants of the RCPSP that consider batch proc...

  4. Algorithmic acquisition of diagnostic patterns in district heating billing system

    International Nuclear Information System (INIS)

    Kiluk, Sebastian

    2012-01-01

    An application of algorithmic exploration of billing data is examined for fault detection, diagnosis (FDD) based on evaluation of present state and detection of unexpected changes in energy efficiency of buildings. Large data sets from district heating (DH) billing systems are used for construction of feature space, diagnostic rules and classification of the buildings according to their energy efficiency properties. The algorithmic approach automates discovering knowledge about common, thus accepted changes in buildings’ properties, in equipment and in habitants’ behavior reflecting progress in technology and life style. In this article implementation of Data Mining and Knowledge Discovery (DMKD) method in supervision system with exemplary results based on real data is presented. Crucial steps of data processing influencing diagnostic results are described in details.

  5. Effective Application of Improved Profit-Mining Algorithm for the Interday Trading Model

    Directory of Open Access Journals (Sweden)

    Yu-Lung Hsieh

    2014-01-01

    Full Text Available Many real world applications of association rule mining from large databases help users make better decisions. However, they do not work well in financial markets at this time. In addition to a high profit, an investor also looks for a low risk trading with a better rate of winning. The traditional approach of using minimum confidence and support thresholds needs to be changed. Based on an interday model of trading, we proposed effective profit-mining algorithms which provide investors with profit rules including information about profit, risk, and winning rate. Since profit-mining in the financial market is still in its infant stage, it is important to detail the inner working of mining algorithms and illustrate the best way to apply them. In this paper we go into details of our improved profit-mining algorithm and showcase effective applications with experiments using real world trading data. The results show that our approach is practical and effective with good performance for various datasets.

  6. Effective application of improved profit-mining algorithm for the interday trading model.

    Science.gov (United States)

    Hsieh, Yu-Lung; Yang, Don-Lin; Wu, Jungpin

    2014-01-01

    Many real world applications of association rule mining from large databases help users make better decisions. However, they do not work well in financial markets at this time. In addition to a high profit, an investor also looks for a low risk trading with a better rate of winning. The traditional approach of using minimum confidence and support thresholds needs to be changed. Based on an interday model of trading, we proposed effective profit-mining algorithms which provide investors with profit rules including information about profit, risk, and winning rate. Since profit-mining in the financial market is still in its infant stage, it is important to detail the inner working of mining algorithms and illustrate the best way to apply them. In this paper we go into details of our improved profit-mining algorithm and showcase effective applications with experiments using real world trading data. The results show that our approach is practical and effective with good performance for various datasets.

  7. An imperialist competitive algorithm for solving the production scheduling problem in open pit mine

    Directory of Open Access Journals (Sweden)

    Mojtaba Mokhtarian Asl

    2016-06-01

    Full Text Available Production scheduling (planning of an open-pit mine is the procedure during which the rock blocks are assigned to different production periods in a way that the highest net present value of the project achieved subject to operational constraints. The paper introduces a new and computationally less expensive meta-heuristic technique known as imperialist competitive algorithm (ICA for long-term production planning of open pit mines. The proposed algorithm modifies the original rules of the assimilation process. The ICA performance for different levels of the control factors has been studied and the results are presented. The result showed that ICA could be efficiently applied on mine production planning problem.

  8. Assessment of the information content of patterns: an algorithm

    Science.gov (United States)

    Daemi, M. Farhang; Beurle, R. L.

    1991-12-01

    A preliminary investigation confirmed the possibility of assessing the translational and rotational information content of simple artificial images. The calculation is tedious, and for more realistic patterns it is essential to implement the method on a computer. This paper describes an algorithm developed for this purpose which confirms the results of the preliminary investigation. Use of the algorithm facilitates much more comprehensive analysis of the combined effect of continuous rotation and fine translation, and paves the way for analysis of more realistic patterns. Owing to the volume of calculation involved in these algorithms, extensive computing facilities were necessary. The major part of the work was carried out using an ICL 3900 series mainframe computer as well as other powerful workstations such as a RISC architecture MIPS machine.

  9. Data mining and Pattern Recognizing Models for Identifying Inherited Diseases: Challenges and Implications

    OpenAIRE

    Lahiru Iddamalgoda; Partha Sarathi Das; Partha Sarathi Das; Achala Aponso; Vijayaraghava Seshadri Sundararajan; Prashanth Suravajhala; Prashanth Suravajhala; Prashanth Suravajhala; Jayaraman K Valadi

    2016-01-01

    Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately determining the responsible genetic factors for prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern r...

  10. Data Mining and Pattern Recognition Models for Identifying Inherited Diseases: Challenges and Implications

    OpenAIRE

    Iddamalgoda, Lahiru; Das, Partha S.; Aponso, Achala; Sundararajan, Vijayaraghava S.; Suravajhala, Prashanth; Valadi, Jayaraman K.

    2016-01-01

    Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited ...

  11. pubmed. mineR: An R package with text-mining algorithms to ...

    Indian Academy of Sciences (India)

    2016-08-26

    Aug 26, 2016 ... Three case studies are presented, namely, `Evolving role of diabetes educators', `Cancer risk assessment' and `Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus ...

  12. Mining the Relationship between Spatial Mobility Patterns and POIs

    Directory of Open Access Journals (Sweden)

    Liping Huang

    2018-01-01

    Full Text Available Passengers move between urban places for diverse interests and drive the metropolitan regions as the aggregation of urban places to group into network communities. This paper aims to examine the relationship between the spatial patterns (represented by the network communities of mobility flows and places of interest (POIs. Furtherly, it intends to identify the categories of POIs that play the most significant role in shaping the spatial patterns of mobility flows. To achieve these purposes, we partition the study area into disjoint regions and construct the network with each partitioned region as a node and connection between them as links weighted by the mobility flows. The community detection algorithm is implemented on the network to discover spatial mobility patterns, and the multiclass classification based on the logistic regression method is adopted to classify spatial communities featured by POIs. Taking the taxi systems of Shanghai and Beijing as examples, we detect spatial communities based on the movement strengths among regions. Then we investigate their correlations with POIs. It finds that communities’ modularity correlates linearly with POIs; particularly governments, hotels, and the traffic facilities are of the most significance for generating the mobility patterns. This study can provide valuable insight into understanding the spatial mobility patterns from the perspective of POIs.

  13. A Contextualized, Differential Sequence Mining Method to Derive Students' Learning Behavior Patterns

    Science.gov (United States)

    Kinnebrew, John S.; Loretz, Kirk M.; Biswas, Gautam

    2013-01-01

    Computer-based learning environments can produce a wealth of data on student learning interactions. This paper presents an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs a novel combination of sequence mining techniques to identify deferentially…

  14. Finding occupational accident patterns in the extractive industry using a systematic data mining approach

    International Nuclear Information System (INIS)

    Silva, Joaquim F.; Jacinto, Celeste

    2012-01-01

    This paper deals with occupational accident patterns of in the Portuguese Extractive Industry. It constitutes a significant advance with relation to a previous study made in 2008, both in terms of methodology and extended knowledge on the patterns’ details. This work uses more recent data (2005–2007) and this time the identification of the “typical accident” shifts from a bivariate, to a multivariate pattern, for characterising more accurately the accident mechanisms. Instead of crossing only two variables (Deviation x Contact), the new methodology developed here uses data mining techniques to associate nine variables, through their categories, and to quantify the statistical cohesion of each pattern. The results confirmed the “typical accident” of the 2008 study, but went much further: it reveals three statistically significant patterns (the top-3 categories in frequency); moreover, each pattern includes now more variables (4–5 categories) and indicates their statistical cohesion. This approach allowed a more accurate vision of the reality, which is fundamental for risk management. The methodology is best suited for large groups, such as national Authorities, Insurers or Corporate Groups, to assist them planning target-oriented safety strategies. Not least importantly, researchers can apply the same algorithm to other study areas, as it is not restricted to accidents, neither to safety.

  15. User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.

    Science.gov (United States)

    Bourobou, Serge Thomas Mickala; Yoo, Younghwan

    2015-05-21

    This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.

  16. User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm

    Directory of Open Access Journals (Sweden)

    Serge Thomas Mickala Bourobou

    2015-05-01

    Full Text Available This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen’s temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.

  17. Differential harmony search algorithm to optimize PWRs loading pattern

    Energy Technology Data Exchange (ETDEWEB)

    Poursalehi, N., E-mail: npsalehi@yahoo.com [Engineering Department, Shahid Beheshti University, G.C, P.O.Box: 1983963113, Tehran (Iran, Islamic Republic of); Zolfaghari, A.; Minuchehr, A. [Engineering Department, Shahid Beheshti University, G.C, P.O.Box: 1983963113, Tehran (Iran, Islamic Republic of)

    2013-04-15

    Highlights: ► Exploit of DHS algorithm in LP optimization reveals its flexibility, robustness and reliability. ► Upshot of our experiments with DHS shows that the search approach to optimal LP is quickly. ► On the average, the final band width of DHS fitness values is narrow relative to HS and GHS. -- Abstract: The objective of this work is to develop a core loading optimization technique using differential harmony search algorithm in the context of obtaining an optimal configuration of fuel assemblies in pressurized water reactors. To implement and evaluate the proposed technique, differential harmony search nodal expansion package for 2-D geometry, DHSNEP-2D, is developed. The package includes two modules; in the first modules differential harmony search (DHS) is implemented and nodal expansion code which solves two dimensional-multi group neutron diffusion equations using fourth degree flux expansion with one node per a fuel assembly is in the second module. For evaluation of DHS algorithm, classical harmony search (HS) and global-best harmony search (GHS) algorithms are also included in DHSNEP-2D in order to compare the outcome of techniques together. For this purpose, two PWR test cases have been investigated to demonstrate the DHS algorithm capability in obtaining near optimal loading pattern. Results show that the convergence rate of DHS and execution times are quite promising and also is reliable for the fuel management operation. Moreover, numerical results show the good performance of DHS relative to other competitive algorithms such as genetic algorithm (GA), classical harmony search (HS) and global-best harmony search (GHS) algorithms.

  18. Differential harmony search algorithm to optimize PWRs loading pattern

    International Nuclear Information System (INIS)

    Poursalehi, N.; Zolfaghari, A.; Minuchehr, A.

    2013-01-01

    Highlights: ► Exploit of DHS algorithm in LP optimization reveals its flexibility, robustness and reliability. ► Upshot of our experiments with DHS shows that the search approach to optimal LP is quickly. ► On the average, the final band width of DHS fitness values is narrow relative to HS and GHS. -- Abstract: The objective of this work is to develop a core loading optimization technique using differential harmony search algorithm in the context of obtaining an optimal configuration of fuel assemblies in pressurized water reactors. To implement and evaluate the proposed technique, differential harmony search nodal expansion package for 2-D geometry, DHSNEP-2D, is developed. The package includes two modules; in the first modules differential harmony search (DHS) is implemented and nodal expansion code which solves two dimensional-multi group neutron diffusion equations using fourth degree flux expansion with one node per a fuel assembly is in the second module. For evaluation of DHS algorithm, classical harmony search (HS) and global-best harmony search (GHS) algorithms are also included in DHSNEP-2D in order to compare the outcome of techniques together. For this purpose, two PWR test cases have been investigated to demonstrate the DHS algorithm capability in obtaining near optimal loading pattern. Results show that the convergence rate of DHS and execution times are quite promising and also is reliable for the fuel management operation. Moreover, numerical results show the good performance of DHS relative to other competitive algorithms such as genetic algorithm (GA), classical harmony search (HS) and global-best harmony search (GHS) algorithms

  19. Large Scale Frequent Pattern Mining using MPI One-Sided Model

    Energy Technology Data Exchange (ETDEWEB)

    Vishnu, Abhinav; Agarwal, Khushbu

    2015-09-08

    In this paper, we propose a work-stealing runtime --- Library for Work Stealing LibWS --- using MPI one-sided model for designing scalable FP-Growth --- {\\em de facto} frequent pattern mining algorithm --- on large scale systems. LibWS provides locality efficient and highly scalable work-stealing techniques for load balancing on a variety of data distributions. We also propose a novel communication algorithm for FP-growth data exchange phase, which reduces the communication complexity from state-of-the-art O(p) to O(f + p/f) for p processes and f frequent attributed-ids. FP-Growth is implemented using LibWS and evaluated on several work distributions and support counts. An experimental evaluation of the FP-Growth on LibWS using 4096 processes on an InfiniBand Cluster demonstrates excellent efficiency for several work distributions (87\\% efficiency for Power-law and 91% for Poisson). The proposed distributed FP-Tree merging algorithm provides 38x communication speedup on 4096 cores.

  20. Research reactor loading pattern optimization using estimation of distribution algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Jiang, S. [Dept. of Earth Science and Engineering, Applied Modeling and Computation Group AMCG, Imperial College, London, SW7 2AZ (United Kingdom); Ziver, K. [Dept. of Earth Science and Engineering, Applied Modeling and Computation Group AMCG, Imperial College, London, SW7 2AZ (United Kingdom); AMCG Group, RM Consultants, Abingdon (United Kingdom); Carter, J. N.; Pain, C. C.; Eaton, M. D.; Goddard, A. J. H. [Dept. of Earth Science and Engineering, Applied Modeling and Computation Group AMCG, Imperial College, London, SW7 2AZ (United Kingdom); Franklin, S. J.; Phillips, H. J. [Imperial College, Reactor Centre, Silwood Park, Buckhurst Road, Ascot, Berkshire, SL5 7TE (United Kingdom)

    2006-07-01

    A new evolutionary search based approach for solving the nuclear reactor loading pattern optimization problems is presented based on the Estimation of Distribution Algorithms. The optimization technique developed is then applied to the maximization of the effective multiplication factor (K{sub eff}) of the Imperial College CONSORT research reactor (the last remaining civilian research reactor in the United Kingdom). A new elitism-guided searching strategy has been developed and applied to improve the local convergence together with some problem-dependent information based on the 'stand-alone K{sub eff} with fuel coupling calculations. A comparison study between the EDAs and a Genetic Algorithm with Heuristic Tie Breaking Crossover operator has shown that the new algorithm is efficient and robust. (authors)

  1. Research reactor loading pattern optimization using estimation of distribution algorithms

    International Nuclear Information System (INIS)

    Jiang, S.; Ziver, K.; Carter, J. N.; Pain, C. C.; Eaton, M. D.; Goddard, A. J. H.; Franklin, S. J.; Phillips, H. J.

    2006-01-01

    A new evolutionary search based approach for solving the nuclear reactor loading pattern optimization problems is presented based on the Estimation of Distribution Algorithms. The optimization technique developed is then applied to the maximization of the effective multiplication factor (K eff ) of the Imperial College CONSORT research reactor (the last remaining civilian research reactor in the United Kingdom). A new elitism-guided searching strategy has been developed and applied to improve the local convergence together with some problem-dependent information based on the 'stand-alone K eff with fuel coupling calculations. A comparison study between the EDAs and a Genetic Algorithm with Heuristic Tie Breaking Crossover operator has shown that the new algorithm is efficient and robust. (authors)

  2. Historical feature pattern extraction based network attack situation sensing algorithm.

    Science.gov (United States)

    Zeng, Yong; Liu, Dacheng; Lei, Zhou

    2014-01-01

    The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE). First, HFPE algorithm seeks similar indications from the history situation sequence recorded and weighs the link intensity between occurred indication and subsequent effect. Then it calculates the probability that a certain effect reappears according to the current indication and makes a prediction after weighting. Meanwhile, HFPE method gives an evolution algorithm to derive the prediction deviation from the views of pattern and accuracy. This algorithm can continuously promote the adaptability of HFPE through gradual fine-tuning. The method preserves the rules in sequence at its best, does not need data preprocessing, and can track and adapt to the variation of situation sequence continuously.

  3. Historical Feature Pattern Extraction Based Network Attack Situation Sensing Algorithm

    Directory of Open Access Journals (Sweden)

    Yong Zeng

    2014-01-01

    Full Text Available The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE. First, HFPE algorithm seeks similar indications from the history situation sequence recorded and weighs the link intensity between occurred indication and subsequent effect. Then it calculates the probability that a certain effect reappears according to the current indication and makes a prediction after weighting. Meanwhile, HFPE method gives an evolution algorithm to derive the prediction deviation from the views of pattern and accuracy. This algorithm can continuously promote the adaptability of HFPE through gradual fine-tuning. The method preserves the rules in sequence at its best, does not need data preprocessing, and can track and adapt to the variation of situation sequence continuously.

  4. PWR loading pattern optimization using Harmony Search algorithm

    International Nuclear Information System (INIS)

    Poursalehi, N.; Zolfaghari, A.; Minuchehr, A.

    2013-01-01

    Highlights: ► Numerical results reveal that the HS method is reliable. ► The great advantage of HS is significant gain in computational cost. ► On the average, the final band width of search fitness values is narrow. ► Our experiments show that the search approaches the optimal value fast. - Abstract: In this paper a core reloading technique using Harmony Search, HS, is presented in the context of finding an optimal configuration of fuel assemblies, FA, in pressurized water reactors. To implement and evaluate the proposed technique a Harmony Search along Nodal Expansion Code for 2-D geometry, HSNEC2D, is developed to obtain nearly optimal arrangement of fuel assemblies in PWR cores. This code consists of two sections including Harmony Search algorithm and Nodal Expansion modules using fourth degree flux expansion which solves two dimensional-multi group diffusion equations with one node per fuel assembly. Two optimization test problems are investigated to demonstrate the HS algorithm capability in converging to near optimal loading pattern in the fuel management field and other subjects. Results, convergence rate and reliability of the method are quite promising and show the HS algorithm performs very well and is comparable to other competitive algorithms such as Genetic Algorithm and Particle Swarm Intelligence. Furthermore, implementation of nodal expansion technique along HS causes considerable reduction of computational time to process and analysis optimization in the core fuel management problems

  5. Data mining scenarios for the discovery of subtypes and the comparison of algorithms

    NARCIS (Netherlands)

    Colas, Fabrice Pierre Robert

    2009-01-01

    A data mining scenario is a logical sequence of steps to infer patterns from data. In this thesis, we present two scenarios. Our first scenario aims to identify homogeneous subtypes in data. It was applied to clinical research on Osteoarthritis (OA) and Parkinson’s disease (PD) and in drug

  6. Gas Emission Prediction Model of Coal Mine Based on CSBP Algorithm

    Directory of Open Access Journals (Sweden)

    Xiong Yan

    2016-01-01

    Full Text Available In view of the nonlinear characteristics of gas emission in a coal working face, a prediction method is proposed based on cuckoo search algorithm optimized BP neural network (CSBP. In the CSBP algorithm, the cuckoo search is adopted to optimize weight and threshold parameters of BP network, and obtains the global optimal solutions. Furthermore, the twelve main affecting factors of the gas emission in the coal working face are taken as input vectors of CSBP algorithm, the gas emission is acted as output vector, and then the prediction model of BP neural network with optimal parameters is established. The results show that the CSBP algorithm has batter generalization ability and higher prediction accuracy, and can be utilized effectively in the prediction of coal mine gas emission.

  7. Attribute Index and Uniform Design Based Multiobjective Association Rule Mining with Evolutionary Algorithm

    Directory of Open Access Journals (Sweden)

    Jie Zhang

    2013-01-01

    Full Text Available In association rule mining, evaluating an association rule needs to repeatedly scan database to compare the whole database with the antecedent, consequent of a rule and the whole rule. In order to decrease the number of comparisons and time consuming, we present an attribute index strategy. It only needs to scan database once to create the attribute index of each attribute. Then all metrics values to evaluate an association rule do not need to scan database any further, but acquire data only by means of the attribute indices. The paper visualizes association rule mining as a multiobjective problem rather than a single objective one. In order to make the acquired solutions scatter uniformly toward the Pareto frontier in the objective space, elitism policy and uniform design are introduced. The paper presents the algorithm of attribute index and uniform design based multiobjective association rule mining with evolutionary algorithm, abbreviated as IUARMMEA. It does not require the user-specified minimum support and minimum confidence anymore, but uses a simple attribute index. It uses a well-designed real encoding so as to extend its application scope. Experiments performed on several databases demonstrate that the proposed algorithm has excellent performance, and it can significantly reduce the number of comparisons and time consumption.

  8. Attribute index and uniform design based multiobjective association rule mining with evolutionary algorithm.

    Science.gov (United States)

    Zhang, Jie; Wang, Yuping; Feng, Junhong

    2013-01-01

    In association rule mining, evaluating an association rule needs to repeatedly scan database to compare the whole database with the antecedent, consequent of a rule and the whole rule. In order to decrease the number of comparisons and time consuming, we present an attribute index strategy. It only needs to scan database once to create the attribute index of each attribute. Then all metrics values to evaluate an association rule do not need to scan database any further, but acquire data only by means of the attribute indices. The paper visualizes association rule mining as a multiobjective problem rather than a single objective one. In order to make the acquired solutions scatter uniformly toward the Pareto frontier in the objective space, elitism policy and uniform design are introduced. The paper presents the algorithm of attribute index and uniform design based multiobjective association rule mining with evolutionary algorithm, abbreviated as IUARMMEA. It does not require the user-specified minimum support and minimum confidence anymore, but uses a simple attribute index. It uses a well-designed real encoding so as to extend its application scope. Experiments performed on several databases demonstrate that the proposed algorithm has excellent performance, and it can significantly reduce the number of comparisons and time consumption.

  9. Feature Reduction Based on Genetic Algorithm and Hybrid Model for Opinion Mining

    Directory of Open Access Journals (Sweden)

    P. Kalaivani

    2015-01-01

    Full Text Available With the rapid growth of websites and web form the number of product reviews is available on the sites. An opinion mining system is needed to help the people to evaluate emotions, opinions, attitude, and behavior of others, which is used to make decisions based on the user preference. In this paper, we proposed an optimized feature reduction that incorporates an ensemble method of machine learning approaches that uses information gain and genetic algorithm as feature reduction techniques. We conducted comparative study experiments on multidomain review dataset and movie review dataset in opinion mining. The effectiveness of single classifiers Naïve Bayes, logistic regression, support vector machine, and ensemble technique for opinion mining are compared on five datasets. The proposed hybrid method is evaluated and experimental results using information gain and genetic algorithm with ensemble technique perform better in terms of various measures for multidomain review and movie reviews. Classification algorithms are evaluated using McNemar’s test to compare the level of significance of the classifiers.

  10. A Partial Join Approach for Mining Co-Location Patterns: A Summary of Results

    National Research Council Canada - National Science Library

    Yoo, Jin S; Shekhar, Shashi

    2005-01-01

    .... They propose a novel partial-join approach for mining co-location patterns efficiently. It transactionizes continuous spatial data while keeping track of the spatial information not modeled by transactions...

  11. Plant succession patterns on residual open-pit gravel mines deposits Bogota

    OpenAIRE

    Ricardo A. Mora Goyes

    1999-01-01

    Based on both: the study of composition and structure of plant communities and the analysis of the physico-chemical characteristics of mining wastes, the initial patterns of primary succession were determined. These patterns were present in three deposits of waste material abandoned during 18, 36 and 120 months respectively. Sue materials were originated in open-pit gravel mines located to the south of Bogota (Colombia). This study pretends to contribute to the knowledge of the meehanlsms of ...

  12. Fringe pattern analysis for optical metrology theory, algorithms, and applications

    CERN Document Server

    Servin, Manuel; Padilla, Moises

    2014-01-01

    The main objective of this book is to present the basic theoretical principles and practical applications for the classical interferometric techniques and the most advanced methods in the field of modern fringe pattern analysis applied to optical metrology. A major novelty of this work is the presentation of a unified theoretical framework based on the Fourier description of phase shifting interferometry using the Frequency Transfer Function (FTF) along with the theory of Stochastic Process for the straightforward analysis and synthesis of phase shifting algorithms with desired properties such

  13. A hybrid GA-TS algorithm for open vehicle routing optimization of coal mines material

    Energy Technology Data Exchange (ETDEWEB)

    Yu, S.W.; Ding, C.; Zhu, K.J. [China University of Geoscience, Wuhan (China)

    2011-08-15

    In the open vehicle routing problem (OVRP), the objective is to minimize the number of vehicles and the total distance (or time) traveled. This study primarily focuses on solving an open vehicle routing problem (OVRP) by applying a novel hybrid genetic algorithm and the Tabu search (GA-TS), which combines the GA's parallel computing and global optimization with TS's Tabu search skill and fast local search. Firstly, the proposed algorithm uses natural number coding according to the customer demands and the captivity of the vehicle for globe optimization. Secondly, individuals of population do TS local search with a certain degree of probability, namely, do the local routing optimization of all customer sites belong to one vehicle. The mechanism not only improves the ability of global optimization, but also ensures the speed of operation. The algorithm was used in Zhengzhou Coal Mine and Power Supply Co., Ltd.'s transport vehicle routing optimization.

  14. Application of an empiric Bayesian data mining algorithm to reports of pancreatitis associated with atypical antipsychotics.

    Science.gov (United States)

    Hauben, Manfred

    2004-09-01

    To compare the results from one frequently cited data mining algorithm with those from a study, which was published in a peer-reviewed journal, that examined the association of pancreatitis with selected atypical antipsychotics observed by traditional rule-based methods of signal detection. Retrospective pharmacovigilance study. The widely studied data mining algorithm known as the Multi-item Gamma Poisson Shrinker (MGPS) was applied to adverse-event reports from the United States Food and Drug Administration's Adverse Event Reporting System database through the first quarter of 2003 for clozapine, olanzapine, and risperidone to determine if a significant signal of pancreatitis would have been generated by this method in advance of their review or the addition of these events to the respective product labels. Data mining was performed by using nine preferred terms relevant to drug-induced pancreatitis from the Medical Dictionary for Regulatory Activities (MedDRA). Results from a previous study on the antipsychotics were reviewed and analyzed. Physicians' Desk References (PDRs) starting from 1994 were manually reviewed to determine the first year that pancreatitis was listed as an adverse event in the product label for each antipsychotic. This information was used as a surrogate marker of the timing of initial signal detection by traditional criteria. Pancreatitis was listed as an adverse event in a PDR for all three atypical antipsychotics. Despite the presence of up to 88 reports/drug-event combination in the Food and Drug Administration's Adverse Event Reporting System database, the MGPS failed to generate a signal of disproportional reporting of pancreatitis associated with the three antipsychotics despite the signaling of these drug-event combinations by traditional rule-based methods, as reflected in product labeling and/or the literature. These discordant findings illustrate key principles in the application of data mining algorithms to drug safety

  15. Continuous firefly algorithm applied to PWR core pattern enhancement

    Energy Technology Data Exchange (ETDEWEB)

    Poursalehi, N., E-mail: npsalehi@yahoo.com [Engineering Department, Shahid Beheshti University, G.C., P.O. Box 1983963113, Tehran (Iran, Islamic Republic of); Zolfaghari, A.; Minuchehr, A.; Moghaddam, H.K. [Engineering Department, Shahid Beheshti University, G.C., P.O. Box 1983963113, Tehran (Iran, Islamic Republic of)

    2013-05-15

    Highlights: ► Numerical results indicate the reliability of CFA for the nuclear reactor LPO. ► The major advantages of CFA are its light computational cost and fast convergence. ► Our experiments demonstrate the ability of CFA to obtain the near optimal loading pattern. -- Abstract: In this research, the new meta-heuristic optimization strategy, firefly algorithm, is developed for the nuclear reactor loading pattern optimization problem. Two main goals in reactor core fuel management optimization are maximizing the core multiplication factor (K{sub eff}) in order to extract the maximum cycle energy and minimizing the power peaking factor due to safety constraints. In this work, we define a multi-objective fitness function according to above goals for the core fuel arrangement enhancement. In order to evaluate and demonstrate the ability of continuous firefly algorithm (CFA) to find the near optimal loading pattern, we developed CFA nodal expansion code (CFANEC) for the fuel management operation. This code consists of two main modules including CFA optimization program and a developed core analysis code implementing nodal expansion method to calculate with coarse meshes by dimensions of fuel assemblies. At first, CFA is applied for the Foxholes test case with continuous variables in order to validate CFA and then for KWU PWR using a decoding strategy for discrete variables. Results indicate the efficiency and relatively fast convergence of CFA in obtaining near optimal loading pattern with respect to considered fitness function. At last, our experience with the CFA confirms that the CFA is easy to implement and reliable.

  16. Continuous firefly algorithm applied to PWR core pattern enhancement

    International Nuclear Information System (INIS)

    Poursalehi, N.; Zolfaghari, A.; Minuchehr, A.; Moghaddam, H.K.

    2013-01-01

    Highlights: ► Numerical results indicate the reliability of CFA for the nuclear reactor LPO. ► The major advantages of CFA are its light computational cost and fast convergence. ► Our experiments demonstrate the ability of CFA to obtain the near optimal loading pattern. -- Abstract: In this research, the new meta-heuristic optimization strategy, firefly algorithm, is developed for the nuclear reactor loading pattern optimization problem. Two main goals in reactor core fuel management optimization are maximizing the core multiplication factor (K eff ) in order to extract the maximum cycle energy and minimizing the power peaking factor due to safety constraints. In this work, we define a multi-objective fitness function according to above goals for the core fuel arrangement enhancement. In order to evaluate and demonstrate the ability of continuous firefly algorithm (CFA) to find the near optimal loading pattern, we developed CFA nodal expansion code (CFANEC) for the fuel management operation. This code consists of two main modules including CFA optimization program and a developed core analysis code implementing nodal expansion method to calculate with coarse meshes by dimensions of fuel assemblies. At first, CFA is applied for the Foxholes test case with continuous variables in order to validate CFA and then for KWU PWR using a decoding strategy for discrete variables. Results indicate the efficiency and relatively fast convergence of CFA in obtaining near optimal loading pattern with respect to considered fitness function. At last, our experience with the CFA confirms that the CFA is easy to implement and reliable

  17. Multilevel Association Rule Mining for Bridge Resource Management Based on Immune Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Yang Ou

    2014-01-01

    Full Text Available This paper is concerned with the problem of multilevel association rule mining for bridge resource management (BRM which is announced by IMO in 2010. The goal of this paper is to mine the association rules among the items of BRM and the vessel accidents. However, due to the indirect data that can be collected, which seems useless for the analysis of the relationship between items of BIM and the accidents, the cross level association rules need to be studied, which builds the relation between the indirect data and items of BRM. In this paper, firstly, a cross level coding scheme for mining the multilevel association rules is proposed. Secondly, we execute the immune genetic algorithm with the coding scheme for analyzing BRM. Thirdly, based on the basic maritime investigation reports, some important association rules of the items of BRM are mined and studied. Finally, according to the results of the analysis, we provide the suggestions for the work of seafarer training, assessment, and management.

  18. Urinary metabolic profiling of asymptomatic acute intermittent porphyria using a rule-mining-based algorithm.

    Science.gov (United States)

    Luck, Margaux; Schmitt, Caroline; Talbi, Neila; Gouya, Laurent; Caradeuc, Cédric; Puy, Hervé; Bertho, Gildas; Pallet, Nicolas

    2018-01-01

    Metabolomic profiling combines Nuclear Magnetic Resonance spectroscopy with supervised statistical analysis that might allow to better understanding the mechanisms of a disease. In this study, the urinary metabolic profiling of individuals with porphyrias was performed to predict different types of disease, and to propose new pathophysiological hypotheses. Urine 1 H-NMR spectra of 73 patients with asymptomatic acute intermittent porphyria (aAIP) and familial or sporadic porphyria cutanea tarda (f/sPCT) were compared using a supervised rule-mining algorithm. NMR spectrum buckets bins, corresponding to rules, were extracted and a logistic regression was trained. Our rule-mining algorithm generated results were consistent with those obtained using partial least square discriminant analysis (PLS-DA) and the predictive performance of the model was significant. Buckets that were identified by the algorithm corresponded to metabolites involved in glycolysis and energy-conversion pathways, notably acetate, citrate, and pyruvate, which were found in higher concentrations in the urines of aAIP compared with PCT patients. Metabolic profiling did not discriminate sPCT from fPCT patients. These results suggest that metabolic reprogramming occurs in aAIP individuals, even in the absence of overt symptoms, and supports the relationship that occur between heme synthesis and mitochondrial energetic metabolism.

  19. Fast Adapting Ensemble: A New Algorithm for Mining Data Streams with Concept Drift

    Science.gov (United States)

    Ortíz Díaz, Agustín; Ramos-Jiménez, Gonzalo; Frías Blanco, Isvani; Caballero Mota, Yailé; Morales-Bueno, Rafael

    2015-01-01

    The treatment of large data streams in the presence of concept drifts is one of the main challenges in the field of data mining, particularly when the algorithms have to deal with concepts that disappear and then reappear. This paper presents a new algorithm, called Fast Adapting Ensemble (FAE), which adapts very quickly to both abrupt and gradual concept drifts, and has been specifically designed to deal with recurring concepts. FAE processes the learning examples in blocks of the same size, but it does not have to wait for the batch to be complete in order to adapt its base classification mechanism. FAE incorporates a drift detector to improve the handling of abrupt concept drifts and stores a set of inactive classifiers that represent old concepts, which are activated very quickly when these concepts reappear. We compare our new algorithm with various well-known learning algorithms, taking into account, common benchmark datasets. The experiments show promising results from the proposed algorithm (regarding accuracy and runtime), handling different types of concept drifts. PMID:25879051

  20. Mining the multigroup-discrete ordinates algorithm for high quality solutions

    International Nuclear Information System (INIS)

    Ganapol, B.D.; Kornreich, D.E.

    2005-01-01

    A novel approach to the numerical solution of the neutron transport equation via the discrete ordinates (SN) method is presented. The new technique is referred to as 'mining' low order (SN) numerical solutions to obtain high order accuracy. The new numerical method, called the Multigroup Converged SN (MGCSN) algorithm, is a combination of several sequence accelerators: Romberg and Wynn-epsilon. The extreme accuracy obtained by the method is demonstrated through self consistency and comparison to the independent semi-analytical benchmark BLUE. (authors)

  1. Mining Long, Sharable Patterns in Trajectories of Moving Objects

    DEFF Research Database (Denmark)

    Gidofalvi, Gyozo; Pedersen, Torben Bach

    2009-01-01

    The efficient analysis of spatio-temporal data, generated by moving objects, is an essential requirement for intelligent location-based services. Spatio-temporal rules can be found by constructing spatio-temporal baskets, from which traditional association rule mining methods can discover spatio...

  2. The Determination of Children's Knowledge of Global Lunar Patterns from Online Essays Using Text Mining Analysis

    Science.gov (United States)

    Cheon, Jongpil; Lee, Sangno; Smith, Walter; Song, Jaeki; Kim, Yongjin

    2013-01-01

    The purpose of this study was to use text mining analysis of early adolescents' online essays to determine their knowledge of global lunar patterns. Australian and American students in grades five to seven wrote about global lunar patterns they had discovered by sharing observations with each other via the Internet. These essays were analyzed for…

  3. Calibration of Mine Ventilation Network Models Using the Non-Linear Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    Guang Xu

    2017-12-01

    Full Text Available Effective ventilation planning is vital to underground mining. To ensure stable operation of the ventilation system and to avoid airflow disorder, mine ventilation network (MVN models have been widely used in simulating and optimizing the mine ventilation system. However, one of the challenges for MVN model simulation is that the simulated airflow distribution results do not match the measured data. To solve this problem, a simple and effective calibration method is proposed based on the non-linear optimization algorithm. The calibrated model not only makes simulated airflow distribution results in accordance with the on-site measured data, but also controls the errors of other parameters within a minimum range. The proposed method was then applied to calibrate an MVN model in a real case, which is built based on ventilation survey results and Ventsim software. Finally, airflow simulation experiments are carried out respectively using data before and after calibration, whose results were compared and analyzed. This showed that the simulated airflows in the calibrated model agreed much better to the ventilation survey data, which verifies the effectiveness of calibrating method.

  4. An overview of data mining algorithms in drug induced toxicity prediction.

    Science.gov (United States)

    Omer, Ankur; Singh, Poonam; Yadav, N K; Singh, R K

    2014-04-01

    The growth in chemical diversity has increased the need to adjudicate the toxicity of different chemical compounds raising the burden on the demand of animal testing. The toxicity evaluation requires time consuming and expensive undertaking, leading to the deprivation of the methods employed for screening chemicals pointing towards the need to develop more efficient toxicity assessment systems. Computational approaches have reduced the time as well as the cost for evaluating the toxicity and kinetic behavior of any chemical. The accessibility of a large amount of data and the intense need of turning this data into useful information have attracted the attention towards data mining. Machine Learning, one of the powerful data mining techniques has evolved as the most effective and potent tool for exploring new insights on combinatorial relationships among various experimental data generated. The article accounts on some sophisticated machine learning algorithms like Artificial Neural Networks (ANN), Support Vector Machine (SVM), k-mean clustering and Self Organizing Maps (SOM) with some of the available tools used for classification, sorting and toxicological evaluation of data, clarifying, how data mining and machine learning interact cooperatively to facilitate knowledge discovery. Addressing the association of some commonly used expert systems, we briefly outline some real world applications to consider the crucial role of data set partitioning.

  5. Mining known attack patterns from security-related events

    Directory of Open Access Journals (Sweden)

    Nicandro Scarabeo

    2015-10-01

    Full Text Available Managed Security Services (MSS have become an essential asset for companies to have in order to protect their infrastructure from hacking attempts such as unauthorized behaviour, denial of service (DoS, malware propagation, and anomalies. A proliferation of attacks has determined the need for installing more network probes and collecting more security-related events in order to assure the best coverage, necessary for generating incident responses. The increase in volume of data to analyse has created a demand for specific tools that automatically correlate events and gather them in pre-defined scenarios of attacks. Motivated by Above Security, a specialized company in the sector, and by National Research Council Canada (NRC, we propose a new data mining system that employs text mining techniques to dynamically relate security-related events in order to reduce analysis time, increase the quality of the reports, and automatically build correlated scenarios.

  6. Trace element patterns in lichens following uranium mine closures

    International Nuclear Information System (INIS)

    Fahselt, D.; Wu, T.W.; Mott, B.

    1995-01-01

    Instrumental neutron activation analysis was used to determine trace elements in Cladina mitis (Sandst). Hale ampersand Culb. along transects extending from uranium mines at Elliot Lake and Agnew Lake in central Ontario, Canada. Levels of 11 elements were reported and the presence of uranium (U) was confirmed, although U concentrations were much less than in Cladina rangiferina 10 years earlier. Among the elements identified in lichen thalli was Th, which occurred in higher concentrations than U. All trace elements, including the two radionuclides, were found in deteriorating thallus parts as well as living podetia, and five of these seem to have originated as airborne particulates from minesites. In spite of mine closures, levels of Th and U remained higher near sources of ore dust and there was little relationship between radionuclide concentrations in thallus and substrate. 24 refs., 4 figs., 3 tabs

  7. Monitoring coal mine changes and their impact on landscape patterns in an alpine region: a case study of the Muli coal mine in the Qinghai-Tibet Plateau.

    Science.gov (United States)

    Qian, Dawen; Yan, Changzhen; Xing, Zanpin; Xiu, Lina

    2017-10-14

    The Muli coal mine is the largest open-cast coal mine in the Qinghai-Tibet Plateau, and it consists of two independent mining sites named Juhugeng and Jiangcang. It has received much attention due to the ecological problems caused by rapid expansion in recent years. The objective of this paper was to monitor the mining area and its surrounding land cover over the period 1976-2016 utilizing Landsat images, and the network structure of land cover changes was determined to visualize the relationships and pattern of the mining-induced land cover changes. In addition, the responses of the surrounding landscape pattern were analysed by constructing gradient transects. The results show that the mining area was increasing in size, especially after 2000 (increased by 71.68 km 2 ), and this caused shrinkage of the surrounding lands, including alpine meadow wetland (53.44 km 2 ), alpine meadow (6.28 km 2 ) and water (6.24 km 2 ). The network structure of the mining area revealed the changes in lands surrounding the mining area. The impact of mining development on landscape patterns was mainly distributed within a range of 1-6 km. Alpine meadow wetland was most affected in Juhugeng, while alpine meadow was most affected in Jiangcang. The results of this study provide a reference for the ecological assessment and restoration of the Muli coal mine land.

  8. Optimizing Fukushima Emissions Through Pattern Matching and Genetic Algorithms

    Science.gov (United States)

    Lucas, D. D.; Simpson, M. D.; Philip, C. S.; Baskett, R.

    2017-12-01

    Hazardous conditions during the Fukushima Daiichi nuclear power plant (NPP) accident hindered direct observations of the emissions of radioactive materials into the atmosphere. A wide range of emissions are estimated from bottom-up studies using reactor inventories and top-down approaches based on inverse modeling. We present a new inverse modeling estimate of cesium-137 emitted from the Fukushima NPP. Our estimate considers weather uncertainty through a large ensemble of Weather Research and Forecasting model simulations and uses the FLEXPART atmospheric dispersion model to transport and deposit cesium. The simulations are constrained by observations of the spatial distribution of cumulative cesium deposited on the surface of Japan through April 2, 2012. Multiple spatial metrics are used to quantify differences between observed and simulated deposition patterns. In order to match the observed pattern, we use a multi-objective genetic algorithm to optimize the time-varying emissions. We find that large differences with published bottom-up estimates are required to explain the observations. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  9. General asymmetric neutral networks and structure design by genetic algorithms: A learning rule for temporal patterns

    Energy Technology Data Exchange (ETDEWEB)

    Bornholdt, S. [Heidelberg Univ., (Germany). Inst., fuer Theoretische Physik; Graudenz, D. [Lawrence Berkeley Lab., CA (United States)

    1993-07-01

    A learning algorithm based on genetic algorithms for asymmetric neural networks with an arbitrary structure is presented. It is suited for the learning of temporal patterns and leads to stable neural networks with feedback.

  10. General asymmetric neutral networks and structure design by genetic algorithms: A learning rule for temporal patterns

    International Nuclear Information System (INIS)

    Bornholdt, S.

    1993-07-01

    A learning algorithm based on genetic algorithms for asymmetric neural networks with an arbitrary structure is presented. It is suited for the learning of temporal patterns and leads to stable neural networks with feedback

  11. Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm.

    Science.gov (United States)

    Tchagang, Alain B; Phan, Sieu; Famili, Fazel; Shearer, Heather; Fobert, Pierre; Huang, Yi; Zou, Jitao; Huang, Daiqing; Cutler, Adrian; Liu, Ziying; Pan, Youlian

    2012-04-04

    Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space. We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples. Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.

  12. Characteristic statistic algorithm (CSA) for in-core loading pattern optimization

    International Nuclear Information System (INIS)

    Liu Zhihong; Hu Yongming; Shi Gong

    2007-01-01

    To solve the problem of PWR in-core loading pattern optimization, a more suitable global optimization algorithm, i.e., Characteristic statistic algorithm (CSA), is used. The searching process of this algorithm and how to apply it to this problem are presented. Loading pattern optimization code SCYCLE is developed. Two different problems on real PWR models are calculated and the results are compared with other algorithms. It is shown that SCYCLE has high efficiency and good global performance on this problem. (authors)

  13. Plant succession patterns on residual open-pit gravel mines deposits Bogota

    Directory of Open Access Journals (Sweden)

    Ricardo A. Mora Goyes

    1999-07-01

    Full Text Available Based on both: the study of composition and structure of plant communities and the analysis of the physico-chemical characteristics of mining wastes, the initial patterns of primary succession were determined. These patterns were present in three deposits of waste material abandoned during 18, 36 and 120 months respectively. Sue materials were originated in open-pit gravel mines located to the south of Bogota (Colombia. This study pretends to contribute to the knowledge of the meehanlsms of natural restauration of tropical ecosystems subjected to man-borne degradation.

  14. Machine Learning Algorithms for Statistical Patterns in Large Data Sets

    Science.gov (United States)

    2018-02-01

    SUBJECT TERMS Text Analysis, Text Exploitation, Situation Awareness of Text , Document Processing, Document Ingestion, Full Text Search, Information...Assortativity: Proclivity Index for Attributed Networks (PRONE).” Pacific-Asia Conference on Knowledge Discovery and Data Mining , 2017. pp. 225-237...international conference on Knowledge discovery and data mining , 2013. pp. 212-220. [18] Sutherland, D.J., Xiong, L., Póczos, B., and Schneider, J

  15. Mining continuous activity patterns from animal trajectory data

    Science.gov (United States)

    Wang, Y.; Luo, Ze; Baoping, Yan; Takekawa, John Y.; Prosser, Diann J.; Newman, Scott H.

    2014-01-01

    The increasing availability of animal tracking data brings us opportunities and challenges to intuitively understand the mechanisms of animal activities. In this paper, we aim to discover animal movement patterns from animal trajectory data. In particular, we propose a notion of continuous activity pattern as the concise representation of underlying similar spatio-temporal movements, and develop an extension and refinement framework to discover the patterns. We first preprocess the trajectories into significant semantic locations with time property. Then, we apply a projection-based approach to generate candidate patterns and refine them to generate true patterns. A sequence graph structure and a simple and effective processing strategy is further developed to reduce the computational overhead. The proposed approaches are extensively validated on both real GPS datasets and large synthetic datasets.

  16. Comparison of predictive performance of data mining algorithms in predicting body weight in Mengali rams of Pakistan

    Directory of Open Access Journals (Sweden)

    Senol Celik

    Full Text Available ABSTRACT The present study aimed at comparing predictive performance of some data mining algorithms (CART, CHAID, Exhaustive CHAID, MARS, MLP, and RBF in biometrical data of Mengali rams. To compare the predictive capability of the algorithms, the biometrical data regarding body (body length, withers height, and heart girth and testicular (testicular length, scrotal length, and scrotal circumference measurements of Mengali rams in predicting live body weight were evaluated by most goodness of fit criteria. In addition, age was considered as a continuous independent variable. In this context, MARS data mining algorithm was used for the first time to predict body weight in two forms, without (MARS_1 and with interaction (MARS_2 terms. The superiority order in the predictive accuracy of the algorithms was found as CART > CHAID ≈ Exhaustive CHAID > MARS_2 > MARS_1 > RBF > MLP. Moreover, all tested algorithms provided a strong predictive accuracy for estimating body weight. However, MARS is the only algorithm that generated a prediction equation for body weight. Therefore, it is hoped that the available results might present a valuable contribution in terms of predicting body weight and describing the relationship between the body weight and body and testicular measurements in revealing breed standards and the conservation of indigenous gene sources for Mengali sheep breeding. Therefore, it will be possible to perform more profitable and productive sheep production. Use of data mining algorithms is useful for revealing the relationship between body weight and testicular traits in describing breed standards of Mengali sheep.

  17. Pattern recognition and data mining software based on artificial neural networks applied to proton transfer in aqueous environments

    International Nuclear Information System (INIS)

    Tahat Amani; Marti Jordi; Khwaldeh Ali; Tahat Kaher

    2014-01-01

    In computational physics proton transfer phenomena could be viewed as pattern classification problems based on a set of input features allowing classification of the proton motion into two categories: transfer ‘occurred’ and transfer ‘not occurred’. The goal of this paper is to evaluate the use of artificial neural networks in the classification of proton transfer events, based on the feed-forward back propagation neural network, used as a classifier to distinguish between the two transfer cases. In this paper, we use a new developed data mining and pattern recognition tool for automating, controlling, and drawing charts of the output data of an Empirical Valence Bond existing code. The study analyzes the need for pattern recognition in aqueous proton transfer processes and how the learning approach in error back propagation (multilayer perceptron algorithms) could be satisfactorily employed in the present case. We present a tool for pattern recognition and validate the code including a real physical case study. The results of applying the artificial neural networks methodology to crowd patterns based upon selected physical properties (e.g., temperature, density) show the abilities of the network to learn proton transfer patterns corresponding to properties of the aqueous environments, which is in turn proved to be fully compatible with previous proton transfer studies. (condensed matter: structural, mechanical, and thermal properties)

  18. A novel procedure on next generation sequencing data analysis using text mining algorithm.

    Science.gov (United States)

    Zhao, Weizhong; Chen, James J; Perkins, Roger; Wang, Yuping; Liu, Zhichao; Hong, Huixiao; Tong, Weida; Zou, Wen

    2016-05-13

    Next-generation sequencing (NGS) technologies have provided researchers with vast possibilities in various biological and biomedical research areas. Efficient data mining strategies are in high demand for large scale comparative and evolutional studies to be performed on the large amounts of data derived from NGS projects. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. We report a novel procedure to analyse NGS data using topic modeling. It consists of four major procedures: NGS data retrieval, preprocessing, topic modeling, and data mining using Latent Dirichlet Allocation (LDA) topic outputs. The NGS data set of the Salmonella enterica strains were used as a case study to show the workflow of this procedure. The perplexity measurement of the topic numbers and the convergence efficiencies of Gibbs sampling were calculated and discussed for achieving the best result from the proposed procedure. The output topics by LDA algorithms could be treated as features of Salmonella strains to accurately describe the genetic diversity of fliC gene in various serotypes. The results of a two-way hierarchical clustering and data matrix analysis on LDA-derived matrices successfully classified Salmonella serotypes based on the NGS data. The implementation of topic modeling in NGS data analysis procedure provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data. The implementation of topic modeling in NGS data analysis provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data.

  19. Exploring the potential of data mining techniques for the analysis of accident patterns

    DEFF Research Database (Denmark)

    Prato, Carlo Giacomo; Bekhor, Shlomo; Galtzur, Ayelet

    2010-01-01

    Research in road safety faces major challenges: individuation of the most significant determinants of traffic accidents, recognition of the most recurrent accident patterns, and allocation of resources necessary to address the most relevant issues. This paper intends to comprehend which data mining...... and association rules) data mining techniques are implemented for the analysis of traffic accidents occurred in Israel between 2001 and 2004. Results show that descriptive techniques are useful to classify the large amount of analyzed accidents, even though introduce problems with respect to the clear...... importance of input and intermediate neurons, and the relative importance of hundreds of association rules. Further research should investigate whether limiting the analysis to fatal accidents would simplify the task of data mining techniques in recognizing accident patterns without the “noise” probably...

  20. Data Mining and Pattern Recognition Models for Identifying Inherited Diseases: Challenges and Implications.

    Science.gov (United States)

    Iddamalgoda, Lahiru; Das, Partha S; Aponso, Achala; Sundararajan, Vijayaraghava S; Suravajhala, Prashanth; Valadi, Jayaraman K

    2016-01-01

    Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification- and scoring-based prioritization methods in determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI) methods in conjunction with the K nearest neighbors' could be used in accurately categorizing the genetic factors in disease causation.

  1. Data mining and Pattern Recognizing Models for Identifying Inherited Diseases: Challenges and Implications

    Directory of Open Access Journals (Sweden)

    Lahiru Iddamalgoda

    2016-08-01

    Full Text Available Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately determining the responsible genetic factors for prioritizing the single nucleotide polymorphisms (SNP associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification and scoring based prioritization methods for determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI methods in conjunction with the K nearest neighbors’ could be used in accurately categorizing the genetic factors in disease causation

  2. A Neural-Network Clustering-Based Algorithm for Privacy Preserving Data Mining

    Science.gov (United States)

    Tsiafoulis, S.; Zorkadis, V. C.; Karras, D. A.

    The increasing use of fast and efficient data mining algorithms in huge collections of personal data, facilitated through the exponential growth of technology, in particular in the field of electronic data storage media and processing power, has raised serious ethical, philosophical and legal issues related to privacy protection. To cope with these concerns, several privacy preserving methodologies have been proposed, classified in two categories, methodologies that aim at protecting the sensitive data and those that aim at protecting the mining results. In our work, we focus on sensitive data protection and compare existing techniques according to their anonymity degree achieved, the information loss suffered and their performance characteristics. The ℓ-diversity principle is combined with k-anonymity concepts, so that background information can not be exploited to successfully attack the privacy of data subjects data refer to. Based on Kohonen Self Organizing Feature Maps (SOMs), we firstly organize data sets in subspaces according to their information theoretical distance to each other, then create the most relevant classes paying special attention to rare sensitive attribute values, and finally generalize attribute values to the minimum extend required so that both the data disclosure probability and the information loss are possibly kept negligible. Furthermore, we propose information theoretical measures for assessing the anonymity degree achieved and empirical tests to demonstrate it.

  3. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    Science.gov (United States)

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Incremental temporal pattern mining using efficient batch-free stream clustering

    NARCIS (Netherlands)

    Lu, Y.; Hassani, M.; Seidl, T.

    2017-01-01

    This paper address the problem of temporal pattern mining from multiple data streams containing temporal events. Temporal events are considered as real world events aligned with comprehensive starting and ending timing information rather than simple integer timestamps. Predefined relations, such as

  5. Toward a Progress Indicator for Machine Learning Model Building and Data Mining Algorithm Execution: A Position Paper

    Science.gov (United States)

    Luo, Gang

    2017-01-01

    For user-friendliness, many software systems offer progress indicators for long-duration tasks. A typical progress indicator continuously estimates the remaining task execution time as well as the portion of the task that has been finished. Building a machine learning model often takes a long time, but no existing machine learning software supplies a non-trivial progress indicator. Similarly, running a data mining algorithm often takes a long time, but no existing data mining software provides a nontrivial progress indicator. In this article, we consider the problem of offering progress indicators for machine learning model building and data mining algorithm execution. We discuss the goals and challenges intrinsic to this problem. Then we describe an initial framework for implementing such progress indicators and two advanced, potential uses of them, with the goal of inspiring future research on this topic. PMID:29177022

  6. A genetic algorithm approach for open-pit mine production scheduling

    Directory of Open Access Journals (Sweden)

    Aref Alipour

    2017-06-01

    Full Text Available In an Open-Pit Production Scheduling (OPPS problem, the goal is to determine the mining sequence of an orebody as a block model. In this article, linear programing formulation is used to aim this goal. OPPS problem is known as an NP-hard problem, so an exact mathematical model cannot be applied to solve in the real state. Genetic Algorithm (GA is a well-known member of evolutionary algorithms that widely are utilized to solve NP-hard problems. Herein, GA is implemented in a hypothetical Two-Dimensional (2D copper orebody model. The orebody is featured as two-dimensional (2D array of blocks. Likewise, counterpart 2D GA array was used to represent the OPPS problem’s solution space. Thereupon, the fitness function is defined according to the OPPS problem’s objective function to assess the solution domain. Also, new normalization method was used for the handling of block sequencing constraint. A numerical study is performed to compare the solutions of the exact and GA-based methods. It is shown that the gap between GA and the optimal solution by the exact method is less than % 5; hereupon GA is found to be efficiently in solving OPPS problem.

  7. Quick Mining of Isomorphic Exact Large Patterns from Large Graphs

    KAUST Repository

    Almasri, Islam

    2014-12-01

    The applications of the sub graph isomorphism search are growing with the growing number of areas that model their systems using graphs or networks. Specifically, many biological systems, such as protein interaction networks, molecular structures and protein contact maps, are modeled as graphs. The sub graph isomorphism search is concerned with finding all sub graphs that are isomorphic to a relevant query graph, the existence of such sub graphs can reflect on the characteristics of the modeled system. The most computationally expensive step in the search for isomorphic sub graphs is the backtracking algorithm that traverses the nodes of the target graph. In this paper, we propose a pruning approach that is inspired by the minimum remaining value heuristic that achieves greater scalability over large query and target graphs. Our testing on various biological networks shows that performance enhancement of our approach over existing state-of-the-art approaches varies between 6x and 53x. © 2014 IEEE.

  8. Quick Mining of Isomorphic Exact Large Patterns from Large Graphs

    KAUST Repository

    Almasri, Islam; Gao, Xin; Fedoroff, Nina V.

    2014-01-01

    The applications of the sub graph isomorphism search are growing with the growing number of areas that model their systems using graphs or networks. Specifically, many biological systems, such as protein interaction networks, molecular structures and protein contact maps, are modeled as graphs. The sub graph isomorphism search is concerned with finding all sub graphs that are isomorphic to a relevant query graph, the existence of such sub graphs can reflect on the characteristics of the modeled system. The most computationally expensive step in the search for isomorphic sub graphs is the backtracking algorithm that traverses the nodes of the target graph. In this paper, we propose a pruning approach that is inspired by the minimum remaining value heuristic that achieves greater scalability over large query and target graphs. Our testing on various biological networks shows that performance enhancement of our approach over existing state-of-the-art approaches varies between 6x and 53x. © 2014 IEEE.

  9. Investigation on the improvement of genetic algorithm for PWR loading pattern search and its benchmark verification

    International Nuclear Information System (INIS)

    Li Qianqian; Jiang Xiaofeng; Zhang Shaohong

    2009-01-01

    In this study, the age technique, the concepts of relativeness degree and worth function are exploited to improve the performance of genetic algorithm (GA) for PWR loading pattern search. Among them, the age technique endows the algorithm be capable of learning from previous search 'experience' and guides it to do a better search in the vicinity ora local optimal; the introduction of the relativeness degree checks the relativeness of two loading patterns before performing crossover between them, which can significantly reduce the possibility of prematurity of the algorithm; while the application of the worth function makes the algorithm be capable of generating new loading patterns based on the statistics of common features of evaluated good loading patterns. Numerical verification against a loading pattern search benchmark problem ora two-loop reactor demonstrates that the adoption of these techniques is able to significantly enhance the efficiency of the genetic algorithm while improves the quality of the final solution as well. (authors)

  10. An Autonomous Star Identification Algorithm Based on One-Dimensional Vector Pattern for Star Sensors.

    Science.gov (United States)

    Luo, Liyan; Xu, Luping; Zhang, Hua

    2015-07-07

    In order to enhance the robustness and accelerate the recognition speed of star identification, an autonomous star identification algorithm for star sensors is proposed based on the one-dimensional vector pattern (one_DVP). In the proposed algorithm, the space geometry information of the observed stars is used to form the one-dimensional vector pattern of the observed star. The one-dimensional vector pattern of the same observed star remains unchanged when the stellar image rotates, so the problem of star identification is simplified as the comparison of the two feature vectors. The one-dimensional vector pattern is adopted to build the feature vector of the star pattern, which makes it possible to identify the observed stars robustly. The characteristics of the feature vector and the proposed search strategy for the matching pattern make it possible to achieve the recognition result as quickly as possible. The simulation results demonstrate that the proposed algorithm can effectively accelerate the star identification. Moreover, the recognition accuracy and robustness by the proposed algorithm are better than those by the pyramid algorithm, the modified grid algorithm, and the LPT algorithm. The theoretical analysis and experimental results show that the proposed algorithm outperforms the other three star identification algorithms.

  11. Pattern Nulling of Linear Antenna Arrays Using Backtracking Search Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    Kerim Guney

    2015-01-01

    Full Text Available An evolutionary method based on backtracking search optimization algorithm (BSA is proposed for linear antenna array pattern synthesis with prescribed nulls at interference directions. Pattern nulling is obtained by controlling only the amplitude, position, and phase of the antenna array elements. BSA is an innovative metaheuristic technique based on an iterative process. Various numerical examples of linear array patterns with the prescribed single, multiple, and wide nulls are given to illustrate the performance and flexibility of BSA. The results obtained by BSA are compared with the results of the following seventeen algorithms: particle swarm optimization (PSO, genetic algorithm (GA, modified touring ant colony algorithm (MTACO, quadratic programming method (QPM, bacterial foraging algorithm (BFA, bees algorithm (BA, clonal selection algorithm (CLONALG, plant growth simulation algorithm (PGSA, tabu search algorithm (TSA, memetic algorithm (MA, nondominated sorting GA-2 (NSGA-2, multiobjective differential evolution (MODE, decomposition with differential evolution (MOEA/D-DE, comprehensive learning PSO (CLPSO, harmony search algorithm (HSA, seeker optimization algorithm (SOA, and mean variance mapping optimization (MVMO. The simulation results show that the linear antenna array synthesis using BSA provides low side-lobe levels and deep null levels.

  12. Risk of hepatotoxicity associated with the use of telithromycin: a signal detection using data mining algorithms.

    Science.gov (United States)

    Chen, Yan; Guo, Jeff J; Healy, Daniel P; Lin, Xiaodong; Patel, Nick C

    2008-12-01

    With the exception of case reports, limited data are available regarding the risk of hepatotoxicity associated with the use of telithromycin. To detect the safety signal regarding the reporting of hepatotoxicity associated with the use of telithromycin using 4 commonly employed data mining algorithms (DMAs). Based on the Adverse Events Reporting System (AERS) database of the Food and Drug Administration, 4 DMAs, including the reporting odds ratio (ROR), the proportional reporting ratio (PRR), the information component (IC), and the Gamma Poisson Shrinker (GPS), were applied to examine the association between the reporting of hepatotoxicity and the use of telithromycin. The study period was from the first quarter of 2004 to the second quarter of 2006. The reporting of hepatotoxicity was identified using the preferred terms indexed in the Medical Dictionary for Regulatory Activities. The drug name was used to identify reports regarding the use of telithromycin. A total of 226 reports describing hepatotoxicity associated with the use of telithromycin were recorded in the AERS. A safety problem of telithromycin associated with increased reporting of hepatotoxicity was clearly detected by 4 algorithms as early as 2005, signaling the problem in the first quarter by the ROR and the IC, in the second quarter by the PRR, and in the fourth quarter by the GPS. A safety signal was indicated by the 4 DMAs suggesting an association between the reporting of hepatotoxicity and the use of telithromycin. Given the wide use of telithromycin and serious consequences of hepatotoxicity, clinicians should be cautious when selecting telithromycin for treatment of an infection. In addition, further observational studies are required to evaluate the utility of signal detection systems for early recognition of serious, life-threatening, low-frequency drug-induced adverse events.

  13. The Application of Machine Learning Algorithms for Text Mining based on Sentiment Analysis Approach

    Directory of Open Access Journals (Sweden)

    Reza Samizade

    2018-06-01

    Full Text Available Classification of the cyber texts and comments into two categories of positive and negative sentiment among social media users is of high importance in the research are related to text mining. In this research, we applied supervised classification methods to classify Persian texts based on sentiment in cyber space. The result of this research is in a form of a system that can decide whether a comment which is published in cyber space such as social networks is considered positive or negative. The comments that are published in Persian movie and movie review websites from 1392 to 1395 are considered as the data set for this research. A part of these data are considered as training and others are considered as testing data. Prior to implementing the algorithms, pre-processing activities such as tokenizing, removing stop words, and n-germs process were applied on the texts. Naïve Bayes, Neural Networks and support vector machine were used for text classification in this study. Out of sample tests showed that there is no evidence indicating that the accuracy of SVM approach is statistically higher than Naïve Bayes or that the accuracy of Naïve Bayes is not statistically higher than NN approach. However, the researchers can conclude that the accuracy of the classification using SVM approach is statistically higher than the accuracy of NN approach in 5% confidence level.

  14. Fuzzy C-Means Clustering Model Data Mining For Recognizing Stock Data Sampling Pattern

    Directory of Open Access Journals (Sweden)

    Sylvia Jane Annatje Sumarauw

    2007-06-01

    Full Text Available Abstract Capital market has been beneficial to companies and investor. For investors, the capital market provides two economical advantages, namely deviden and capital gain, and a non-economical one that is a voting .} hare in Shareholders General Meeting. But, it can also penalize the share owners. In order to prevent them from the risk, the investors should predict the prospect of their companies. As a consequence of having an abstract commodity, the share quality will be determined by the validity of their company profile information. Any information of stock value fluctuation from Jakarta Stock Exchange can be a useful consideration and a good measurement for data analysis. In the context of preventing the shareholders from the risk, this research focuses on stock data sample category or stock data sample pattern by using Fuzzy c-Me, MS Clustering Model which providing any useful information jar the investors. lite research analyses stock data such as Individual Index, Volume and Amount on Property and Real Estate Emitter Group at Jakarta Stock Exchange from January 1 till December 31 of 204. 'he mining process follows Cross Industry Standard Process model for Data Mining (CRISP,. DM in the form of circle with these steps: Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation and Deployment. At this modelling process, the Fuzzy c-Means Clustering Model will be applied. Data Mining Fuzzy c-Means Clustering Model can analyze stock data in a big database with many complex variables especially for finding the data sample pattern, and then building Fuzzy Inference System for stimulating inputs to be outputs that based on Fuzzy Logic by recognising the pattern. Keywords: Data Mining, AUz..:y c-Means Clustering Model, Pattern Recognition

  15. A Pilot-Pattern Based Algorithm for MIMO-OFDM Channel Estimation

    Directory of Open Access Journals (Sweden)

    Guomin Li

    2016-12-01

    Full Text Available An improved pilot pattern algorithm for facilitating the channel estimation in multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM systems is proposed in this paper. The presented algorithm reconfigures the parameter in the least square (LS algorithm, which belongs to the space-time block-coded (STBC category for channel estimation in pilot-based MIMO-OFDM system. Simulation results show that the algorithm has better performance in contrast to the classical single symbol scheme. In contrast to the double symbols scheme, the proposed algorithm can achieve nearly the same performance with only half of the complexity of the double symbols scheme.

  16. Design Pattern Mining Using Distributed Learning Automata and DNA Sequence Alignment

    Science.gov (United States)

    Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina

    2014-01-01

    Context Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. Objective This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. Method The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. Results The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. Conclusion The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns. PMID:25243670

  17. Design pattern mining using distributed learning automata and DNA sequence alignment.

    Directory of Open Access Journals (Sweden)

    Mansour Esmaeilpour

    Full Text Available CONTEXT: Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. OBJECTIVE: This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA and deoxyribonucleic acid (DNA sequences alignment. METHOD: The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. RESULTS: The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. CONCLUSION: The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns.

  18. Design pattern mining using distributed learning automata and DNA sequence alignment.

    Science.gov (United States)

    Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina

    2014-01-01

    Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns.

  19. Algorithm that mimics human perceptual grouping of dot patterns

    NARCIS (Netherlands)

    Papari, G.; Petkov, N.; Gregorio, MD; DiMaio,; Frucci, M; Musio, C

    2005-01-01

    We propose an algorithm that groups points similarly to how human observers do. It is simple, totally unsupervised and able to find clusters of complex and not necessarily convex shape. Groups are identified as the connected components of a Reduced Delaunay Graph (RDG) that we define in this paper.

  20. Sequential Pattern Mining of Electronic Healthcare Reimbursement Claims: Experiences and Challenges in Uncovering How Patients are Treated by Physicians

    Energy Technology Data Exchange (ETDEWEB)

    Pullum, Laura L [ORNL; Ramanathan, Arvind [ORNL; Hobson, Tanner C [ORNL

    2015-01-01

    We examine the use of electronic healthcare reimbursement claims (EHRC) for analyzing healthcare delivery and practice patterns across the United States (US). We show that EHRCs are correlated with disease incidence estimates published by the Centers for Disease Control. Further, by analyzing over 1 billion EHRCs, we track patterns of clinical procedures administered to patients with autism spectrum disorder (ASD), heart disease (HD) and breast cancer (BC) using sequential pattern mining algorithms. Our analyses reveal that in contrast to treating HD and BC, clinical procedures for ASD diagnoses are highly varied leading up to and after the ASD diagnoses. The discovered clinical procedure sequences also reveal significant differences in the overall costs incurred across different parts of the US, indicating a lack of consensus amongst practitioners in treating ASD patients. We show that a data-driven approach to understand clinical trajectories using EHRC can provide quantitative insights into how to better manage and treat patients. Based on our experience, we also discuss emerging challenges in using EHRC datasets for gaining insights into the state of contemporary healthcare delivery and practice in the US.

  1. GRAMI: Frequent subgraph and pattern mining in a single large graph

    KAUST Repository

    Elseidy, M.

    2014-01-01

    Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or proteinprotein interactions in bioinformatics, are modeled as a single large graph. In this paper we present GRAMI, a novel framework for frequent subgraph mining in a single large graph. GRAMI undertakes a novel approach that only finds the minimal set of instances to satisfy the frequency threshold and avoids the costly enumeration of all instances required by previous approaches. We accompany our approach with a heuristic and optimizations that significantly improve performance. Additionally, we present an extension of GRAMI that mines frequent patterns. Compared to subgraphs, patterns offer a more powerful version of matching that captures transitive interactions between graph nodes (like friend of a friend) which are very common in modern applications. Finally, we present CGRAMI, a version supporting structural and semantic constraints, and AGRAMI, an approximate version producing results with no false positives. Our experiments on real data demonstrate that our framework is up to 2 orders of magnitude faster and discovers more interesting patterns than existing approaches. 2014 VLDB Endowment.

  2. Analysis of gas migration patterns in fractured coal rocks under actual mining conditions

    Directory of Open Access Journals (Sweden)

    Gao Mingzhong

    2017-01-01

    Full Text Available Fracture fields in coal rocks are the main channels for gas seepage, migration, and extraction. The development, evolution, and spatial distribution of fractures in coal rocks directly affect the permeability of the coal rock as well as gas migration and flow. In this work, the Ji-15-14120 mining face at the No. 8 Coal Mine of Pingdingshan Tian’an Coal Mining Co. Ltd., Pingdingshan, China, was selected as the test site to develop a full-parameter fracture observation instrument and a dynamic fracture observation technique. The acquired video information of fractures in the walls of the boreholes was vectorized and converted to planarly expanded images on a computer-aided design platform. Based on the relative spatial distances between the openings of the boreholes, simultaneous planar images of isolated fractures in the walls of the boreholes along the mining direction were obtained from the boreholes located at various distances from the mining face. Using this information, a 3-D fracture network under mining conditions was established. The gas migration pattern was calculated using a COMSOL computation platform. The results showed that between 10 hours and 1 day the fracture network controlled the gas-flow, rather than the coal seam itself. After one day, the migration of gas was completely controlled by the fractures. The presence of fractures in the overlying rock enables the gas in coal seam to migrate more easily to the surrounding rocks or extraction tunnels situated relatively far away from the coal rock. These conclusions provide an important theoretical basis for gas extraction.

  3. Pattern-set generation algorithm for the one-dimensional multiple stock sizes cutting stock problem

    Science.gov (United States)

    Cui, Yaodong; Cui, Yi-Ping; Zhao, Zhigang

    2015-09-01

    A pattern-set generation algorithm (PSG) for the one-dimensional multiple stock sizes cutting stock problem (1DMSSCSP) is presented. The solution process contains two stages. In the first stage, the PSG solves the residual problems repeatedly to generate the patterns in the pattern set, where each residual problem is solved by the column-generation approach, and each pattern is generated by solving a single large object placement problem. In the second stage, the integer linear programming model of the 1DMSSCSP is solved using a commercial solver, where only the patterns in the pattern set are considered. The computational results of benchmark instances indicate that the PSG outperforms existing heuristic algorithms and rivals the exact algorithm in solution quality.

  4. In-depth motivic analysis based on multiparametric closed pattern and cyclic sequence mining

    DEFF Research Database (Denmark)

    Lartillot, Olivier

    2014-01-01

    presents a much simpler description and justification of this general strategy, as well as significant simplifications of the model, in particular concerning the management of pattern cyclicity. A new method for automated bundling of patterns belonging to same motivic or thematic classes is also presented....... The good performance of the method is shown through the analysis of a piece from the JKUPDD database. Ground-truth motives are detected, while additional relevant information completes the ground-truth musicological analysis. The system, implemented in Matlab, is made publicly available as part of Mining......Suite, a new open-source framework for audio and music analysis....

  5. GraMi: Generalized Frequent Pattern Mining in a Single Large Graph

    KAUST Repository

    Saeedy, Mohammed El

    2011-11-01

    Mining frequent subgraphs is an important operation on graphs. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs or protein-protein interaction in bioinformatics, are modeled as a single large graph. Interesting interactions in such applications may be transitive (e.g., friend of a friend). Existing methods, however, search for frequent isomorphic (i.e., exact match) subgraphs and cannot discover many useful patterns. In this paper the authors propose GRAMI, a framework that generalizes frequent subgraph mining in a large single graph. GRAMI discovers frequent patterns. A pattern is a graph where edges are generalized to distance-constrained paths. Depending on the definition of the distance function, many instantiations of the framework are possible. Both directed and undirected graphs, as well as multiple labels per vertex, are supported. The authors developed an efficient implementation of the framework that models the frequency resolution phase as a constraint satisfaction problem, in order to avoid the costly enumeration of all instances of each pattern in the graph. The authors also implemented CGRAMI, a version that supports structural and semantic constraints; and AGRAMI, an approximate version that supports very large graphs. The experiments on real data demonstrate that the authors framework is up to 3 orders of magnitude faster and discovers more interesting patterns than existing approaches.

  6. The use of antenna radiation pattern in node localisation algorithms for wireless sensor networks

    CSIR Research Space (South Africa)

    Mwila, MK

    2014-08-01

    Full Text Available due to the limited accuracy inherent to the current ranging model. These models, however, make the assumption that the antenna radiation pattern is omnidirectional targeted to simplifying the complexity of the algorithms. An increasing number of sensor...

  7. Self-karaoke patterns: an interactive audio-visual system for handsfree live algorithm performance

    OpenAIRE

    Eldridge, Alice

    2014-01-01

    Self-karaoke Patterns, is an audiovisual study for improvised cello and live algorithms. The work is motivated in part by addressing the practical needs of the performer in ‘handsfree’ live algorithm contexts and in part an aesthetic concern with resolving the tension between conceptual dedication to autonomous algorithms and musical dedication to coherent performance. The elected approach is inspired by recent work investing the role of ‘shape’ in musical performance.

  8. Development of pattern recognition algorithms for the central drift chamber of the Belle II detector

    Energy Technology Data Exchange (ETDEWEB)

    Trusov, Viktor

    2016-11-04

    In this thesis, the development of one of the pattern recognition algorithms for the Belle II experiment based on conformal and Legendre transformations is presented. In order to optimize the performance of the algorithm (CPU time and efficiency) specialized processing steps have been introduced. To show achieved results, Monte-Carlo based efficiency measurements of the tracking algorithms in the Central Drift Chamber (CDC) has been done.

  9. Quantifying Surface Coal-Mining Patterns to Promote Regional Sustainability in Ordos, Inner Mongolia

    Directory of Open Access Journals (Sweden)

    Xiaoji Zeng

    2018-04-01

    Full Text Available Ordos became the new “coal capital” of China within a few decades since the country’s economic reform in 1978, as large-scale surface coal mining dramatically propelled its per capita GDP from being one of the lowest to one of the highest in China, exceeding Hong Kong in 2009. Surface coal-mining areas (SCMAs have continued to expand in this region during recent decades, resulting in serious environmental and socioeconomic consequences. To understand these impacts and promote regional sustainability, quantifying the spatiotemporal patterns of SCMAs is urgently needed. Thus, the main objectives of this study were to quantify the spatiotemporal patterns of SCMAs in the Ordos region from 1990 to 2015, and to examine some of the major environmental and socioeconomic impacts in the study region. We extracted the SCMAs using remote-sensing data, and then quantified their spatiotemporal patterns using landscape metrics. The loss of natural habitat and several socioeconomic indicators were examined in relation to surface coal mining. Our results show that the area of SCMAs increased from 7.12 km2 to 355.95 km2, an increase of nearly 49 times from 1990 to 2015 in the Ordos region. The number of SCMAs in this region increased from 82 to 651, a nearly seven-fold increase. In particular, Zhungeer banner (an administrative division, Yijinhuoluo banner, Dongsheng District and Dalate banner in the north-eastern part of the Ordos region had higher growth rates of SCMAs. The income gap between urban and rural residents increased along with the growth in SCMAs, undermining social equity in the Ordos region. Moreover, the rapid increase in SCMAs resulted in natural habitat loss (including grasslands, forests, and deserts across this region. Thus, we suggest that regional sustainability in Ordos needs to emphasize effective measures to curb large-scale surface coal mining in order to reduce the urban–rural income gap, and to restore degraded natural

  10. A Boyer-Moore (or Watson-Watson) type algorithm for regular tree pattern matching

    NARCIS (Netherlands)

    Watson, B.W.; Aarts, E.H.L.; Eikelder, ten H.M.M.; Hemerik, C.; Rem, M.

    1995-01-01

    In this chapter, I outline a new algorithm for regular tree pattern matching. The existence of this algorithm was first mentioned in the statements accompanying my dissertation, [2]. In order to avoid repeating the material in my dissertation, it is assumed that the reader is familiar with Chapters

  11. Two related algorithms for root-to-frontier tree pattern matching

    NARCIS (Netherlands)

    Cleophas, L.G.W.A.; Hemerik, C.; Zwaan, G.

    2006-01-01

    Tree pattern matching (TPM) algorithms on ordered, ranked trees play an important role in applications such as compilers and term rewriting systems. Many TPM algorithms appearing in the literature are based on tree automata. For efficiency, these automata should be deterministic, yet deterministic

  12. An Interval Bound Algorithm of optimizing reactor core loading pattern by using reactivity interval schema

    International Nuclear Information System (INIS)

    Gong Zhaohu; Wang Kan; Yao Dong

    2011-01-01

    Highlights: → We present a new Loading Pattern Optimization method - Interval Bound Algorithm (IBA). → IBA directly uses the reactivity of fuel assemblies and burnable poison. → IBA can optimize fuel assembly orientation in a coupled way. → Numerical experiment shows that IBA outperforms genetic algorithm and engineers. → We devise DDWF technique to deal with multiple objectives and constraints. - Abstract: In order to optimize the core loading pattern in Nuclear Power Plants, the paper presents a new optimization method - Interval Bound Algorithm (IBA). Similar to the typical population based algorithms, e.g. genetic algorithm, IBA maintains a population of solutions and evolves them during the optimization process. IBA acquires the solution by statistical learning and sampling the control variable intervals of the population in each iteration. The control variables are the transforms of the reactivity of fuel assemblies or the worth of burnable poisons, which are the crucial heuristic information for loading pattern optimization problems. IBA can deal with the relationship between the dependent variables by defining the control variables. Based on the IBA algorithm, a parallel Loading Pattern Optimization code, named IBALPO, has been developed. To deal with multiple objectives and constraints, the Dynamic Discontinuous Weight Factors (DDWF) for the fitness function have been used in IBALPO. Finally, the code system has been used to solve a realistic reloading problem and a better pattern has been obtained compared with the ones searched by engineers and genetic algorithm, thus the performance of the code is proved.

  13. Mining Emerging Patterns for Recognizing Activities of Multiple Users in Pervasive Computing

    DEFF Research Database (Denmark)

    Gu, Tao; Wu, Zhanqing; Wang, Liang

    2009-01-01

    Understanding and recognizing human activities from sensor readings is an important task in pervasive computing. Existing work on activity recognition mainly focuses on recognizing activities for a single user in a smart home environment. However, in real life, there are often multiple inhabitants...... activity models, and propose an Emerging Pattern based Multi-user Activity Recognizer (epMAR) to recognize both single-user and multiuser activities. We conduct our empirical studies by collecting real-world activity traces done by two volunteers over a period of two weeks in a smart home environment...... sensor readings in a home environment, and propose a novel pattern mining approach to recognize both single-user and multi-user activities in a unified solution. We exploit Emerging Pattern – a type of knowledge pattern that describes significant changes between classes of data – for constructing our...

  14. Trace metal depositional patterns from an open pit mining activity as revealed by archived avian gizzard contents

    Energy Technology Data Exchange (ETDEWEB)

    Bendell, L.I., E-mail: bendell@sfu.ca

    2011-02-15

    Archived samples of blue grouse (Dendragapus obscurus) gizzard contents, inclusive of grit, collected yearly between 1959 and 1970 were analyzed for cadmium, lead, zinc, and copper content. Approximately halfway through the 12-year sampling period, an open-pit copper mine began activities, then ceased operations 2 years later. Thus the archived samples provided a unique opportunity to determine if avian gizzard contents, inclusive of grit, could reveal patterns in the anthropogenic deposition of trace metals associated with mining activities. Gizzard concentrations of cadmium and copper strongly coincided with the onset of opening and the closing of the pit mining activity. Gizzard zinc and lead demonstrated significant among year variation; however, maximum concentrations did not correlate to mining activity. The archived gizzard contents did provide a useful tool for documenting trends in metal depositional patterns related to an anthropogenic activity. Further, blue grouse ingesting grit particles during the time of active mining activity would have been exposed to toxicologically significant levels of cadmium. Gizzard lead concentrations were also of toxicological significance but not related to mining activity. This type of 'pulse' toxic metal exposure as a consequence of open-pit mining activity would not necessarily have been revealed through a 'snap-shot' of soil, plant or avian tissue trace metal analysis post-mining activity. - Research Highlights: {yields} Archived gizzard samples reveals mining history. {yields} Grit ingestion exposes grouse to cadmium and lead. {yields} Grit selection includes particles enriched in cadmium. {yields} Cadmium enriched particles are of toxicological significance.

  15. Reload pattern optimization by application of multiple cyclic interchange algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Geemert, R. van; Quist, A.J.; Hoogenboom, J.E. [Technische Univ. Delft (Netherlands)

    1996-09-01

    Reload pattern optimization procedures are proposed which are based on the multiple cyclic interchange approach, according to which the search for the reload pattern associated with the highest objective function value can be thought of as divided in multiple stages. The transition from the initial to the final stage is characterized by an increase in the degree of locality of the search procedure. The general idea is that, during the first stages, the `elite` cluster containing the group of best patterns must be located, after which the solution space is sampled in a more and more local sense to find the local optimum in this cluster. The transition(s) from global search behaviour to local search behaviour can be either prompt, by defining strictly separate search regimes, or gradual by introducing stochastic tests for the number of fuel bundles involved in a cyclic interchange. Equilibrium cycle optimization results are reported for a test PWR reactor core of modest size. (author)

  16. Reload pattern optimization by application of multiple cyclic interchange algorithms

    International Nuclear Information System (INIS)

    Geemert, R. van; Quist, A.J.; Hoogenboom, J.E.

    1996-01-01

    Reload pattern optimization procedures are proposed which are based on the multiple cyclic interchange approach, according to which the search for the reload pattern associated with the highest objective function value can be thought of as divided in multiple stages. The transition from the initial to the final stage is characterized by an increase in the degree of locality of the search procedure. The general idea is that, during the first stages, the 'elite' cluster containing the group of best patterns must be located, after which the solution space is sampled in a more and more local sense to find the local optimum in this cluster. The transition(s) from global search behaviour to local search behaviour can be either prompt, by defining strictly separate search regimes, or gradual by introducing stochastic tests for the number of fuel bundles involved in a cyclic interchange. Equilibrium cycle optimization results are reported for a test PWR reactor core of modest size. (author)

  17. Efficient Mining and Detection of Sequential Intrusion Patterns for Network Intrusion Detection Systems

    Science.gov (United States)

    Shyu, Mei-Ling; Huang, Zifang; Luo, Hongli

    In recent years, pervasive computing infrastructures have greatly improved the interaction between human and system. As we put more reliance on these computing infrastructures, we also face threats of network intrusion and/or any new forms of undesirable IT-based activities. Hence, network security has become an extremely important issue, which is closely connected with homeland security, business transactions, and people's daily life. Accurate and efficient intrusion detection technologies are required to safeguard the network systems and the critical information transmitted in the network systems. In this chapter, a novel network intrusion detection framework for mining and detecting sequential intrusion patterns is proposed. The proposed framework consists of a Collateral Representative Subspace Projection Modeling (C-RSPM) component for supervised classification, and an inter-transactional association rule mining method based on Layer Divided Modeling (LDM) for temporal pattern analysis. Experiments on the KDD99 data set and the traffic data set generated by a private LAN testbed show promising results with high detection rates, low processing time, and low false alarm rates in mining and detecting sequential intrusion detections.

  18. Algorithm Design for Grip-Pattern Verification in Smart Gun

    NARCIS (Netherlands)

    Shang, X.; Veldhuis, Raymond N.J.; Bazen, A.M.; Ganzevoort, W.P.T.

    2005-01-01

    The Secure Grip project1 focuses on the development of a hand-grip pattern recognition system, as part of the smart gun. Its target customer is the police. To explore the authentication performance of this system, we collected data from a group of police officers, and made authentication simulations

  19. Pattern Recognition of Signals for the Fault-Slip Type of Rock Burst in Coal Mines

    Directory of Open Access Journals (Sweden)

    X. S. Liu

    2015-01-01

    Full Text Available The fault-slip type of rock burst is a major threat to the safety of coal mining, and effectively recognizing its signals patterns is the foundation for the early warning and prevention. At first, a mechanical model of the fault-slip was established and the mechanism of the rock burst induced by the fault-slip was revealed. Then, the patterns of the electromagnetic radiation, acoustic emission (AE, and microseismic signals in the fault-slip type of rock burst were proposed, in that before the rock burst occurs, the electromagnetic radiation intensity near the sliding surface increases rapidly, the AE energy rises exponentially, and the energy released by microseismic events experiences at least one peak and is close to the next peak. At last, in situ investigations were performed at number 1412 coal face in the Huafeng Mine, China. Results showed that the signals patterns proposed are in good agreement with the process of the fault-slip type of rock burst. The pattern recognition can provide a basis for the early warning and the implementation of relief measures of the fault-slip type of rock burst.

  20. Zoning method for environmental engineering geological patterns in underground coal mining areas.

    Science.gov (United States)

    Liu, Shiliang; Li, Wenping; Wang, Qiqing

    2018-09-01

    Environmental engineering geological patterns (EEGPs) are used to express the trend and intensity of eco-geological environment caused by mining in underground coal mining areas, a complex process controlled by multiple factors. A new zoning method for EEGPs was developed based on the variable-weight theory (VWT), where the weights of factors vary with their value. The method was applied to the Yushenfu mining area, Shaanxi, China. First, the mechanism of the EEGPs caused by mining was elucidated, and four types of EEGPs were proposed. Subsequently, 13 key control factors were selected from mining conditions, lithosphere, hydrosphere, ecosphere, and climatic conditions; their thematic maps were constructed using ArcGIS software and remote-sensing technologies. Then, a stimulation-punishment variable-weight model derived from the partition of basic evaluation unit of study area, construction of partition state-variable-weight vector, and determination of variable-weight interval was built to calculate the variable weights of each factor. On this basis, a zoning mathematical model of EEGPs was established, and the zoning results were analyzed. For comparison, the traditional constant-weight theory (CWT) was also applied to divide the EEGPs. Finally, the zoning results obtained using VWT and CWT were compared. The verification of field investigation indicates that VWT is more accurate and reliable than CWT. The zoning results are consistent with the actual situations and the key of planning design for the rational development of coal resources and protection of eco-geological environment. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. Analysis of the electrical disturbances in CERN power distribution network with pattern mining methods

    CERN Document Server

    Abramenko, Oleksii

    2017-01-01

    The current research focuses on the perturbations within the electrical network of the LHC and its subsystems by analyzing measurements collected from oscilloscopes installed across different CERN sites, and alarms by electrical equipments. We analyze amplitude and duration of the glitches and, together with other relevant variables, correlate them with beam stopping events. The work also tries to identify assets affected by such perturbations using data mining and, in particular, frequent pattern mining methods. On the practical side we summarize results of our work by putting forward a prototype of a software tool enabling online monitoring of the alarms coming from the electrical network and facilitating glitch detection and analysis by a technical operator.

  2. An AUTONOMOUS STAR IDENTIFICATION ALGORITHM BASED ON THE DIRECTED CIRCULARITY PATTERN

    Directory of Open Access Journals (Sweden)

    J. Xie

    2012-07-01

    Full Text Available The accuracy of the angular distance may decrease due to lots of factors, such as the parameters of the stellar camera aren't calibrated on-orbit, or the location accuracy of the star image points is low, and so on, which can cause the low success rates of star identification. A robust directed circularity pattern algorithm is proposed in this paper, which is developed on basis of the matching probability algorithm. The improved algorithm retains the matching probability strategy to identify master star, and constructs a directed circularity pattern with the adjacent stars for unitary matching. The candidate matching group which has the longest chain will be selected as the final result. Simulation experiments indicate that the improved algorithm has high successful identification and reliability etc, compared with the original algorithm. The experiments with real data are used to verify it.

  3. Structural optimization of a motorcycle chassis by pattern search algorithm

    Science.gov (United States)

    Scappaticci, Lorenzo; Bartolini, Nicola; Guglielmino, Eugenio; Risitano, Giacomo

    2017-08-01

    Changes to the technical regulations of the motorcycle racing world classes introduced the new Moto2 category. The vehicles are prototypes that use single-brand tyres and engines derived from series production, supplied by a single manufacturer. The stability and handling of the vehicle are highly dependent on the geometric properties of the chassis. The performance of a racing motorcycle chassis can be primarily evaluated in terms of weight and stiffness. The aim of this work is to maximize the performance of a tubular frame designed for a motorcycle racing in the Moto2 category. The goal is the implementation of an optimization algorithm that acts on the dimensions of the single pipes of the frame and involves the design of an objective function to minimize the weight of the frame by controlling its stiffnesses.

  4. Engineering Algorithms for Finding Patterns in Biological Data

    DEFF Research Database (Denmark)

    Nielsen, Jesper

    2011-01-01

    similarity scores. Association mapping is a technique based on using large amounts of data on Single Nucleotide Polymorphisms (SNPs) to statistically infer associations between segments of DNA and effects in the host. Within the area of association mapping we develop an efficient file format and software...... library, called SNPFile. The file format is able to store both large amounts of SNP data and associated metadata, such as ids and affected-status of samples. Thus the file format can both speed-up SNP data access and simplify data management significantly. On the topic of molecular biological data, we...... analyze data from an experiment on exosome knockout. The exosome is a complex with a role in RNA degradation. We find that knockout of the exosome stabilize hitherto unknown RNA transcripts upstream active transcription start sites. With respect to Hidden Markov Models we develop two fast algorithms. We...

  5. Trace metal depositional patterns from an open pit mining activity as revealed by archived avian gizzard contents.

    Science.gov (United States)

    Bendell, L I

    2011-02-15

    Archived samples of blue grouse (Dendragapus obscurus) gizzard contents, inclusive of grit, collected yearly between 1959 and 1970 were analyzed for cadmium, lead, zinc, and copper content. Approximately halfway through the 12-year sampling period, an open-pit copper mine began activities, then ceased operations 2 years later. Thus the archived samples provided a unique opportunity to determine if avian gizzard contents, inclusive of grit, could reveal patterns in the anthropogenic deposition of trace metals associated with mining activities. Gizzard concentrations of cadmium and copper strongly coincided with the onset of opening and the closing of the pit mining activity. Gizzard zinc and lead demonstrated significant among year variation; however, maximum concentrations did not correlate to mining activity. The archived gizzard contents did provide a useful tool for documenting trends in metal depositional patterns related to an anthropogenic activity. Further, blue grouse ingesting grit particles during the time of active mining activity would have been exposed to toxicologically significant levels of cadmium. Gizzard lead concentrations were also of toxicological significance but not related to mining activity. This type of "pulse" toxic metal exposure as a consequence of open-pit mining activity would not necessarily have been revealed through a "snap-shot" of soil, plant or avian tissue trace metal analysis post-mining activity. Copyright © 2010 Elsevier B.V. All rights reserved.

  6. Integrating Entropy and Closed Frequent Pattern Mining for Social Network Modelling and Analysis

    Science.gov (United States)

    Adnan, Muhaimenul; Alhajj, Reda; Rokne, Jon

    The recent increase in the explicitly available social networks has attracted the attention of the research community to investigate how it would be possible to benefit from such a powerful model in producing effective solutions for problems in other domains where the social network is implicit; we argue that social networks do exist around us but the key issue is how to realize and analyze them. This chapter presents a novel approach for constructing a social network model by an integrated framework that first preparing the data to be analyzed and then applies entropy and frequent closed patterns mining for network construction. For a given problem, we first prepare the data by identifying items and transactions, which arc the basic ingredients for frequent closed patterns mining. Items arc main objects in the problem and a transaction is a set of items that could exist together at one time (e.g., items purchased in one visit to the supermarket). Transactions could be analyzed to discover frequent closed patterns using any of the well-known techniques. Frequent closed patterns have the advantage that they successfully grab the inherent information content of the dataset and is applicable to a broader set of domains. Entropies of the frequent closed patterns arc used to keep the dimensionality of the feature vectors to a reasonable size; it is a kind of feature reduction process. Finally, we analyze the dynamic behavior of the constructed social network. Experiments were conducted on a synthetic dataset and on the Enron corpus email dataset. The results presented in the chapter show that social networks extracted from a feature set as frequent closed patterns successfully carry the community structure information. Moreover, for the Enron email dataset, we present an analysis to dynamically indicate the deviations from each user's individual and community profile. These indications of deviations can be very useful to identify unusual events.

  7. Rating Algorithm for Pronunciation of English Based on Audio Feature Pattern Matching

    Directory of Open Access Journals (Sweden)

    Li Kun

    2015-01-01

    Full Text Available With the increasing internationalization of China, language communication has become an important channel for us to adapt to the political and economic environment. How to improve English learners’ language learning efficiency in limited conditions has turned into a problem demanding prompt solution at present. This paper applies two pronunciation patterns according to the actual needs of English pronunciation rating: to-be-evaluated pronunciation pattern and standard pronunciation pattern. It will translate the patterns into English pronunciation rating results through European distance. Besides, this paper will introduce the design philosophy of the whole algorithm in combination with CHMM matching pattern. Each link of the CHMM pattern will be given selective analysis while a contrast experiment between the CHMM matching pattern and the other two patterns will be conducted. From the experiment results, it can be concluded that CHMM pattern is the best option.

  8. Brand Switching Pattern Discovery by Data Mining Techniques for the Telecommunication Industry in Australia

    Directory of Open Access Journals (Sweden)

    Md Zahidul Islam

    2016-11-01

    Full Text Available There is more than one mobile-phone subscription per member of the Australian population. The number of complaints against the mobile-phone-service providers is also high. Therefore, the mobile service providers are facing a huge challenge in retaining their customers. There are a number of existing models to analyse customer behaviour and switching patterns. A number of switching models may also exist within a large market. These models are often not useful due to the heterogeneous nature of the market. Therefore, in this study we use data mining techniques to let the data talk to help us discover switching patterns without requiring us to use any models and domain knowledge. We use a variety of decision tree and decision forest techniques on a real mobile-phone-usage dataset in order to demonstrate the effectiveness of data mining techniques in knowledge discovery. We report many interesting patterns, and discuss them from a brand-switching and marketing perspective, through which they are found to be very sensible and interesting.

  9. Walking pattern classification and walking distance estimation algorithms using gait phase information.

    Science.gov (United States)

    Wang, Jeen-Shing; Lin, Che-Wei; Yang, Ya-Ting C; Ho, Yu-Jen

    2012-10-01

    This paper presents a walking pattern classification and a walking distance estimation algorithm using gait phase information. A gait phase information retrieval algorithm was developed to analyze the duration of the phases in a gait cycle (i.e., stance, push-off, swing, and heel-strike phases). Based on the gait phase information, a decision tree based on the relations between gait phases was constructed for classifying three different walking patterns (level walking, walking upstairs, and walking downstairs). Gait phase information was also used for developing a walking distance estimation algorithm. The walking distance estimation algorithm consists of the processes of step count and step length estimation. The proposed walking pattern classification and walking distance estimation algorithm have been validated by a series of experiments. The accuracy of the proposed walking pattern classification was 98.87%, 95.45%, and 95.00% for level walking, walking upstairs, and walking downstairs, respectively. The accuracy of the proposed walking distance estimation algorithm was 96.42% over a walking distance.

  10. REGULAR PATTERN MINING (WITH JITTER ON WEIGHTED-DIRECTED DYNAMIC GRAPHS

    Directory of Open Access Journals (Sweden)

    A. GUPTA

    2017-02-01

    Full Text Available Real world graphs are mostly dynamic in nature, exhibiting time-varying behaviour in structure of the graph, weight on the edges and direction of the edges. Mining regular patterns in the occurrence of edge parameters gives an insight into the consumer trends over time in ecommerce co-purchasing networks. But such patterns need not necessarily be precise as in the case when some product goes out of stock or a group of customers becomes unavailable for a short period of time. Ignoring them may lead to loss of useful information and thus taking jitter into account becomes vital. To the best of our knowledge, no work has been yet reported to extract regular patterns considering a jitter of length greater than unity. In this article, we propose a novel method to find quasi regular patterns on weight and direction sequences of such graphs. The method involves analysing the dynamic network considering the inconsistencies in the occurrence of edges. It utilizes the relation between the occurrence sequence and the corresponding weight and direction sequences to speed up this process. Further, these patterns are used to determine the most central nodes (such as the most profit yielding products. To accomplish this we introduce the concept of dynamic closeness centrality and dynamic betweenness centrality. Experiments on Enron e-mail dataset and a synthetic dynamic network show that the presented approach is efficient, so it can be used to find patterns in large scale networks consisting of many timestamps.

  11. Classifying unstructed textual data using the Product Score Model: an alternative text mining algorithm

    NARCIS (Netherlands)

    He, Qiwei; Veldkamp, Bernard P.; Eggen, T.J.H.M.; Veldkamp, B.P.

    2012-01-01

    Unstructured textual data such as students’ essays and life narratives can provide helpful information in educational and psychological measurement, but often contain irregularities and ambiguities, which creates difficulties in analysis. Text mining techniques that seek to extract useful

  12. Cryptosystem Based On Finger Vein Patterns Using Vas Algorithm

    Directory of Open Access Journals (Sweden)

    G.Kanimozhi

    2015-08-01

    Full Text Available Cryptosystems based on biometrics authentication is developing areas in the field of modernize security schemes. Elastic distortion of fingerprints is one of the major causes for false non-match. While this problem affects all fingerprint identification function it is especially dangerous in opposite identification function such as note list and reduplication function. In such function malicious possessors may purposely distort their fingerprints to evade identification. Distortion rectification or equivalently distortion field estimation is viewed as a regression problem where the input is a distorted fingerprint and the output is the distortion field. The current document deals with the application of finger veins pattern as an approach for possessor confirmation and encryption key generation. The design of the optical imprison scheme by near infrared is described. We propose a step for the location of the vein crossing points and the quantification of the angles between the vein-branches this information is used to generate a personal key that allows the possessor to encrypt information after the confirmation is approved. In order to demonstrate the potential of the suggested approach and model of figure encryption is developed. All action biometric imprison figure presetting key generation and figure encryption are performed on the identical hidden platform adding an important portability and diminishing the execution time.

  13. Grade Distribution Modeling within the Bauxite Seams of the Wachangping Mine, China, Using a Multi-Step Interpolation Algorithm

    Directory of Open Access Journals (Sweden)

    Shaofeng Wang

    2017-05-01

    Full Text Available Mineral reserve estimation and mining design depend on a precise modeling of the mineralized deposit. A multi-step interpolation algorithm, including 1D biharmonic spline estimator for interpolating floor altitudes, 2D nearest neighbor, linear, natural neighbor, cubic, biharmonic spline, inverse distance weighted, simple kriging, and ordinary kriging interpolations for grade distribution on the two vertical sections at roadways, and 3D linear interpolation for grade distribution between sections, was proposed to build a 3D grade distribution model of the mineralized seam in a longwall mining panel with a U-shaped layout having two roadways at both sides. Compared to field data from exploratory boreholes, this multi-step interpolation using a natural neighbor method shows an optimal stability and a minimal difference between interpolation and field data. Using this method, the 97,576 m3 of bauxite, in which the mass fraction of Al2O3 (Wa and the mass ratio of Al2O3 to SiO2 (Wa/s are 61.68% and 27.72, respectively, was delimited from the 189,260 m3 mineralized deposit in the 1102 longwall mining panel in the Wachangping mine, Southwest China. The mean absolute errors, the root mean squared errors and the relative standard deviations of errors between interpolated data and exploratory grade data at six boreholes are 2.544, 2.674, and 32.37% of Wa; and 1.761, 1.974, and 67.37% of Wa/s, respectively. The proposed method can be used for characterizing the grade distribution in a mineralized seam between two roadways at both sides of a longwall mining panel.

  14. A DATA-MINING BASED METHOD FOR THE GAIT PATTERN ANALYSIS

    Directory of Open Access Journals (Sweden)

    Marcelo Rudek

    2015-12-01

    Full Text Available The paper presents a method developed for the gait classification based on the analysis of the trajectory of the pressure centres (CoP extracted from the contact points of the feet with the ground during walking. The data acquirement is performed ba means of a walkway with embedded tactile sensors. The proposed method includes capturing procedures, standardization of data, creation of an organized repository (data warehouse, and development of a process mining. A graphical analysis is applied to looking at the footprint signature patterns. The aim is to obtain a visual interpretation of the grouping by situating it into the normal walking patterns or deviations associated with an individual way of walking. The method consists of data classification automation which divides them into healthy and non-healthy subjects in order to assist in rehabilitation treatments for the people with related mobility problems.

  15. Mining Research on Vibration Signal Association Rules of Quayside Container Crane Hoisting Motor Based on Apriori Algorithm

    Science.gov (United States)

    Yang, Chencheng; Tang, Gang; Hu, Xiong

    2017-07-01

    Shore-hoisting motor in the daily work will produce a large number of vibration signal data,in order to analyze the correlation among the data and discover the fault and potential safety hazard of the motor, the data are discretized first, and then Apriori algorithm are used to mine the strong association rules among the data. The results show that the relationship between day 1 and day 16 is the most closely related, which can guide the staff to analyze the work of these two days of motor to find and solve the problem of fault and safety.

  16. An Incremental Classification Algorithm for Mining Data with Feature Space Heterogeneity

    Directory of Open Access Journals (Sweden)

    Yu Wang

    2014-01-01

    Full Text Available Feature space heterogeneity often exists in many real world data sets so that some features are of different importance for classification over different subsets. Moreover, the pattern of feature space heterogeneity might dynamically change over time as more and more data are accumulated. In this paper, we develop an incremental classification algorithm, Supervised Clustering for Classification with Feature Space Heterogeneity (SCCFSH, to address this problem. In our approach, supervised clustering is implemented to obtain a number of clusters such that samples in each cluster are from the same class. After the removal of outliers, relevance of features in each cluster is calculated based on their variations in this cluster. The feature relevance is incorporated into distance calculation for classification. The main advantage of SCCFSH lies in the fact that it is capable of solving a classification problem with feature space heterogeneity in an incremental way, which is favorable for online classification tasks with continuously changing data. Experimental results on a series of data sets and application to a database marketing problem show the efficiency and effectiveness of the proposed approach.

  17. Next Place Prediction Based on Spatiotemporal Pattern Mining of Mobile Device Logs

    Directory of Open Access Journals (Sweden)

    Sungjun Lee

    2016-01-01

    Full Text Available Due to the recent explosive growth of location-aware services based on mobile devices, predicting the next places of a user is of increasing importance to enable proactive information services. In this paper, we introduce a data-driven framework that aims to predict the user’s next places using his/her past visiting patterns analyzed from mobile device logs. Specifically, the notion of the spatiotemporal-periodic (STP pattern is proposed to capture the visits with spatiotemporal periodicity by focusing on a detail level of location for each individual. Subsequently, we present algorithms that extract the STP patterns from a user’s past visiting behaviors and predict the next places based on the patterns. The experiment results obtained by using a real-world dataset show that the proposed methods are more effective in predicting the user’s next places than the previous approaches considered in most cases.

  18. Next Place Prediction Based on Spatiotemporal Pattern Mining of Mobile Device Logs.

    Science.gov (United States)

    Lee, Sungjun; Lim, Junseok; Park, Jonghun; Kim, Kwanho

    2016-01-23

    Due to the recent explosive growth of location-aware services based on mobile devices, predicting the next places of a user is of increasing importance to enable proactive information services. In this paper, we introduce a data-driven framework that aims to predict the user's next places using his/her past visiting patterns analyzed from mobile device logs. Specifically, the notion of the spatiotemporal-periodic (STP) pattern is proposed to capture the visits with spatiotemporal periodicity by focusing on a detail level of location for each individual. Subsequently, we present algorithms that extract the STP patterns from a user's past visiting behaviors and predict the next places based on the patterns. The experiment results obtained by using a real-world dataset show that the proposed methods are more effective in predicting the user's next places than the previous approaches considered in most cases.

  19. Algorithmic Information Dynamics of Persistent Patterns and Colliding Particles in the Game of Life

    KAUST Repository

    Zenil, Hector

    2018-02-18

    We demonstrate the way to apply and exploit the concept of \\\\textit{algorithmic information dynamics} in the characterization and classification of dynamic and persistent patterns, motifs and colliding particles in, without loss of generalization, Conway\\'s Game of Life (GoL) cellular automaton as a case study. We analyze the distribution of prevailing motifs that occur in GoL from the perspective of algorithmic probability. We demonstrate how the tools introduced are an alternative to computable measures such as entropy and compression algorithms which are often nonsensitive to small changes and features of non-statistical nature in the study of evolving complex systems and their emergent structures.

  20. Genetic algorithm for the optimization of the loading pattern for reactor core fuel management

    International Nuclear Information System (INIS)

    Zhou Sheng; Hu Yongming; zheng Wenxiang

    2000-01-01

    The paper discusses the application of a genetic algorithm to the optimization of the loading pattern for in-core fuel management with the NP characteristics. The algorithm develops a matrix model for the fuel assembly loading pattern. The burnable poisons matrix was assigned randomly considering the distributed nature of the poisons. A method based on the traveling salesman problem was used to solve the problem. A integrated code for in-core fuel management was formed by combining this code with a reactor physics code

  1. Below a Historic Mercury Mine: Non-linear Patterns of Mercury Bioaccumulation in Aquatic Organisms

    Science.gov (United States)

    Haas, J.; Ichikawa, G.; Ode, P.; Salsbery, D.; Abel, J.

    2001-12-01

    Unlike most heavy metals, mercury is capable of bioaccumulating in aquatic food-chains, primarily because it is methylated by bacteria in sediment to the more toxic methylmercury form. Mercury concentrations in a number of riparian systems in California are highly elevated as a result of historic mining activities. These activities included both the mining of cinnabar in the coastal ranges to recover elemental mercury and the use of elemental mercury in the gold fields of the Sierra Nevada Mountains. The most productive mercury mining area was the New Almaden District, now a county park, located in the Guadalupe River drainage of Santa Clara County, where cinnabar was mined and retorted for over 100 years. As a consequence, riparian systems in several subwatersheds of the Guadalupe River drainage are contaminated with total mercury concentrations that exceed state hazardous waste criteria. Mercury concentrations in fish tissue frequently exceed human health guidelines. However, the potential ecological effects of these elevated mercury concentrations have not been thoroughly evaluated. One difficulty is in extrapolating sediment concentrations to fish tissue concentrations without accounting for physical and biological processes that determine bioaccumulation patterns. Many processes, such as methylation and demethylation of mercury by bacteria, assimilation efficiency in invertebrates, and metabolic rates in fish, are nonlinear, a factor that often confounds attempts to evaluate the effects of mercury contamination on aquatic food webs. Sediment, benthic macroinvertebrate, and fish tissue samples were collected in 1998 from the Guadalupe River drainage in Santa Clara County at 13 sites upstream and downstream from the historic mining district. Sediment and macroinvertebrate samples were analyzed for total mercury and methylmercury. Fish samples were analyzed for total mercury as whole bodies, composited by species and size. While linear correlations of sediment

  2. Comparison of optimization of loading patterns on the basis of SA and PMA algorithms

    International Nuclear Information System (INIS)

    Beliczai, Botond

    2007-01-01

    Optimization of loading patterns is a very important task from economical point of view in a nuclear power plant. The optimization algorithms used for this purpose can be categorized basically into two categories: deterministic ones and stochastic ones. In the Paks nuclear power plant a deterministic optimization procedure is used to optimize the loading pattern at BOC, so that the core would have maximal reactivity reserve. To the group of stochastic optimization procedures belong mainly simulated annealing (SA) procedures and genetic algorithms (GA). There are new procedures as well, which try to combine the advantages of SAs and GAs. One of them is called population mutation annealing algorithm (PMA). In the Paks NPP we would like to introduce fuel assemblies including burnable poison (Gd) in the near future. In order to be able to find the optimal loading pattern (or near-optimal loading patterns) in that case, we have to optimize our core not only for objective functions defined at BOC, but at EOC as well. For this purpose I used stochastic algorithms (SA and PMA) to investigate loading pattern optimization results for different objective functions at BOC. (author)

  3. Exploiting Sequential Patterns Found in Users' Solutions and Virtual Tutor Behavior to Improve Assistance in ITS

    Science.gov (United States)

    Fournier-Viger, Philippe; Faghihi, Usef; Nkambou, Roger; Nguifo, Engelbert Mephu

    2010-01-01

    We propose to mine temporal patterns in Intelligent Tutoring Systems (ITSs) to uncover useful knowledge that can enhance their ability to provide assistance. To discover patterns, we suggest using a custom, sequential pattern-mining algorithm. Two ways of applying the algorithm to enhance an ITS's capabilities are addressed. The first is to…

  4. An IPSO-SVM algorithm for security state prediction of mine production logistics system

    Science.gov (United States)

    Zhang, Yanliang; Lei, Junhui; Ma, Qiuli; Chen, Xin; Bi, Runfang

    2017-06-01

    A theoretical basis for the regulation of corporate security warning and resources was provided in order to reveal the laws behind the security state in mine production logistics. Considering complex mine production logistics system and the variable is difficult to acquire, a superior security status predicting model of mine production logistics system based on the improved particle swarm optimization and support vector machine (IPSO-SVM) is proposed in this paper. Firstly, through the linear adjustments of inertia weight and learning weights, the convergence speed and search accuracy are enhanced with the aim to deal with situations associated with the changeable complexity and the data acquisition difficulty. The improved particle swarm optimization (IPSO) is then introduced to resolve the problem of parameter settings in traditional support vector machines (SVM). At the same time, security status index system is built to determine the classification standards of safety status. The feasibility and effectiveness of this method is finally verified using the experimental results.

  5. Data Mining Learning Models and Algorithms on a Scada System Data Repository

    Directory of Open Access Journals (Sweden)

    Mircea Rîşteiu

    2010-06-01

    Full Text Available This paper presents three data mining techniques applied
    on a SCADA system data repository: Naijve Bayes, k-Nearest Neighbor and Decision Trees. A conclusion that k-Nearest Neighbor is a suitable method to classify the large amount of data considered is made finally according to the mining result and its reasonable explanation. The experiments are built on the training data set and evaluated using the new test set with machine learning tool WEKA.

  6. Algorithms

    Indian Academy of Sciences (India)

    polynomial) division have been found in Vedic Mathematics which are dated much before Euclid's algorithm. A programming language Is used to describe an algorithm for execution on a computer. An algorithm expressed using a programming.

  7. Comparison of phase unwrapping algorithms for topography reconstruction based on digital speckle pattern interferometry

    Science.gov (United States)

    Li, Yuanbo; Cui, Xiaoqian; Wang, Hongbei; Zhao, Mengge; Ding, Hongbin

    2017-10-01

    Digital speckle pattern interferometry (DSPI) can diagnose the topography evolution in real-time, continuous and non-destructive, and has been considered as a most promising technique for Plasma-Facing Components (PFCs) topography diagnostic under the complicated environment of tokamak. It is important for the study of digital speckle pattern interferometry to enhance speckle patterns and obtain the real topography of the ablated crater. In this paper, two kinds of numerical model based on flood-fill algorithm has been developed to obtain the real profile by unwrapping from the wrapped phase in speckle interference pattern, which can be calculated through four intensity images by means of 4-step phase-shifting technique. During the process of phase unwrapping by means of flood-fill algorithm, since the existence of noise pollution, and other inevitable factors will lead to poor quality of the reconstruction results, this will have an impact on the authenticity of the restored topography. The calculation of the quality parameters was introduced to obtain the quality-map from the wrapped phase map, this work presents two different methods to calculate the quality parameters. Then quality parameters are used to guide the path of flood-fill algorithm, and the pixels with good quality parameters are given priority calculation, so that the quality of speckle interference pattern reconstruction results are improved. According to the comparison between the flood-fill algorithm which is suitable for speckle pattern interferometry and the quality-guided flood-fill algorithm (with two different calculation approaches), the errors which caused by noise pollution and the discontinuous of the strips were successfully reduced.

  8. A multi-pattern hash-binary hybrid algorithm for URL matching in the HTTP protocol.

    Directory of Open Access Journals (Sweden)

    Ping Zeng

    Full Text Available In this paper, based on our previous multi-pattern uniform resource locator (URL binary-matching algorithm called HEM, we propose an improved multi-pattern matching algorithm called MH that is based on hash tables and binary tables. The MH algorithm can be applied to the fields of network security, data analysis, load balancing, cloud robotic communications, and so on-all of which require string matching from a fixed starting position. Our approach effectively solves the performance problems of the classical multi-pattern matching algorithms. This paper explores ways to improve string matching performance under the HTTP protocol by using a hash method combined with a binary method that transforms the symbol-space matching problem into a digital-space numerical-size comparison and hashing problem. The MH approach has a fast matching speed, requires little memory, performs better than both the classical algorithms and HEM for matching fields in an HTTP stream, and it has great promise for use in real-world applications.

  9. Climatic zonation and land suitability determination for saffron in Khorasan-Razavi province using data mining algorithms

    Directory of Open Access Journals (Sweden)

    mehdi Bashiri

    2017-12-01

    Full Text Available Yield prediction for agricultural crops plays an important role in export-import planning, purchase guarantees, pricing, secure profits and increasing in agricultural productivity. Crop yield is affected by several parameters especially climate. In this study, the saffron yield in the Khorasan-Razavi province was evaluated by different classification algorithms including artificial neural networks, regression models, local linear trees, decision trees, discriminant analysis, random forest, support vector machine and nearest neighbor analysis. These algorithms analyzed data for 20 years (1989-2009 including 11 climatological parameters. The results showed that a few numbers of climatological parameters affect the saffron yield. The minimum, mean and maximum of temperature, had the highest positive correlations and the relative humidity of 6.5h, sunny hours, relative humidity of 18.5h, evaporation, relative humidity of 12.5h and absolute humidity had the highest negative correlations with saffron cultivation areas, respectively. In addition, in classification of saffron cultivation areas, the discriminant analysis and support vector machine had higher accuracies. The correlation between saffron cultivation area and saffron yield values was relatively high (r=0.38. The nearest neighbor analysis had the best prediction accuracy for classification of cultivation areas. For this algorithm the coefficients of determination were 1 and 0.944 for training and testing stages, respectively. However, the algorithms accuracy for prediction of crop yield from climatological parameters was low (the average coefficients of determination equal to 0.48 and 0.05 for training and testing stages. The best algorithm i.e. nearest neighbor analysis had coefficients of determination equal to 1 and 0.177 for saffron yield prediction. Results showed that, using climatological parameters and data mining algorithms can classify cultivation areas. By this way it is possible

  10. Length-Bounded Hybrid CPU/GPU Pattern Matching Algorithm for Deep Packet Inspection

    Directory of Open Access Journals (Sweden)

    Yi-Shan Lin

    2017-01-01

    Full Text Available Since frequent communication between applications takes place in high speed networks, deep packet inspection (DPI plays an important role in the network application awareness. The signature-based network intrusion detection system (NIDS contains a DPI technique that examines the incoming packet payloads by employing a pattern matching algorithm that dominates the overall inspection performance. Existing studies focused on implementing efficient pattern matching algorithms by parallel programming on software platforms because of the advantages of lower cost and higher scalability. Either the central processing unit (CPU or the graphic processing unit (GPU were involved. Our studies focused on designing a pattern matching algorithm based on the cooperation between both CPU and GPU. In this paper, we present an enhanced design for our previous work, a length-bounded hybrid CPU/GPU pattern matching algorithm (LHPMA. In the preliminary experiment, the performance and comparison with the previous work are displayed, and the experimental results show that the LHPMA can achieve not only effective CPU/GPU cooperation but also higher throughput than the previous method.

  11. Multiwavelength Absolute Phase Retrieval from Noisy Diffractive Patterns: Wavelength Multiplexing Algorithm

    Directory of Open Access Journals (Sweden)

    Vladimir Katkovnik

    2018-05-01

    Full Text Available We study the problem of multiwavelength absolute phase retrieval from noisy diffraction patterns. The system is lensless with multiwavelength coherent input light beams and random phase masks applied for wavefront modulation. The light beams are formed by light sources radiating all wavelengths simultaneously. A sensor equipped by a Color Filter Array (CFA is used for spectral measurement registration. The developed algorithm targeted on optimal phase retrieval from noisy observations is based on maximum likelihood technique. The algorithm is specified for Poissonian and Gaussian noise distributions. One of the key elements of the algorithm is an original sparse modeling of the multiwavelength complex-valued wavefronts based on the complex-domain block-matching 3D filtering. Presented numerical experiments are restricted to noisy Poissonian observations. They demonstrate that the developed algorithm leads to effective solutions explicitly using the sparsity for noise suppression and enabling accurate reconstruction of absolute phase of high-dynamic range.

  12. Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques

    DEFF Research Database (Denmark)

    Ozer, Mert; Keles, Ilkcan; Toroslu, Hakki

    2016-01-01

    In recent years, using cell phone log data to model human mobility patterns became an active research area. This problem is a challenging data mining problem due to huge size and non-uniformity of the log data, which introduces several granularity levels for the specification of temporal...... and spatial dimensions. This paper focuses on the prediction of the location of the next activity of the mobile phone users. There are several versions of this problem. In this work, we have concentrated on the following three problems: predicting the location and the time of the next user activity...... the success of these methods with real data obtained from one of the largest mobile phone operators in Turkey. Our results are very encouraging, since we were able to obtain quite high accuracy results under small prediction sets....

  13. Synthesis of Steered Flat-top Beam Pattern Using Evolutionary Algorithm

    Directory of Open Access Journals (Sweden)

    D. Mandal

    2016-12-01

    Full Text Available In this paper a pattern synthesis method based on Evolutionary Algorithm is presented. A Flat-top beam pattern has been generated from a concentric ring array of isotropic elements by finding out the optimum set of elements amplitudes and phases using Differential Evolution algorithm. The said pattern is generated in three predefined azimuth planes instate of a single phi plane and also verified for a range of azimuth plane for the same optimum excitations. The main beam is steered to an elevation angle of 30 degree with lower peak SLL and ripple. Dynamic range ratio (DRR is also being improved by eliminating the weakly excited array elements, which simplify the design complexity of feed networks.

  14. Performance evaluation of Genetic Algorithms on loading pattern optimization of PWRs

    International Nuclear Information System (INIS)

    Tombakoglu, M.; Bekar, K.B.; Erdemli, A.O.

    2001-01-01

    Genetic Algorithm (GA) based systems are used for search and optimization problems. There are several applications of GAs in literature successfully applied for loading pattern optimization problems. In this study, we have selected loading pattern optimization problem of Pressurised Water Reactor (PWR). The main objective of this work is to evaluate the performance of Genetic Algorithm operators such as regional crossover, crossover and mutation, and selection and construction of initial population and its size for PWR loading pattern optimization problems. The performance of GA with antithetic variates is compared to traditional GA. Antithetic variates are used to generate the initial population and its use with GA operators are also discussed. Finally, the results of multi-cycle optimization problems are discussed for objective function taking into account cycle burn-up and discharge burn-up.(author)

  15. An Efficient System Based On Closed Sequential Patterns for Web Recommendations

    OpenAIRE

    Utpala Niranjan; R.B.V. Subramanyam; V-Khana

    2010-01-01

    Sequential pattern mining, since its introduction has received considerable attention among the researchers with broad applications. The sequential pattern algorithms generally face problems when mining long sequential patterns or while using very low support threshold. One possible solution of such problems is by mining the closed sequential patterns, which is a condensed representation of sequential patterns. Recently, several researchers have utilized the sequential pattern discovery for d...

  16. LPaMI: A Graph-Based Lifestyle Pattern Mining Application Using Personal Image Collections in Smartphones

    Directory of Open Access Journals (Sweden)

    Kifayat Ullah Khan

    2017-11-01

    Full Text Available Normally, individuals use smartphones for a variety of purposes like photography, schedule planning, playing games, and so on, apart from benefiting from the core tasks of call-making and short messaging. These services are sources of personal data generation. Therefore, any application that utilises personal data of a user from his/her smartphone is truly a great witness of his/her interests and this information can be used for various personalised services. In this paper, we present Lifestyle Pattern MIning (LPaMI, which is a personalised application for mining the lifestyle patterns of a smartphone user. LPaMI uses the personal photograph collections of a user, which reflect the day-to-day photos taken by a smartphone, to recognise scenes (called objects of interest in our work. These are then mined to discover lifestyle patterns. The uniqueness of LPaMI lies in our graph-based approach to mining the patterns of interest. Modelling of data in the form of graphs is effective in preserving the lifestyle behaviour maintained over the passage of time. Graph-modelled lifestyle data enables us to apply variety of graph mining techniques for pattern discovery. To demonstrate the effectiveness of our proposal, we have developed a prototype system for LPaMI to implement its end-to-end pipeline. We have also conducted an extensive evaluation for various phases of LPaMI using different real-world datasets. We understand that the output of LPaMI can be utilised for variety of pattern discovery application areas like trip and food recommendations, shopping, and so on.

  17. hs-CRP is strongly associated with coronary heart disease (CHD): A data mining approach using decision tree algorithm.

    Science.gov (United States)

    Tayefi, Maryam; Tajfard, Mohammad; Saffar, Sara; Hanachi, Parichehr; Amirabadizadeh, Ali Reza; Esmaeily, Habibollah; Taghipour, Ali; Ferns, Gordon A; Moohebati, Mohsen; Ghayour-Mobarhan, Majid

    2017-04-01

    Coronary heart disease (CHD) is an important public health problem globally. Algorithms incorporating the assessment of clinical biomarkers together with several established traditional risk factors can help clinicians to predict CHD and support clinical decision making with respect to interventions. Decision tree (DT) is a data mining model for extracting hidden knowledge from large databases. We aimed to establish a predictive model for coronary heart disease using a decision tree algorithm. Here we used a dataset of 2346 individuals including 1159 healthy participants and 1187 participant who had undergone coronary angiography (405 participants with negative angiography and 782 participants with positive angiography). We entered 10 variables of a total 12 variables into the DT algorithm (including age, sex, FBG, TG, hs-CRP, TC, HDL, LDL, SBP and DBP). Our model could identify the associated risk factors of CHD with sensitivity, specificity, accuracy of 96%, 87%, 94% and respectively. Serum hs-CRP levels was at top of the tree in our model, following by FBG, gender and age. Our model appears to be an accurate, specific and sensitive model for identifying the presence of CHD, but will require validation in prospective studies. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Algorithms bio-inspired for the pattern obtention of control bars in BWR reactors

    International Nuclear Information System (INIS)

    Ortiz, J.J.; Perusquia, R.; Montes, J.L.

    2003-01-01

    In this work methods based on Genetic Algorithms and Systems based on ant colonies for the obtention of the patterns of control bars of an equilibrium cycle of 18 months for the Laguna Verde nuclear power station are presented. A comparison of obtained results with the methods and with those of design of such equilibrium cycle is presented. As consequence of the study, it was found that the algorithm based on the ant colonies reached to diminish the coast down period (decrease of power at the end of the cycle) in five and half days with respect to the original design what represents an annual saving of $US 100,000. (Author)

  19. Computational mining for hypothetical patterns of amino acid side chains in protein data bank (PDB)

    Science.gov (United States)

    Ghani, Nur Syatila Ab; Firdaus-Raih, Mohd

    2018-04-01

    The three-dimensional structure of a protein can provide insights regarding its function. Functional relationship between proteins can be inferred from fold and sequence similarities. In certain cases, sequence or fold comparison fails to conclude homology between proteins with similar mechanism. Since the structure is more conserved than the sequence, a constellation of functional residues can be similarly arranged among proteins of similar mechanism. Local structural similarity searches are able to detect such constellation of amino acids among distinct proteins, which can be useful to annotate proteins of unknown function. Detection of such patterns of amino acids on a large scale can increase the repertoire of important 3D motifs since available known 3D motifs currently, could not compensate the ever-increasing numbers of uncharacterized proteins to be annotated. Here, a computational platform for an automated detection of 3D motifs is described. A fuzzy-pattern searching algorithm derived from IMagine an Amino Acid 3D Arrangement search EnGINE (IMAAAGINE) was implemented to develop an automated method for searching of hypothetical patterns of amino acid side chains in Protein Data Bank (PDB), without the need for prior knowledge on related sequence or structure of pattern of interest. We present an example of the searches, which is the detection of a hypothetical pattern derived from known structural motif of C2H2 structural pattern from zinc fingers. The conservation of particular patterns of amino acid side chains in unrelated proteins is highlighted. This approach can act as a complementary method for available structure- and sequence-based platforms and may contribute in improving functional association between proteins.

  20. A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series

    Directory of Open Access Journals (Sweden)

    Madeira Sara C

    2009-06-01

    Full Text Available Abstract Background The ability to monitor the change in expression patterns over time, and to observe the emergence of coherent temporal responses using gene expression time series, obtained from microarray experiments, is critical to advance our understanding of complex biological processes. In this context, biclustering algorithms have been recognized as an important tool for the discovery of local expression patterns, which are crucial to unravel potential regulatory mechanisms. Although most formulations of the biclustering problem are NP-hard, when working with time series expression data the interesting biclusters can be restricted to those with contiguous columns. This restriction leads to a tractable problem and enables the design of efficient biclustering algorithms able to identify all maximal contiguous column coherent biclusters. Methods In this work, we propose e-CCC-Biclustering, a biclustering algorithm that finds and reports all maximal contiguous column coherent biclusters with approximate expression patterns in time polynomial in the size of the time series gene expression matrix. This polynomial time complexity is achieved by manipulating a discretized version of the original matrix using efficient string processing techniques. We also propose extensions to deal with missing values, discover anticorrelated and scaled expression patterns, and different ways to compute the errors allowed in the expression patterns. We propose a scoring criterion combining the statistical significance of expression patterns with a similarity measure between overlapping biclusters. Results We present results in real data showing the effectiveness of e-CCC-Biclustering and its relevance in the discovery of regulatory modules describing the transcriptomic expression patterns occurring in Saccharomyces cerevisiae in response to heat stress. In particular, the results show the advantage of considering approximate patterns when compared to state of

  1. Analysis Of Data Mining For Car Sales Sparepart Using Apriori Algorithm (Case Study: PT. IDK 1 FIELD

    Directory of Open Access Journals (Sweden)

    Khairul Ummi

    2016-10-01

    Full Text Available PT. IDK 1 is one of the branch offices honda car dealership that sells various types of variants honda matic or manual car and motorcycle parts. Any sales or goods sold will be performed by inputting the database directly connected directly to the central office. But PT. IDK 1 do not know a couple items frequently purchased parts simultaneously. When the stock of spare parts which amount is low, the office is only asking them to send the stock of spare parts from the central office without knowing that the other parts if the parts were purchased then the other parts were also purchased. It was considered difficult when restocking of goods because of the many types of auto parts. Data mining techniques have been widely used to solve the existing problems with the implementation of the algorithm one A-Priori to obtain information about the association between the product of a database transaction. Sales transaction data honda car parts at PT. IDK 1 can be reprocessed using data mining applications resulting association rules is a strong link between itemset sales of spare parts so that it can provide recommendations and facilitate restocking items in the arrangement or placement of goods related to a strong interdependence.

  2. A Tutorial on Nonlinear Time-Series Data Mining in Engineering Asset Health and Reliability Prediction: Concepts, Models, and Algorithms

    Directory of Open Access Journals (Sweden)

    Ming Dong

    2010-01-01

    Full Text Available The primary objective of engineering asset management is to optimize assets service delivery potential and to minimize the related risks and costs over their entire life through the development and application of asset health and usage management in which the health and reliability prediction plays an important role. In real-life situations where an engineering asset operates under dynamic operational and environmental conditions, the lifetime of an engineering asset is generally described as monitored nonlinear time-series data and subject to high levels of uncertainty and unpredictability. It has been proved that application of data mining techniques is very useful for extracting relevant features which can be used as parameters for assets diagnosis and prognosis. In this paper, a tutorial on nonlinear time-series data mining in engineering asset health and reliability prediction is given. Besides that an overview on health and reliability prediction techniques for engineering assets is covered, this tutorial will focus on concepts, models, algorithms, and applications of hidden Markov models (HMMs and hidden semi-Markov models (HSMMs in engineering asset health prognosis, which are representatives of recent engineering asset health prediction techniques.

  3. Algorithms

    Indian Academy of Sciences (India)

    to as 'divide-and-conquer'. Although there has been a large effort in realizing efficient algorithms, there are not many universally accepted algorithm design paradigms. In this article, we illustrate algorithm design techniques such as balancing, greedy strategy, dynamic programming strategy, and backtracking or traversal of ...

  4. Development of a BWR loading pattern design system based on modified genetic algorithms and knowledge

    International Nuclear Information System (INIS)

    Martin-del-Campo, Cecilia; Francois, Juan Luis; Avendano, Linda; Gonzalez, Mario

    2004-01-01

    An optimization system based on Genetic Algorithms (GAs), in combination with expert knowledge coded in heuristics rules, was developed for the design of optimized boiling water reactor (BWR) fuel loading patterns. The system was coded in a computer program named Loading Pattern Optimization System based on Genetic Algorithms, in which the optimization code uses GAs to select candidate solutions, and the core simulator code CM-PRESTO to evaluate them. A multi-objective function was built to maximize the cycle energy length while satisfying power and reactivity constraints used as BWR design parameters. Heuristic rules were applied to satisfy standard fuel management recommendations as the Control Cell Core and Low Leakage loading strategies, and octant symmetry. To test the system performance, an optimized cycle was designed and compared against an actual operating cycle of Laguna Verde Nuclear Power Plant, Unit I

  5. Study of high speed complex number algorithms. [for determining antenna for field radiation patterns

    Science.gov (United States)

    Heisler, R.

    1981-01-01

    A method of evaluating the radiation integral on the curved surface of a reflecting antenna is presented. A three dimensional Fourier transform approach is used to generate a two dimensional radiation cross-section along a planer cut at any angle phi through the far field pattern. Salient to the method is an algorithm for evaluating a subset of the total three dimensional discrete Fourier transform results. The subset elements are selectively evaluated to yield data along a geometric plane of constant. The algorithm is extremely efficient so that computation of the induced surface currents via the physical optics approximation dominates the computer time required to compute a radiation pattern. Application to paraboloid reflectors with off-focus feeds in presented, but the method is easily extended to offset antenna systems and reflectors of arbitrary shapes. Numerical results were computed for both gain and phase and are compared with other published work.

  6. Searching for full power control rod patterns in a boiling water reactor using genetic algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Montes, Jose Luis [Departamento Sistemas Nucleares, ININ, Carr. Mexico-Toluca Km. 36.5, Ocoyoacac, Edo. de Mexico (Mexico)]. E-mail: jlmt@nuclear.inin.mx; Ortiz, Juan Jose [Departamento Sistemas Nucleares, ININ, Carr. Mexico-Toluca Km. 36.5, Ocoyoacac, Edo. de Mexico (Mexico)]. E-mail: jjortiz@nuclear.inin.mx; Requena, Ignacio [Departamento Ciencias Computacion e I.A. ETSII, Informatica, Universidad de Granada, C. Daniel Saucedo Aranda s/n. 18071 Granada (Spain)]. E-mail: requena@decsai.ugr.es; Perusquia, Raul [Departamento Sistemas Nucleares, ININ, Carr. Mexico-Toluca Km. 36.5, Ocoyoacac, Edo. de Mexico (Mexico)]. E-mail: rpc@nuclear.inin.mx

    2004-11-01

    One of the most important questions related to both safety and economic aspects in a nuclear power reactor operation, is without any doubt its reactivity control. During normal operation of a boiling water reactor, the reactivity control of its core is strongly determined by control rods patterns efficiency. In this paper, GACRP system is proposed based on the concepts of genetic algorithms for full power control rod patterns search. This system was carried out using LVNPP transition cycle characteristics, being applied too to an equilibrium cycle. Several operation scenarios, including core water flow variation throughout the cycle and different target axial power distributions, are considered. Genetic algorithm fitness function includes reactor security parameters, such as MLHGR, MCPR, reactor k{sub eff} and axial power density.

  7. An improved genetic algorithm for designing optimal temporal patterns of neural stimulation

    Science.gov (United States)

    Cassar, Isaac R.; Titus, Nathan D.; Grill, Warren M.

    2017-12-01

    Objective. Electrical neuromodulation therapies typically apply constant frequency stimulation, but non-regular temporal patterns of stimulation may be more effective and more efficient. However, the design space for temporal patterns is exceedingly large, and model-based optimization is required for pattern design. We designed and implemented a modified genetic algorithm (GA) intended for design optimal temporal patterns of electrical neuromodulation. Approach. We tested and modified standard GA methods for application to designing temporal patterns of neural stimulation. We evaluated each modification individually and all modifications collectively by comparing performance to the standard GA across three test functions and two biophysically-based models of neural stimulation. Main results. The proposed modifications of the GA significantly improved performance across the test functions and performed best when all were used collectively. The standard GA found patterns that outperformed fixed-frequency, clinically-standard patterns in biophysically-based models of neural stimulation, but the modified GA, in many fewer iterations, consistently converged to higher-scoring, non-regular patterns of stimulation. Significance. The proposed improvements to standard GA methodology reduced the number of iterations required for convergence and identified superior solutions.

  8. Reloading pattern optimization of VVER-1000 reactors in transient cycles using genetic algorithm

    International Nuclear Information System (INIS)

    Rahmani, Yashar

    2017-01-01

    Highlights: • The genetic algorithm (GA) and the innovative weighting factors method were used. • The coupling of WIMSD5-B and CITATION-LDI2 neutronic codes with the thermohydraulic WERL code was employed. • Optimization of reloading patterns was carried out in two states. • First an arrangement with satisfactory excess reactivity and the flattest power distribution was searched. • Second, it is tried to obtain an arrangement with satisfactory safety threshold and the maximum K_e_f_f. - Abstract: The present paper proposes application of the genetic algorithm (GA) and the innovative weighting factor method to optimize the reloading pattern of Bushehr VVER-1000 reactor in the second cycle. To estimate the composition of fuel assemblies remaining from the first cycle and precisely calculate the objective parameters of each reloading pattern in the second cycle, coupling of WIMSD5-B and CITATION-LDI2 codes in the neutronic section and the WERL code in the thermo-hydraulic section was employed. Optimization of the reloading patterns was carried out in two states. To meet the mentioned objective, with application of the weighting factor method in the first state, the type and quantity of the loadable fresh assemblies were determined to enable the reactor core to maintain the core criticality over the entire cycle length. Afterwards, the genetic algorithm was used to optimize the reloading pattern of the reactor to obtain an arrangement with flat radial power distribution. In the second state, the optimization algorithm was free to select the type and number of fresh fuel assemblies to be able to search for an arrangement with the maximum effective multiplication factor and the safe power peaking factor. In addition, in order to ensure the safety and desirability of the proposed patterns in both states, a time-dependent examination of the thermo-neutronic behavior of the reactor core was carried out during the second cycle. With consideration of the new

  9. Biosorption of metal and salt tolerant microbial isolates from a former uranium mining area. Their impact on changes in rare earth element patterns in acid mine drainage.

    Science.gov (United States)

    Haferburg, Götz; Merten, Dirk; Büchel, Georg; Kothe, Erika

    2007-12-01

    The concentration of metals in microbial habitats influenced by mining operations can reach enormous values. Worldwide, much emphasis is placed on the research of resistance and biosorptive capacities of microorganisms suitable for bioremediation purposes. Using a collection of isolates from a former uranium mining area in Eastern Thuringia, Germany, this study presents three Gram-positive bacterial strains with distinct metal tolerances. These strains were identified as members of the genera Bacillus, Micrococcus and Streptomyces. Acid mine drainage (AMD) originating from the same mining area is characterized by high metal concentrations of a broad range of elements and a very low pH. AMD was analyzed and used as incubation solution. The sorption of rare earth elements (REE), aluminum, cobalt, copper, manganese, nickel, strontium, and uranium through selected strains was studied during a time course of four weeks. Biosorption was investigated after one hour, one week and four weeks by analyzing the concentrations of metals in supernatant and biomass. Additionally, dead biomass was investigated after four weeks of incubation. The maximum of metal removal was reached after one week. Up to 80% of both Al and Cu, and more than 60% of U was shown to be removed from the solution. High concentrations of metals could be bound to the biomass, as for example 2.2 mg/g U. The strains could survive four weeks of incubation. Distinct and different patterns of rare earth elements of the inoculated and non-inoculated AMD water were observed. Changes in REE patterns hint at different binding types of heavy metals regarding incubation time and metabolic activity of the cells. (c) 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Optimal Refueling Pattern Search for a CANDU Reactor Using a Genetic Algorithm

    International Nuclear Information System (INIS)

    Quang Binh, DO; Gyuhong, ROH; Hangbok, CHOI

    2006-01-01

    This paper presents the results from the application of genetic algorithms to a refueling optimization of a Canada deuterium uranium (CANDU) reactor. This work aims at making a mathematical model of the refueling optimization problem including the objective function and constraints and developing a method based on genetic algorithms to solve the problem. The model of the optimization problem and the proposed method comply with the key features of the refueling strategy of the CANDU reactor which adopts an on-power refueling operation. In this study, a genetic algorithm combined with an elitism strategy was used to automatically search for the refueling patterns. The objective of the optimization was to maximize the discharge burn-up of the refueling bundles, minimize the maximum channel power, or minimize the maximum change in the zone controller unit (ZCU) water levels. A combination of these objectives was also investigated. The constraints include the discharge burn-up, maximum channel power, maximum bundle power, channel power peaking factor and the ZCU water level. A refueling pattern that represents the refueling rate and channels was coded by a one-dimensional binary chromosome, which is a string of binary numbers 0 and 1. A computer program was developed in FORTRAN 90 running on an HP 9000 workstation to conduct the search for the optimal refueling patterns for a CANDU reactor at the equilibrium state. The results showed that it was possible to apply genetic algorithms to automatically search for the refueling channels of the CANDU reactor. The optimal refueling patterns were compared with the solutions obtained from the AUTOREFUEL program and the results were consistent with each other. (authors)

  11. Image/Time Series Mining Algorithms: Applications to Developmental Biology, Document Processing and Data Streams

    Science.gov (United States)

    Tataw, Oben Moses

    2013-01-01

    Interdisciplinary research in computer science requires the development of computational techniques for practical application in different domains. This usually requires careful integration of different areas of technical expertise. This dissertation presents image and time series analysis algorithms, with practical interdisciplinary applications…

  12. Genetic algorithms and artificial neural networks for loading pattern optimisation of advanced gas-cooled reactors

    Energy Technology Data Exchange (ETDEWEB)

    Ziver, A.K. E-mail: a.k.ziver@imperial.ac.uk; Pain, C.C; Carter, J.N.; Oliveira, C.R.E. de; Goddard, A.J.H.; Overton, R.S

    2004-03-01

    A non-generational genetic algorithm (GA) has been developed for fuel management optimisation of Advanced Gas-Cooled Reactors, which are operated by British Energy and produce around 20% of the UK's electricity requirements. An evolutionary search is coded using the genetic operators; namely selection by tournament, two-point crossover, mutation and random assessment of population for multi-cycle loading pattern (LP) optimisation. A detailed description of the chromosomes in the genetic algorithm coded is presented. Artificial Neural Networks (ANNs) have been constructed and trained to accelerate the GA-based search during the optimisation process. The whole package, called GAOPT, is linked to the reactor analysis code PANTHER, which performs fresh fuel loading, burn-up and power shaping calculations for each reactor cycle by imposing station-specific safety and operational constraints. GAOPT has been verified by performing a number of tests, which are applied to the Hinkley Point B and Hartlepool reactors. The test results giving loading pattern (LP) scenarios obtained from single and multi-cycle optimisation calculations applied to realistic reactor states of the Hartlepool and Hinkley Point B reactors are discussed. The results have shown that the GA/ANN algorithms developed can help the fuel engineer to optimise loading patterns in an efficient and more profitable way than currently available for multi-cycle refuelling of AGRs. Research leading to parallel GAs applied to LP optimisation are outlined, which can be adapted to present day LWR fuel management problems.

  13. A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection.

    Directory of Open Access Journals (Sweden)

    Chun-Liang Lee

    Full Text Available The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms.

  14. Order Batching in Warehouses by Minimizing Total Tardiness: A Hybrid Approach of Weighted Association Rule Mining and Genetic Algorithms

    Directory of Open Access Journals (Sweden)

    Amir Hossein Azadnia

    2013-01-01

    Full Text Available One of the cost-intensive issues in managing warehouses is the order picking problem which deals with the retrieval of items from their storage locations in order to meet customer requests. Many solution approaches have been proposed in order to minimize traveling distance in the process of order picking. However, in practice, customer orders have to be completed by certain due dates in order to avoid tardiness which is neglected in most of the related scientific papers. Consequently, we proposed a novel solution approach in order to minimize tardiness which consists of four phases. First of all, weighted association rule mining has been used to calculate associations between orders with respect to their due date. Next, a batching model based on binary integer programming has been formulated to maximize the associations between orders within each batch. Subsequently, the order picking phase will come up which used a Genetic Algorithm integrated with the Traveling Salesman Problem in order to identify the most suitable travel path. Finally, the Genetic Algorithm has been applied for sequencing the constructed batches in order to minimize tardiness. Illustrative examples and comparisons are presented to demonstrate the proficiency and solution quality of the proposed approach.

  15. An Unsupervised Opinion Mining Approach for Japanese Weblog Reputation Information Using an Improved SO-PMI Algorithm

    Science.gov (United States)

    Wang, Guangwei; Araki, Kenji

    In this paper, we propose an improved SO-PMI (Semantic Orientation Using Pointwise Mutual Information) algorithm, for use in Japanese Weblog Opinion Mining. SO-PMI is an unsupervised approach proposed by Turney that has been shown to work well for English. When this algorithm was translated into Japanese naively, most phrases, whether positive or negative in meaning, received a negative SO. For dealing with this slanting phenomenon, we propose three improvements: to expand the reference words to sets of words, to introduce a balancing factor and to detect neutral expressions. In our experiments, the proposed improvements obtained a well-balanced result: both positive and negative accuracy exceeded 62%, when evaluated on 1,200 opinion sentences sampled from three different domains (reviews of Electronic Products, Cars and Travels from Kakaku. com). In a comparative experiment on the same corpus, a supervised approach (SA-Demo) achieved a very similar accuracy to our method. This shows that our proposed approach effectively adapted SO-PMI for Japanese, and it also shows the generality of SO-PMI.

  16. Opinion mining on book review using CNN-L2-SVM algorithm

    Science.gov (United States)

    Rozi, M. F.; Mukhlash, I.; Soetrisno; Kimura, M.

    2018-03-01

    Review of a product can represent quality of a product itself. An extraction to that review can be used to know sentiment of that opinion. Process to extract useful information of user review is called Opinion Mining. Review extraction model that is enhancing nowadays is Deep Learning model. This Model has been used by many researchers to obtain excellent performance on Natural Language Processing. In this research, one of deep learning model, Convolutional Neural Network (CNN) is used for feature extraction and L2 Support Vector Machine (SVM) as classifier. These methods are implemented to know the sentiment of book review data. The result of this method shows state-of-the art performance in 83.23% for training phase and 64.6% for testing phase.

  17. Learning Analytics Through Serious Games: Data Mining Algorithms for Performance Measurement and Improvement Purposes

    Directory of Open Access Journals (Sweden)

    Abdelali Slimani

    2018-01-01

    Full Text Available learning analytics is an emerging discipline focused on the measurement, collection, analysis and reporting of learner interaction data through the E-learning contents. Serious game provides a potential source for relevant educational user data; it can propose an interactive environment for training and offer an effective learning process. This paper presents methods and approaches of educational data mining such as EM and K-Means to discuss the learning analytics through serious games, and then we provide an analysis of the player experience data collected from the educational game “ELISA” used to teach students of biology the immunological technique for determination of ANTI-HIV antibodies. Finally, we propose critically evaluation of our results including the limitations of our study and making suggestions for future research that links learning analytics and serious gaming.

  18. KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

    Directory of Open Access Journals (Sweden)

    Schomburg Dietmar

    2010-07-01

    Full Text Available Abstract Background The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed. Description Here we present a text mining algorithm for the extraction of kinetic information such as KM, Ki, kcat etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (KM, Ki, kcat, kcat/KM, Vmax, IC50, S0.5, Kd, Ka, t1/2, pI, nH, specific activity, Vmax/KM from about 17 million PubMed abstracts and combine them with other data in the abstract. A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched. The results were stored in a database and are available as "KID the KInetic Database" via the internet. Conclusions The presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases. The database is available at http://kid.tu-bs.de. The source code of the algorithm is provided under the GNU General Public

  19. Study on text mining algorithm for ultrasound examination of chronic liver diseases based on spectral clustering

    Science.gov (United States)

    Chang, Bingguo; Chen, Xiaofei

    2018-05-01

    Ultrasonography is an important examination for the diagnosis of chronic liver disease. The doctor gives the liver indicators and suggests the patient's condition according to the description of ultrasound report. With the rapid increase in the amount of data of ultrasound report, the workload of professional physician to manually distinguish ultrasound results significantly increases. In this paper, we use the spectral clustering method to cluster analysis of the description of the ultrasound report, and automatically generate the ultrasonic diagnostic diagnosis by machine learning. 110 groups ultrasound examination report of chronic liver disease were selected as test samples in this experiment, and the results were validated by spectral clustering and compared with k-means clustering algorithm. The results show that the accuracy of spectral clustering is 92.73%, which is higher than that of k-means clustering algorithm, which provides a powerful ultrasound-assisted diagnosis for patients with chronic liver disease.

  20. PedMine – A simulated annealing algorithm to identify maximally unrelated individuals in population isolates

    OpenAIRE

    Douglas, Julie A.; Sandefur, Conner I.

    2008-01-01

    In family-based genetic studies, it is often useful to identify a subset of unrelated individuals. When such studies are conducted in population isolates, however, most if not all individuals are often detectably related to each other. To identify a set of maximally unrelated (or equivalently, minimally related) individuals, we have implemented simulated annealing, a general-purpose algorithm for solving difficult combinatorial optimization problems. We illustrate our method on data from a ge...

  1. Indexing amyloid peptide diffraction from serial femtosecond crystallography: new algorithms for sparse patterns

    Energy Technology Data Exchange (ETDEWEB)

    Brewster, Aaron S. [Lawrence Berkeley National Laboratory, Berkeley, CA 94720 (United States); Sawaya, Michael R. [University of California, Los Angeles, CA 90095-1570 (United States); University of California, Los Angeles, CA 90095-1570 (United States); University of California, Los Angeles, CA 90095-1570 (United States); Rodriguez, Jose [University of California, Los Angeles, CA 90095-1570 (United States); University of California, Los Angeles, CA 90095-1570 (United States); Hattne, Johan; Echols, Nathaniel [Lawrence Berkeley National Laboratory, Berkeley, CA 94720 (United States); McFarlane, Heather T. [University of California, Los Angeles, CA 90095-1570 (United States); University of California, Los Angeles, CA 90095-1570 (United States); Cascio, Duilio [University of California, Los Angeles, CA 90095-1570 (United States); University of California, Los Angeles, CA 90095-1570 (United States); University of California, Los Angeles, CA 90095-1570 (United States); Adams, Paul D. [Lawrence Berkeley National Laboratory, Berkeley, CA 94720 (United States); University of California, Berkeley, CA 94720 (United States); Eisenberg, David S. [University of California, Los Angeles, CA 90095-1570 (United States); University of California, Los Angeles, CA 90095-1570 (United States); University of California, Los Angeles, CA 90095-1570 (United States); Sauter, Nicholas K., E-mail: nksauter@lbl.gov [Lawrence Berkeley National Laboratory, Berkeley, CA 94720 (United States)

    2015-02-01

    Special methods are required to interpret sparse diffraction patterns collected from peptide crystals at X-ray free-electron lasers. Bragg spots can be indexed from composite-image powder rings, with crystal orientations then deduced from a very limited number of spot positions. Still diffraction patterns from peptide nanocrystals with small unit cells are challenging to index using conventional methods owing to the limited number of spots and the lack of crystal orientation information for individual images. New indexing algorithms have been developed as part of the Computational Crystallography Toolbox (cctbx) to overcome these challenges. Accurate unit-cell information derived from an aggregate data set from thousands of diffraction patterns can be used to determine a crystal orientation matrix for individual images with as few as five reflections. These algorithms are potentially applicable not only to amyloid peptides but also to any set of diffraction patterns with sparse properties, such as low-resolution virus structures or high-throughput screening of still images captured by raster-scanning at synchrotron sources. As a proof of concept for this technique, successful integration of X-ray free-electron laser (XFEL) data to 2.5 Å resolution for the amyloid segment GNNQQNY from the Sup35 yeast prion is presented.

  2. Big Data Mining and Adverse Event Pattern Analysis in Clinical Drug Trials.

    Science.gov (United States)

    Federer, Callie; Yoo, Minjae; Tan, Aik Choon

    2016-12-01

    Drug adverse events (AEs) are a major health threat to patients seeking medical treatment and a significant barrier in drug discovery and development. AEs are now required to be submitted during clinical trials and can be extracted from ClinicalTrials.gov ( https://clinicaltrials.gov/ ), a database of clinical studies around the world. By extracting drug and AE information from ClinicalTrials.gov and structuring it into a database, drug-AEs could be established for future drug development and repositioning. To our knowledge, current AE databases contain mainly U.S. Food and Drug Administration (FDA)-approved drugs. However, our database contains both FDA-approved and experimental compounds extracted from ClinicalTrials.gov . Our database contains 8,161 clinical trials of 3,102,675 patients and 713,103 reported AEs. We extracted the information from ClinicalTrials.gov using a set of python scripts, and then used regular expressions and a drug dictionary to process and structure relevant information into a relational database. We performed data mining and pattern analysis of drug-AEs in our database. Our database can serve as a tool to assist researchers to discover drug-AE relationships for developing, repositioning, and repurposing drugs.

  3. Improving Fishing Pattern Detection from Satellite AIS Using Data Mining and Machine Learning.

    Directory of Open Access Journals (Sweden)

    Erico N de Souza

    Full Text Available A key challenge in contemporary ecology and conservation is the accurate tracking of the spatial distribution of various human impacts, such as fishing. While coastal fisheries in national waters are closely monitored in some countries, existing maps of fishing effort elsewhere are fraught with uncertainty, especially in remote areas and the High Seas. Better understanding of the behavior of the global fishing fleets is required in order to prioritize and enforce fisheries management and conservation measures worldwide. Satellite-based Automatic Information Systems (S-AIS are now commonly installed on most ocean-going vessels and have been proposed as a novel tool to explore the movements of fishing fleets in near real time. Here we present approaches to identify fishing activity from S-AIS data for three dominant fishing gear types: trawl, longline and purse seine. Using a large dataset containing worldwide fishing vessel tracks from 2011-2015, we developed three methods to detect and map fishing activities: for trawlers we produced a Hidden Markov Model (HMM using vessel speed as observation variable. For longliners we have designed a Data Mining (DM approach using an algorithm inspired from studies on animal movement. For purse seiners a multi-layered filtering strategy based on vessel speed and operation time was implemented. Validation against expert-labeled datasets showed average detection accuracies of 83% for trawler and longliner, and 97% for purse seiner. Our study represents the first comprehensive approach to detect and identify potential fishing behavior for three major gear types operating on a global scale. We hope that this work will enable new efforts to assess the spatial and temporal distribution of global fishing effort and make global fisheries activities transparent to ocean scientists, managers and the public.

  4. Multicycle Optimization of Advanced Gas-Cooled Reactor Loading Patterns Using Genetic Algorithms

    International Nuclear Information System (INIS)

    Ziver, A. Kemal; Carter, Jonathan N.; Pain, Christopher C.; Oliveira, Cassiano R.E. de; Goddard, Antony J. H.; Overton, Richard S.

    2003-01-01

    A genetic algorithm (GA)-based optimizer (GAOPT) has been developed for in-core fuel management of advanced gas-cooled reactors (AGRs) at HINKLEY B and HARTLEPOOL, which employ on-load and off-load refueling, respectively. The optimizer has been linked to the reactor analysis code PANTHER for the automated evaluation of loading patterns in a two-dimensional geometry, which is collapsed from the three-dimensional reactor model. GAOPT uses a directed stochastic (Monte Carlo) algorithm to generate initial population members, within predetermined constraints, for use in GAs, which apply the standard genetic operators: selection by tournament, crossover, and mutation. The GAOPT is able to generate and optimize loading patterns for successive reactor cycles (multicycle) within acceptable CPU times even on single-processor systems. The algorithm allows radial shuffling of fuel assemblies in a multicycle refueling optimization, which is constructed to aid long-term core management planning decisions. This paper presents the application of the GA-based optimization to two AGR stations, which apply different in-core management operational rules. Results obtained from the testing of GAOPT are discussed

  5. Performance Evaluation of Machine Learning Algorithms for Urban Pattern Recognition from Multi-spectral Satellite Images

    Directory of Open Access Journals (Sweden)

    Marc Wieland

    2014-03-01

    Full Text Available In this study, a classification and performance evaluation framework for the recognition of urban patterns in medium (Landsat ETM, TM and MSS and very high resolution (WorldView-2, Quickbird, Ikonos multi-spectral satellite images is presented. The study aims at exploring the potential of machine learning algorithms in the context of an object-based image analysis and to thoroughly test the algorithm’s performance under varying conditions to optimize their usage for urban pattern recognition tasks. Four classification algorithms, Normal Bayes, K Nearest Neighbors, Random Trees and Support Vector Machines, which represent different concepts in machine learning (probabilistic, nearest neighbor, tree-based, function-based, have been selected and implemented on a free and open-source basis. Particular focus is given to assess the generalization ability of machine learning algorithms and the transferability of trained learning machines between different image types and image scenes. Moreover, the influence of the number and choice of training data, the influence of the size and composition of the feature vector and the effect of image segmentation on the classification accuracy is evaluated.

  6. Automatic boiling water reactor control rod pattern design using particle swarm optimization algorithm and local search

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Cheng-Der, E-mail: jdwang@iner.gov.tw [Nuclear Engineering Division, Institute of Nuclear Energy Research, No. 1000, Wenhua Rd., Jiaan Village, Longtan Township, Taoyuan County 32546, Taiwan, ROC (China); Lin, Chaung [National Tsing Hua University, Department of Engineering and System Science, 101, Section 2, Kuang Fu Road, Hsinchu 30013, Taiwan (China)

    2013-02-15

    Highlights: ► The PSO algorithm was adopted to automatically design a BWR CRP. ► The local search procedure was added to improve the result of PSO algorithm. ► The results show that the obtained CRP is the same good as that in the previous work. -- Abstract: This study developed a method for the automatic design of a boiling water reactor (BWR) control rod pattern (CRP) using the particle swarm optimization (PSO) algorithm. The PSO algorithm is more random compared to the rank-based ant system (RAS) that was used to solve the same BWR CRP design problem in the previous work. In addition, the local search procedure was used to make improvements after PSO, by adding the single control rod (CR) effect. The design goal was to obtain the CRP so that the thermal limits and shutdown margin would satisfy the design requirement and the cycle length, which is implicitly controlled by the axial power distribution, would be acceptable. The results showed that the same acceptable CRP found in the previous work could be obtained.

  7. Using data mining and OLAP to discover patterns in a database of patients with Y-chromosome deletions.

    Science.gov (United States)

    Dzeroski, S; Hristovski, D; Peterlin, B

    2000-01-01

    The paper presents a database of published Y chromosome deletions and the results of analyzing the database with data mining techniques. The database describes 382 patients for which 177 different markers were tested: 364 of the 382 patients had deletions. Two data mining techniques, clustering and decision tree induction were used. Clustering was used to group patients according to the overall presence/absence of deletions at the tested markers. Decision trees and On-Line-Analytical-Processing (OLAP) were used to inspect the resulting clustering and look for correlations between deletion patterns, populations and the clinical picture of infertility. The results of the analysis indicate that there are correlations between deletion patterns and patient populations, as well as clinical phenotype severity.

  8. Use of a machine learning algorithm to classify expertise: analysis of hand motion patterns during a simulated surgical task.

    Science.gov (United States)

    Watson, Robert A

    2014-08-01

    To test the hypothesis that machine learning algorithms increase the predictive power to classify surgical expertise using surgeons' hand motion patterns. In 2012 at the University of North Carolina at Chapel Hill, 14 surgical attendings and 10 first- and second-year surgical residents each performed two bench model venous anastomoses. During the simulated tasks, the participants wore an inertial measurement unit on the dorsum of their dominant (right) hand to capture their hand motion patterns. The pattern from each bench model task performed was preprocessed into a symbolic time series and labeled as expert (attending) or novice (resident). The labeled hand motion patterns were processed and used to train a Support Vector Machine (SVM) classification algorithm. The trained algorithm was then tested for discriminative/predictive power against unlabeled (blinded) hand motion patterns from tasks not used in the training. The Lempel-Ziv (LZ) complexity metric was also measured from each hand motion pattern, with an optimal threshold calculated to separately classify the patterns. The LZ metric classified unlabeled (blinded) hand motion patterns into expert and novice groups with an accuracy of 70% (sensitivity 64%, specificity 80%). The SVM algorithm had an accuracy of 83% (sensitivity 86%, specificity 80%). The results confirmed the hypothesis. The SVM algorithm increased the predictive power to classify blinded surgical hand motion patterns into expert versus novice groups. With further development, the system used in this study could become a viable tool for low-cost, objective assessment of procedural proficiency in a competency-based curriculum.

  9. Software tool for data mining and its applications

    Science.gov (United States)

    Yang, Jie; Ye, Chenzhou; Chen, Nianyi

    2002-03-01

    A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.

  10. Advanced and flexible genetic algorithms for BWR fuel loading pattern optimization

    International Nuclear Information System (INIS)

    Martin-del-Campo, Cecilia; Palomera-Perez, Miguel-Angel; Francois, Juan-Luis

    2009-01-01

    This work proposes advances in the implementation of a flexible genetic algorithm (GA) for fuel loading pattern optimization for Boiling Water Reactors (BWRs). In order to avoid specific implementations of genetic operators and to obtain a more flexible treatment, a binary representation of the solution was implemented; this representation had to take into account that a little change in the genotype must correspond to a little change in the phenotype. An identifier number is assigned to each assembly by means of a Gray Code of 7 bits and the solution (the loading pattern) is represented by a binary chain of 777 bits of length. Another important contribution is the use of a Fitness Function which includes a Heuristic Function and an Objective Function. The Heuristic Function which is defined to give flexibility on the application of a set of positioning rules based on knowledge, and the Objective Function that contains all the parameters which qualify the neutronic and thermal hydraulic performances of each loading pattern. Experimental results illustrating the effectiveness and flexibility of this optimization algorithm are presented and discussed.

  11. Application of frequent itemsets mining to analyze patterns of one-stop visits in Taiwan.

    Directory of Open Access Journals (Sweden)

    Chun-Yi Tu

    Full Text Available BACKGROUND: The free choice of health care facilities without limitations on frequency of visits within the National Health Insurance in Taiwan gives rise to not only a high number of annual ambulatory visits per capita but also a unique "one-stop shopping"phenomenon, which refers to a patient' visits to several specialties of the same healthcare facility in one day. The visits to multiple physicians would increase the potential risk of polypharmacy. The aim of this study was to analyze the frequency and patterns of one-stop visits in Taiwan. METHODOLOGY/PRINCIPAL FINDINGS: The claims datasets of 1 million nationally representative people within Taiwan's National Health Insurance in 2005 were used to calculate the number of patients with one-stop visits. The frequent itemsets mining was applied to compute the combination patterns of specialties in the one-stop visits. Among the total 13,682,469 ambulatory care visits in 2005, one-stop visits occurred 144,132 times and involved 296,822 visits (2.2% of all visits by 66,294 (6.6% persons. People tended to have this behavior with age and the percentage reached 27.5% (5,662 in 20,579 in the age group ≥80 years. In general, women were more likely to have one-stop visits than men (7.2% vs. 6.0%. Internal medicine plus ophthalmology was the most frequent combination with a visited frequency of 3,552 times (2.5%, followed by cardiology plus neurology with 3,183 times (2.2%. The most frequent three-specialty combination, cardiology plus neurology and gastroenterology, occurred only 111 times. CONCLUSIONS/SIGNIFICANCE: Without the novel computational technique, it would be hardly possible to analyze the extremely diverse combination patterns of specialties in one-stop visits. The results of the study could provide useful information either for the hospital manager to set up integrated services or for the policymaker to rebuild the health care system.

  12. Application of the distributed genetic algorithm for loading pattern optimization problems

    International Nuclear Information System (INIS)

    Hashimoto, Hiroshi; Yamamoto, Akio

    2000-01-01

    The distributed genetic algorithm (DGA) is applied for loading pattern optimization problems of the pressurized water reactors (PWR). Due to stiff nature of the loading pattern optimizations (e.g. multi-modality and non-linearity), stochastic methods like the simulated annealing or the genetic algorithm (GA) are widely applied for these problems. A basic concept of DGA is based on that of GA. However, DGA equally distributes candidates of solutions (i.e. loading patterns) to several independent 'islands' and evolves them in each island. Migrations of some candidates are performed among islands with a certain period. Since candidates of solutions independently evolve in each island with accepting different genes of migrants from other islands, premature convergence in the traditional GA can be prevented. Because many candidate loading patterns should be evaluated in one generation of GA or DGA, the parallelization in these calculations works efficiently. Parallel efficiency was measured using our optimization code and good load balance was attained even in a heterogeneous cluster environment due to dynamic distribution of the calculation load. The optimization code is based on the client/server architecture with the TCP/IP native socket and a client (optimization module) and calculation server modules communicate the objects of loading patterns each other. Throughout the sensitivity study on optimization parameters of DGA, a suitable set of the parameters for a test problem was identified. Finally, optimization capability of DGA and the traditional GA was compared in the test problem and DGA provided better optimization results than the traditional GA. (author)

  13. A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering

    Science.gov (United States)

    Chahine, Firas Safwan

    2012-01-01

    Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…

  14. Low-complexity transcoding algorithm from H.264/AVC to SVC using data mining

    Science.gov (United States)

    Garrido-Cantos, Rosario; De Cock, Jan; Martínez, Jose Luis; Van Leuven, Sebastian; Cuenca, Pedro; Garrido, Antonio

    2013-12-01

    Nowadays, networks and terminals with diverse characteristics of bandwidth and capabilities coexist. To ensure a good quality of experience, this diverse environment demands adaptability of the video stream. In general, video contents are compressed to save storage capacity and to reduce the bandwidth required for its transmission. Therefore, if these compressed video streams were compressed using scalable video coding schemes, they would be able to adapt to those heterogeneous networks and a wide range of terminals. Since the majority of the multimedia contents are compressed using H.264/AVC, they cannot benefit from that scalability. This paper proposes a low-complexity algorithm to convert an H.264/AVC bitstream without scalability to scalable bitstreams with temporal scalability in baseline and main profiles by accelerating the mode decision task of the scalable video coding encoding stage using machine learning tools. The results show that when our technique is applied, the complexity is reduced by 87% while maintaining coding efficiency.

  15. Algorithms

    Indian Academy of Sciences (India)

    ticians but also forms the foundation of computer science. Two ... with methods of developing algorithms for solving a variety of problems but ... applications of computers in science and engineer- ... numerical calculus are as important. We will ...

  16. Simulation of small-angle scattering patterns using a CPU-efficient algorithm

    Science.gov (United States)

    Anitas, E. M.

    2017-12-01

    Small-angle scattering (of neutrons, x-ray or light; SAS) is a well-established experimental technique for structural analysis of disordered systems at nano and micro scales. For complex systems, such as super-molecular assemblies or protein molecules, analytic solutions of SAS intensity are generally not available. Thus, a frequent approach to simulate the corresponding patterns is to use a CPU-efficient version of the Debye formula. For this purpose, in this paper we implement the well-known DALAI algorithm in Mathematica software. We present calculations for a series of 2D Sierpinski gaskets and respectively of pentaflakes, obtained from chaos game representation.

  17. Study on the Detection of Moving Target in the Mining Method Based on Hybrid Algorithm for Sports Video Analysis

    Directory of Open Access Journals (Sweden)

    Huang Tian

    2014-10-01

    Full Text Available Moving object detection and tracking is the computer vision and image processing is a hot research direction, based on the analysis of the moving target detection and tracking algorithm in common use, focus on the sports video target tracking non rigid body. In sports video, non rigid athletes often have physical deformation in the process of movement, and may be associated with the occurrence of moving target under cover. Media data is surging to fast search and query causes more difficulties in data. However, the majority of users want to be able to quickly from the multimedia data to extract the interested content and implicit knowledge (concepts, rules, rules, models and correlation, retrieval and query quickly to take advantage of them, but also can provide the decision support problem solving hierarchy. Based on the motion in sport video object as the object of study, conducts the system research from the theoretical level and technical framework and so on, from the layer by layer mining between low level motion features to high-level semantic motion video, not only provides support for users to find information quickly, but also can provide decision support for the user to solve the problem.

  18. Clustering box office movie with Partition Around Medoids (PAM) Algorithm based on Text Mining of Indonesian subtitle

    Science.gov (United States)

    Alfarizy, A. D.; Indahwati; Sartono, B.

    2017-03-01

    Indonesia is the largest Hollywood movie industry target market in Southeast Asia in 2015. Hollywood movies distributed in Indonesia targeted people in all range of ages including children. Low awareness of guiding children while watching movies make them could watch any rated films even the unsuitable ones for their ages. Even after being translated into Bahasa and passed the censorship phase, words that uncomfortable for children to watch still exist. The purpose of this research is to cluster box office Hollywood movies based on Indonesian subtitle, revenue, IMDb user rating and genres as one of the reference for adults to choose right movies for their children to watch. Text mining is used to extract words from the subtitles and count the frequency for three group of words (bad words, sexual words and terror words), while Partition Around Medoids (PAM) Algorithm with Gower similarity coefficient as proximity matrix is used as clustering method. We clustered 624 movies from 2006 until first half of 2016 from IMDb. Cluster with highest silhouette coefficient value (0.36) is the one with 5 clusters. Animation, Adventure and Comedy movies with high revenue like in cluster 5 is recommended for children to watch, while Comedy movies with high revenue like in cluster 4 should be avoided to watch.

  19. Vegetation pattern and heavy metal accumulation at a mine tailing at Gyöngyösoroszi, hungary.

    Science.gov (United States)

    Tamás, János; Kovács, Alza

    2005-01-01

    Vegetation at an abandoned heavy metal bearing mine tailing may have multifunctional roles such as modification of water balance, erosion control and landscape rehabilitation. Research on the vegetation of mine tailings can provide useful information on tolerance, accumulation and translocation properties of species potentially applicable at moderately contaminated sites. Analyses of the relationship between heavy metal content (Pb, Zn and Cu) and vegetation in a mine tailing were carried out. These analyses included: (1) spatial analysis of relationship among heavy metal distribution, pH and vegetation patterns, and (2) analysis of heavy metal accumulation and translocation in some plant species. Presence of vegetation was found to be significantly dependent on pH value, which confirms that phytotoxicity is a function of element concentration in solution, which is primarily controlled by pH value in mine tailings. Among the most abundant plant species, dewberry (Rubus caesius), vipersbugloss (Echium vulgare), scarlet pimpernel (Anagallis arvensis) and narrowleaf plantain (Plantago lanceolata) accumulate significant amounts of Pb, Cu and Zn, while in the case of annual bluegrass (Poa annua) only Pb can be measured in elevated contents. Considering the translocation features, scarlet pimpernel, narrowleaf plantain, and dewberry accumulate heavy metals primarily in their roots, while heavy metal concentration in vipersbugloss and annual bluegrass is higher in the shoots.

  20. GRAMI: Frequent subgraph and pattern mining in a single large graph

    KAUST Repository

    Elseidy, M.; Abdelhamid, Ehab; Skiadopoulos, S.; Kalnis, Panos

    2014-01-01

    Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs

  1. A novel tree-based algorithm to discover seismic patterns in earthquake catalogs

    Science.gov (United States)

    Florido, E.; Asencio-Cortés, G.; Aznarte, J. L.; Rubio-Escudero, C.; Martínez-Álvarez, F.

    2018-06-01

    A novel methodology is introduced in this research study to detect seismic precursors. Based on an existing approach, the new methodology searches for patterns in the historical data. Such patterns may contain statistical or soil dynamics information. It improves the original version in several aspects. First, new seismicity indicators have been used to characterize earthquakes. Second, a machine learning clustering algorithm has been applied in a very flexible way, thus allowing the discovery of new data groupings. Third, a novel search strategy is proposed in order to obtain non-overlapped patterns. And, fourth, arbitrary lengths of patterns are searched for, thus discovering long and short-term behaviors that may influence in the occurrence of medium-large earthquakes. The methodology has been applied to seven different datasets, from three different regions, namely the Iberian Peninsula, Chile and Japan. Reported results show a remarkable improvement with respect to the former version, in terms of all evaluated quality measures. In particular, the number of false positives has decreased and the positive predictive values increased, both of them in a very remarkable manner.

  2. Optimization of Boiling Water Reactor Loading Pattern Using Two-Stage Genetic Algorithm

    International Nuclear Information System (INIS)

    Kobayashi, Yoko; Aiyoshi, Eitaro

    2002-01-01

    A new two-stage optimization method based on genetic algorithms (GAs) using an if-then heuristic rule was developed to generate optimized boiling water reactor (BWR) loading patterns (LPs). In the first stage, the LP is optimized using an improved GA operator. In the second stage, an exposure-dependent control rod pattern (CRP) is sought using GA with an if-then heuristic rule. The procedure of the improved GA is based on deterministic operators that consist of crossover, mutation, and selection. The handling of the encoding technique and constraint conditions by that GA reflects the peculiar characteristics of the BWR. In addition, strategies such as elitism and self-reproduction are effectively used in order to improve the search speed. The LP evaluations were performed with a three-dimensional diffusion code that coupled neutronic and thermal-hydraulic models. Strong axial heterogeneities and constraints dependent on three dimensions have always necessitated the use of three-dimensional core simulators for BWRs, so that optimization of computational efficiency is required. The proposed algorithm is demonstrated by successfully generating LPs for an actual BWR plant in two phases. One phase is only LP optimization applying the Haling technique. The other phase is an LP optimization that considers the CRP during reactor operation. In test calculations, candidates that shuffled fresh and burned fuel assemblies within a reasonable computation time were obtained

  3. Application of affinity propagation algorithm based on manifold distance for transformer PD pattern recognition

    Science.gov (United States)

    Wei, B. G.; Huo, K. X.; Yao, Z. F.; Lou, J.; Li, X. Y.

    2018-03-01

    It is one of the difficult problems encountered in the research of condition maintenance technology of transformers to recognize partial discharge (PD) pattern. According to the main physical characteristics of PD, three models of oil-paper insulation defects were set up in laboratory to study the PD of transformers, and phase resolved partial discharge (PRPD) was constructed. By using least square method, the grey-scale images of PRPD were constructed and features of each grey-scale image were 28 box dimensions and 28 information dimensions. Affinity propagation algorithm based on manifold distance (AP-MD) for transformers PD pattern recognition was established, and the data of box dimension and information dimension were clustered based on AP-MD. Study shows that clustering result of AP-MD is better than the results of affinity propagation (AP), k-means and fuzzy c-means algorithm (FCM). By choosing different k values of k-nearest neighbor, we find clustering accuracy of AP-MD falls when k value is larger or smaller, and the optimal k value depends on sample size.

  4. Kajian Algoritma Sequential Pattern Mining Dan Market Basket Analysis Dalam Pengenalan Pola Belanja Customer Untuk Layout Toko

    Directory of Open Access Journals (Sweden)

    Rusito Rusito

    2016-01-01

    Full Text Available Penelitian ini membahas tentang keterkaitan antar item yang dibeli oleh customer dalam toko ritel. Pengetahuan keterkaitan item yang dibeli dapat digunakan untuk  menentukan tata letak barang dagangan toko ritel. Hal ini penting agar konsumen dapat mudah mendapatkan barang yang dibutuhkan. Sehingga dapat meningkatkan omzet penjualan toko ritel sehingga akhirnya menambah keuntungan bagi pemilik toko ritel. Teknik yang digunakan untuk menyelesaikan penggalian data dan keterkaitan pembelian tersebut menggunakan pendekatan Association rule dan Market Basket Analysis. Sedangkan untuk mencari keterkaitan item tersebut digunakan algoritma Sequential Pattern Mining. Digunakan karena mampu menangani jumlah database yang besar dan sangat baik disisi kecepatan pemrosesan. Berbagai aplikasi telah diidentifikasi, termasuk misalnya, cross-selling, analisis situs Web, pendukung keputusan, evaluasi kredit, acara prediksi kriminal, analisis perilaku pelanggan  dan deteksi penipuan. Dari penelitian yang telah dilakukan diperoleh  pola-pola belanja customer untuk membentuk suatu layout display dalam toko ritel. Penelitian ini juga menyajikan suatu kerja algoritma yang lebih efektif dari algoritma asli karena terdapat pembatasan perulangan. Untuk kombinasi maksimal 5 item dengan waktu eksekusi 421.06 detik untuk 200 nota.   Kata kunci : Data Mining, Algoritma Sequential Pattern Mining, Market Basket Analysis, Apriori, Layout, Toko Ritel

  5. Algorithms

    Indian Academy of Sciences (India)

    algorithm design technique called 'divide-and-conquer'. One of ... Turtle graphics, September. 1996. 5. ... whole list named 'PO' is a pointer to the first element of the list; ..... Program for computing matrices X and Y and placing the result in C *).

  6. Algorithms

    Indian Academy of Sciences (India)

    algorithm that it is implicitly understood that we know how to generate the next natural ..... Explicit comparisons are made in line (1) where maximum and minimum is ... It can be shown that the function T(n) = 3/2n -2 is the solution to the above ...

  7. Heuristics Miner for E-Commerce Visitor Access Pattern Representation

    OpenAIRE

    Kartina Diah Kesuma Wardhani; Wawan Yunanto

    2017-01-01

    E-commerce click stream data can form a certain pattern that describe visitor behavior while surfing the e-commerce website. This pattern can be used to initiate a design to determine alternative access sequence on the website. This research use heuristic miner algorithm to determine the pattern. σ-Algorithm and Genetic Mining are methods used for pattern recognition with frequent sequence item set approach. Heuristic Miner is an evolved form of those methods. σ-Algorithm assume that an activ...

  8. Improving Pattern Recognition and Neural Network Algorithms with Applications to Solar Panel Energy Optimization

    Science.gov (United States)

    Zamora Ramos, Ernesto

    Artificial Intelligence is a big part of automation and with today's technological advances, artificial intelligence has taken great strides towards positioning itself as the technology of the future to control, enhance and perfect automation. Computer vision includes pattern recognition and classification and machine learning. Computer vision is at the core of decision making and it is a vast and fruitful branch of artificial intelligence. In this work, we expose novel algorithms and techniques built upon existing technologies to improve pattern recognition and neural network training, initially motivated by a multidisciplinary effort to build a robot that helps maintain and optimize solar panel energy production. Our contributions detail an improved non-linear pre-processing technique to enhance poorly illuminated images based on modifications to the standard histogram equalization for an image. While the original motivation was to improve nocturnal navigation, the results have applications in surveillance, search and rescue, medical imaging enhancing, and many others. We created a vision system for precise camera distance positioning motivated to correctly locate the robot for capture of solar panel images for classification. The classification algorithm marks solar panels as clean or dirty for later processing. Our algorithm extends past image classification and, based on historical and experimental data, it identifies the optimal moment in which to perform maintenance on marked solar panels as to minimize the energy and profit loss. In order to improve upon the classification algorithm, we delved into feedforward neural networks because of their recent advancements, proven universal approximation and classification capabilities, and excellent recognition rates. We explore state-of-the-art neural network training techniques offering pointers and insights, culminating on the implementation of a complete library with support for modern deep learning architectures

  9. Unsupervised learning algorithms

    CERN Document Server

    Aydin, Kemal

    2016-01-01

    This book summarizes the state-of-the-art in unsupervised learning. The contributors discuss how with the proliferation of massive amounts of unlabeled data, unsupervised learning algorithms, which can automatically discover interesting and useful patterns in such data, have gained popularity among researchers and practitioners. The authors outline how these algorithms have found numerous applications including pattern recognition, market basket analysis, web mining, social network analysis, information retrieval, recommender systems, market research, intrusion detection, and fraud detection. They present how the difficulty of developing theoretically sound approaches that are amenable to objective evaluation have resulted in the proposal of numerous unsupervised learning algorithms over the past half-century. The intended audience includes researchers and practitioners who are increasingly using unsupervised learning algorithms to analyze their data. Topics of interest include anomaly detection, clustering,...

  10. Algorithms

    Indian Academy of Sciences (India)

    will become clear in the next article when we discuss a simple logo like programming language. ... Rod B may be used as an auxiliary store. The problem is to find an algorithm which performs this task. ... No disks are moved from A to Busing C as auxiliary rod. • move _disk (A, C);. (No + l)th disk is moved from A to C directly ...

  11. Pattern Extraction Algorithm for NetFlow-Based Botnet Activities Detection

    Directory of Open Access Journals (Sweden)

    Rafał Kozik

    2017-01-01

    Full Text Available As computer and network technologies evolve, the complexity of cybersecurity has dramatically increased. Advanced cyber threats have led to current approaches to cyber-attack detection becoming ineffective. Many currently used computer systems and applications have never been deeply tested from a cybersecurity point of view and are an easy target for cyber criminals. The paradigm of security by design is still more of a wish than a reality, especially in the context of constantly evolving systems. On the other hand, protection technologies have also improved. Recently, Big Data technologies have given network administrators a wide spectrum of tools to combat cyber threats. In this paper, we present an innovative system for network traffic analysis and anomalies detection to utilise these tools. The systems architecture is based on a Big Data processing framework, data mining, and innovative machine learning techniques. So far, the proposed system implements pattern extraction strategies that leverage batch processing methods. As a use case we consider the problem of botnet detection by means of data in the form of NetFlows. Results are promising and show that the proposed system can be a useful tool to improve cybersecurity.

  12. Fuel spill identification by gas chromatography -- genetic algorithms/pattern recognition techniques

    International Nuclear Information System (INIS)

    Lavine, B.K.; Moores, A.J.; Faruque, A.

    1998-01-01

    Gas chromatography and pattern recognition methods were used to develop a potential method for typing jet fuels so a spill sample in the environment can be traced to its source. The test data consisted of 256 gas chromatograms of neat jet fuels. 31 fuels that have undergone weathering in a subsurface environment were correctly identified by type using discriminants developed from the gas chromatograms of the neat jet fuels. Coalescing poorly resolved peaks, which occurred during preprocessing, diminished the resolution and hence information content of the GC profiles. Nevertheless a genetic algorithm was able to extract enough information from these profiles to correctly classify the chromatograms of weathered fuels. This suggests that cheaper and simpler GC instruments ca be used to type jet fuels

  13. A hybrid firefly algorithm and pattern search technique for SSSC based power oscillation damping controller design

    Directory of Open Access Journals (Sweden)

    Srikanta Mahapatra

    2014-12-01

    Full Text Available In this paper, a novel hybrid Firefly Algorithm and Pattern Search (h-FAPS technique is proposed for a Static Synchronous Series Compensator (SSSC-based power oscillation damping controller design. The proposed h-FAPS technique takes the advantage of global search capability of FA and local search facility of PS. In order to tackle the drawback of using the remote signal that may impact reliability of the controller, a modified signal equivalent to the remote speed deviation signal is constructed from the local measurements. The performances of the proposed controllers are evaluated in SMIB and multi-machine power system subjected to various transient disturbances. To show the effectiveness and robustness of the proposed design approach, simulation results are presented and compared with some recently published approaches such as Differential Evolution (DE and Particle Swarm Optimization (PSO. It is observed that the proposed approach yield superior damping performance compared to some recently reported approaches.

  14. Atlas Career Path Guidebook: Patterns and Common Practices in Systems Engineers’ Development

    Science.gov (United States)

    2018-01-16

    text mining principles to be used by systems...statistical and text mining principles facilitate the identification of patterns. Figure 1. Helix methodology for career path analysis In order to...illustrate how text mining algorithms might be used to identify similarities in position titles for systems engineers. In broad

  15. Automatic boiling water reactor loading pattern design using ant colony optimization algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Wang, C.-D. [Department of Engineering and System Science, National Tsing Hua University, 101, Section 2 Kuang Fu Road, Hsinchu 30013, Taiwan (China); Nuclear Engineering Division, Institute of Nuclear Energy Research, No. 1000, Wenhua Rd., Jiaan Village, Longtan Township, Taoyuan County 32546, Taiwan (China)], E-mail: jdwang@iner.gov.tw; Lin Chaung [Department of Engineering and System Science, National Tsing Hua University, 101, Section 2 Kuang Fu Road, Hsinchu 30013, Taiwan (China)

    2009-08-15

    An automatic boiling water reactor (BWR) loading pattern (LP) design methodology was developed using the rank-based ant system (RAS), which is a variant of the ant colony optimization (ACO) algorithm. To reduce design complexity, only the fuel assemblies (FAs) of one eight-core positions were determined using the RAS algorithm, and then the corresponding FAs were loaded into the other parts of the core. Heuristic information was adopted to exclude the selection of the inappropriate FAs which will reduce search space, and thus, the computation time. When the LP was determined, Haling cycle length, beginning of cycle (BOC) shutdown margin (SDM), and Haling end of cycle (EOC) maximum fraction of limit for critical power ratio (MFLCPR) were calculated using SIMULATE-3 code, which were used to evaluate the LP for updating pheromone of RAS. The developed design methodology was demonstrated using FAs of a reference cycle of the BWR6 nuclear power plant. The results show that, the designed LP can be obtained within reasonable computation time, and has a longer cycle length than that of the original design.

  16. Algorithms for Image Analysis and Combination of Pattern Classifiers with Application to Medical Diagnosis

    Science.gov (United States)

    Georgiou, Harris

    2009-10-01

    Medical Informatics and the application of modern signal processing in the assistance of the diagnostic process in medical imaging is one of the more recent and active research areas today. This thesis addresses a variety of issues related to the general problem of medical image analysis, specifically in mammography, and presents a series of algorithms and design approaches for all the intermediate levels of a modern system for computer-aided diagnosis (CAD). The diagnostic problem is analyzed with a systematic approach, first defining the imaging characteristics and features that are relevant to probable pathology in mammo-grams. Next, these features are quantified and fused into new, integrated radio-logical systems that exhibit embedded digital signal processing, in order to improve the final result and minimize the radiological dose for the patient. In a higher level, special algorithms are designed for detecting and encoding these clinically interest-ing imaging features, in order to be used as input to advanced pattern classifiers and machine learning models. Finally, these approaches are extended in multi-classifier models under the scope of Game Theory and optimum collective deci-sion, in order to produce efficient solutions for combining classifiers with minimum computational costs for advanced diagnostic systems. The material covered in this thesis is related to a total of 18 published papers, 6 in scientific journals and 12 in international conferences.

  17. Development of pattern recognition algorithms for particles detection from atmospheric images

    International Nuclear Information System (INIS)

    Khatchadourian, S.

    2010-01-01

    The HESS experiment consists of a system of telescopes destined to observe cosmic rays. Since the project has achieved a high level of performances, a second phase of the project has been initiated. This implies the addition of a new telescope which is more sensitive than its predecessors and which is capable of collecting a huge amount of images. In this context, all data collected by the telescope can not be retained because of storage limitations. Therefore, a new real-time system trigger must be designed in order to select interesting events on the fly. The purpose of this thesis was to propose a trigger solution to efficiently discriminate events (images) which are captured by the telescope. The first part of this thesis was to develop pattern recognition algorithms to be implemented within the trigger. A processing chain based on neural networks and Zernike moments has been validated. The second part of the thesis has focused on the implementation of the proposed algorithms onto an FPGA target, taking into account the application constraints in terms of resources and execution time. (author)

  18. Hemodynamic and oxygen transport patterns for outcome prediction, therapeutic goals, and clinical algorithms to improve outcome. Feasibility of artificial intelligence to customize algorithms.

    Science.gov (United States)

    Shoemaker, W C; Patil, R; Appel, P L; Kram, H B

    1992-11-01

    A generalized decision tree or clinical algorithm for treatment of high-risk elective surgical patients was developed from a physiologic model based on empirical data. First, a large data bank was used to do the following: (1) describe temporal hemodynamic and oxygen transport patterns that interrelate cardiac, pulmonary, and tissue perfusion functions in survivors and nonsurvivors; (2) define optimal therapeutic goals based on the supranormal oxygen transport values of high-risk postoperative survivors; (3) compare the relative effectiveness of alternative therapies in a wide variety of clinical and physiologic conditions; and (4) to develop criteria for titration of therapy to the endpoints of the supranormal optimal goals using cardiac index (CI), oxygen delivery (DO2), and oxygen consumption (VO2) as proxy outcome measures. Second, a general purpose algorithm was generated from these data and tested in preoperatively randomized clinical trials of high-risk surgical patients. Improved outcome was demonstrated with this generalized algorithm. The concept that the supranormal values represent compensations that have survival value has been corroborated by several other groups. We now propose a unique approach to refine the generalized algorithm to develop customized algorithms and individualized decision analysis for each patient's unique problems. The present article describes a preliminary evaluation of the feasibility of artificial intelligence techniques to accomplish individualized algorithms that may further improve patient care and outcome.

  19. Evaluation of Documentation Patterns of Trainees and Supervising Physicians Using Data Mining.

    Science.gov (United States)

    Madhavan, Ramesh; Tang, Chi; Bhattacharya, Pratik; Delly, Fadi; Basha, Maysaa M

    2014-09-01

    The electronic health record (EHR) includes a rich data set that may offer opportunities for data mining and natural language processing to answer questions about quality of care, key aspects of resident education, or attributes of the residents' learning environment. We used data obtained from the EHR to report on inpatient documentation practices of residents and attending physicians at a large academic medical center. We conducted a retrospective observational study of deidentified patient notes entered over 7 consecutive months by a multispecialty university physician group at an urban hospital. A novel automated data mining technology was used to extract patient note-related variables. A sample of 26 802 consecutive patient notes was analyzed using the data mining and modeling tool Healthcare Smartgrid. Residents entered most of the notes (33%, 8178 of 24 787) between noon and 4 pm and 31% (7718 of 24 787) of notes between 8 am and noon. Attending physicians placed notes about teaching attestations within 24 hours in only 73% (17 843 of 24 443) of the records. Surgical residents were more likely to place notes before noon (P Data related to patient note entry was successfully used to objectively measure current work flow of resident physicians and their supervising faculty, and the findings have implications for physician oversight of residents' clinical work. We were able to demonstrate the utility of a data mining model as an assessment tool in graduate medical education.

  20. Quick fuzzy backpropagation algorithm.

    Science.gov (United States)

    Nikov, A; Stoeva, S

    2001-03-01

    A modification of the fuzzy backpropagation (FBP) algorithm called QuickFBP algorithm is proposed, where the computation of the net function is significantly quicker. It is proved that the FBP algorithm is of exponential time complexity, while the QuickFBP algorithm is of polynomial time complexity. Convergence conditions of the QuickFBP, resp. the FBP algorithm are defined and proved for: (1) single output neural networks in case of training patterns with different targets; and (2) multiple output neural networks in case of training patterns with equivalued target vector. They support the automation of the weights training process (quasi-unsupervised learning) establishing the target value(s) depending on the network's input values. In these cases the simulation results confirm the convergence of both algorithms. An example with a large-sized neural network illustrates the significantly greater training speed of the QuickFBP rather than the FBP algorithm. The adaptation of an interactive web system to users on the basis of the QuickFBP algorithm is presented. Since the QuickFBP algorithm ensures quasi-unsupervised learning, this implies its broad applicability in areas of adaptive and adaptable interactive systems, data mining, etc. applications.

  1. Heuristic rules embedded genetic algorithm to solve VVER loading pattern optimization problem

    International Nuclear Information System (INIS)

    Fatih, Alim; Kostandi, Ivanov

    2006-01-01

    Full text: Loading Pattern (LP) optimization is one of the most important aspects of the operation of nuclear reactors. A genetic algorithm (GA) code GARCO (Genetic Algorithm Reactor Optimization Code) has been developed with embedded heuristic techniques to perform optimization calculations for in-core fuel management tasks. GARCO is a practical tool that includes a unique methodology applicable for all types of Pressurized Water Reactor (PWR) cores having different geometries with an unlimited number of FA types in the inventory. GARCO was developed by modifying the classical representation of the genotype. Both the genotype representation and the basic algorithm have been modified to incorporate the in-core fuel management heuristics rules so as to obtain the best results in a shorter time. GARCO has three modes. Mode 1 optimizes the locations of the fuel assemblies (FAs) in the nuclear reactor core, Mode 2 optimizes the placement of the burnable poisons (BPs) in a selected LP, and Mode 3 optimizes simultaneously both the LP and the BP placement in the core. This study describes the basic algorithm for Mode 1. The GARCO code is applied to the VVER-1000 reactor hexagonal geometry core in this study. The M oby-Dick i s used as reactor physics code to deplete FAs in the core. It was developed to analyze the VVER reactors by SKODA Inc. To use these rules for creating the initial population with GA operators, the worth definition application is developed. Each FA has a worth value for each location. This worth is between 0 and 1. If worth of any FA for a location is larger than 0.5, this FA in this location is a good choice. When creating the initial population of LPs, a subroutine provides a percent of individuals, which have genes with higher than the 0.5 worth. The percentage of the population to be created without using worth definition is defined in the GARCO input. And also age concept has been developed to accelerate the GA calculation process in reaching the

  2. Mining compressing sequential problems

    NARCIS (Netherlands)

    Hoang, T.L.; Mörchen, F.; Fradkin, D.; Calders, T.G.K.

    2012-01-01

    Compression based pattern mining has been successfully applied to many data mining tasks. We propose an approach based on the minimum description length principle to extract sequential patterns that compress a database of sequences well. We show that mining compressing patterns is NP-Hard and

  3. Automated Detection of Selective Logging in Amazon Forests Using Airborne Lidar Data and Pattern Recognition Algorithms

    Science.gov (United States)

    Keller, M. M.; d'Oliveira, M. N.; Takemura, C. M.; Vitoria, D.; Araujo, L. S.; Morton, D. C.

    2012-12-01

    Selective logging, the removal of several valuable timber trees per hectare, is an important land use in the Brazilian Amazon and may degrade forests through long term changes in structure, loss of forest carbon and species diversity. Similar to deforestation, the annual area affected by selected logging has declined significantly in the past decade. Nonetheless, this land use affects several thousand km2 per year in Brazil. We studied a 1000 ha area of the Antimary State Forest (FEA) in the State of Acre, Brazil (9.304 ○S, 68.281 ○W) that has a basal area of 22.5 m2 ha-1 and an above-ground biomass of 231 Mg ha-1. Logging intensity was low, approximately 10 to 15 m3 ha-1. We collected small-footprint airborne lidar data using an Optech ALTM 3100EA over the study area once each in 2010 and 2011. The study area contained both recent and older logging that used both conventional and technologically advanced logging techniques. Lidar return density averaged over 20 m-2 for both collection periods with estimated horizontal and vertical precision of 0.30 and 0.15 m. A relative density model comparing returns from 0 to 1 m elevation to returns in 1-5 m elevation range revealed the pattern of roads and skid trails. These patterns were confirmed by ground-based GPS survey. A GIS model of the road and skid network was built using lidar and ground data. We tested and compared two pattern recognition approaches used to automate logging detection. Both segmentation using commercial eCognition segmentation and a Frangi filter algorithm identified the road and skid trail network compared to the GIS model. We report on the effectiveness of these two techniques.

  4. A STUDY OF TEXT MINING METHODS, APPLICATIONS,AND TECHNIQUES

    OpenAIRE

    R. Rajamani*1 & S. Saranya2

    2017-01-01

    Data mining is used to extract useful information from the large amount of data. It is used to implement and solve different types of research problems. The research related areas in data mining are text mining, web mining, image mining, sequential pattern mining, spatial mining, medical mining, multimedia mining, structure mining and graph mining. Text mining also referred to text of data mining, it is also called knowledge discovery in text (KDT) or knowledge of intelligent text analysis. T...

  5. Systematic data mining using a pattern database to accelerate yield ramp

    Science.gov (United States)

    Teoh, Edward; Dai, Vito; Capodieci, Luigi; Lai, Ya-Chieh; Gennari, Frank

    2014-03-01

    Pattern-based approaches to physical verification, such as DRC Plus, which use a library of patterns to identify problematic 2D configurations, have been proven to be effective in capturing the concept of manufacturability where traditional DRC fails. As the industry moves to advanced technology nodes, the manufacturing process window tightens and the number of patterns continues to rapidly increase. This increase in patterns brings about challenges in identifying, organizing, and carrying forward the learning of each pattern from test chip designs to first product and then to multiple product variants. This learning includes results from printability simulation, defect scans and physical failure analysis, which are important for accelerating yield ramp. Using pattern classification technology and a relational database, GLOBALFOUNDRIES has constructed a pattern database (PDB) of more than one million potential yield detractor patterns. In PDB, 2D geometries are clustered based on similarity criteria, such as radius and edge tolerance. Each cluster is assigned a representative pattern and a unique identifier (ID). This ID is then used as a persistent reference for linking together information such as the failure mechanism of the patterns, the process condition where the pattern is likely to fail and the number of occurrences of the pattern in a design. Patterns and their associated information are used to populate DRC Plus pattern matching libraries for design-for-manufacturing (DFM) insertion into the design flow for auto-fixing and physical verification. Patterns are used in a production-ready yield learning methodology to identify and score critical hotspot patterns. Patterns are also used to select sites for process monitoring in the fab. In this paper, we describe the design of PDB, the methodology for identifying and analyzing patterns across multiple design and technology cycles, and the use of PDB to accelerate manufacturing process learning. One such

  6. A hybrid artificial bee colony algorithm and pattern search method for inversion of particle size distribution from spectral extinction data

    Science.gov (United States)

    Wang, Li; Li, Feng; Xing, Jian

    2017-10-01

    In this paper, a hybrid artificial bee colony (ABC) algorithm and pattern search (PS) method is proposed and applied for recovery of particle size distribution (PSD) from spectral extinction data. To be more useful and practical, size distribution function is modelled as the general Johnson's ? function that can overcome the difficulty of not knowing the exact type beforehand encountered in many real circumstances. The proposed hybrid algorithm is evaluated through simulated examples involving unimodal, bimodal and trimodal PSDs with different widths and mean particle diameters. For comparison, all examples are additionally validated by the single ABC algorithm. In addition, the performance of the proposed algorithm is further tested by actual extinction measurements with real standard polystyrene samples immersed in water. Simulation and experimental results illustrate that the hybrid algorithm can be used as an effective technique to retrieve the PSDs with high reliability and accuracy. Compared with the single ABC algorithm, our proposed algorithm can produce more accurate and robust inversion results while taking almost comparative CPU time over ABC algorithm alone. The superiority of ABC and PS hybridization strategy in terms of reaching a better balance of estimation accuracy and computation effort increases its potentials as an excellent inversion technique for reliable and efficient actual measurement of PSD.

  7. Real-time intelligent pattern recognition algorithm for surface EMG signals

    Directory of Open Access Journals (Sweden)

    Jahed Mehran

    2007-12-01

    Full Text Available Abstract Background Electromyography (EMG is the study of muscle function through the inquiry of electrical signals that the muscles emanate. EMG signals collected from the surface of the skin (Surface Electromyogram: sEMG can be used in different applications such as recognizing musculoskeletal neural based patterns intercepted for hand prosthesis movements. Current systems designed for controlling the prosthetic hands either have limited functions or can only be used to perform simple movements or use excessive amount of electrodes in order to achieve acceptable results. In an attempt to overcome these problems we have proposed an intelligent system to recognize hand movements and have provided a user assessment routine to evaluate the correctness of executed movements. Methods We propose to use an intelligent approach based on adaptive neuro-fuzzy inference system (ANFIS integrated with a real-time learning scheme to identify hand motion commands. For this purpose and to consider the effect of user evaluation on recognizing hand movements, vision feedback is applied to increase the capability of our system. By using this scheme the user may assess the correctness of the performed hand movement. In this work a hybrid method for training fuzzy system, consisting of back-propagation (BP and least mean square (LMS is utilized. Also in order to optimize the number of fuzzy rules, a subtractive clustering algorithm has been developed. To design an effective system, we consider a conventional scheme of EMG pattern recognition system. To design this system we propose to use two different sets of EMG features, namely time domain (TD and time-frequency representation (TFR. Also in order to decrease the undesirable effects of the dimension of these feature sets, principle component analysis (PCA is utilized. Results In this study, the myoelectric signals considered for classification consists of six unique hand movements. Features chosen for EMG signal

  8. Diagnosing Breast Cancer with the Aid of Fuzzy Logic Based on Data Mining of a Genetic Algorithm in Infrared Images

    Directory of Open Access Journals (Sweden)

    Hossein Ghayoumi Zadeh

    2012-10-01

    Full Text Available Background: Breast cancer is one of the most prevalent cancers among women today. The importance of breast cancer screening, its role in the timely identification of patients, and the reduction in treatment expenses are considered to be among the highest sanitary priorities of a modern country. Thermal imaging clearly possesses a special role in this stage due to rapid diagnosis and use of harmless rays.Methods: We used a thermal camera for imaging of the patients. Important parameters were derived from the images for their posterior analysis with the aid of a genetic algorithm. The principal components that were entered in a fuzzy neural network for clustering breast cancer were identified.Results: The number of images considered for the test included a database of 200 patients out of whom 15 were diagnosed with breast cancer via mammography. Results of the base method show a sensitivity of 93%. The selection of parameters in the combination module gave rise measured errors, which in training of the fuzzy-neural network were of the order of clustering 1.0923×10-5, which reached 2%.Conclusion: The study indicates that thermal image scanning coupled with the presented method based on artificial intelligence can possess a special status in screening women for breast cancer due to the use of harmless non-radiation rays. There are cases where physicians cannot decisively say that the observed pattern in theimage is benign or malignant. In such cases, the response of the computer model can be a valuable support tool for the physician enabling an accurate diagnosis based on the type of imaging pattern as a response from the computer model.

  9. Correlating Microbial Diversity Patterns with Geochemistry in an Extreme and Heterogeneous Environment of Mine Tailings

    Science.gov (United States)

    Liu, Jun; Hua, Zheng-Shuang; Chen, Lin-Xing; Kuang, Jia-Liang; Li, Sheng-Jin; Shu, Wen-Sheng

    2014-01-01

    Recent molecular surveys have advanced our understanding of the forces shaping the large-scale ecological distribution of microbes in Earth's extreme habitats, such as hot springs and acid mine drainage. However, few investigations have attempted dense spatial analyses of specific sites to resolve the local diversity of these extraordinary organisms and how communities are shaped by the harsh environmental conditions found there. We have applied a 16S rRNA gene-targeted 454 pyrosequencing approach to explore the phylogenetic differentiation among 90 microbial communities from a massive copper tailing impoundment generating acidic drainage and coupled these variations in community composition with geochemical parameters to reveal ecological interactions in this extreme environment. Our data showed that the overall microbial diversity estimates and relative abundances of most of the dominant lineages were significantly correlated with pH, with the simplest assemblages occurring under extremely acidic conditions and more diverse assemblages associated with neutral pHs. The consistent shifts in community composition along the pH gradient indicated that different taxa were involved in the different acidification stages of the mine tailings. Moreover, the effect of pH in shaping phylogenetic structure within specific lineages was also clearly evident, although the phylogenetic differentiations within the Alphaproteobacteria, Deltaproteobacteria, and Firmicutes were attributed to variations in ferric and ferrous iron concentrations. Application of the microbial assemblage prediction model further supported pH as the major factor driving community structure and demonstrated that several of the major lineages are readily predictable. Together, these results suggest that pH is primarily responsible for structuring whole communities in the extreme and heterogeneous mine tailings, although the diverse microbial taxa may respond differently to various environmental conditions

  10. Know abnormal, find evil : frequent pattern mining for ransomware threat hunting and intelligence

    OpenAIRE

    Homayoun, S; Dehghantanha, A; Ahmadzadeh, M; Hashemi, S; Khayami, R

    2017-01-01

    Emergence of crypto-ransomware has significantly\\ud changed the cyber threat landscape. A crypto ransomware\\ud removes data custodian access by encrypting valuable data\\ud on victims’ computers and requests a ransom payment to reinstantiate custodian access by decrypting data. Timely detection of ransomware very much depends on how quickly and\\ud accurately system logs can be mined to hunt abnormalities and\\ud stop the evil. In this paper we first setup an environment to\\ud collect activity l...

  11. Improvements in seismic event locations in a deep western U.S. coal mine using tomographic velocity models and an evolutionary search algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Adam Lurka; Peter Swanson [Central Mining Institute, Katowice (Poland)

    2009-09-15

    Methods of improving seismic event locations were investigated as part of a research study aimed at reducing ground control safety hazards. Seismic event waveforms collected with a 23-station three-dimensional sensor array during longwall coal mining provide the data set used in the analyses. A spatially variable seismic velocity model is constructed using seismic event sources in a passive tomographic method. The resulting three-dimensional velocity model is used to relocate seismic event positions. An evolutionary optimization algorithm is implemented and used in both the velocity model development and in seeking improved event location solutions. Results obtained using the different velocity models are compared. The combination of the tomographic velocity model development and evolutionary search algorithm provides improvement to the event locations. 13 refs., 5 figs., 4 tabs.

  12. Working with Data: Discovering Knowledge through Mining and Analysis; Systematic Knowledge Management and Knowledge Discovery; Text Mining; Methodological Approach in Discovering User Search Patterns through Web Log Analysis; Knowledge Discovery in Databases Using Formal Concept Analysis; Knowledge Discovery with a Little Perspective.

    Science.gov (United States)

    Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.

    2000-01-01

    These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)

  13. Using of FPGA coprocessor for improving the execution speed of the pattern recognition algorithm for ATLAS - high energy physics experiment

    CERN Document Server

    Hinkelbein, C; Kugel, A; Männer, R; Miiller, M

    2004-01-01

    Pattern recognition algorithms are used in experimental High Energy physics for getting parameters (features) of particles tracks in detectors. It is particularly important to have fast algorithms in trigger system. This paper investigates the suitability of using FPGA coprocessor for speedup of the TRT-LUT algorithm - one of the feature extraction algorithms for second level trigger for ATLAS experiment (CERN). Two realization of the same algorithm have been compared: C++ realization tested on a computer equipped with dual Xeon 2.4 GHz CPU, 64-bit, 66MHz PCI bus, 1024Mb DDR RAM main memories with Red Hat Linux 7.1 and hybrid C++ - VHDL realisation tested on same PC equipped in addition by MPRACE board (FPGA-Coprocessor board based on Xilinx Virtex-II FPGA and made as 64-bit, 66 MHz PCI card developed at the University of Mannheim). Usage of the FPGA coprocessor can give some reasonable speedup in contrast to general purpose processor only for those algorithms (or parts of algorithms), for which there is a po...

  14. Personalized Recommendation of Learning Material Using Sequential Pattern Mining and Attribute Based Collaborative Filtering

    Science.gov (United States)

    Salehi, Mojtaba; Nakhai Kamalabadi, Isa; Ghaznavi Ghoushchi, Mohammad Bagher

    2014-01-01

    Material recommender system is a significant part of e-learning systems for personalization and recommendation of appropriate materials to learners. However, in the existing recommendation algorithms, dynamic interests and multi-preference of learners and multidimensional-attribute of materials are not fully considered simultaneously. Moreover,…

  15. Supporting Solar Physics Research via Data Mining

    Science.gov (United States)

    Angryk, Rafal; Banda, J.; Schuh, M.; Ganesan Pillai, K.; Tosun, H.; Martens, P.

    2012-05-01

    In this talk we will briefly introduce three pillars of data mining (i.e. frequent patterns discovery, classification, and clustering), and discuss some possible applications of known data mining techniques which can directly benefit solar physics research. In particular, we plan to demonstrate applicability of frequent patterns discovery methods for the verification of hypotheses about co-occurrence (in space and time) of filaments and sigmoids. We will also show how classification/machine learning algorithms can be utilized to verify human-created software modules to discover individual types of solar phenomena. Finally, we will discuss applicability of clustering techniques to image data processing.

  16. Seasonal and spatial patterns of metals at a restored copper mine site. I. Stream copper and zinc

    International Nuclear Information System (INIS)

    Bambic, Dustin G.; Alpers, Charles N.; Green, Peter G.; Fanelli, Eileen; Silk, Wendy K.

    2006-01-01

    Seasonal and spatial variations in metal concentrations and pH were found in a stream at a restored copper mine site located near a massive sulfide deposit in the Foothill copper-zinc belt of the Sierra Nevada, California. At the mouth of the stream, copper concentrations increased and pH decreased with increased streamflow after the onset of winter rain and, unexpectedly, reached extreme values 1 or 2 months after peaks in the seasonal hydrographs. In contrast, aqueous zinc and sulfate concentrations were highest during low-flow periods. Spatial variation was assessed in 400 m of reach encompassing an acidic, metal-laden seep. At this seep, pH remained low (2-3) throughout the year, and copper concentrations were highest. In contrast, the zinc concentrations increased with downstream distance. These spatial patterns were caused by immobilization of copper by hydrous ferric oxides in benthic sediments, coupled with increasing downstream supply of zinc from groundwater seepage. - Seasonal hydrology and benthic sediments control copper and zinc concentrations in a stream through a restored mine site

  17. Articular dysfunction patterns in patients with mechanical low back pain: A clinical algorithm to guide specific mobilization and manipulation techniques.

    Science.gov (United States)

    Dewitte, V; Cagnie, B; Barbe, T; Beernaert, A; Vanthillo, B; Danneels, L

    2015-06-01

    Recent systematic reviews have demonstrated reasonable evidence that lumbar mobilization and manipulation techniques are beneficial. However, knowledge on optimal techniques and doses, and its clinical reasoning is currently lacking. To address this, a clinical algorithm is presented so as to guide therapists in their clinical reasoning to identify patients who are likely to respond to lumbar mobilization and/or manipulation and to direct appropriate technique selection. Key features in subjective and clinical examination suggestive of mechanical nociceptive pain probably arising from articular structures, can categorize patients into distinct articular dysfunction patterns. Based on these patterns, specific mobilization and manipulation techniques are suggested. This clinical algorithm is merely based on empirical clinical expertise and complemented through knowledge exchange between international colleagues. The added value of the proposed articular dysfunction patterns should be considered within a broader perspective. Copyright © 2014 Elsevier Ltd. All rights reserved.

  18. Design of Compressed Sensing Algorithm for Coal Mine IoT Moving Measurement Data Based on a Multi-Hop Network and Total Variation

    Directory of Open Access Journals (Sweden)

    Gang Wang

    2018-05-01

    Full Text Available As the application of a coal mine Internet of Things (IoT, mobile measurement devices, such as intelligent mine lamps, cause moving measurement data to be increased. How to transmit these large amounts of mobile measurement data effectively has become an urgent problem. This paper presents a compressed sensing algorithm for the large amount of coal mine IoT moving measurement data based on a multi-hop network and total variation. By taking gas data in mobile measurement data as an example, two network models for the transmission of gas data flow, namely single-hop and multi-hop transmission modes, are investigated in depth, and a gas data compressed sensing collection model is built based on a multi-hop network. To utilize the sparse characteristics of gas data, the concept of total variation is introduced and a high-efficiency gas data compression and reconstruction method based on Total Variation Sparsity based on Multi-Hop (TVS-MH is proposed. According to the simulation results, by using the proposed method, the moving measurement data flow from an underground distributed mobile network can be acquired and transmitted efficiently.

  19. Design of Compressed Sensing Algorithm for Coal Mine IoT Moving Measurement Data Based on a Multi-Hop Network and Total Variation.

    Science.gov (United States)

    Wang, Gang; Zhao, Zhikai; Ning, Yongjie

    2018-05-28

    As the application of a coal mine Internet of Things (IoT), mobile measurement devices, such as intelligent mine lamps, cause moving measurement data to be increased. How to transmit these large amounts of mobile measurement data effectively has become an urgent problem. This paper presents a compressed sensing algorithm for the large amount of coal mine IoT moving measurement data based on a multi-hop network and total variation. By taking gas data in mobile measurement data as an example, two network models for the transmission of gas data flow, namely single-hop and multi-hop transmission modes, are investigated in depth, and a gas data compressed sensing collection model is built based on a multi-hop network. To utilize the sparse characteristics of gas data, the concept of total variation is introduced and a high-efficiency gas data compression and reconstruction method based on Total Variation Sparsity based on Multi-Hop (TVS-MH) is proposed. According to the simulation results, by using the proposed method, the moving measurement data flow from an underground distributed mobile network can be acquired and transmitted efficiently.

  20. Mining Emerging Sequential Patterns for Activity Recognition in Body Sensor Networks

    DEFF Research Database (Denmark)

    Gu, Tao; Wang, Liang; Chen, Hanhua

    2010-01-01

    Body Sensor Networks oer many applications in healthcare, well-being and entertainment. One of the emerging applications is recognizing activities of daily living. In this paper, we introduce a novel knowledge pattern named Emerging Sequential Pattern (ESP)|a sequential pattern that discovers...... signicant class dierences|to recognize both simple (i.e., sequential) and complex (i.e., interleaved and concurrent) activities. Based on ESPs, we build our complex activity models directly upon the sequential model to recognize both activity types. We conduct comprehensive empirical studies to evaluate...

  1. Image processing and pattern recognition algorithms for evaluation of crossed immunoelectrophoretic patterns (crossed radioimmunoelectrophoresis analysis manager; CREAM)

    DEFF Research Database (Denmark)

    Søndergaard, I; Poulsen, L K; Hagerup, M

    1987-01-01

    points along the precipitation curve in the curve-fitting process. The system has been tested on crossed immunoelectrophoretic patterns as well as crossed radioimmunoelectrophoretic patterns and it has been shown that the system can recognize the same precipitation curves on different immunoplates...

  2. GraMi: Generalized Frequent Pattern Mining in a Single Large Graph

    KAUST Repository

    Saeedy, Mohammed El; Kalnis, Panos

    2011-01-01

    resolution phase as a constraint satisfaction problem, in order to avoid the costly enumeration of all instances of each pattern in the graph. The authors also implemented CGRAMI, a version that supports structural and semantic constraints; and AGRAMI

  3. Phrase Mining of Textual Data to Analyze Extracellular Matrix Protein Patterns Across Cardiovascular Disease.

    Science.gov (United States)

    Liem, David Alexandre; Murali, Sanjana; Sigdel, Dibakar; Shi, Yu; Wang, Xuan; Shen, Jiaming; Choi, Howard; Caufield, J Harry; Wang, Wei; Ping, Peipei; Han, Jiawei

    2018-05-18

    Extracellular matrix (ECM) proteins have been shown to play important roles regulating multiple biological processes in an array of organ systems, including the cardiovascular system. By using a novel bioinformatics text-mining tool, we studied six categories of cardiovascular disease (CVD), namely ischemic heart disease (IHD), cardiomyopathies (CM), cerebrovascular accident (CVA), congenital heart disease (CHD), arrhythmias (ARR), and valve disease (VD), anticipating novel ECM protein-disease and protein-protein relationships hidden within vast quantities of textual data. We conducted a phrase-mining analysis, delineating the relationships of 709 ECM proteins with the six groups of CVDs reported in 1,099,254 abstracts. The technology pipeline known as Context-aware Semantic Online Analytical Processing (CaseOLAP) was applied to semantically rank the association of proteins to each and all six CVDs, performing analyses to quantify each protein-disease relationship. We performed principal component analysis and hierarchical clustering of the data, where each protein is visualized as a six dimensional vector. We found that ECM proteins display variable degrees of association with the six CVDs; certain CVDs share groups of associated proteins whereas others have divergent protein associations. We identified 82 ECM proteins sharing associations with all six CVDs. Our bioinformatics analysis ascribed distinct ECM pathways (via Reactome) from this subset of proteins, namely insulin-like growth factor regulation and interleukin-4 and interleukin-13 signaling, suggesting their contribution to the pathogenesis of all six CVDs. Finally, we performed hierarchical clustering analysis and identified protein clusters associated with a targeted CVD; analyses revealed unexpected insights underlying ECM-pathogenesis of CVDs.

  4. Algorithm for real-time detection of signal patterns using phase synchrony: an application to an electrode array

    Science.gov (United States)

    Sadeghi, Saman; MacKay, William A.; van Dam, R. Michael; Thompson, Michael

    2011-02-01

    Real-time analysis of multi-channel spatio-temporal sensor data presents a considerable technical challenge for a number of applications. For example, in brain-computer interfaces, signal patterns originating on a time-dependent basis from an array of electrodes on the scalp (i.e. electroencephalography) must be analyzed in real time to recognize mental states and translate these to commands which control operations in a machine. In this paper we describe a new technique for recognition of spatio-temporal patterns based on performing online discrimination of time-resolved events through the use of correlation of phase dynamics between various channels in a multi-channel system. The algorithm extracts unique sensor signature patterns associated with each event during a training period and ranks importance of sensor pairs in order to distinguish between time-resolved stimuli to which the system may be exposed during real-time operation. We apply the algorithm to electroencephalographic signals obtained from subjects tested in the neurophysiology laboratories at the University of Toronto. The extension of this algorithm for rapid detection of patterns in other sensing applications, including chemical identification via chemical or bio-chemical sensor arrays, is also discussed.

  5. Frequent Pattern Mining of Eye-Tracking Records Partitioned into Cognitive Chunks

    Directory of Open Access Journals (Sweden)

    Noriyuki Matsuda

    2014-01-01

    Full Text Available Assuming that scenes would be visually scanned by chunking information, we partitioned fixation sequences of web page viewers into chunks using isolate gaze point(s as the delimiter. Fixations were coded in terms of the segments in a 5×5 mesh imposed on the screen. The identified chunks were mostly short, consisting of one or two fixations. These were analyzed with respect to the within- and between-chunk distances in the overall records and the patterns (i.e., subsequences frequently shared among the records. Although the two types of distances were both dominated by zero- and one-block shifts, the primacy of the modal shifts was less prominent between chunks than within them. The lower primacy was compensated by the longer shifts. The patterns frequently extracted at three threshold levels were mostly simple, consisting of one or two chunks. The patterns revealed interesting properties as to segment differentiation and the directionality of the attentional shifts.

  6. A constrained tracking algorithm to optimize plug patterns in multiple isocenter Gamma Knife radiosurgery planning

    International Nuclear Information System (INIS)

    Li Kaile; Ma Lijun

    2005-01-01

    We developed a source blocking optimization algorithm for Gamma Knife radiosurgery, which is based on tracking individual source contributions to arbitrarily shaped target and critical structure volumes. A scalar objective function and a direct search algorithm were used to produce near real-time calculation results. The algorithm allows the user to set and vary the total number of plugs for each shot to limit the total beam-on time. We implemented and tested the algorithm for several multiple-isocenter Gamma Knife cases. It was found that the use of limited number of plugs significantly lowered the integral dose to the critical structures such as an optical chiasm in pituitary adenoma cases. The main effect of the source blocking is the faster dose falloff in the junction area between the target and the critical structure. In summary, we demonstrated a useful source-plugging algorithm for improving complex multi-isocenter Gamma Knife treatment planning cases

  7. DTFP-Growth: Dynamic Threshold-Based FP-Growth Rule Mining Algorithm Through Integrating Gene Expression, Methylation, and Protein-Protein Interaction Profiles.

    Science.gov (United States)

    Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan; Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan; Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan

    2018-04-01

    Association rule mining is an important technique for identifying interesting relationships between gene pairs in a biological data set. Earlier methods basically work for a single biological data set, and, in maximum cases, a single minimum support cutoff can be applied globally, i.e., across all genesets/itemsets. To overcome this limitation, in this paper, we propose dynamic threshold-based FP-growth rule mining algorithm that integrates gene expression, methylation and protein-protein interaction profiles based on weighted shortest distance to find the novel associations among different pairs of genes in multi-view data sets. For this purpose, we introduce three new thresholds, namely, Distance-based Variable/Dynamic Supports (DVS), Distance-based Variable Confidences (DVC), and Distance-based Variable Lifts (DVL) for each rule by integrating co-expression, co-methylation, and protein-protein interactions existed in the multi-omics data set. We develop the proposed algorithm utilizing these three novel multiple threshold measures. In the proposed algorithm, the values of , , and are computed for each rule separately, and subsequently it is verified whether the support, confidence, and lift of each evolved rule are greater than or equal to the corresponding individual , , and values, respectively, or not. If all these three conditions for a rule are found to be true, the rule is treated as a resultant rule. One of the major advantages of the proposed method compared with other related state-of-the-art methods is that it considers both the quantitative and interactive significance among all pairwise genes belonging to each rule. Moreover, the proposed method generates fewer rules, takes less running time, and provides greater biological significance for the resultant top-ranking rules compared to previous methods.

  8. An improved Pattern Search based algorithm to solve the Dynamic Economic Dispatch problem with valve-point effect

    International Nuclear Information System (INIS)

    Alsumait, J.S.; Qasem, M.; Sykulski, J.K.; Al-Othman, A.K.

    2010-01-01

    In this paper, an improved algorithm based on Pattern Search method (PS) to solve the Dynamic Economic Dispatch is proposed. The algorithm maintains the essential unit ramp rate constraint, along with all other necessary constraints, not only for the time horizon of operation (24 h), but it preserves these constraints through the transaction period to the next time horizon (next day) in order to avoid the discontinuity of the power system operation. The Dynamic Economic and Emission Dispatch problem (DEED) is also considered. The load balance constraints, operating limits, valve-point loading and network losses are included in the models of both DED and DEED. The numerical results clarify the significance of the improved algorithm and verify its performance.

  9. Patterns of solidarity: A case study of self-organization in underground mining

    International Nuclear Information System (INIS)

    Vaught, C.

    1991-01-01

    This case study in underground coal mining is informed by some notions of scholars who have written in widely divergent traditions and disciplines. Two major themes dealt with are labor's subjective moment and workplace culture. Regarding the subjective moment of labor, it is argued that there is an expressive element in work which defies reductions to some exchange principle. The struggle, for those articulating capitalist work processes, is to keep this purposive activity from being diverted totally to alien ends. The mediating element in this struggle, which structural Marxists have ignored in their analyses of capitalist workplaces, is culture. There is created a network of lasting relationships in the work group over and above any interdependence engendered by the division of labor. This shared culture allows for a collective recognition of the common product of group work, the shared nature of a particular work process, even the liberating potential of social relations themselves. The group's internalization of these social facts provides a base from which workers can mount an unceasing effort to control their workplace

  10. Online Learners' Navigational Patterns Based on Data Mining in Terms of Learning Achievement

    Science.gov (United States)

    Keskin, Sinan; Sahin, Muhittin; Ozgur, Adem; Yurdugul, Halil

    2016-01-01

    The aim of this study is to determine navigational patterns of university students in a learning management system (LMS). It also investigates whether online learners' navigational behaviors differ in terms of their academic achievement (pass, fail). The data for the study comes from 65 third grade students enrolled in online Computer Network and…

  11. The Plasmodium falciparum Sexual Development Transcriptome: A Microarray Analysis using Ontology-Based Pattern Identification

    National Research Council Canada - National Science Library

    Young, Jason A; Fivelman, Quinton L; Blair, Peter L; de la Vega, Patricia; Le Roch, Karine G; Zhou, Yingyao; Carucci, Daniel J; Baker, David A; Winzeler, Elizabeth A

    2005-01-01

    ... a full-genome high-density oligonucleotide microarray. The interpretation of this transcriptional data was aided by applying a novel knowledge-based data-mining algorithm termed ontology-based pattern identification (OPI...

  12. Extracting Patterns from Educational Traces via Clustering and Associated Quality Metrics

    NARCIS (Netherlands)

    Mihaescu, Marian; Tanasie, Alexandru; Dascalu, Mihai; Trausan-Matu, Stefan

    2016-01-01

    Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers,

  13. Theoretical Aspects of the Patterns Recognition Statistical Theory Used for Developing the Diagnosis Algorithms for Complicated Technical Systems

    Science.gov (United States)

    Obozov, A. A.; Serpik, I. N.; Mihalchenko, G. S.; Fedyaeva, G. A.

    2017-01-01

    In the article, the problem of application of the pattern recognition (a relatively young area of engineering cybernetics) for analysis of complicated technical systems is examined. It is shown that the application of a statistical approach for hard distinguishable situations could be the most effective. The different recognition algorithms are based on Bayes approach, which estimates posteriori probabilities of a certain event and an assumed error. Application of the statistical approach to pattern recognition is possible for solving the problem of technical diagnosis complicated systems and particularly big powered marine diesel engines.

  14. Statistically significant relational data mining :

    Energy Technology Data Exchange (ETDEWEB)

    Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.

    2014-02-01

    This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.

  15. Patterns of disclosure and antiretroviral treatment adherence in a South African mining workplace programme and implications for HIV prevention.

    Science.gov (United States)

    Bhagwanjee, Anil; Govender, Kaymarlin; Akintola, Olagoke; Petersen, Inge; George, Gavin; Johnstone, Leigh; Naidoo, Kerisha

    2011-01-01

    Social and psychological barriers to the disclosure of one's seropositive HIV status to significant others and poor adherence to taking medications pose significant challenges to the scaling-up of access to antiretroviral treatment (ART) in the workplace. Such barriers are predictive of sub-optimal treatment outcomes and bedevil HIV-prevention interventions at a societal level. Against this background, this article explores the lived experiences of 19 HIV-positive male participants, between the ages of 33 and 57 years, who were enrolled in an ART programme managed at an occupational health clinic at a mining company in South Africa. The majority of these mineworkers had been aware of their HIV status for between 5 and 7 years. The study explored psychological and relational factors, as aspects of these participants lived experiences, which had a bearing on their adherence to their ART regimen and the disclosure choices that they made regarding their HIV status. In our sample, those participants who were adherent demonstrated higher levels of control and acceptance of their HIV infection and were more confident in their ability to manage their treatment, while the group who were non-adherent presented with lower levels of adherence motivation and self-efficacy, difficulties in maintaining a healthy lifestyle and significant challenges in maintaining control over their lives. While most of the men favoured disclosing their HIV status to their partners for the sake of treatment support, they were less sure about disclosing to family members and non-family members, respectively, because of their need to protect these persons and due to their fear of being stigmatised. It was evident that treatment adherence choices and behaviours were impacted by psychological and relational factors, including disclosure decisions. We conclude with a bivariate model for understanding the adherence behaviours that influenced different patterns of ART adherence among the sample, and

  16. Melodic pattern extraction in large collections of music recordings using time series mining techniques

    OpenAIRE

    Gulati, Sankalp; Serrà, Joan; Ishwar, Vignesh; Serra, Xavier

    2014-01-01

    We demonstrate a data-driven unsupervised approach for the discovery of melodic patterns in large collections of Indian art music recordings. The approach first works on single recordings and subsequently searches in the entire music collection. Melodic similarity is based on dynamic time warping. The task being computationally intensive, lower bounding and early abandoning techniques are applied during distance computation. Our dataset comprises 365 hours of music, containing 1,764 audio rec...

  17. An Adaptive Sensor Mining Framework for Pervasive Computing Applications

    Science.gov (United States)

    Rashidi, Parisa; Cook, Diane J.

    Analyzing sensor data in pervasive computing applications brings unique challenges to the KDD community. The challenge is heightened when the underlying data source is dynamic and the patterns change. We introduce a new adaptive mining framework that detects patterns in sensor data, and more importantly, adapts to the changes in the underlying model. In our framework, the frequent and periodic patterns of data are first discovered by the Frequent and Periodic Pattern Miner (FPPM) algorithm; and then any changes in the discovered patterns over the lifetime of the system are discovered by the Pattern Adaptation Miner (PAM) algorithm, in order to adapt to the changing environment. This framework also captures vital context information present in pervasive computing applications, such as the startup triggers and temporal information. In this paper, we present a description of our mining framework and validate the approach using data collected in the CASAS smart home testbed.

  18. An Improved Algorithm Research on the PrefixSpan Based on the Server Session Constraint

    Directory of Open Access Journals (Sweden)

    Cai Hong-Guo

    2017-01-01

    Full Text Available When we mine long sequential pattern and discover knowledge by the PrefixSpan algorithm in Web Usage Mining (WUM.The elements and the suffix sequences are much more may cause the problem of the calculation, such as the space explosion. To further solve the problem a more effective way is that. Firstly, a server session-based server log file format is proposed. Then the improved algorithm on the PrefixSpan based on server session constraint is discussed for mining frequent Sequential patterns on the website. Finally, the validity and superiority of the method are presented by the experiment in the paper.

  19. Mining Social and Affective Data for Recommendation of Student Tutors

    Directory of Open Access Journals (Sweden)

    Elisa Boff

    2013-03-01

    Full Text Available This paper presents a learning environment where a mining algorithm is used to learn patterns of interaction with the user and to represent these patterns in a scheme called item descriptors. The learning environment keeps theoretical information about subjects, as well as tools and exercises where the student can put into practice the knowledge gained. One of the main purposes of the project is to stimulate collaborative learning through the interaction of students with different levels of knowledge. The students' actions, as well as their interactions, are monitored by the system and used to find patterns that can guide the search for students that may play the role of a tutor. Such patterns are found with a particular learning algorithm and represented in item descriptors. The paper presents the educational environment, the representation mechanism and learning algorithm used to mine social-affective data in order to create a recommendation model of tutors.

  20. AprioriGWAS, a new pattern mining strategy for detecting genetic variants associated with disease through interaction effects.

    Science.gov (United States)

    Zhang, Qingrun; Long, Quan; Ott, Jurg

    2014-06-01

    Identifying gene-gene interaction is a hot topic in genome wide association studies. Two fundamental challenges are: (1) how to smartly identify combinations of variants that may be associated with the trait from astronomical number of all possible combinations; and (2) how to test epistatic interaction when all potential combinations are available. We developed AprioriGWAS, which brings two innovations. (1) Based on Apriori, a successful method in field of Frequent Itemset Mining (FIM) in which a pattern growth strategy is leveraged to effectively and accurately reduce search space, AprioriGWAS can efficiently identify genetically associated genotype patterns. (2) To test the hypotheses of epistasis, we adopt a new conditional permutation procedure to obtain reliable statistical inference of Pearson's chi-square test for the [Formula: see text] contingency table generated by associated variants. By applying AprioriGWAS to age-related macular degeneration (AMD) data, we found that: (1) angiopoietin 1 (ANGPT1) and four retinal genes interact with Complement Factor H (CFH). (2) GO term "glycosaminoglycan biosynthetic process" was enriched in AMD interacting genes. The epistatic interactions newly found by AprioriGWAS on AMD data are likely true interactions, since genes interacting with CFH are retinal genes, and GO term enrichment also verified that interaction between glycosaminoglycans (GAGs) and CFH plays an important role in disease pathology of AMD. By applying AprioriGWAS on Bipolar disorder in WTCCC data, we found variants without marginal effect show significant interactions. For example, multiple-SNP genotype patterns inside gene GABRB2 and GRIA1 (AMPA subunit 1 receptor gene). AMPARs are found in many parts of the brain and are the most commonly found receptor in the nervous system. The GABRB2 mediates the fastest inhibitory synaptic transmission in the central nervous system. GRIA1 and GABRB2 are relevant to mental disorders supported by multiple

  1. HSM: Heterogeneous Subspace Mining in High Dimensional Data

    DEFF Research Database (Denmark)

    Müller, Emmanuel; Assent, Ira; Seidl, Thomas

    2009-01-01

    Heterogeneous data, i.e. data with both categorical and continuous values, is common in many databases. However, most data mining algorithms assume either continuous or categorical attributes, but not both. In high dimensional data, phenomena due to the "curse of dimensionality" pose additional...... challenges. Usually, due to locally varying relevance of attributes, patterns do not show across the full set of attributes. In this paper we propose HSM, which defines a new pattern model for heterogeneous high dimensional data. It allows data mining in arbitrary subsets of the attributes that are relevant...... for the respective patterns. Based on this model we propose an efficient algorithm, which is aware of the heterogeneity of the attributes. We extend an indexing structure for continuous attributes such that HSM indexing adapts to different attribute types. In our experiments we show that HSM efficiently mines...

  2. ICRPfinder: a fast pattern design algorithm for coding sequences and its application in finding potential restriction enzyme recognition sites

    Directory of Open Access Journals (Sweden)

    Stafford Phillip

    2009-09-01

    Full Text Available Abstract Background Restriction enzymes can produce easily definable segments from DNA sequences by using a variety of cut patterns. There are, however, no software tools that can aid in gene building -- that is, modifying wild-type DNA sequences to express the same wild-type amino acid sequences but with enhanced codons, specific cut sites, unique post-translational modifications, and other engineered-in components for recombinant applications. A fast DNA pattern design algorithm, ICRPfinder, is provided in this paper and applied to find or create potential recognition sites in target coding sequences. Results ICRPfinder is applied to find or create restriction enzyme recognition sites by introducing silent mutations. The algorithm is shown capable of mapping existing cut-sites but importantly it also can generate specified new unique cut-sites within a specified region that are guaranteed not to be present elsewhere in the DNA sequence. Conclusion ICRPfinder is a powerful tool for finding or creating specific DNA patterns in a given target coding sequence. ICRPfinder finds or creates patterns, which can include restriction enzyme recognition sites, without changing the translated protein sequence. ICRPfinder is a browser-based JavaScript application and it can run on any platform, in on-line or off-line mode.

  3. Algorithmic Information Dynamics of Persistent Patterns and Colliding Particles in the Game of Life

    KAUST Repository

    Zenil, Hector; Kiani, Narsis A.; Tegner, Jesper

    2018-01-01

    , Conway's Game of Life (GoL) cellular automaton as a case study. We analyze the distribution of prevailing motifs that occur in GoL from the perspective of algorithmic probability. We demonstrate how the tools introduced are an alternative to computable

  4. Non-Negative Tensor Factorization for Human Behavioral Pattern Mining in Online Games

    Directory of Open Access Journals (Sweden)

    Anna Sapienza

    2018-03-01

    Full Text Available Multiplayer online battle arena is a genre of online games that has become extremely popular. Due to their success, these games also drew the attention of our research community, because they provide a wealth of information about human online interactions and behaviors. A crucial problem is the extraction of activity patterns that characterize this type of data, in an interpretable way. Here, we leverage the Non-negative Tensor Factorization to detect hidden correlated behaviors of playing in a well-known game: League of Legends. To this aim, we collect the entire gaming history of a group of about 1000 players, which accounts for roughly 100K matches. By applying our framework we are able to separate players into different groups. We show that each group exhibits similar features and playing strategies, as well as similar temporal trajectories, i.e., behavioral progressions over the course of their gaming history. We surprisingly discover that playing strategies are stable over time and we provide an explanation for this observation.

  5. Data mining for the social sciences an introduction

    CERN Document Server

    Attewell, Paul

    2015-01-01

    We live in a world of big data: the amount of information collected on human behavior each day is staggering, and exponentially greater than at any time in the past. Additionally, powerful algorithms are capable of churning through seas of data to uncover patterns. Providing a simple and accessible introduction to data mining, Paul Attewell and David B. Monaghan discuss how data mining substantially differs from conventional statistical modeling familiar to most social scientists. The authors also empower social scientists to tap into these new resources and incorporate data mining

  6. A study on PubMed search tag usage pattern: association rule mining of a full-day PubMed query log.

    Science.gov (United States)

    Mosa, Abu Saleh Mohammad; Yoo, Illhoi

    2013-01-09

    The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search. A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm. The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches. The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed's Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE.

  7. Articular dysfunction patterns in patients with mechanical neck pain: a clinical algorithm to guide specific mobilization and manipulation techniques.

    Science.gov (United States)

    Dewitte, Vincent; Beernaert, Axel; Vanthillo, Bart; Barbe, Tom; Danneels, Lieven; Cagnie, Barbara

    2014-02-01

    In view of a didactical approach for teaching cervical mobilization and manipulation techniques to students as well as their use in daily practice, it is mandatory to acquire sound clinical reasoning to optimally apply advanced technical skills. The aim of this Masterclass is to present a clinical algorithm to guide (novice) therapists in their clinical reasoning to identify patients who are likely to respond to mobilization and/or manipulation. The presented clinical reasoning process is situated within the context of pain mechanisms and is narrowed to and applicable in patients with a dominant input pain mechanism. Based on key features in subjective and clinical examination, patients with mechanical nociceptive pain probably arising from articular structures can be categorized into specific articular dysfunction patterns. Pending on these patterns, specific mobilization and manipulation techniques are warranted. The proposed patterns are illustrated in 3 case studies. This clinical algorithm is the corollary of empirical expertise and is complemented by in-depth discussions and knowledge exchange with international colleagues. Consequently, it is intended that a carefully targeted approach contributes to an increase in specificity and safety in the use of cervical mobilizations and manipulation techniques as valuable adjuncts to other manual therapy modalities. Copyright © 2013 Elsevier Ltd. All rights reserved.

  8. Quantification of differences between nailfold capillaroscopy images with a scleroderma pattern and normal pattern using measures of geometric and algorithmic complexity.

    Science.gov (United States)

    Urwin, Samuel George; Griffiths, Bridget; Allen, John

    2017-02-01

    This study aimed to quantify and investigate differences in the geometric and algorithmic complexity of the microvasculature in nailfold capillaroscopy (NFC) images displaying a scleroderma pattern and those displaying a 'normal' pattern. 11 NFC images were qualitatively classified by a capillary specialist as indicative of 'clear microangiopathy' (CM), i.e. a scleroderma pattern, and 11 as 'not clear microangiopathy' (NCM), i.e. a 'normal' pattern. Pre-processing was performed, and fractal dimension (FD) and Kolmogorov complexity (KC) were calculated following image binarisation. FD and KC were compared between groups, and a k-means cluster analysis (n  =  2) on all images was performed, without prior knowledge of the group assigned to them (i.e. CM or NCM), using FD and KC as inputs. CM images had significantly reduced FD and KC compared to NCM images, and the cluster analysis displayed promising results that the quantitative classification of images into CM and NCM groups is possible using the mathematical measures of FD and KC. The analysis techniques used show promise for quantitative microvascular investigation in patients with systemic sclerosis.

  9. Adaptive enhancement of optical fringe patterns by selective reconstruction using FABEMD algorithm and Hilbert spiral transform.

    Science.gov (United States)

    Trusiak, Maciej; Patorski, Krzysztof; Wielgus, Maciej

    2012-10-08

    Presented method for fringe pattern enhancement has been designed for processing and analyzing low quality fringe patterns. It uses a modified fast and adaptive bidimensional empirical mode decomposition (FABEMD) for the extraction of bidimensional intrinsic mode functions (BIMFs) from an interferogram. Fringe pattern is then selectively reconstructed (SR) taking the regions of selected BIMFs with high modulation values only. Amplitude demodulation and normalization of the reconstructed image is conducted using the spiral phase Hilbert transform (HS). It has been tested using computer generated interferograms and real data. The performance of the presented SR-FABEMD-HS method is compared with other normalization techniques. Its superiority, potential and robustness to high fringe density variations and the presence of noise, modulation and background illumination defects in analyzed fringe patterns has been corroborated.

  10. High Performance Data mining by Genetic Neural Network

    Directory of Open Access Journals (Sweden)

    Dadmehr Rahbari

    2013-10-01

    Full Text Available Data mining in computer science is the process of discovering interesting and useful patterns and relationships in large volumes of data. Most methods for mining problems is based on artificial intelligence algorithms. Neural network optimization based on three basic parameters topology, weights and the learning rate is a powerful method. We introduce optimal method for solving this problem. In this paper genetic algorithm with mutation and crossover operators change the network structure and optimized that. Dataset used for our work is stroke disease with twenty features that optimized number of that achieved by new hybrid algorithm. Result of this work is very well incomparison with other similar method. Low present of error show that our method is our new approach to efficient, high-performance data mining problems is introduced.

  11. Are Female Applicants Disadvantaged in National Institutes of Health Peer Review? Combining Algorithmic Text Mining and Qualitative Methods to Detect Evaluative Differences in R01 Reviewers' Critiques.

    Science.gov (United States)

    Magua, Wairimu; Zhu, Xiaojin; Bhattacharya, Anupama; Filut, Amarette; Potvien, Aaron; Leatherberry, Renee; Lee, You-Geon; Jens, Madeline; Malikireddy, Dastagiri; Carnes, Molly; Kaatz, Anna

    2017-05-01

    Women are less successful than men in renewing R01 grants from the National Institutes of Health. Continuing to probe text mining as a tool to identify gender bias in peer review, we used algorithmic text mining and qualitative analysis to examine a sample of critiques from men's and women's R01 renewal applications previously analyzed by counting and comparing word categories. We analyzed 241 critiques from 79 Summary Statements for 51 R01 renewals awarded to 45 investigators (64% male, 89% white, 80% PhD) at the University of Wisconsin-Madison between 2010 and 2014. We used latent Dirichlet allocation to discover evaluative "topics" (i.e., words that co-occur with high probability). We then qualitatively examined the context in which evaluative words occurred for male and female investigators. We also examined sex differences in assigned scores controlling for investigator productivity. Text analysis results showed that male investigators were described as "leaders" and "pioneers" in their "fields," with "highly innovative" and "highly significant research." By comparison, female investigators were characterized as having "expertise" and working in "excellent" environments. Applications from men received significantly better priority, approach, and significance scores, which could not be accounted for by differences in productivity. Results confirm our previous analyses suggesting that gender stereotypes operate in R01 grant peer review. Reviewers may more easily view male than female investigators as scientific leaders with significant and innovative research, and score their applications more competitively. Such implicit bias may contribute to sex differences in award rates for R01 renewals.

  12. Spatial distribution patterns of illegal artisanal small scale gold mining (Galamsey) operations in Ghana: A focus on the Western Region.

    Science.gov (United States)

    Owusu-Nimo, F; Mantey, J; Nyarko, K B; Appiah-Effah, Eugene; Aubynn, A

    2018-02-01

    Recently, there have been efforts by stakeholders to monitor illegal mining ( galamsey) activities, foster their formalization and reclaim the many abandoned wastelands in Ghana. However, limited information exists on the locations, abundance, scope and scale of galamsey types, which hinders the development of effective policy response. This study attempts to map and analyze the distribution patterns, abundance, activity statuses and the extents of nine (9) galamsey types within eleven (11) Municipal and District Assemblies (MDAs) of Ghana's Western Region. It explores the utility of field-based survey, using the Open Data Kit (ODK) system, ArcGIS and Google Earth Imagery to map and visualize different galamsey types under a hostile working environment. A total of 911 galamsey sightings, of which 547 were found in clusters (corresponding to approximately 7106 individual operational units) and 364 in stand-alone mode. Overall, a total of 7470 individual galamsey operations were encountered in 312 different communities (towns and villages). Operationally, the Alluvial Washing Board, Mill-House and Chamfi were found to be the three most popular and practiced galamsey types. The three main galamsey hotspot districts (out of the 11) are the Tarkwa Nsuaem (294 sightings and 3648 individual galamsey sites), Amenfi East (223 sightings and 1397 individual galamsey sites) and Prestea Huni-Valley Districts (156 sightings and 1130 individual galamsey sites). In terms of their activity statuses, 199 abandoned operations (entailing 1855 individual operations), 664 active (entailing 5055 individuals operations) and 48 semi-active (comprising 560 individuals within clusters) galamsey operations were sighted at the time of the study. While galamsey is generally acknowledged to be widespread in Ghana, the results suggest a scale that probably surpasses any previous estimate or expectation. The findings will adequately inform the prioritization of reclamation efforts.

  13. Computed Tomography Image Origin Identification Based on Original Sensor Pattern Noise and 3-D Image Reconstruction Algorithm Footprints.

    Science.gov (United States)

    Duan, Yuping; Bouslimi, Dalel; Yang, Guanyu; Shu, Huazhong; Coatrieux, Gouenou

    2017-07-01

    In this paper, we focus on the "blind" identification of the computed tomography (CT) scanner that has produced a CT image. To do so, we propose a set of noise features derived from the image chain acquisition and which can be used as CT-scanner footprint. Basically, we propose two approaches. The first one aims at identifying a CT scanner based on an original sensor pattern noise (OSPN) that is intrinsic to the X-ray detectors. The second one identifies an acquisition system based on the way this noise is modified by its three-dimensional (3-D) image reconstruction algorithm. As these reconstruction algorithms are manufacturer dependent and kept secret, our features are used as input to train a support vector machine (SVM) based classifier to discriminate acquisition systems. Experiments conducted on images issued from 15 different CT-scanner models of 4 distinct manufacturers demonstrate that our system identifies the origin of one CT image with a detection rate of at least 94% and that it achieves better performance than sensor pattern noise (SPN) based strategy proposed for general public camera devices.

  14. Multiple Memory Structure Bit Reversal Algorithm Based on Recursive Patterns of Bit Reversal Permutation

    Directory of Open Access Journals (Sweden)

    K. K. L. B. Adikaram

    2014-01-01

    Full Text Available With the increasing demand for online/inline data processing efficient Fourier analysis becomes more and more relevant. Due to the fact that the bit reversal process requires considerable processing time of the Fast Fourier Transform (FFT algorithm, it is vital to optimize the bit reversal algorithm (BRA. This paper is to introduce an efficient BRA with multiple memory structures. In 2009, Elster showed the relation between the first and the second halves of the bit reversal permutation (BRP and stated that it may cause serious impact on cache performance of the computer, if implemented. We found exceptions, especially when the said index mapping was implemented with multiple one-dimensional memory structures instead of multidimensional or one-dimensional memory structure. Also we found a new index mapping, even after the recursive splitting of BRP into equal sized slots. The four-array and the four-vector versions of BRA with new index mapping reported 34% and 16% improvement in performance in relation to similar versions of Linear BRA of Elster which uses single one-dimensional memory structure.

  15. Information mining in remote sensing imagery

    Science.gov (United States)

    Li, Jiang

    The volume of remotely sensed imagery continues to grow at an enormous rate due to the advances in sensor technology, and our capability for collecting and storing images has greatly outpaced our ability to analyze and retrieve information from the images. This motivates us to develop image information mining techniques, which is very much an interdisciplinary endeavor drawing upon expertise in image processing, databases, information retrieval, machine learning, and software design. This dissertation proposes and implements an extensive remote sensing image information mining (ReSIM) system prototype for mining useful information implicitly stored in remote sensing imagery. The system consists of three modules: image processing subsystem, database subsystem, and visualization and graphical user interface (GUI) subsystem. Land cover and land use (LCLU) information corresponding to spectral characteristics is identified by supervised classification based on support vector machines (SVM) with automatic model selection, while textural features that characterize spatial information are extracted using Gabor wavelet coefficients. Within LCLU categories, textural features are clustered using an optimized k-means clustering approach to acquire search efficient space. The clusters are stored in an object-oriented database (OODB) with associated images indexed in an image database (IDB). A k-nearest neighbor search is performed using a query-by-example (QBE) approach. Furthermore, an automatic parametric contour tracing algorithm and an O(n) time piecewise linear polygonal approximation (PLPA) algorithm are developed for shape information mining of interesting objects within the image. A fuzzy object-oriented database based on the fuzzy object-oriented data (FOOD) model is developed to handle the fuzziness and uncertainty. Three specific applications are presented: integrated land cover and texture pattern mining, shape information mining for change detection of lakes, and

  16. The Pandora multi-algorithm approach to automated pattern recognition of cosmic-ray muon and neutrino events in the MicroBooNE detector

    Energy Technology Data Exchange (ETDEWEB)

    Acciarri, R.; Bagby, L.; Baller, B.; Carls, B.; Castillo Fernandez, R.; Cavanna, F.; Greenlee, H.; James, C.; Jostlein, H.; Ketchum, W.; Kirby, M.; Kobilarcik, T.; Lockwitz, S.; Lundberg, B.; Marchionni, A.; Moore, C.D.; Palamara, O.; Pavlovic, Z.; Raaf, J.L.; Schukraft, A.; Snider, E.L.; Spentzouris, P.; Strauss, T.; Toups, M.; Wolbers, S.; Yang, T.; Zeller, G.P. [Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States); Adams, C. [Harvard University, Cambridge, MA (United States); Yale University, New Haven, CT (United States); An, R.; Littlejohn, B.R.; Martinez Caicedo, D.A. [Illinois Institute of Technology (IIT), Chicago, IL (United States); Anthony, J.; Escudero Sanchez, L.; De Vries, J.J.; Marshall, J.; Smith, A.; Thomson, M. [University of Cambridge, Cambridge (United Kingdom); Asaadi, J. [University of Texas, Arlington, TX (United States); Auger, M.; Ereditato, A.; Goeldi, D.; Kreslo, I.; Lorca, D.; Luethi, M.; Rudolf von Rohr, C.; Sinclair, J.; Weber, M. [Universitaet Bern, Bern (Switzerland); Balasubramanian, S.; Fleming, B.T.; Gramellini, E.; Hackenburg, A.; Luo, X.; Russell, B.; Tufanli, S. [Yale University, New Haven, CT (United States); Barnes, C.; Mousseau, J.; Spitz, J. [University of Michigan, Ann Arbor, MI (United States); Barr, G.; Bass, M.; Del Tutto, M.; Laube, A.; Soleti, S.R.; De Pontseele, W.V. [University of Oxford, Oxford (United Kingdom); Bay, F. [TUBITAK Space Technologies Research Institute, Ankara (Turkey); Bishai, M.; Chen, H.; Joshi, J.; Kirby, B.; Li, Y.; Mooney, M.; Qian, X.; Viren, B.; Zhang, C. [Brookhaven National Laboratory (BNL), Upton, NY (United States); Blake, A.; Devitt, D.; Lister, A.; Nowak, J. [Lancaster University, Lancaster (United Kingdom); Bolton, T.; Horton-Smith, G.; Meddage, V.; Rafique, A. [Kansas State University (KSU), Manhattan, KS (United States); Camilleri, L.; Caratelli, D.; Crespo-Anadon, J.I.; Fadeeva, A.A.; Genty, V.; Kaleko, D.; Seligman, W.; Shaevitz, M.H. [Columbia University, New York, NY (United States); Church, E. [Pacific Northwest National Laboratory (PNNL), Richland, WA (United States); Cianci, D.; Karagiorgi, G. [Columbia University, New York, NY (United States); The University of Manchester (United Kingdom); Cohen, E.; Piasetzky, E. [Tel Aviv University, Tel Aviv (Israel); Collin, G.H.; Conrad, J.M.; Hen, O.; Hourlier, A.; Moon, J.; Wongjirad, T.; Yates, L. [Massachusetts Institute of Technology (MIT), Cambridge, MA (United States); Convery, M.; Eberly, B.; Rochester, L.; Tsai, Y.T.; Usher, T. [SLAC National Accelerator Laboratory, Menlo Park, CA (United States); Dytman, S.; Graf, N.; Jiang, L.; Naples, D.; Paolone, V.; Wickremasinghe, D.A. [University of Pittsburgh, Pittsburgh, PA (United States); Esquivel, J.; Hamilton, P.; Pulliam, G.; Soderberg, M. [Syracuse University, Syracuse, NY (United States); Foreman, W.; Ho, J.; Schmitz, D.W.; Zennamo, J. [University of Chicago, IL (United States); Furmanski, A.P.; Garcia-Gamez, D.; Hewes, J.; Hill, C.; Murrells, R.; Porzio, D.; Soeldner-Rembold, S.; Szelc, A.M. [The University of Manchester (United Kingdom); Garvey, G.T.; Huang, E.C.; Louis, W.C.; Mills, G.B.; De Water, R.G.V. [Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); Gollapinni, S. [Kansas State University (KSU), Manhattan, KS (United States); University of Tennessee, Knoxville, TN (United States); and others

    2018-01-15

    The development and operation of liquid-argon time-projection chambers for neutrino physics has created a need for new approaches to pattern recognition in order to fully exploit the imaging capabilities offered by this technology. Whereas the human brain can excel at identifying features in the recorded events, it is a significant challenge to develop an automated, algorithmic solution. The Pandora Software Development Kit provides functionality to aid the design and implementation of pattern-recognition algorithms. It promotes the use of a multi-algorithm approach to pattern recognition, in which individual algorithms each address a specific task in a particular topology. Many tens of algorithms then carefully build up a picture of the event and, together, provide a robust automated pattern-recognition solution. This paper describes details of the chain of over one hundred Pandora algorithms and tools used to reconstruct cosmic-ray muon and neutrino events in the MicroBooNE detector. Metrics that assess the current pattern-recognition performance are presented for simulated MicroBooNE events, using a selection of final-state event topologies. (orig.)

  17. The Pandora multi-algorithm approach to automated pattern recognition of cosmic-ray muon and neutrino events in the MicroBooNE detector

    Science.gov (United States)

    Acciarri, R.; Adams, C.; An, R.; Anthony, J.; Asaadi, J.; Auger, M.; Bagby, L.; Balasubramanian, S.; Baller, B.; Barnes, C.; Barr, G.; Bass, M.; Bay, F.; Bishai, M.; Blake, A.; Bolton, T.; Camilleri, L.; Caratelli, D.; Carls, B.; Castillo Fernandez, R.; Cavanna, F.; Chen, H.; Church, E.; Cianci, D.; Cohen, E.; Collin, G. H.; Conrad, J. M.; Convery, M.; Crespo-Anadón, J. I.; Del Tutto, M.; Devitt, D.; Dytman, S.; Eberly, B.; Ereditato, A.; Escudero Sanchez, L.; Esquivel, J.; Fadeeva, A. A.; Fleming, B. T.; Foreman, W.; Furmanski, A. P.; Garcia-Gamez, D.; Garvey, G. T.; Genty, V.; Goeldi, D.; Gollapinni, S.; Graf, N.; Gramellini, E.; Greenlee, H.; Grosso, R.; Guenette, R.; Hackenburg, A.; Hamilton, P.; Hen, O.; Hewes, J.; Hill, C.; Ho, J.; Horton-Smith, G.; Hourlier, A.; Huang, E.-C.; James, C.; Jan de Vries, J.; Jen, C.-M.; Jiang, L.; Johnson, R. A.; Joshi, J.; Jostlein, H.; Kaleko, D.; Karagiorgi, G.; Ketchum, W.; Kirby, B.; Kirby, M.; Kobilarcik, T.; Kreslo, I.; Laube, A.; Li, Y.; Lister, A.; Littlejohn, B. R.; Lockwitz, S.; Lorca, D.; Louis, W. C.; Luethi, M.; Lundberg, B.; Luo, X.; Marchionni, A.; Mariani, C.; Marshall, J.; Martinez Caicedo, D. A.; Meddage, V.; Miceli, T.; Mills, G. B.; Moon, J.; Mooney, M.; Moore, C. D.; Mousseau, J.; Murrells, R.; Naples, D.; Nienaber, P.; Nowak, J.; Palamara, O.; Paolone, V.; Papavassiliou, V.; Pate, S. F.; Pavlovic, Z.; Piasetzky, E.; Porzio, D.; Pulliam, G.; Qian, X.; Raaf, J. L.; Rafique, A.; Rochester, L.; Rudolf von Rohr, C.; Russell, B.; Schmitz, D. W.; Schukraft, A.; Seligman, W.; Shaevitz, M. H.; Sinclair, J.; Smith, A.; Snider, E. L.; Soderberg, M.; Söldner-Rembold, S.; Soleti, S. R.; Spentzouris, P.; Spitz, J.; St. John, J.; Strauss, T.; Szelc, A. M.; Tagg, N.; Terao, K.; Thomson, M.; Toups, M.; Tsai, Y.-T.; Tufanli, S.; Usher, T.; Van De Pontseele, W.; Van de Water, R. G.; Viren, B.; Weber, M.; Wickremasinghe, D. A.; Wolbers, S.; Wongjirad, T.; Woodruff, K.; Yang, T.; Yates, L.; Zeller, G. P.; Zennamo, J.; Zhang, C.

    2018-01-01

    The development and operation of liquid-argon time-projection chambers for neutrino physics has created a need for new approaches to pattern recognition in order to fully exploit the imaging capabilities offered by this technology. Whereas the human brain can excel at identifying features in the recorded events, it is a significant challenge to develop an automated, algorithmic solution. The Pandora Software Development Kit provides functionality to aid the design and implementation of pattern-recognition algorithms. It promotes the use of a multi-algorithm approach to pattern recognition, in which individual algorithms each address a specific task in a particular topology. Many tens of algorithms then carefully build up a picture of the event and, together, provide a robust automated pattern-recognition solution. This paper describes details of the chain of over one hundred Pandora algorithms and tools used to reconstruct cosmic-ray muon and neutrino events in the MicroBooNE detector. Metrics that assess the current pattern-recognition performance are presented for simulated MicroBooNE events, using a selection of final-state event topologies.

  18. Application of the pessimistic pruning to increase the accuracy of C4.5 algorithm in diagnosing chronic kidney disease

    Science.gov (United States)

    Muslim, M. A.; Herowati, A. J.; Sugiharti, E.; Prasetiyo, B.

    2018-03-01

    A technique to dig valuable information buried or hidden in data collection which is so big to be found an interesting patterns that was previously unknown is called data mining. Data mining has been applied in the healthcare industry. One technique used data mining is classification. The decision tree included in the classification of data mining and algorithm developed by decision tree is C4.5 algorithm. A classifier is designed using applying pessimistic pruning in C4.5 algorithm in diagnosing chronic kidney disease. Pessimistic pruning use to identify and remove branches that are not needed, this is done to avoid overfitting the decision tree generated by the C4.5 algorithm. In this paper, the result obtained using these classifiers are presented and discussed. Using pessimistic pruning shows increase accuracy of C4.5 algorithm of 1.5% from 95% to 96.5% in diagnosing of chronic kidney disease.

  19. Improving diagnostic accuracy using agent-based distributed data mining system.

    Science.gov (United States)

    Sridhar, S

    2013-09-01

    The use of data mining techniques to improve the diagnostic system accuracy is investigated in this paper. The data mining algorithms aim to discover patterns and extract useful knowledge from facts recorded in databases. Generally, the expert systems are constructed for automating diagnostic procedures. The learning component uses the data mining algorithms to extract the expert system rules from the database automatically. Learning algorithms can assist the clinicians in extracting knowledge automatically. As the number and variety of data sources is dramatically increasing, another way to acquire knowledge from databases is to apply various data mining algorithms that extract knowledge from data. As data sets are inherently distributed, the distributed system uses agents to transport the trained classifiers and uses meta learning to combine the knowledge. Commonsense reasoning is also used in association with distributed data mining to obtain better results. Combining human expert knowledge and data mining knowledge improves the performance of the diagnostic system. This work suggests a framework of combining the human knowledge and knowledge gained by better data mining algorithms on a renal and gallstone data set.

  20. Applied Swarm-based medicine: collecting decision trees for patterns of algorithms analysis.

    Science.gov (United States)

    Panje, Cédric M; Glatzer, Markus; von Rappard, Joscha; Rothermundt, Christian; Hundsberger, Thomas; Zumstein, Valentin; Plasswilm, Ludwig; Putora, Paul Martin

    2017-08-16

    The objective consensus methodology has recently been applied in consensus finding in several studies on medical decision-making among clinical experts or guidelines. The main advantages of this method are an automated analysis and comparison of treatment algorithms of the participating centers which can be performed anonymously. Based on the experience from completed consensus analyses, the main steps for the successful implementation of the objective consensus methodology were identified and discussed among the main investigators. The following steps for the successful collection and conversion of decision trees were identified and defined in detail: problem definition, population selection, draft input collection, tree conversion, criteria adaptation, problem re-evaluation, results distribution and refinement, tree finalisation, and analysis. This manuscript provides information on the main steps for successful collection of decision trees and summarizes important aspects at each point of the analysis.

  1. Imaging of chest trauma: radiological patterns of injury and diagnostic algorithms

    International Nuclear Information System (INIS)

    Lomoschitz, Fritz M.; Eisenhuber, Edith; Linnau, Ken F.; Peloschek, Philipp; Schoder, Maria; Bankier, Alexander A.

    2003-01-01

    In patients after chest trauma, imaging plays a key role for both, the primary diagnostic work-up, and the secondary assessment of potential treatment. Despite its well-known limitations, the anteroposterior chest radiograph remains the starting point of the imaging work-up. Adjunctive imaging with computed tomography, that recently is increasingly often performed on multidetector computed tomography units, adds essential information not readily available on the conventional radiograph. This allows better definition of trauma-associated thoracic injuries not only in acute traumatic aortic injury, but also in pulmonary, tracheobronchial, cardiac, diaphragmal, and thoracic skeletal injuries. This article reviews common radiographic findings in patients after chest trauma, shows typical imaging features resulting from thoracic injury, presents imaging algorithms, and recalls to the reader less common but clinically relevant entities encountered in patients after thoracic trauma

  2. A methodology for obtaining the control rod patterns in a BWR using genetic algorithms

    International Nuclear Information System (INIS)

    Ortiz S, J.J.; Montes T, J.L.; Requena R, I.

    2003-01-01

    In this work the GACRP system based on the genetic algorithms technique for the obtaining of the drivers of control bars in a BWR reactor is presented. This methodology was applied to a transition cycle and a one of balance of the Laguna Verde nuclear power station (CNLV). For each one of the studied cycles, it was executed the methodology with a fixed length of the cycle and it was compared the effective multiplication factor of neutrons at the end of the cycle that it is obtained with the proposed drivers of control bars and the multiplication factor of neutrons obtained by means of a Haling calculation. It was found that it is possible to extend several days the length of both cycles with regard to the one Haling calculation. (Author)

  3. Distributed genetic process mining

    NARCIS (Netherlands)

    Bratosin, C.C.; Sidorova, N.; Aalst, van der W.M.P.

    2010-01-01

    Process mining aims at discovering process models from data logs in order to offer insight into the real use of information systems. Most of the existing process mining algorithms fail to discover complex constructs or have problems dealing with noise and infrequent behavior. The genetic process

  4. PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data.

    Science.gov (United States)

    Zhang, Yanju; Lameijer, Eric-Wubbo; 't Hoen, Peter A C; Ning, Zemin; Slagboom, P Eline; Ye, Kai

    2012-02-15

    RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon-exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ≈ 137,000 and 173,000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples. The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion.

  5. Meta-algorithmics patterns for robust, low cost, high quality systems

    CERN Document Server

    Simske, Steven J

    2013-01-01

    The confluence of cloud computing, parallelism and advanced machine intelligence approaches has created a world in which the optimum knowledge system will usually be architected from the combination of two or more knowledge-generating systems. There is a need, then, to provide a reusable, broadly-applicable set of design patterns to empower the intelligent system architect to take advantage of this opportunity. This book explains how to design and build intelligent systems that are optimized for changing system requirements (adaptability), optimized for changing system input (robustness), an

  6. Mining Views : database views for data mining

    NARCIS (Netherlands)

    Blockeel, H.; Calders, T.; Fromont, É.; Goethals, B.; Prado, A.

    2008-01-01

    We present a system towards the integration of data mining into relational databases. To this end, a relational database model is proposed, based on the so called virtual mining views. We show that several types of patterns and models over the data, such as itemsets, association rules and decision

  7. Mining Views : database views for data mining

    NARCIS (Netherlands)

    Blockeel, H.; Calders, T.; Fromont, É.; Goethals, B.; Prado, A.; Nijssen, S.; De Raedt, L.

    2007-01-01

    We propose a relational database model towards the integration of data mining into relational database systems, based on the so called virtual mining views. We show that several types of patterns and models over the data, such as itemsets, association rules, decision trees and clusterings, can be

  8. The Pandora multi-algorithm approach to automated pattern recognition of cosmic-ray muon and neutrino events in the MicroBooNE detector

    CERN Document Server

    Acciarri, R.; An, R.; Anthony, J.; Asaadi, J.; Auger, M.; Bagby, L.; Balasubramanian, S.; Baller, B.; Barnes, C.; Barr, G.; Bass, M.; Bay, F.; Bishai, M.; Blake, A.; Bolton, T.; Camilleri, L.; Caratelli, D.; Carls, B.; Castillo Fernandez, R.; Cavanna, F.; Chen, H.; Church, E.; Cianci, D.; Cohen, E.; Collin, G. H.; Conrad, J. M.; Convery, M.; Crespo-Anadón, J. I.; Del Tutto, M.; Devitt, D.; Dytman, S.; Eberly, B.; Ereditato, A.; Escudero Sanchez, L.; Esquivel, J.; Fadeeva, A. A.; Fleming, B. T.; Foreman, W.; Furmanski, A. P.; Garcia-Gamez, D.; Garvey, G. T.; Genty, V.; Goeldi, D.; Gollapinni, S.; Graf, N.; Gramellini, E.; Greenlee, H.; Grosso, R.; Guenette, R.; Hackenburg, A.; Hamilton, P.; Hen, O.; Hewes, J.; Hill, C.; Ho, J.; Horton-Smith, G.; Hourlier, A.; Huang, E.-C.; James, C.; Jan de Vries, J.; Jen, C.-M.; Jiang, L.; Johnson, R. A.; Joshi, J.; Jostlein, H.; Kaleko, D.; Karagiorgi, G.; Ketchum, W.; Kirby, B.; Kirby, M.; Kobilarcik, T.; Kreslo, I.; Laube, A.; Li, Y.; Lister, A.; Littlejohn, B. R.; Lockwitz, S.; Lorca, D.; Louis, W. C.; Luethi, M.; Lundberg, B.; Luo, X.; Marchionni, A.; Mariani, C.; Marshall, J.; Martinez Caicedo, D. A.; Meddage, V.; Miceli, T.; Mills, G. B.; Moon, J.; Mooney, M.; Moore, C. D.; Mousseau, J.; Murrells, R.; Naples, D.; Nienaber, P.; Nowak, J.; Palamara, O.; Paolone, V.; Papavassiliou, V.; Pate, S. F.; Pavlovic, Z.; Piasetzky, E.; Porzio, D.; Pulliam, G.; Qian, X.; Raaf, J. L.; Rafique, A.; Rochester, L.; Rudolf von Rohr, C.; Russell, B.; Schmitz, D. W.; Schukraft, A.; Seligman, W.; Shaevitz, M. H.; Sinclair, J.; Smith, A.; Snider, E. L.; Soderberg, M.; Söldner-Rembold, S.; Soleti, S. R.; Spentzouris, P.; Spitz, J.; St. John, J.; Strauss, T.; Szelc, A. M.; Tagg, N.; Terao, K.; Thomson, M.; Toups, M.; Tsai, Y.-T.; Tufanli, S.; Usher, T.; Van De Pontseele, W.; Van de Water, R. G.; Viren, B.; Weber, M.; Wickremasinghe, D. A.; Wolbers, S.; Wongjirad, T.; Woodruff, K.; Yang, T.; Yates, L.; Zeller, G. P.; Zennamo, J.; Zhang, C.

    2017-01-01

    The development and operation of Liquid-Argon Time-Projection Chambers for neutrino physics has created a need for new approaches to pattern recognition in order to fully exploit the imaging capabilities offered by this technology. Whereas the human brain can excel at identifying features in the recorded events, it is a significant challenge to develop an automated, algorithmic solution. The Pandora Software Development Kit provides functionality to aid the design and implementation of pattern-recognition algorithms. It promotes the use of a multi-algorithm approach to pattern recognition, in which individual algorithms each address a specific task in a particular topology. Many tens of algorithms then carefully build up a picture of the event and, together, provide a robust automated pattern-recognition solution. This paper describes details of the chain of over one hundred Pandora algorithms and tools used to reconstruct cosmic-ray muon and neutrino events in the MicroBooNE detector. Metrics that assess the...

  9. Comparison Spatial Pattern of Land Surface Temperature with Mono Window Algorithm and Split Window Algorithm: A Case Study in South Tangerang, Indonesia

    Science.gov (United States)

    Bunai, Tasya; Rokhmatuloh; Wibowo, Adi

    2018-05-01

    In this paper, two methods to retrieve the Land Surface Temperature (LST) from thermal infrared data supplied by band 10 and 11 of the Thermal Infrared Sensor (TIRS) onboard the Landsat 8 is compared. The first is mono window algorithm developed by Qin et al. and the second is split window algorithm by Rozenstein et al. The purpose of this study is to perform the spatial distribution of land surface temperature, as well as to determine more accurate algorithm for retrieving land surface temperature by calculated root mean square error (RMSE). Finally, we present comparison the spatial distribution of land surface temperature by both of algorithm, and more accurate algorithm is split window algorithm refers to the root mean square error (RMSE) is 7.69° C.

  10. Data mining methods

    CERN Document Server

    Chattamvelli, Rajan

    2015-01-01

    DATA MINING METHODS, Second Edition discusses both theoretical foundation and practical applications of datamining in a web field including banking, e-commerce, medicine, engineering and management. This book starts byintroducing data and information, basic data type, data category and applications of data mining. The second chapterbriefly reviews data visualization technology and importance in data mining. Fundamentals of probability and statisticsare discussed in chapter 3, and novel algorithm for sample covariants are derived. The next two chapters give an indepthand useful discussion of data warehousing and OLAP. Decision trees are clearly explained and a new tabularmethod for decision tree building is discussed. The chapter on association rules discusses popular algorithms andcompares various algorithms in summary table form. An interesting application of genetic algorithm is introduced inthe next chapter. Foundations of neural networks are built from scratch and the back propagation algorithm is derived...

  11. Analysis and Classification of Stride Patterns Associated with Children Development Using Gait Signal Dynamics Parameters and Ensemble Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Meihong Wu

    2016-01-01

    Full Text Available Measuring stride variability and dynamics in children is useful for the quantitative study of gait maturation and neuromotor development in childhood and adolescence. In this paper, we computed the sample entropy (SampEn and average stride interval (ASI parameters to quantify the stride series of 50 gender-matched children participants in three age groups. We also normalized the SampEn and ASI values by leg length and body mass for each participant, respectively. Results show that the original and normalized SampEn values consistently decrease over the significance level of the Mann-Whitney U test (p<0.01 in children of 3–14 years old, which indicates the stride irregularity has been significantly ameliorated with the body growth. The original and normalized ASI values are also significantly changing when comparing between any two groups of young (aged 3–5 years, middle (aged 6–8 years, and elder (aged 10–14 years children. Such results suggest that healthy children may better modulate their gait cadence rhythm with the development of their musculoskeletal and neurological systems. In addition, the AdaBoost.M2 and Bagging algorithms were used to effectively distinguish the children’s gait patterns. These ensemble learning algorithms both provided excellent gait classification results in terms of overall accuracy (≥90%, recall (≥0.8, and precision (≥0.8077.

  12. Application of data mining techniques for nuclear data and instrumentation

    International Nuclear Information System (INIS)

    Toshniwal, Durga

    2013-01-01

    Data mining is defined as the discovery of previously unknown, valid, novel, potentially useful, and understandable patterns in large databases. It encompasses many different techniques and algorithms which differ in the kinds of data that can be analyzed and the form of knowledge representation used to convey the discovered knowledge. Patterns in the data can be represented in many different forms, including classification rules, association rules, clusters, etc. Data mining thus deals with the discovery of hidden trends and patterns from large quantities of data. The field of data mining is emerging as a new, fundamental research area with important applications to science, engineering, medicine, business, and education. It is an interdisciplinary research area and draws upon several roots, including database systems, machine learning, information systems, statistics and expert systems. Data mining, when performed on time series data, is known as time series data mining (TSDM). A time series is a sequence of real numbers, each number representing a value at a point of time. During the past few years, there has been an explosion of research in the area of time series data mining. This includes attempts to model time series data, to design languages to query such data, and to develop access structures to efficiently process queries on such data. Time series data arises naturally in many real-world applications. Efficient discovery of knowledge through time series data mining can be helpful in several domains such as: Stock market analysis, Weather forecasting etc. An important application area of data mining techniques is in nuclear power plant and related data. Nuclear power plant data can be represented in form of time sequences. Often it may be of prime importance to analyze such data to find trends and anomalies. The general goals of data mining include feature extraction, similarity search, clustering and classification, association rule mining and anomaly

  13. Blackout risk prevention in a smart grid based flexible optimal strategy using Grey Wolf-pattern search algorithms

    International Nuclear Information System (INIS)

    Mahdad, Belkacem; Srairi, K.

    2015-01-01

    Highlights: • A generalized optimal security power system planning strategy for blackout risk prevention is proposed. • A Grey Wolf Optimizer dynamically coordinated with Pattern Search algorithm is proposed. • A useful optimized database dynamically generated considering margin loading stability under severe faults. • The robustness and feasibility of the proposed strategy is validated in the standard IEEE 30 Bus system. • The proposed planning strategy will be useful for power system protection coordination and control. - Abstract: Developing a flexible and reliable power system planning strategy under critical situations is of great importance to experts and industrials to minimize the probability of blackouts occurrence. This paper introduces the first stage of this practical strategy by the application of Grey Wolf Optimizer coordinated with pattern search algorithm for solving the security smart grid power system management under critical situations. The main objective of this proposed planning strategy is to prevent the practical power system against blackout due to the apparition of faults in generating units or important transmission lines. At the first stage the system is pushed to its margin stability limit, the critical loads shedding are selected using voltage stability index. In the second stage the generator control variables, the reactive power of shunt and dynamic compensators are adjusted in coordination with minimization the active and reactive power at critical loads to maintain the system at security state to ensure service continuity. The feasibility and efficiency of the proposed strategy is applied to IEEE 30-Bus test system. Results are promising and prove the practical efficiency of the proposed strategy to ensure system security under critical situations

  14. Efficient discovery of risk patterns in medical data.

    Science.gov (United States)

    Li, Jiuyong; Fu, Ada Wai-chee; Fahey, Paul

    2009-01-01

    This paper studies a problem of efficiently discovering risk patterns in medical data. Risk patterns are defined by a statistical metric, relative risk, which has been widely used in epidemiological research. To avoid fruitless search in the complete exploration of risk patterns, we define optimal risk pattern set to exclude superfluous patterns, i.e. complicated patterns with lower relative risk than their corresponding simpler form patterns. We prove that mining optimal risk pattern sets conforms an anti-monotone property that supports an efficient mining algorithm. We propose an efficient algorithm for mining optimal risk pattern sets based on this property. We also propose a hierarchical structure to present discovered patterns for the easy perusal by domain experts. The proposed approach is compared with two well-known rule discovery methods, decision tree and association rule mining approaches on benchmark data sets and applied to a real world application. The proposed method discovers more and better quality risk patterns than a decision tree approach. The decision tree method is not designed for such applications and is inadequate for pattern exploring. The proposed method does not discover a large number of uninteresting superfluous patterns as an association mining approach does. The proposed method is more efficient than an association rule mining method. A real world case study shows that the method reveals some interesting risk patterns to medical practitioners. The proposed method is an efficient approach to explore risk patterns. It quickly identifies cohorts of patients that are vulnerable to a risk outcome from a large data set. The proposed method is useful for exploratory study on large medical data to generate and refine hypotheses. The method is also useful for designing medical surveillance systems.

  15. Collaborative mining and transfer learning for relational data

    Science.gov (United States)

    Levchuk, Georgiy; Eslami, Mohammed

    2015-06-01

    Many of the real-world problems, - including human knowledge, communication, biological, and cyber network analysis, - deal with data entities for which the essential information is contained in the relations among those entities. Such data must be modeled and analyzed as graphs, with attributes on both objects and relations encode and differentiate their semantics. Traditional data mining algorithms were originally designed for analyzing discrete objects for which a set of features can be defined, and thus cannot be easily adapted to deal with graph data. This gave rise to the relational data mining field of research, of which graph pattern learning is a key sub-domain [11]. In this paper, we describe a model for learning graph patterns in collaborative distributed manner. Distributed pattern learning is challenging due to dependencies between the nodes and relations in the graph, and variability across graph instances. We present three algorithms that trade-off benefits of parallelization and data aggregation, compare their performance to centralized graph learning, and discuss individual benefits and weaknesses of each model. Presented algorithms are designed for linear speedup in distributed computing environments, and learn graph patterns that are both closer to ground truth and provide higher detection rates than centralized mining algorithm.

  16. Optical pattern recognition architecture implementing the mean-square error correlation algorithm

    Science.gov (United States)

    Molley, Perry A.

    1991-01-01

    An optical architecture implementing the mean-square error correlation algorithm, MSE=.SIGMA.[I-R].sup.2 for discriminating the presence of a reference image R in an input image scene I by computing the mean-square-error between a time-varying reference image signal s.sub.1 (t) and a time-varying input image signal s.sub.2 (t) includes a laser diode light source which is temporally modulated by a double-sideband suppressed-carrier source modulation signal I.sub.1 (t) having the form I.sub.1 (t)=A.sub.1 [1+.sqroot.2m.sub.1 s.sub.1 (t)cos (2.pi.f.sub.o t)] and the modulated light output from the laser diode source is diffracted by an acousto-optic deflector. The resultant intensity of the +1 diffracted order from the acousto-optic device is given by: I.sub.2 (t)=A.sub.2 [+2m.sub.2.sup.2 s.sub.2.sup.2 (t)-2.sqroot.2m.sub.2 (t) cos (2.pi.f.sub.o t] The time integration of the two signals I.sub.1 (t) and I.sub.2 (t) on the CCD deflector plane produces the result R(.tau.) of the mean-square error having the form: R(.tau.)=A.sub.1 A.sub.2 {[T]+[2m.sub.2.sup.2.multidot..intg.s.sub.2.sup.2 (t-.tau.)dt]-[2m.sub.1 m.sub.2 cos (2.tau.f.sub.o .tau.).multidot..intg.s.sub.1 (t)s.sub.2 (t-.tau.)dt]} where: s.sub.1 (t) is the signal input to the diode modulation source: s.sub.2 (t) is the signal input to the AOD modulation source; A.sub.1 is the light intensity; A.sub.2 is the diffraction efficiency; m.sub.1 and m.sub.2 are constants that determine the signal-to-bias ratio; f.sub.o is the frequency offset between the oscillator at f.sub.c and the modulation at f.sub.c +f.sub.o ; and a.sub.o and a.sub.1 are constant chosen to bias the diode source and the acousto-optic deflector into their respective linear operating regions so that the diode source exhibits a linear intensity characteristic and the AOD exhibits a linear amplitude characteristic.

  17. Recolonization patterns of ants in a rehabilitated lignite mine in central Italy: Potential for the use of Mediterranean ants as indicators of restoration processes

    Energy Technology Data Exchange (ETDEWEB)

    Ottonetti, L.; Tucci, L.; Santini, G. [University of Florence, Florence (Italy)

    2006-03-15

    Ant (Hymenoptera: Formicidae) assemblages were sampled with pitfall traps in three different habitats associated with a rehabilitated mine district and in undisturbed forests in Tuscany, Italy. The four habitats were (1) open fields (3-4 years old); (2) a middle-age mixed plantation (10 years); (3) an old-age mixed plantation (20 years); and (4) an oak woodland (40 years) not directly affected by mining activities. The aim of the study was to analyze ant recolonization patterns in order to provide insights on the use of Mediterranean ant fauna as indicators of restoration processes. Species richness and diversity were not significantly different among the four habitats. However, multivariate analyses showed that the assemblages in the different habitats were clearly differentiated, with similarity relationships reflecting a successional gradient among rehabilitated sites. The observed patterns of functional group changes along the gradient broadly accord with those of previous studies in other biogeographic regions. These were (1) a decrease of dominant Dolichoderinae and opportunists; (2) an increase in the proportion of cold-climate specialists; and (3) the appearance of the Cryptic species in the oldest plantations, with a maximum of abundance in the woodland. In conclusion, the results of our study supported the use of Mediterranean ants as a suitable tool for biomonitoring of restoration processes, and in particular, the functional group approach proved a valuable framework to better interpret local trends in terms of global ecological patterns. Further research is, however, needed in order to obtain a reliable classification of Mediterranean ant functional groups.

  18. A novel iris patterns matching algorithm of weighted polar frequency correlation

    Science.gov (United States)

    Zhao, Weijie; Jiang, Linhua

    2014-11-01

    Iris recognition is recognized as one of the most accurate techniques for biometric authentication. In this paper, we present a novel correlation method - Weighted Polar Frequency Correlation(WPFC) - to match and evaluate two iris images, actually it can also be used for evaluating the similarity of any two images. The WPFC method is a novel matching and evaluating method for iris image matching, which is complete different from the conventional methods. For instance, the classical John Daugman's method of iris recognition uses 2D Gabor wavelets to extract features of iris image into a compact bit stream, and then matching two bit streams with hamming distance. Our new method is based on the correlation in the polar coordinate system in frequency domain with regulated weights. The new method is motivated by the observation that the pattern of iris that contains far more information for recognition is fine structure at high frequency other than the gross shapes of iris images. Therefore, we transform iris images into frequency domain and set different weights to frequencies. Then calculate the correlation of two iris images in frequency domain. We evaluate the iris images by summing the discrete correlation values with regulated weights, comparing the value with preset threshold to tell whether these two iris images are captured from the same person or not. Experiments are carried out on both CASIA database and self-obtained images. The results show that our method is functional and reliable. Our method provides a new prospect for iris recognition system.

  19. Current constrained voltage scaled reconstruction (CCVSR) algorithm for MR-EIT and its performance with different probing current patterns

    International Nuclear Information System (INIS)

    Birguel, Oezlem; Eyueboglu, B Murat; Ider, Y Ziya

    2003-01-01

    Conventional injected-current electrical impedance tomography (EIT) and magnetic resonance imaging (MRI) techniques can be combined to reconstruct high resolution true conductivity images. The magnetic flux density distribution generated by the internal current density distribution is extracted from MR phase images. This information is used to form a fine detailed conductivity image using an Ohm's law based update equation. The reconstructed conductivity image is assumed to differ from the true image by a scale factor. EIT surface potential measurements are then used to scale the reconstructed image in order to find the true conductivity values. This process is iterated until a stopping criterion is met. Several simulations are carried out for opposite and cosine current injection patterns to select the best current injection pattern for a 2D thorax model. The contrast resolution and accuracy of the proposed algorithm are also studied. In all simulation studies, realistic noise models for voltage and magnetic flux density measurements are used. It is shown that, in contrast to the conventional EIT techniques, the proposed method has the capability of reconstructing conductivity images with uniform and high spatial resolution. The spatial resolution is limited by the larger element size of the finite element mesh and twice the magnetic resonance image pixel size

  20. Use of Data Mining to Reveal Body Mass Index (BMI): Patterns among Pennsylvania Schoolchildren, Pre-K to Grade 12

    Science.gov (United States)

    YoussefAgha, Ahmed H.; Lohrmann, David K.; Jayawardene, Wasantha P.

    2013-01-01

    Background: Health eTools for Schools was developed to assist school nurses with routine entries, including height and weight, on student health records, thus providing a readily accessible data base. Data-mining techniques were applied to this database to determine if clinically signi?cant results could be generated. Methods: Body mass index…

  1. Public health implications of changing patterns of recruitment into the South African mining industry, 1973–2012: a database analysis

    Directory of Open Access Journals (Sweden)

    Rodney Ehrlich

    2017-08-01

    Full Text Available Abstract Background The triple epidemic of silicosis, tuberculosis and HIV infection among migrant miners from South Africa and neighbouring countries who have worked in the South African mining industry is currently the target of regional and international control efforts. These initiatives are hampered by a lack of information on this population. Methods This study analysed the major South African mining recruitment database for the period 1973 to 2012 by calendar intervals and demographic and occupational characteristics. Changes in area of recruitment were mapped using a geographic information system. Results The database contained over 10 million contracts, reducible to 1.64 million individuals. Major trends relevant to health projection were a decline in gold mining employment, the major source of silicosis; increasing recruitment of female miners; and shifts in recruitment from foreign to South African miners, from the Eastern to the Northwestern parts of South Africa, and from company employees to contractors. Conclusions These changes portend further externalisation of the burden of mining lung disease to home communities, as miners, particularly from the gold sector, leave the industry. The implications for health, surveillance and health services of the growing number of miners hired as contractors need further research, as does the health experience of female miners. Overall, the information in this report can be used for projection of disease burden and direction of compensation, screening and treatment services for the ex-miner population throughout Southern Africa.

  2. Public health implications of changing patterns of recruitment into the South African mining industry, 1973-2012: a database analysis.

    Science.gov (United States)

    Ehrlich, Rodney; Montgomery, Alex; Akugizibwe, Paula; Gonsalves, Gregg

    2017-08-03

    The triple epidemic of silicosis, tuberculosis and HIV infection among migrant miners from South Africa and neighbouring countries who have worked in the South African mining industry is currently the target of regional and international control efforts. These initiatives are hampered by a lack of information on this population. This study analysed the major South African mining recruitment database for the period 1973 to 2012 by calendar intervals and demographic and occupational characteristics. Changes in area of recruitment were mapped using a geographic information system. The database contained over 10 million contracts, reducible to 1.64 million individuals. Major trends relevant to health projection were a decline in gold mining employment, the major source of silicosis; increasing recruitment of female miners; and shifts in recruitment from foreign to South African miners, from the Eastern to the Northwestern parts of South Africa, and from company employees to contractors. These changes portend further externalisation of the burden of mining lung disease to home communities, as miners, particularly from the gold sector, leave the industry. The implications for health, surveillance and health services of the growing number of miners hired as contractors need further research, as does the health experience of female miners. Overall, the information in this report can be used for projection of disease burden and direction of compensation, screening and treatment services for the ex-miner population throughout Southern Africa.

  3. On Identifying Useful Patterns to Analyze Products in Retail Transaction Databases

    Science.gov (United States)

    Yun, Unil

    Mining correlated patterns in large transaction databases is one of the essential tasks in data mining since a huge number of patterns are usually mined, but it is hard to find patterns with the correlation. The needed data analysis should be made according to the requirements of the particular real application. In previous mining approaches, patterns with the weak affinity are found even with a high minimum support. In this paper, we suggest weighted support affinity pattern mining in which a new measure, weighted support confidence (ws-confidence) is developed to identify correlated patterns with the weighted support affinity. To efficiently prune the weak affinity patterns, we prove that the ws-confidence measure satisfies the anti-monotone and cross weighted support properties which can be applied to eliminate patterns with dissimilar weighted support levels. Based on the two properties, we develop a weighted support affinity pattern mining algorithm (WSP). The weighted support affinity patterns can be useful to answer the comparative analysis queries such as finding itemsets containing items which give similar total selling expense levels with an acceptable error range α% and detecting item lists with similar levels of total profits. In addition, our performance study shows that WSP is efficient and scalable for mining weighted support affinity patterns.

  4. Data mining in radiology

    International Nuclear Information System (INIS)

    Kharat, Amit T; Singh, Amarjit; Kulkarni, Vilas M; Shah, Digish

    2014-01-01

    Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining

  5. Application and Exploration of Big Data Mining in Clinical Medicine.

    Science.gov (United States)

    Zhang, Yue; Guo, Shu-Li; Han, Li-Na; Li, Tie-Ling

    2016-03-20

    To review theories and technologies of big data mining and their application in clinical medicine. Literatures published in English or Chinese regarding theories and technologies of big data mining and the concrete applications of data mining technology in clinical medicine were obtained from PubMed and Chinese Hospital Knowledge Database from 1975 to 2015. Original articles regarding big data mining theory/technology and big data mining's application in the medical field were selected. This review characterized the basic theories and technologies of big data mining including fuzzy theory, rough set theory, cloud theory, Dempster-Shafer theory, artificial neural network, genetic algorithm, inductive learning theory, Bayesian network, decision tree, pattern recognition, high-performance computing, and statistical analysis. The application of big data mining in clinical medicine was analyzed in the fields of disease risk assessment, clinical decision support, prediction of disease development, guidance of rational use of drugs, medical management, and evidence-based medicine. Big data mining has the potential to play an important role in clinical medicine.

  6. Application and Exploration of Big Data Mining in Clinical Medicine

    Science.gov (United States)

    Zhang, Yue; Guo, Shu-Li; Han, Li-Na; Li, Tie-Ling

    2016-01-01

    Objective: To review theories and technologies of big data mining and their application in clinical medicine. Data Sources: Literatures published in English or Chinese regarding theories and technologies of big data mining and the concrete applications of data mining technology in clinical medicine were obtained from PubMed and Chinese Hospital Knowledge Database from 1975 to 2015. Study Selection: Original articles regarding big data mining theory/technology and big data mining's application in the medical field were selected. Results: This review characterized the basic theories and technologies of big data mining including fuzzy theory, rough set theory, cloud theory, Dempster–Shafer theory, artificial neural network, genetic algorithm, inductive learning theory, Bayesian network, decision tree, pattern recognition, high-performance computing, and statistical analysis. The application of big data mining in clinical medicine was analyzed in the fields of disease risk assessment, clinical decision support, prediction of disease development, guidance of rational use of drugs, medical management, and evidence-based medicine. Conclusion: Big data mining has the potential to play an important role in clinical medicine. PMID:26960378

  7. Data mining in soft computing framework: a survey.

    Science.gov (United States)

    Mitra, S; Pal, S K; Mitra, P

    2002-01-01

    The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included.

  8. Pattern Discovery and Change Detection of Online Music Query Streams

    Science.gov (United States)

    Li, Hua-Fu

    In this paper, an efficient stream mining algorithm, called FTP-stream (Frequent Temporal Pattern mining of streams), is proposed to find the frequent temporal patterns over melody sequence streams. In the framework of our proposed algorithm, an effective bit-sequence representation is used to reduce the time and memory needed to slide the windows. The FTP-stream algorithm can calculate the support threshold in only a single pass based on the concept of bit-sequence representation. It takes the advantage of "left" and "and" operations of the representation. Experiments show that the proposed algorithm only scans the music query stream once, and runs significant faster and consumes less memory than existing algorithms, such as SWFI-stream and Moment.

  9. FDG PET/CT patterns of treatment failure of malignant pleural mesothelioma: relationship to histologic type, treatment algorithm, and survival

    Energy Technology Data Exchange (ETDEWEB)

    Gerbaudo, Victor H.; Mamede, Marcelo [Brigham and Women' s Hospital, Harvard Medical School, Division of Nuclear Medicine and Molecular Imaging, Boston, MA (United States); Trotman-Dickenson, Beatrice; Hatabu, Hiroto [Brigham and Women' s Hospital, Harvard Medical School, Division of Thoracic Radiology, Boston, MA (United States); Sugarbaker, David J. [Brigham and Women' s Hospital, Harvard Medical School, Division of Thoracic Surgery, Boston, MA (United States)

    2011-05-15

    This study investigated the diagnostic performance and prognostic value of fluorodeoxyglucose (FDG) positron emission tomography (PET)/CT in suspected malignant pleural mesothelioma (MPM) recurrence, in the context of patterns and intensity of FDG uptake, histologic type, and treatment algorithm. Fifty patients with MPM underwent FDG PET/CT for restaging 11 {+-} 6 months after therapy. Tumor relapse was confirmed by histopathology, and by clinical evolution and subsequent imaging. Progression-free survival was defined as the time between treatment and the earliest clinical evidence of recurrence. Survival after FDG PET/CT was defined as the time between the scan and death or last follow-up. Overall survival was defined as the time between initial treatment and death or last follow-up date. Treatment failure was confirmed in 42 patients (30 epithelial and 12 non-epithelial MPM). Sensitivity, specificity, accuracy, negative predictive value, and positive predictive value for FDG PET/CT were 97.6, 75, 94, 86, and 95.3%, respectively. FDG PET/CT evidence of single site of recurrence was observed in the ipsilateral hemithorax in 18 patients (44%), contralaterally in 2 (5%), and in the abdomen in 1 patient (2%). Bilateral thoracic relapse was detected in three patients (7%). Simultaneous recurrence in the ipsilateral hemithorax and abdomen was observed in ten (24%) patients and in seven (17%) in all three cavities. Unsuspected distant metastases were detected in 11 patients (26%). Four patterns of uptake were observed in recurrent disease: focal, linear, mixed (focal/linear), and encasing, with a significant difference between the intensity of uptake in malignant lesions compared to benign post-therapeutic changes. Lesion uptake was lower in patients previously treated with more aggressive therapy and higher in intrathoracic lesions of patients with distant metastases. FDG PET/CT helped in the selection of 12 patients (29%) who benefited from additional previously

  10. FDG PET/CT patterns of treatment failure of malignant pleural mesothelioma: relationship to histologic type, treatment algorithm, and survival

    International Nuclear Information System (INIS)

    Gerbaudo, Victor H.; Mamede, Marcelo; Trotman-Dickenson, Beatrice; Hatabu, Hiroto; Sugarbaker, David J.

    2011-01-01

    This study investigated the diagnostic performance and prognostic value of fluorodeoxyglucose (FDG) positron emission tomography (PET)/CT in suspected malignant pleural mesothelioma (MPM) recurrence, in the context of patterns and intensity of FDG uptake, histologic type, and treatment algorithm. Fifty patients with MPM underwent FDG PET/CT for restaging 11 ± 6 months after therapy. Tumor relapse was confirmed by histopathology, and by clinical evolution and subsequent imaging. Progression-free survival was defined as the time between treatment and the earliest clinical evidence of recurrence. Survival after FDG PET/CT was defined as the time between the scan and death or last follow-up. Overall survival was defined as the time between initial treatment and death or last follow-up date. Treatment failure was confirmed in 42 patients (30 epithelial and 12 non-epithelial MPM). Sensitivity, specificity, accuracy, negative predictive value, and positive predictive value for FDG PET/CT were 97.6, 75, 94, 86, and 95.3%, respectively. FDG PET/CT evidence of single site of recurrence was observed in the ipsilateral hemithorax in 18 patients (44%), contralaterally in 2 (5%), and in the abdomen in 1 patient (2%). Bilateral thoracic relapse was detected in three patients (7%). Simultaneous recurrence in the ipsilateral hemithorax and abdomen was observed in ten (24%) patients and in seven (17%) in all three cavities. Unsuspected distant metastases were detected in 11 patients (26%). Four patterns of uptake were observed in recurrent disease: focal, linear, mixed (focal/linear), and encasing, with a significant difference between the intensity of uptake in malignant lesions compared to benign post-therapeutic changes. Lesion uptake was lower in patients previously treated with more aggressive therapy and higher in intrathoracic lesions of patients with distant metastases. FDG PET/CT helped in the selection of 12 patients (29%) who benefited from additional previously

  11. Data clustering algorithms and applications

    CERN Document Server

    Aggarwal, Charu C

    2013-01-01

    Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains.The book focuses on three primary aspects of data clustering: Methods, describing key techniques commonly used for clustering, such as fea

  12. 'ISL pattern reserve requirements for today's spot price,' or 'how many in-place pounds are needed for a mining pattern to be profitable in today's market'

    International Nuclear Information System (INIS)

    Anthony, H.L.

    2000-01-01

    Recent uranium spot market values place additional burdens on the geologist and project manager to identify mineralized ore that will yield a profitable return on investment to the mining venture and its investors. The author reviews the various cost components that comprise the total work effort required to produce uranium via ISL methods to arrive at a suitable ore grade that will guarantee profitably. Amortization of costs based on recent expenditures for typical ISL operations are used in conjunction with wellfield development, operating and restoration costs to determine the ore value required to show a positive return on investment. (author)

  13. Investigation on the applicability of Piety's on-line PSD-pattern recognition algorithm to boiling detection by neutron-noise at a swimming-pool reactor

    International Nuclear Information System (INIS)

    Behringer, K.; Spiekerman, G.; Yadigaroglu, G.

    1984-11-01

    The neutron noise signal of an initiation-of-boiling experiment performed at the SAPHIR reactor has been analyzed by the PSD-pattern recognition algorithm of Piety (1977); the results indicate that the onset of boiling can be detected by this method. Improved confidence statements for the statistical decision discriminants are given. (Auth.)

  14. Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values.

    Science.gov (United States)

    Bhattacharya, Anindya; De, Rajat K

    2010-08-01

    Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to

  15. Parallelizing Gene Expression Programming Algorithm in Enabling Large-Scale Classification

    Directory of Open Access Journals (Sweden)

    Lixiong Xu

    2017-01-01

    Full Text Available As one of the most effective function mining algorithms, Gene Expression Programming (GEP algorithm has been widely used in classification, pattern recognition, prediction, and other research fields. Based on the self-evolution, GEP is able to mine an optimal function for dealing with further complicated tasks. However, in big data researches, GEP encounters low efficiency issue due to its long time mining processes. To improve the efficiency of GEP in big data researches especially for processing large-scale classification tasks, this paper presents a parallelized GEP algorithm using MapReduce computing model. The experimental results show that the presented algorithm is scalable and efficient for processing large-scale classification tasks.

  16. Educational Data Mining Application for Estimating Students Performance in Weka Environment

    Science.gov (United States)

    Gowri, G. Shiyamala; Thulasiram, Ramasamy; Amit Baburao, Mahindra

    2017-11-01

    Educational data mining (EDM) is a multi-disciplinary research area that examines artificial intelligence, statistical modeling and data mining with the data generated from an educational institution. EDM utilizes computational ways to deal with explicate educational information keeping in mind the end goal to examine educational inquiries. To make a country stand unique among the other nations of the world, the education system has to undergo a major transition by redesigning its framework. The concealed patterns and data from various information repositories can be extracted by adopting the techniques of data mining. In order to summarize the performance of students with their credentials, we scrutinize the exploitation of data mining in the field of academics. Apriori algorithmic procedure is extensively applied to the database of students for a wider classification based on various categorizes. K-means procedure is applied to the same set of databases in order to accumulate them into a specific category. Apriori algorithm deals with mining the rules in order to extract patterns that are similar along with their associations in relation to various set of records. The records can be extracted from academic information repositories. The parameters used in this study gives more importance to psychological traits than academic features. The undesirable student conduct can be clearly witnessed if we make use of information mining frameworks. Thus, the algorithms efficiently prove to profile the students in any educational environment. The ultimate objective of the study is to suspect if a student is prone to violence or not.

  17. Data mining in agriculture

    CERN Document Server

    Mucherino, Antonio; Pardalos, Panos M

    2009-01-01

    Data Mining in Agriculture represents a comprehensive effort to provide graduate students and researchers with an analytical text on data mining techniques applied to agriculture and environmental related fields. This book presents both theoretical and practical insights with a focus on presenting the context of each data mining technique rather intuitively with ample concrete examples represented graphically and with algorithms written in MATLAB®. Examples and exercises with solutions are provided at the end of each chapter to facilitate the comprehension of the material. For each data mining technique described in the book variants and improvements of the basic algorithm are also given. Also by P.J. Papajorgji and P.M. Pardalos: Advances in Modeling Agricultural Systems, 'Springer Optimization and its Applications' vol. 25, ©2009.

  18. Applied data mining

    CERN Document Server

    Xu, Guandong

    2013-01-01

    Data mining has witnessed substantial advances in recent decades. New research questions and practical challenges have arisen from emerging areas and applications within the various fields closely related to human daily life, e.g. social media and social networking. This book aims to bridge the gap between traditional data mining and the latest advances in newly emerging information services. It explores the extension of well-studied algorithms and approaches into these new research arenas.

  19. Mining Data of Noisy Signal Patterns in Recognition of Gasoline Bio-Based Additives using Electronic Nose

    Directory of Open Access Journals (Sweden)

    Osowski Stanisław

    2017-03-01

    Full Text Available The paper analyses the distorted data of an electronic nose in recognizing the gasoline bio-based additives. Different tools of data mining, such as the methods of data clustering, principal component analysis, wavelet transformation, support vector machine and random forest of decision trees are applied. A special stress is put on the robustness of signal processing systems to the noise distorting the registered sensor signals. A special denoising procedure based on application of discrete wavelet transformation has been proposed. This procedure enables to reduce the error rate of recognition in a significant way. The numerical results of experiments devoted to the recognition of different blends of gasoline have shown the superiority of support vector machine in a noisy environment of measurement.

  20. Improve Data Mining and Knowledge Discovery through the use of MatLab

    Science.gov (United States)

    Shaykahian, Gholan Ali; Martin, Dawn Elliott; Beil, Robert

    2011-01-01

    Data mining is widely used to mine business, engineering, and scientific data. Data mining uses pattern based queries, searches, or other analyses of one or more electronic databases/datasets in order to discover or locate a predictive pattern or anomaly indicative of system failure, criminal or terrorist activity, etc. There are various algorithms, techniques and methods used to mine data; including neural networks, genetic algorithms, decision trees, nearest neighbor method, rule induction association analysis, slice and dice, segmentation, and clustering. These algorithms, techniques and methods used to detect patterns in a dataset, have been used in the development of numerous open source and commercially available products and technology for data mining. Data mining is best realized when latent information in a large quantity of data stored is discovered. No one technique solves all data mining problems; challenges are to select algorithms or methods appropriate to strengthen data/text mining and trending within given datasets. In recent years, throughout industry, academia and government agencies, thousands of data systems have been designed and tailored to serve specific engineering and business needs. Many of these systems use databases with relational algebra and structured query language to categorize and retrieve data. In these systems, data analyses are limited and require prior explicit knowledge of metadata and database relations; lacking exploratory data mining and discoveries of latent information. This presentation introduces MatLab(TradeMark)(MATrix LABoratory), an engineering and scientific data analyses tool to perform data mining. MatLab was originally intended to perform purely numerical calculations (a glorified calculator). Now, in addition to having hundreds of mathematical functions, it is a programming language with hundreds built in standard functions and numerous available toolboxes. MatLab's ease of data processing, visualization and

  1. Output-Sensitive Pattern Extraction in Sequences

    DEFF Research Database (Denmark)

    Grossi, Roberto; Menconi, Giulia; Pisanti, Nadia

    2014-01-01

    Genomic Analysis, Plagiarism Detection, Data Mining, Intrusion Detection, Spam Fighting and Time Series Analysis are just some examples of applications where extraction of recurring patterns in sequences of objects is one of the main computational challenges. Several notions of patterns exist...... or extend them causes a loss of significant information (where the number of occurrences changes). Output-sensitive algorithms have been proposed to enumerate and list these patterns, taking polynomial time O(nc) per pattern for constant c > 1, which is impractical for massive sequences of very large length...

  2. Trust Mines

    Science.gov (United States)

    The United States and the Navajo Nation entered into settlement agreements that provide funds to conduct investigations and any needed cleanup at 16 of the 46 priority mines, including six mines in the Northern Abandoned Uranium Mine Region.

  3. Correlation-maximizing surrogate gene space for visual mining of gene expression patterns in developing barley endosperm tissue

    Directory of Open Access Journals (Sweden)

    Usadel Björn

    2007-05-01

    Full Text Available Abstract Background Micro- and macroarray technologies help acquire thousands of gene expression patterns covering important biological processes during plant ontogeny. Particularly, faithful visualization methods are beneficial for revealing interesting gene expression patterns and functional relationships of coexpressed genes. Such screening helps to gain deeper insights into regulatory behavior and cellular responses, as will be discussed for expression data of developing barley endosperm tissue. For that purpose, high-throughput multidimensional scaling (HiT-MDS, a recent method for similarity-preserving data embedding, is substantially refined and used for (a assessing the quality and reliability of centroid gene expression patterns, and for (b derivation of functional relationships of coexpressed genes of endosperm tissue during barley grain development (0–26 days after flowering. Results Temporal expression profiles of 4824 genes at 14 time points are faithfully embedded into two-dimensional displays. Thereby, similar shapes of coexpressed genes get closely grouped by a correlation-based similarity measure. As a main result, by using power transformation of correlation terms, a characteristic cloud of points with bipolar sandglass shape is obtained that is inherently connected to expression patterns of pre-storage, intermediate and storage phase of endosperm development. Conclusion The new HiT-MDS-2 method helps to create global views of expression patterns and to validate centroids obtained from clustering programs. Furthermore, functional gene annotation for developing endosperm barley tissue is successfully mapped to the visualization, making easy localization of major centroids of enriched functional categories possible.

  4. Spatiotemporal patterns of High Mountain Asia's snowmelt season identified with an automated snowmelt detection algorithm, 1987-2016

    Science.gov (United States)

    Smith, Taylor; Bookhagen, Bodo; Rheinwalt, Aljoscha

    2017-10-01

    High Mountain Asia (HMA) - encompassing the Tibetan Plateau and surrounding mountain ranges - is the primary water source for much of Asia, serving more than a billion downstream users. Many catchments receive the majority of their yearly water budget in the form of snow, which is poorly monitored by sparse in situ weather networks. Both the timing and volume of snowmelt play critical roles in downstream water provision, as many applications - such as agriculture, drinking-water generation, and hydropower - rely on consistent and predictable snowmelt runoff. Here, we examine passive microwave data across HMA with five sensors (SSMI, SSMIS, AMSR-E, AMSR2, and GPM) from 1987 to 2016 to track the timing of the snowmelt season - defined here as the time between maximum passive microwave signal separation and snow clearance. We validated our method against climate model surface temperatures, optical remote-sensing snow-cover data, and a manual control dataset (n = 2100, 3 variables at 25 locations over 28 years); our algorithm is generally accurate within 3-5 days. Using the algorithm-generated snowmelt dates, we examine the spatiotemporal patterns of the snowmelt season across HMA. The climatically short (29-year) time series, along with complex interannual snowfall variations, makes determining trends in snowmelt dates at a single point difficult. We instead identify trends in snowmelt timing by using hierarchical clustering of the passive microwave data to determine trends in self-similar regions. We make the following four key observations. (1) The end of the snowmelt season is trending almost universally earlier in HMA (negative trends). Changes in the end of the snowmelt season are generally between 2 and 8 days decade-1 over the 29-year study period (5-25 days total). The length of the snowmelt season is thus shrinking in many, though not all, regions of HMA. Some areas exhibit later peak signal separation (positive trends), but with generally smaller magnitudes

  5. Spatiotemporal patterns of High Mountain Asia's snowmelt season identified with an automated snowmelt detection algorithm, 1987–2016

    Directory of Open Access Journals (Sweden)

    T. Smith

    2017-10-01

    Full Text Available High Mountain Asia (HMA – encompassing the Tibetan Plateau and surrounding mountain ranges – is the primary water source for much of Asia, serving more than a billion downstream users. Many catchments receive the majority of their yearly water budget in the form of snow, which is poorly monitored by sparse in situ weather networks. Both the timing and volume of snowmelt play critical roles in downstream water provision, as many applications – such as agriculture, drinking-water generation, and hydropower – rely on consistent and predictable snowmelt runoff. Here, we examine passive microwave data across HMA with five sensors (SSMI, SSMIS, AMSR-E, AMSR2, and GPM from 1987 to 2016 to track the timing of the snowmelt season – defined here as the time between maximum passive microwave signal separation and snow clearance. We validated our method against climate model surface temperatures, optical remote-sensing snow-cover data, and a manual control dataset (n = 2100, 3 variables at 25 locations over 28 years; our algorithm is generally accurate within 3–5 days. Using the algorithm-generated snowmelt dates, we examine the spatiotemporal patterns of the snowmelt season across HMA. The climatically short (29-year time series, along with complex interannual snowfall variations, makes determining trends in snowmelt dates at a single point difficult. We instead identify trends in snowmelt timing by using hierarchical clustering of the passive microwave data to determine trends in self-similar regions. We make the following four key observations. (1 The end of the snowmelt season is trending almost universally earlier in HMA (negative trends. Changes in the end of the snowmelt season are generally between 2 and 8 days decade−1 over the 29-year study period (5–25 days total. The length of the snowmelt season is thus shrinking in many, though not all, regions of HMA. Some areas exhibit later peak signal separation (positive

  6. Text Mining Applications and Theory

    CERN Document Server

    Berry, Michael W

    2010-01-01

    Text Mining: Applications and Theory presents the state-of-the-art algorithms for text mining from both the academic and industrial perspectives.  The contributors span several countries and scientific domains: universities, industrial corporations, and government laboratories, and demonstrate the use of techniques from machine learning, knowledge discovery, natural language processing and information retrieval to design computational models for automated text analysis and mining. This volume demonstrates how advancements in the fields of applied mathematics, computer science, machine learning

  7. Performance of fusion algorithms for computer-aided detection and classification of mines in very shallow water obtained from testing in navy Fleet Battle Exercise-Hotel 2000

    Science.gov (United States)

    Ciany, Charles M.; Zurawski, William; Kerfoot, Ian

    2001-10-01

    The performance of Computer Aided Detection/Computer Aided Classification (CAD/CAC) Fusion algorithms on side-scan sonar images was evaluated using data taken at the Navy's's Fleet Battle Exercise-Hotel held in Panama City, Florida, in August 2000. A 2-of-3 binary fusion algorithm is shown to provide robust performance. The algorithm accepts the classification decisions and associated contact locations form three different CAD/CAC algorithms, clusters the contacts based on Euclidian distance, and then declares a valid target when a clustered contact is declared by at least 2 of the 3 individual algorithms. This simple binary fusion provided a 96 percent probability of correct classification at a false alarm rate of 0.14 false alarms per image per side. The performance represented a 3.8:1 reduction in false alarms over the best performing single CAD/CAC algorithm, with no loss in probability of correct classification.

  8. SOMA: A Proposed Framework for Trend Mining in Large UK Diabetic Retinopathy Temporal Databases

    Science.gov (United States)

    Somaraki, Vassiliki; Harding, Simon; Broadbent, Deborah; Coenen, Frans

    In this paper, we present SOMA, a new trend mining framework; and Aretaeus, the associated trend mining algorithm. The proposed framework is able to detect different kinds of trends within longitudinal datasets. The prototype trends are defined mathematically so that they can be mapped onto the temporal patterns. Trends are defined and generated in terms of the frequency of occurrence of pattern changes over time. To evaluate the proposed framework the process was applied to a large collection of medical records, forming part of the diabetic retinopathy screening programme at the Royal Liverpool University Hospital.

  9. An algorithm for the classification of mRNA patterns in eosinophilic esophagitis: Integration of machine learning.

    Science.gov (United States)

    Sallis, Benjamin F; Erkert, Lena; Moñino-Romero, Sherezade; Acar, Utkucan; Wu, Rina; Konnikova, Liza; Lexmond, Willem S; Hamilton, Matthew J; Dunn, W Augustine; Szepfalusi, Zsolt; Vanderhoof, Jon A; Snapper, Scott B; Turner, Jerrold R; Goldsmith, Jeffrey D; Spencer, Lisa A; Nurko, Samuel; Fiebiger, Edda

    2018-04-01

    Diagnostic evaluation of eosinophilic esophagitis (EoE) remains difficult, particularly the assessment of the patient's allergic status. This study sought to establish an automated medical algorithm to assist in the evaluation of EoE. Machine learning techniques were used to establish a diagnostic probability score for EoE, p(EoE), based on esophageal mRNA transcript patterns from biopsies of patients with EoE, gastroesophageal reflux disease and controls. Dimensionality reduction in the training set established weighted factors, which were confirmed by immunohistochemistry. Following weighted factor analysis, p(EoE) was determined by random forest classification. Accuracy was tested in an external test set, and predictive power was assessed with equivocal patients. Esophageal IgE production was quantified with epsilon germ line (IGHE) transcripts and correlated with serum IgE and the T h 2-type mRNA profile to establish an IGHE score for tissue allergy. In the primary analysis, a 3-class statistical model generated a p(EoE) score based on common characteristics of the inflammatory EoE profile. A p(EoE) ≥ 25 successfully identified EoE with high accuracy (sensitivity: 90.9%, specificity: 93.2%, area under the curve: 0.985) and improved diagnosis of equivocal cases by 84.6%. The p(EoE) changed in response to therapy. A secondary analysis loop in EoE patients defined an IGHE score of ≥37.5 for a patient subpopulation with increased esophageal allergic inflammation. The development of intelligent data analysis from a machine learning perspective provides exciting opportunities to improve diagnostic precision and improve patient care in EoE. The p(EoE) and the IGHE score are steps toward the development of decision trees to define EoE subpopulations and, consequently, will facilitate individualized therapy. Copyright © 2017 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  10. Analyzing Patterns of Community Interest at a Legacy Mining Waste Site to Assess and Inform Environmental Health Literacy Efforts

    Science.gov (United States)

    Ramirez-Andreotta, Monica D.; Lothrop, Nathan; Wilkinson, Sarah T.; Root, Robert A.; Artiola, Janick F.; Klimecki, Walter; Loh, Miranda

    2015-01-01

    Understanding a community’s concerns and informational needs is crucial to conducting and improving environmental health research and literacy initiatives. We hypothesized that analysis of community inquiries over time at a legacy mining site would be an effective method for assessing environmental health literacy efforts and determining whether community concerns were thoroughly addressed. Through a qualitative analysis, we determined community concerns at the time of being listed as a Superfund site. We analyzed how community concerns changed from this starting point over the subsequent years, and whether: 1) communication materials produced by the USEPA and other media were aligned with community concerns; and 2) these changes demonstrated a progression of the community’s understanding resulting from community involvement and engaged research efforts. We observed that when the Superfund site was first listed, community members were most concerned with USEPA management, remediation, site-specific issues, health effects, and environmental monitoring efforts related to air/dust and water. Over the next five years, community inquiries shifted significantly to include exposure assessment and reduction methods and issues unrelated to the site, particularly the local public water supply and home water treatment systems. Such documentation of community inquiries over time at contaminated sites is a novel method to assess environmental health literacy efforts and determine whether community concerns were thoroughly addressed. PMID:27595054

  11. Evolving temporal association rules with genetic algorithms

    OpenAIRE

    Matthews, Stephen G.; Gongora, Mario A.; Hopgood, Adrian A.

    2010-01-01

    A novel framework for mining temporal association rules by discovering itemsets with a genetic algorithm is introduced. Metaheuristics have been applied to association rule mining, we show the efficacy of extending this to another variant - temporal association rule mining. Our framework is an enhancement to existing temporal association rule mining methods as it employs a genetic algorithm to simultaneously search the rule space and temporal space. A methodology for validating the ability of...

  12. Data Mining and Machine Learning Methods for Dementia Research.

    Science.gov (United States)

    Li, Rui

    2018-01-01

    Patient data in clinical research often includes large amounts of structured information, such as neuroimaging data, neuropsychological test results, and demographic variables. Given the various sources of information, we can develop computerized methods that can be a great help to clinicians to discover hidden patterns in the data. The computerized methods often employ data mining and machine learning algorithms, lending themselves as the computer-aided diagnosis (CAD) tool that assists clinicians in making diagnostic decisions. In this chapter, we review state-of-the-art methods used in dementia research, and briefly introduce some recently proposed algorithms subsequently.

  13. Interesting association rule mining with consistent and inconsistent rule detection from big sales data in distributed environment

    Directory of Open Access Journals (Sweden)

    Dinesh J. Prajapati

    2017-06-01

    Full Text Available Nowadays, there is an increasing demand in mining interesting patterns from the big data. The process of analyzing such a huge amount of data is really computationally complex task when using traditional methods. The overall purpose of this paper is in twofold. First, this paper presents a novel approach to identify consistent and inconsistent association rules from sales data located in distributed environment. Secondly, the paper also overcomes the main memory bottleneck and computing time overhead of single computing system by applying computations to multi node cluster. The proposed method initially extracts frequent itemsets for each zone using existing distributed frequent pattern mining algorithms. The paper also compares the time efficiency of Mapreduce based frequent pattern mining algorithm with Count Distribution Algorithm (CDA and Fast Distributed Mining (FDM algorithms. The association generated from frequent itemsets are too large that it becomes complex to analyze it. Thus, Mapreduce based consistent and inconsistent rule detection (MR-CIRD algorithm is proposed to detect the consistent and inconsistent rules from big data and provide useful and actionable knowledge to the domain experts. These pruned interesting rules also give useful knowledge for better marketing strategy as well. The extracted consistent and inconsistent rules are evaluated and compared based on different interestingness measures presented together with experimental results that lead to the final conclusions.

  14. A Comparative Analysis of Classification Algorithms on Diverse Datasets

    Directory of Open Access Journals (Sweden)

    M. Alghobiri

    2018-04-01

    Full Text Available Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.

  15. Multiplex protein pattern unmixing using a non-linear variable-weighted support vector machine as optimized by a particle swarm optimization algorithm.

    Science.gov (United States)

    Yang, Qin; Zou, Hong-Yan; Zhang, Yan; Tang, Li-Juan; Shen, Guo-Li; Jiang, Jian-Hui; Yu, Ru-Qin

    2016-01-15

    Most of the proteins locate more than one organelle in a cell. Unmixing the localization patterns of proteins is critical for understanding the protein functions and other vital cellular processes. Herein, non-linear machine learning technique is proposed for the first time upon protein pattern unmixing. Variable-weighted support vector machine (VW-SVM) is a demonstrated robust modeling technique with flexible and rational variable selection. As optimized by a global stochastic optimization technique, particle swarm optimization (PSO) algorithm, it makes VW-SVM to be an adaptive parameter-free method for automated unmixing of protein subcellular patterns. Results obtained by pattern unmixing of a set of fluorescence microscope images of cells indicate VW-SVM as optimized by PSO is able to extract useful pattern features by optimally rescaling each variable for non-linear SVM modeling, consequently leading to improved performances in multiplex protein pattern unmixing compared with conventional SVM and other exiting pattern unmixing methods. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Trends in spatial patterns of heavy metal deposition on national park service lands along the Red Dog Mine haul road, Alaska, 2001-2006.

    Directory of Open Access Journals (Sweden)

    Peter N Neitlich

    Full Text Available Spatial patterns of Zn, Pb and Cd deposition in Cape Krusenstern National Monument (CAKR, Alaska, adjacent to the Red Dog Mine haul road, were characterized in 2001 and 2006 using Hylocomium moss tissue as a biomonitor. Elevated concentrations of Cd, Pb, and Zn in moss tissue decreased logarithmically away from the haul road and the marine port. The metals concentrations in the two years were compared using Bayesian posterior predictions on a new sampling grid to which both data sets were fit. Posterior predictions were simulated 200 times both on a coarse grid of 2,357 points and by distance-based strata including subsets of these points. Compared to 2001, Zn and Pb concentrations in 2006 were 31 to 54% lower in the 3 sampling strata closest to the haul road (0-100, 100-2000 and 2000-4000 m. Pb decreased by 40% in the stratum 4,000-5,000 m from the haul road. Cd decreased significantly by 38% immediately adjacent to the road (0-100m, had an 89% probability of a small decrease 100-2000 m from the road, and showed moderate probabilities (56-71% for increase at greater distances. There was no significant change over time (with probabilities all ≤ 85% for any of the 3 elements in more distant reference areas (40-60 km. As in 2001, elemental concentrations in 2006 were higher on the north side of the road. Reductions in deposition have followed a large investment in infrastructure to control fugitive dust escapement at the mine and port sites, operational controls, and road dust mitigation. Fugitive dust escapement, while much reduced, is still resulting in elevated concentrations of Zn, Pb and Cd out to 5,000 m from the haul road. Zn and Pb levels were slightly above arctic baseline values in southern CAKR reference areas.

  17. Prediction of black box warning by mining patterns of Convergent Focus Shift in clinical trial study populations using linked public data.

    Science.gov (United States)

    Ma, Handong; Weng, Chunhua

    2016-04-01

    -term BBW acquisition events without compromising prediction accuracy. This study contributes a method for post-marketing pharmacovigilance using Convergent Focus Shift (CFS) patterns in clinical trial study populations mined from linked public data resources. These signals are otherwise unavailable from individual data resources. We demonstrated the added value of linked public data and the feasibility of integrating ClinicalTrials.gov summaries and drug safety labels for post-marketing surveillance. Future research is needed to ensure better accessibility and linkage of heterogeneous drug safety data for efficient pharmacovigilance. Copyright © 2016 Elsevier Inc. All rights reserved.

  18. Data mining, mining data : energy consumption modelling

    Energy Technology Data Exchange (ETDEWEB)

    Dessureault, S. [Arizona Univ., Tucson, AZ (United States)

    2007-09-15

    Most modern mining operations are accumulating large amounts of data on production and business processes. Data, however, provides value only if it can be translated into information that appropriate users can utilize. This paper emphasized that a new technological focus should emerge, notably how to concentrate data into information; analyze information sufficiently to become knowledge; and, act on that knowledge. Researchers at the Mining Information Systems and Operations Management (MISOM) laboratory at the University of Arizona have created a method to transform data into action. The data-to-action approach was exercised in the development of an energy consumption model (ECM), in partnership with a major US-based copper mining company, 2 software companies, and the MISOM laboratory. The approach begins by integrating several key data sources using data warehousing techniques, and increasing the existing level of integration and data cleaning. An online analytical processing (OLAP) cube was also created to investigate the data and identify a subset of several million records. Data mining algorithms were applied using the information that was isolated by the OLAP cube. The data mining results showed that traditional cost drivers of energy consumption are poor predictors. A comparison was made between traditional methods of predicting energy consumption and the prediction formed using data mining. Traditionally, in the mines for which data were available, monthly averages of tons and distance are used to predict diesel fuel consumption. However, this article showed that new information technology can be used to incorporate many more variables into the budgeting process, resulting in more accurate predictions. The ECM helped mine planners improve the prediction of energy use through more data integration, measure development, and workflow analysis. 5 refs., 11 figs.

  19. WildSpan: mining structured motifs from protein sequences

    Directory of Open Access Journals (Sweden)

    Chen Chien-Yu

    2011-03-01

    Full Text Available Abstract Background Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions that incorporates several pruning strategies to largely reduce the mining cost. Results WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode

  20. Mining frequent binary expressions

    NARCIS (Netherlands)

    Calders, T.; Paredaens, J.; Kambayashi, Y.; Mohania, M.K.; Tjoa, A.M.

    2000-01-01

    In data mining, searching for frequent patterns is a common basic operation. It forms the basis of many interesting decision support processes. In this paper we present a new type of patterns, binary expressions. Based on the properties of a specified binary test, such as reflexivity, transitivity

  1. Study of the distribution patterns of the constituent herbs in classical Chinese medicine prescriptions treating respiratory disease by data mining methods.

    Science.gov (United States)

    Fu, Xian-Jun; Song, Xu-Xia; Wei, Lin-Bo; Wang, Zhen-Guo

    2013-08-01

    To provide the distribution pattern and compatibility laws of the constituent herbs in prescriptions, for doctor's convenience to make decision in choosing correct herbs and prescriptions for treating respiratory disease. Classical prescriptions treating respiratory disease were selected from authoritative prescription books. Data mining methods (frequent itemsets and association rules) were used to analyze the regular patterns and compatibility laws of the constituent herbs in the selected prescriptions. A total of 562 prescriptions were selected to be studied. The result exhibited that, Radix glycyrrhizae was the most frequently used in 47.2% prescriptions, other frequently used were Semen armeniacae amarum, Fructus schisandrae Chinese, Herba ephedrae, and Radix ginseng. Herbal ephedrae was always coupled with Semen armeniacae amarum with the confidence of 73.3%, and many herbs were always accompanied by Radix glycyrrhizae with high confidence. More over, Fructus schisandrae Chinese, Herba ephedrae and Rhizoma pinelliae was most commonly used to treat cough, dyspnoea and associated sputum respectively besides Radix glycyrrhizae and Semen armeniacae amarum. The prescriptions treating dyspnoea often used double herb group of Herba ephedrae & Radix glycyrrhizae, while prescriptions treating sputum often used double herb group of Rhizoma pinelliae & Radix glycyrrhizae and Rhizoma pinelliae & Semen armeniacae amarum, triple herb groups of Rhizoma pinelliae & Semen armeniacae amarum & Radix glycyrrhizae and Pericarpium citri reticulatae & Rhizoma pinelliae & Radix glycyrrhizae. The prescriptions treating respiratory disease showed common compatibility laws in using herbs and special compatibility laws for treating different respiratory symptoms. These principle patterns and special compatibility laws reported here could be useful for doctors to choose correct herbs and prescriptions in treating respiratory disease.

  2. Reduct Driven Pattern Extraction from Clusters

    Directory of Open Access Journals (Sweden)

    Shuchita Upadhyaya

    2009-03-01

    Full Text Available Clustering algorithms give general description of clusters, listing number of clusters and member entities in those clusters. However, these algorithms lack in generating cluster description in the form of pattern. From data mining perspective, pattern learning from clusters is as important as cluster finding. In the proposed approach, reduct derived from rough set theory is employed for pattern formulation. Further, reduct are the set of attributes which distinguishes the entities in a homogenous cluster, hence these can be clear cut removed from the same. Remaining attributes are then ranked for their contribution in the cluster. Pattern is formulated with the conjunction of most contributing attributes such that pattern distinctively describes the cluster with minimum error.

  3. Detection of boiling by Piety's on-line PSD-pattern recognition algorithm applied to neutron noise signals in the SAPHIR reactor

    International Nuclear Information System (INIS)

    Spiekerman, G.

    1988-09-01

    A partial blockage of the cooling channels of a fuel element in a swimming pool reactor could lead to vapour generation and to burn-out. To detect such anomalies, a pattern recognition algorithm based on power spectra density (PSD) proposed by Piety was further developed and implemented on a PDP 11/23 for on-line applications. This algorithm identifies anomalies by measuring the PSD on the process signal and comparing them with a standard baseline previously formed. Up to 8 decision discriminants help to recognize spectral changes due to anomalies. In our application, to detect boiling as quickly as possible with sufficient sensitivity, Piety's algorithm was modified using overlapped Fast-Fourier-Transform-Processing and the averaging of the PSDs over a large sample of preceding instantaneous PSDs. This processing allows high sensitivity in detecting weak disturbances without reducing response time. The algorithm was tested with simulation-of-boiling experiments where nitrogen in a cooling channel of a mock-up of a fuel element was injected. Void fractions higher than 30 % in the channel can be detected. In the case of boiling, it is believed that this limit is lower because collapsing bubbles could give rise to stronger fluctuations. The algorithm was also tested with a boiling experiment where the reactor coolant flow was actually reduced. The results showed that the discriminant D5 of Piety's algorithm based on neutron noise obtained from the existing neutron chambers of the reactor control system could sensitively recognize boiling. The detection time amounts to 7-30 s depending on the strength of the disturbances. Other events, which arise during a normal reactor run like scrams, removal of isotope elements without scramming or control rod movements and which could lead to false alarms, can be distinguished from boiling. 49 refs., 104 figs., 5 tabs

  4. Statistical analysis of lineament patterns and their palinspastic restoration: Coeur d'Alene mining district, Idaho

    Energy Technology Data Exchange (ETDEWEB)

    Venkatakrishnan, R.; McMillan, T.B.; Waller, E.B.

    1985-01-01

    2644 lineaments obtained from multistage lineament mapping on color infrared and conventional black and white aerial photos and Landsat images were digitized and length-weighted frequency distributions calculated at 5 degree intervals on a 10 kilometer square grid. Statistically significant peaks (<95%) and trough (>5%) within each cell were identified. These significant trends were cross correlated with available structural data such as mapped faults, joints and Pb-Zn-Ag bearing ore veins in each cell. The resulting lineament map was then subjected to a palinspastic restoration. Ambiguity in interpreting the paly-modal population was corrected by restoring the lineament within each of the 21 fault-bound structural domains. Restoration of the patterns was accomplished by translating the digitized block margins with their lineaments to the predetermined palinspastic base. On the basis of this reconstruction it has been possible to identify individual grid cells that had strongest correlations with available structural data. These cells clearly showed that spatial filtering of lineament patterns into statistically significant trends brings forth the strong structural controls on mineralization exerted by preexisting ac-joints and faults in the NW-SE trending arcuate Kootenay folds that were right-laterally displaced and deformed by the Lewis and Clark shear zone. Statistically significant correlation between ore-veins and lineaments was only possible on palinspastic restoration and not on the present day lineament map.

  5. Data Stream Mining

    Science.gov (United States)

    Gaber, Mohamed Medhat; Zaslavsky, Arkady; Krishnaswamy, Shonali

    Data mining is concerned with the process of computationally extracting hidden knowledge structures represented in models and patterns from large data repositories. It is an interdisciplinary field of study that has its roots in databases, statistics, machine learning, and data visualization. Data mining has emerged as a direct outcome of the data explosion that resulted from the success in database and data warehousing technologies over the past two decades (Fayyad, 1997,Fayyad, 1998,Kantardzic, 2003).

  6. Genetic Algorithm with Maximum-Minimum Crossover (GA-MMC) Applied in Optimization of Radiation Pattern Control of Phased-Array Radars for Rocket Tracking Systems

    Science.gov (United States)

    Silva, Leonardo W. T.; Barros, Vitor F.; Silva, Sandro G.

    2014-01-01

    In launching operations, Rocket Tracking Systems (RTS) process the trajectory data obtained by radar sensors. In order to improve functionality and maintenance, radars can be upgraded by replacing antennas with parabolic reflectors (PRs) with phased arrays (PAs). These arrays enable the electronic control of the radiation pattern by adjusting the signal supplied to each radiating element. However, in projects of phased array radars (PARs), the modeling of the problem is subject to various combinations of excitation signals producing a complex optimization problem. In this case, it is possible to calculate the problem solutions with optimization methods such as genetic algorithms (GAs). For this, the Genetic Algorithm with Maximum-Minimum Crossover (GA-MMC) method was developed to control the radiation pattern of PAs. The GA-MMC uses a reconfigurable algorithm with multiple objectives, differentiated coding and a new crossover genetic operator. This operator has a different approach from the conventional one, because it performs the crossover of the fittest individuals with the least fit individuals in order to enhance the genetic diversity. Thus, GA-MMC was successful in more than 90% of the tests for each application, increased the fitness of the final population by more than 20% and reduced the premature convergence. PMID:25196013

  7. Web Mining and Social Networking

    CERN Document Server

    Xu, Guandong; Li, Lin

    2011-01-01

    This book examines the techniques and applications involved in the Web Mining, Web Personalization and Recommendation and Web Community Analysis domains, including a detailed presentation of the principles, developed algorithms, and systems of the research in these areas. The applications of web mining, and the issue of how to incorporate web mining into web personalization and recommendation systems are also reviewed. Additionally, the volume explores web community mining and analysis to find the structural, organizational and temporal developments of web communities and reveal the societal s

  8. Data warehousing and data mining: A case study

    Directory of Open Access Journals (Sweden)

    Suknović Milija

    2005-01-01

    Full Text Available This paper shows design and implementation of data warehouse as well as the use of data mining algorithms for the purpose of knowledge discovery as the basic resource of adequate business decision making process. The project is realized for the needs of Student's Service Department of the Faculty of Organizational Sciences (FOS, University of Belgrade, Serbia and Montenegro. This system represents a good base for analysis and predictions in the following time period for the purpose of quality business decision-making by top management. Thus, the first part of the paper shows the steps in designing and development of data warehouse of the mentioned business system. The second part of the paper shows the implementation of data mining algorithms for the purpose of deducting rules, patterns and knowledge as a resource for support in the process of decision making.

  9. Fine and Coarse-Scale Patterns of Vegetation Diversity on Reclaimed Surface Mine-land Over a 40-Year Chronosequence.

    Science.gov (United States)

    Bohrer, Stefanie L; Limb, Ryan F; Daigh, Aaron L; Volk, Jay M; Wick, Abbey F

    2017-03-01

    Rangelands are described as heterogeneous, due to patterning in species assemblages and productivity that arise from species dispersal and interactions with environmental gradients and disturbances across multiple scales. The objectives of rangeland reclamation are typically vegetation establishment, plant community productivity, and soil stability. However, while fine-scale diversity is often promoted through species-rich seed mixes, landscape heterogeneity and coarse-scale diversity are largely overlooked. Our objectives were to evaluate fine and coarse-scale vegetation patterns across a 40-year reclamation chronosequence on reclaimed surface coalmine lands. We hypothesized that both α-diversity and β-diversity would increase and community patch size and species dissimilarity to reference sites would decrease on independent sites over 40 years. Plant communities were surveyed on 19 post-coalmine reclaimed sites and four intact native reference sites in central North Dakota mixed-grass prairie. Our results showed no differences in α or β-diversity and plant community patch size over the 40-year chronosequence. However, both α-diversity and β-diversity on reclaimed sites was similar to reference sites. Native species establishment was limited due to the presence of non-native species such as Kentucky bluegrass (Poa pratensis) on both the reclaimed and reference sites. Species composition was different between reclaimed and reference sites and community dissimilarity increased on reclaimed sites over the 40-year chronosequence. Plant communities resulting from reclamation followed non-equilibrium succession, even with consistent seeds mixes established across all reclaimed years. This suggests post-reclamation management strategies influence species composition outcomes and land management strategies applied uniformly may not increase landscape-level diversity.

  10. Variability in Regularity: Mining Temporal Mobility Patterns in London, Singapore and Beijing Using Smart-Card Data.

    Directory of Open Access Journals (Sweden)

    Chen Zhong

    Full Text Available To discover regularities in human mobility is of fundamental importance to our understanding of urban dynamics, and essential to city and transport planning, urban management and policymaking. Previous research has revealed universal regularities at mainly aggregated spatio-temporal scales but when we zoom into finer scales, considerable heterogeneity and diversity is observed instead. The fundamental question we address in this paper is at what scales are the regularities we detect stable, explicable, and sustainable. This paper thus proposes a basic measure of variability to assess the stability of such regularities focusing mainly on changes over a range of temporal scales. We demonstrate this by comparing regularities in the urban mobility patterns in three world cities, namely London, Singapore and Beijing using one-week of smart-card data. The results show that variations in regularity scale as non-linear functions of the temporal resolution, which we measure over a scale from 1 minute to 24 hours thus reflecting the diurnal cycle of human mobility. A particularly dramatic increase in variability occurs up to the temporal scale of about 15 minutes in all three cities and this implies that limits exist when we look forward or backward with respect to making short-term predictions. The degree of regularity varies in fact from city to city with Beijing and Singapore showing higher regularity in comparison to London across all temporal scales. A detailed discussion is provided, which relates the analysis to various characteristics of the three cities. In summary, this work contributes to a deeper understanding of regularities in patterns of transit use from variations in volumes of travellers entering subway stations, it establishes a generic analytical framework for comparative studies using urban mobility data, and it provides key points for the management of variability by policy-makers intent on for making the travel experience more

  11. Variability in Regularity: Mining Temporal Mobility Patterns in London, Singapore and Beijing Using Smart-Card Data.

    Science.gov (United States)

    Zhong, Chen; Batty, Michael; Manley, Ed; Wang, Jiaqiu; Wang, Zijia; Chen, Feng; Schmitt, Gerhard

    2016-01-01

    To discover regularities in human mobility is of fundamental importance to our understanding of urban dynamics, and essential to city and transport planning, urban management and policymaking. Previous research has revealed universal regularities at mainly aggregated spatio-temporal scales but when we zoom into finer scales, considerable heterogeneity and diversity is observed instead. The fundamental question we address in this paper is at what scales are the regularities we detect stable, explicable, and sustainable. This paper thus proposes a basic measure of variability to assess the stability of such regularities focusing mainly on changes over a range of temporal scales. We demonstrate this by comparing regularities in the urban mobility patterns in three world cities, namely London, Singapore and Beijing using one-week of smart-card data. The results show that variations in regularity scale as non-linear functions of the temporal resolution, which we measure over a scale from 1 minute to 24 hours thus reflecting the diurnal cycle of human mobility. A particularly dramatic increase in variability occurs up to the temporal scale of about 15 minutes in all three cities and this implies that limits exist when we look forward or backward with respect to making short-term predictions. The degree of regularity varies in fact from city to city with Beijing and Singapore showing higher regularity in comparison to London across all temporal scales. A detailed discussion is provided, which relates the analysis to various characteristics of the three cities. In summary, this work contributes to a deeper understanding of regularities in patterns of transit use from variations in volumes of travellers entering subway stations, it establishes a generic analytical framework for comparative studies using urban mobility data, and it provides key points for the management of variability by policy-makers intent on for making the travel experience more amenable.

  12. Variability in Regularity: Mining Temporal Mobility Patterns in London, Singapore and Beijing Using Smart-Card Data

    Science.gov (United States)

    Zhong, Chen; Batty, Michael; Manley, Ed; Wang, Jiaqiu; Wang, Zijia; Chen, Feng; Schmitt, Gerhard

    2016-01-01

    To discover regularities in human mobility is of fundamental importance to our understanding of urban dynamics, and essential to city and transport planning, urban management and policymaking. Previous research has revealed universal regularities at mainly aggregated spatio-temporal scales but when we zoom into finer scales, considerable heterogeneity and diversity is observed instead. The fundamental question we address in this paper is at what scales are the regularities we detect stable, explicable, and sustainable. This paper thus proposes a basic measure of variability to assess the stability of such regularities focusing mainly on changes over a range of temporal scales. We demonstrate this by comparing regularities in the urban mobility patterns in three world cities, namely London, Singapore and Beijing using one-week of smart-card data. The results show that variations in regularity scale as non-linear functions of the temporal resolution, which we measure over a scale from 1 minute to 24 hours thus reflecting the diurnal cycle of human mobility. A particularly dramatic increase in variability occurs up to the temporal scale of about 15 minutes in all three cities and this implies that limits exist when we look forward or backward with respect to making short-term predictions. The degree of regularity varies in fact from city to city with Beijing and Singapore showing higher regularity in comparison to London across all temporal scales. A detailed discussion is provided, which relates the analysis to various characteristics of the three cities. In summary, this work contributes to a deeper understanding of regularities in patterns of transit use from variations in volumes of travellers entering subway stations, it establishes a generic analytical framework for comparative studies using urban mobility data, and it provides key points for the management of variability by policy-makers intent on for making the travel experience more amenable. PMID:26872333

  13. SU-C-BRF-07: A Pattern Fusion Algorithm for Multi-Step Ahead Prediction of Surrogate Motion

    International Nuclear Information System (INIS)

    Zawisza, I; Yan, H; Yin, F

    2014-01-01

    Purpose: To assure that tumor motion is within the radiation field during high-dose and high-precision radiosurgery, real-time imaging and surrogate monitoring are employed. These methods are useful in providing real-time tumor/surrogate motion but no future information is available. In order to anticipate future tumor/surrogate motion and track target location precisely, an algorithm is developed and investigated for estimating surrogate motion multiple-steps ahead. Methods: The study utilized a one-dimensional surrogate motion signal divided into three components: (a) training component containing the primary data including the first frame to the beginning of the input subsequence; (b) input subsequence component of the surrogate signal used as input to the prediction algorithm: (c) output subsequence component is the remaining signal used as the known output of the prediction algorithm for validation. The prediction algorithm consists of three major steps: (1) extracting subsequences from training component which best-match the input subsequence according to given criterion; (2) calculating weighting factors from these best-matched subsequence; (3) collecting the proceeding parts of the subsequences and combining them together with assigned weighting factors to form output. The prediction algorithm was examined for several patients, and its performance is assessed based on the correlation between prediction and known output. Results: Respiratory motion data was collected for 20 patients using the RPM system. The output subsequence is the last 50 samples (∼2 seconds) of a surrogate signal, and the input subsequence was 100 (∼3 seconds) frames prior to the output subsequence. Based on the analysis of correlation coefficient between predicted and known output subsequence, the average correlation is 0.9644±0.0394 and 0.9789±0.0239 for equal-weighting and relative-weighting strategies, respectively. Conclusion: Preliminary results indicate that the prediction

  14. A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences.

    Science.gov (United States)

    Xue, Yun; Liao, Zhengling; Li, Meihang; Luo, Jie; Kuang, Qiuhua; Hu, Xiaohui; Li, Tiechen

    2015-01-01

    Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS) between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method.

  15. Using a combination of weighting factor method and imperialist competitive algorithm to improve speed and enhance process of reloading pattern optimization of VVER-1000 reactors in transient cycles

    Energy Technology Data Exchange (ETDEWEB)

    Rahmani, Yashar, E-mail: yashar.rahmani@gmail.com [Department of Physics, Faculty of Engineering, Islamic Azad University, Sari Branch, Sari (Iran, Islamic Republic of); Shahvari, Yaser [Department of Computer Engineering, Payame Noor University (PNU), P.O. Box 19395-3697, Tehran (Iran, Islamic Republic of); Kia, Faezeh [Golestan Institute of Higher Education, Gorgan 49139-83635 (Iran, Islamic Republic of)

    2017-03-15

    Highlights: • This article was an attempt to optimize reloading pattern of Bushehr VVER-1000 reactor. • A combination of weighting factor method and the imperialist competitive algorithm was used. • The speed of optimization and desirability of the proposed pattern increased considerably. • To evaluate arrangements, a coupling of WIMSD5-B, CITATION-LDI2 and WERL codes was used. • Results reflected the considerable superiority of the proposed method over direct optimization. - Abstract: In this research, an innovative solution is described which can be used with a combination of the new imperialist competitive algorithm and the weighting factor method to improve speed and increase globality of search in reloading pattern optimization of VVER-1000 reactors in transient cycles and even obtain more desirable results than conventional direct method. In this regard, to reduce the scope of the assumed searchable arrangements, first using the weighting factor method and based on values of these coefficients in each of the 16 types of loadable fuel assemblies in the second cycle, the fuel assemblies were classified in more limited groups. In consequence, the types of fuel assemblies were reduced from 16 to 6 and consequently the number of possible arrangements was reduced considerably. Afterwards, in the first phase of optimization the imperialist competitive algorithm was used to propose an optimum reloading pattern with 6 groups. In the second phase, the algorithm was reused for finding desirable placement of the subset assemblies of each group in the optimum arrangement obtained from the previous phase, and thus the retransformation of the optimum arrangement takes place from the virtual 6-group mode to the real mode with 16 fuel types. In this research, the optimization process was conducted in two states. In the first state, it was tried to obtain an arrangement with the maximum effective multiplication factor and the smallest maximum power peaking factor. In

  16. A comprehensive data mining study shows that most nuclear receptors act as newly proposed homeostasis-associated molecular pattern receptors.

    Science.gov (United States)

    Wang, Luqiao; Nanayakkara, Gayani; Yang, Qian; Tan, Hongmei; Drummer, Charles; Sun, Yu; Shao, Ying; Fu, Hangfei; Cueto, Ramon; Shan, Huimin; Bottiglieri, Teodoro; Li, Ya-Feng; Johnson, Candice; Yang, William Y; Yang, Fan; Xu, Yanjie; Xi, Hang; Liu, Weiqing; Yu, Jun; Choi, Eric T; Cheng, Xiaoshu; Wang, Hong; Yang, Xiaofeng

    2017-10-24

    Nuclear receptors (NRs) can regulate gene expression; therefore, they are classified as transcription factors. Despite the extensive research carried out on NRs, still several issues including (1) the expression profile of NRs in human tissues, (2) how the NR expression is modulated during atherosclerosis and metabolic diseases, and (3) the overview of the role of NRs in inflammatory conditions are not fully understood. To determine whether and how the expression of NRs are regulated in physiological/pathological conditions, we took an experimental database analysis to determine expression of all 48 known NRs in 21 human and 17 murine tissues as well as in pathological conditions. We made the following significant findings: (1) NRs are differentially expressed in tissues, which may be under regulation by oxygen sensors, angiogenesis pathway, stem cell master regulators, inflammasomes, and tissue hypo-/hypermethylation indexes; (2) NR sequence mutations are associated with increased risks for development of cancers and metabolic, cardiovascular, and autoimmune diseases; (3) NRs have less tendency to be upregulated than downregulated in cancers, and autoimmune and metabolic diseases, which may be regulated by inflammation pathways and mitochondrial energy enzymes; and (4) the innate immune sensor inflammasome/caspase-1 pathway regulates the expression of most NRs. Based on our findings, we propose a new paradigm that most nuclear receptors are anti-inflammatory homeostasis-associated molecular pattern receptors (HAMPRs). Our results have provided a novel insight on NRs as therapeutic targets in metabolic diseases, inflammations, and malignancies.

  17. Service mining framework and application

    CERN Document Server

    Chang, Wei-Lun

    2014-01-01

    The shifting focus of service from the 1980s to 2000s has proved that IT not only lowers the cost of service but creates avenues to enhance and increase revenue through service. The new type of service, e-service, is mobile, flexible, interactive, and interchangeable. While service science provides an avenue for future service researches, the specific research areas from the IT perspective still need to be elaborated. This book introduces a novel concept-service mining-to address several research areas from technology, model, management, and application perspectives. Service mining is defined as "a systematical process including service discovery, service experience, service recovery, and service retention to discover unique patterns and exceptional values within the existing services." The goal of service mining is similar to data mining, text mining, or web mining, and aims to "detect something new" from the service pool. The major difference is the feature of service is quite distinct from the mining targe...

  18. Mining Building Metadata by Data Stream Comparison

    DEFF Research Database (Denmark)

    Holmegaard, Emil; Kjærgaard, Mikkel Baun

    2016-01-01

    to handle data streams with only slightly similar patterns. We have evaluated Metafier with points and data from one building located in Denmark. We have evaluated Metafier with 903 points, and the overall accuracy, with only 3 known examples, was 94.71%. Furthermore we found that using DTW for mining...... ways to annotate sensor and actuation points. This makes it difficult to create intuitive queries for retrieving data streams from points. Another problem is the amount of insufficient or missing metadata. We introduce Metafier, a tool for extracting metadata from comparing data streams. Metafier...... enables a semi-automatic labeling of metadata to building instrumentation. Metafier annotates points with metadata by comparing the data from a set of validated points with unvalidated points. Metafier has three different algorithms to compare points with based on their data. The three algorithms...

  19. Scheduling trucks in cross docking systems with temporary storage and dock repeat truck holding pattern using genetic algorithm

    Directory of Open Access Journals (Sweden)

    Ehsan Ghobadian

    2013-02-01

    Full Text Available Cross docking is one of the most important issues in management of supply chains. In cross docking, different items delivered to a warehouse by inbound trucks are directly arranged and reorganized based on customer demands, routed and loaded into outbound trucks for delivery purposes to customers without virtually keeping them at the warehouse. If any item is kept in storage, it is normally for a short amount of time, say less than 24 hours. In this paper, we consider a special case of cross docking where there is temporary storage and implements genetic algorithm to solve the resulted problem for some realistic test problems. In our method, we first use some heuristics as initial solutions and then improve the final solution using genetic algorithm. The performance of the proposed model is compared with alternative solution strategy, the GRASP method.

  20. Proposta de reflexão teórica e análise de padrões conceituais com data mining Theoretical discussion and conceptual pattern analysis with data mining

    Directory of Open Access Journals (Sweden)

    Álvaro Machado Dias

    2011-08-01

    Full Text Available Mais do que uma teoria ou modelo, a Teoria da Mente se refere a um campo de estudos voltado à habilidade de se prospectar intenções alheias. Visando contribuir para a discussão teórica e a interpretação da literatura no tema, o presente estudo apresenta: 1. Um mapa conceitual do campo, baseado em data mining/text mining; 2. Uma abordagem conceitual inovadora e mais eficiente aos estudos de ToM informacional; 3. Uma discussão crítica da extensão e limites dos principais modelos, baseada na análise da literatura com data/text mining e nas perspectivas teóricas anteriormente alinhavadas.More than just a theory or a model, Theory of Mind represents a field of studies concerned with the ability to prospect someone else's intentions. Aiming to contribute to theoretical discussion and the interpretation of the literature on the matter, this study presents: 1. A conceptual map of the field, based on data mining/text mining techniques; 2. A new and advanced conceptual framework focused on informational ToM studies; 3. A critical discussion of the extensions and limits of the most prominent models, based on the outputs of the data/text mining analysis and on the theoretical perspectives that were previously raised.

  1. AN EFFICIENT DATA MINING METHOD TO FIND FREQUENT ITEM SETS IN LARGE DATABASE USING TR- FCTM

    Directory of Open Access Journals (Sweden)

    Saravanan Suba

    2016-01-01

    Full Text Available Mining association rules in large database is one of most popular data mining techniques for business decision makers. Discovering frequent item set is the core process in association rule mining. Numerous algorithms are available in the literature to find frequent patterns. Apriori and FP-tree are the most common methods for finding frequent items. Apriori finds significant frequent items using candidate generation with more number of data base scans. FP-tree uses two database scans to find significant frequent items without using candidate generation. This proposed TR-FCTM (Transaction Reduction- Frequency Count Table Method discovers significant frequent items by generating full candidates once to form frequency count table with one database scan. Experimental results of TR-FCTM shows that this algorithm outperforms than Apriori and FP-tree.

  2. Mine drivage in hydraulic mines

    Energy Technology Data Exchange (ETDEWEB)

    Ehkber, B Ya

    1983-09-01

    From 20 to 25% of labor cost in hydraulic coal mines falls on mine drivage. Range of mine drivage is high due to the large number of shortwalls mined by hydraulic monitors. Reducing mining cost in hydraulic mines depends on lowering drivage cost by use of new drivage systems or by increasing efficiency of drivage systems used at present. The following drivage methods used in hydraulic mines are compared: heading machines with hydraulic haulage of cut rocks and coal, hydraulic monitors with hydraulic haulage, drilling and blasting with hydraulic haulage of blasted rocks. Mining and geologic conditions which influence selection of the optimum mine drivage system are analyzed. Standardized cross sections of mine roadways driven by the 3 methods are shown in schemes. Support systems used in mine roadways are compared: timber supports, roof bolts, roof bolts with steel elements, and roadways driven in rocks without a support system. Heading machines (K-56MG, GPKG, 4PU, PK-3M) and hydraulic monitors (GMDTs-3M, 12GD-2) used for mine drivage are described. Data on mine drivage in hydraulic coal mines in the Kuzbass are discussed. From 40 to 46% of roadways are driven by heading machines with hydraulic haulage and from 12 to 15% by hydraulic monitors with hydraulic haulage.

  3. Practical graph mining with R

    CERN Document Server

    Hendrix, William; Jenkins, John; Padmanabhan, Kanchana; Chakraborty, Arpan

    2014-01-01

    Practical Graph Mining with R presents a "do-it-yourself" approach to extracting interesting patterns from graph data. It covers many basic and advanced techniques for the identification of anomalous or frequently recurring patterns in a graph, the discovery of groups or clusters of nodes that share common patterns of attributes and relationships, the extraction of patterns that distinguish one category of graphs from another, and the use of those patterns to predict the category of new graphs. Hands-On Application of Graph Data Mining Each chapter in the book focuses on a graph mining task, such as link analysis, cluster analysis, and classification. Through applications using real data sets, the book demonstrates how computational techniques can help solve real-world problems. The applications covered include network intrusion detection, tumor cell diagnostics, face recognition, predictive toxicology, mining metabolic and protein-protein interaction networks, and community detection in social networks. De...

  4. Prediction of customer behaviour analysis using classification algorithms

    Science.gov (United States)

    Raju, Siva Subramanian; Dhandayudam, Prabha

    2018-04-01

    Customer Relationship management plays a crucial role in analyzing of customer behavior patterns and their values with an enterprise. Analyzing of customer data can be efficient performed using various data mining techniques, with the goal of developing business strategies and to enhance the business. In this paper, three classification models (NB, J48, and MLPNN) are studied and evaluated for our experimental purpose. The performance measures of the three classifications are compared using three different parameters (accuracy, sensitivity, specificity) and experimental results expose J48 algorithm has better accuracy with compare to NB and MLPNN algorithm.

  5. Multi-mode energy management strategy for fuel cell electric vehicles based on driving pattern identification using learning vector quantization neural network algorithm

    Science.gov (United States)

    Song, Ke; Li, Feiqiang; Hu, Xiao; He, Lin; Niu, Wenxu; Lu, Sihao; Zhang, Tong

    2018-06-01

    The development of fuel cell electric vehicles can to a certain extent alleviate worldwide energy and environmental issues. While a single energy management strategy cannot meet the complex road conditions of an actual vehicle, this article proposes a multi-mode energy management strategy for electric vehicles with a fuel cell range extender based on driving condition recognition technology, which contains a patterns recognizer and a multi-mode energy management controller. This paper introduces a learning vector quantization (LVQ) neural network to design the driving patterns recognizer according to a vehicle's driving information. This multi-mode strategy can automatically switch to the genetic algorithm optimized thermostat strategy under specific driving conditions in the light of the differences in condition recognition results. Simulation experiments were carried out based on the model's validity verification using a dynamometer test bench. Simulation results show that the proposed strategy can obtain better economic performance than the single-mode thermostat strategy under dynamic driving conditions.

  6. Data mining for dummies

    CERN Document Server

    Brown, Meta S

    2014-01-01

    Delve into your data for the key to success Data mining is quickly becoming integral to creating value and business momentum. The ability to detect unseen patterns hidden in the numbers exhaustively generated by day-to-day operations allows savvy decision-makers to exploit every tool at their disposal in the pursuit of better business. By creating models and testing whether patterns hold up, it is possible to discover new intelligence that could change your business''s entire paradigm for a more successful outcome. Data Mining for Dummies shows you why it doesn''t take a data scientist to gain

  7. Uncertainty modeling for data mining a label semantics approach

    CERN Document Server

    Qin, Zengchang

    2014-01-01

    Outlining a new research direction in fuzzy set theory applied to data mining, this volume proposes a number of new data mining algorithms and includes dozens of figures and illustrations that help the reader grasp the complexities of the concepts.

  8. Web Mining

    Science.gov (United States)

    Fürnkranz, Johannes

    The World-Wide Web provides every internet citizen with access to an abundance of information, but it becomes increasingly difficult to identify the relevant pieces of information. Research in web mining tries to address this problem by applying techniques from data mining and machine learning to Web data and documents. This chapter provides a brief overview of web mining techniques and research areas, most notably hypertext classification, wrapper induction, recommender systems and web usage mining.

  9. Dietary Assessment on a Mobile Phone Using Image Processing and Pattern Recognition Techniques: Algorithm Design and System Prototyping

    Directory of Open Access Journals (Sweden)

    Yasmine Probst

    2015-07-01

    Full Text Available Dietary assessment, while traditionally based on pen-and-paper, is rapidly moving towards automatic approaches. This study describes an Australian automatic food record method and its prototype for dietary assessment via the use of a mobile phone and techniques of image processing and pattern recognition. Common visual features including scale invariant feature transformation (SIFT, local binary patterns (LBP, and colour are used for describing food images. The popular bag-of-words (BoW model is employed for recognizing the images taken by a mobile phone for dietary assessment. Technical details are provided together with discussions on the issues and future work.

  10. Text Mining.

    Science.gov (United States)

    Trybula, Walter J.

    1999-01-01

    Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…

  11. Surface mining

    Science.gov (United States)

    Robert Leopold; Bruce Rowland; Reed Stalder

    1979-01-01

    The surface mining process consists of four phases: (1) exploration; (2) development; (3) production; and (4) reclamation. A variety of surface mining methods has been developed, including strip mining, auger, area strip, open pit, dredging, and hydraulic. Sound planning and design techniques are essential to implement alternatives to meet the myriad of laws,...

  12. Uranium mining

    International Nuclear Information System (INIS)

    Lange, G.

    1975-01-01

    The winning of uranium ore is the first stage of the fuel cycle. The whole complex of questions to be considered when evaluating the profitability of an ore mine is shortly outlined, and the possible mining techniques are described. Some data on uranium mining in the western world are also given. (RB) [de

  13. Process Mining Online Assessment Data

    Science.gov (United States)

    Pechenizkiy, Mykola; Trcka, Nikola; Vasilyeva, Ekaterina; van der Aalst, Wil; De Bra, Paul

    2009-01-01

    Traditional data mining techniques have been extensively applied to find interesting patterns, build descriptive and predictive models from large volumes of data accumulated through the use of different information systems. The results of data mining can be used for getting a better understanding of the underlying educational processes, for…

  14. Process mining online assessment data

    NARCIS (Netherlands)

    Pechenizkiy, M.; Trcka, N.; Vasilyeva, E.; Aalst, van der W.M.P.; De Bra, P.M.E.; Barnes, T.; Desmarais, M.; Romero, C.; Ventura, S.

    2009-01-01

    Traditional data mining techniques have been extensively applied to find interesting patterns, build descriptive and predictive models from large volumes of data accumulated through the use of different information systems. The results of data mining can be used for getting a better understanding of

  15. Discovering More Accurate Frequent Web Usage Patterns

    OpenAIRE

    Bayir, Murat Ali; Toroslu, Ismail Hakki; Cosar, Ahmet; Fidan, Guven

    2008-01-01

    Web usage mining is a type of web mining, which exploits data mining techniques to discover valuable information from navigation behavior of World Wide Web users. As in classical data mining, data preparation and pattern discovery are the main issues in web usage mining. The first phase of web usage mining is the data processing phase, which includes the session reconstruction operation from server logs. Session reconstruction success directly affects the quality of the frequent patterns disc...

  16. Comparing sets of patterns with the Jaccard index

    Directory of Open Access Journals (Sweden)

    Sam Fletcher

    2018-03-01

    Full Text Available The ability to extract knowledge from data has been the driving force of Data Mining since its inception, and of statistical modeling long before even that. Actionable knowledge often takes the form of patterns, where a set of antecedents can be used to infer a consequent. In this paper we offer a solution to the problem of comparing different sets of patterns. Our solution allows comparisons between sets of patterns that were derived from different techniques (such as different classification algorithms, or made from different samples of data (such as temporal data or data perturbed for privacy reasons. We propose using the Jaccard index to measure the similarity between sets of patterns by converting each pattern into a single element within the set. Our measure focuses on providing conceptual simplicity, computational simplicity, interpretability, and wide applicability. The results of this measure are compared to prediction accuracy in the context of a real-world data mining scenario.

  17. Contract Mining versus Owner Mining

    African Journals Online (AJOL)

    Owner

    mining companies can concentrate on their core businesses while using specialists for ... 2 Definition of Contract and Owner. Mining ... equipment maintenance, scheduling and budgeting ..... No. Region. Amount Spent on. Contract Mining. ($ billion). Percent of. Total. 1 ... cost and productivity data based on a large range.

  18. Fuzzy Clustering: An Approachfor Mining Usage Profilesfrom Web

    OpenAIRE

    Ms.Archana N. Boob; Prof. D. M. Dakhane

    2012-01-01

    Web usage mining is an application of data mining technology to mining the data of the web server log file. It can discover the browsing patterns of user and some kind of correlations between the web pages. Web usage mining provides the support for the web site design, providing personalization server and other business making decision, etc. Web mining applies the data mining, the artificial intelligence and the chart technology and so on to the web data and traces users' visiting characteris...

  19. Mining Branching Rules from Past Survey Data with an Illustration Using a Geriatric Assessment Survey for Older Adults with Cancer

    Directory of Open Access Journals (Sweden)

    Daniel R. Jeske

    2016-05-01

    Full Text Available We construct a fast data mining algorithm that can be used to identify high-frequency response patterns in historical surveys. Identification of these patterns leads to the derivation of question branching rules that shorten the time required to complete a survey. The data mining algorithm allows the user to control the error rate that is incurred through the use of implied answers that go along with each branching rule. The context considered is binary response questions, which can be obtained from multi-level response questions through dichotomization. The algorithm is illustrated by the analysis of four sections of a geriatric assessment survey used by oncologists. Reductions in the number of questions that need to be asked in these four sections range from 33% to 54%.

  20. An optimization framework for process discovery algorithms

    NARCIS (Netherlands)

    Weijters, A.J.M.M.; Stahlbock, R.

    2011-01-01

    Today there are many process mining techniques that, based on an event log, allow for the automatic induction of a process model. The process mining algorithms that are able to deal with incomplete event logs, exceptions, and noise typically have many parameters to tune the algorithm. Therefore, the

  1. Pattern recognition, neural networks, genetic algorithms and high performance computing in nuclear reactor diagnostics. Results and perspectives

    International Nuclear Information System (INIS)

    Dzwinel, W.; Pepyolyshev, N.

    1996-01-01

    The main goal of this paper is the presentation of our experience in development of the diagnostic system for the IBR-2 (Russia - Dubna) nuclear reactor. The authors show the principal results of the system modifications to make it work more reliable and much faster. The former needs the adaptation of new techniques of data processing, the latter, implementation of the newest computational facilities. The results of application of the clustering techniques and a method of visualization of the multi-dimensional information directly on the operator display are presented. The experiences with neural nets, used for prediction of the reactor operation, are discussed. The genetic algorithms were also tested, to reduce the quantity of data nd extracting the most informative components of the analyzed spectra. (authors)

  2. Support-Less Association Rule Mining Using Tuple Count Cube

    OpenAIRE

    Qin Ding; William Perrizo

    2007-01-01

    Association rule mining is one of the important tasks in data mining and knowledge discovery (KDD). The traditional task of association rule mining is to find all the rules with high support and high confidence. In some applications, we are interested in finding high confidence rules even though the support may be low. This type of problem differs from the traditional association rule mining problem; hence, it is called support-less association rule mining. Existing algorithms for association...

  3. Mining dynamic noteworthy functions in software execution sequences.

    Science.gov (United States)

    Zhang, Bing; Huang, Guoyan; Wang, Yuqian; He, Haitao; Ren, Jiadong

    2017-01-01

    As the quality of crucial entities can directly affect that of software, their identification and protection become an important premise for effective software development, management, maintenance and testing, which thus contribute to improving the software quality and its attack-defending ability. Most analysis and evaluation on important entities like codes-based static structure analysis are on the destruction of the actual software running. In this paper, from the perspective of software execution process, we proposed an approach to mine dynamic noteworthy functions (DNFM)in software execution sequences. First, according to software decompiling and tracking stack changes, the execution traces composed of a series of function addresses were acquired. Then these traces were modeled as execution sequences and then simplified so as to get simplified sequences (SFS), followed by the extraction of patterns through pattern extraction (PE) algorithm from SFS. After that, evaluating indicators inner-importance and inter-importance were designed to measure the noteworthiness of functions in DNFM algorithm. Finally, these functions were sorted by their noteworthiness. Comparison and contrast were conducted on the experiment results from two traditional complex network-based node mining methods, namely PageRank and DegreeRank. The results show that the DNFM method can mine noteworthy functions in software effectively and precisely.

  4. Modelling hydrological connectivity in semi-arid flat areas: effect of the flow accumulation algorithm on the spatial pattern

    Science.gov (United States)

    López-Vicente, Manuel; Álvarez, Sara

    2017-04-01

    Much of the water and sediment fluxes in semi-arid landscapes are found to be concentrated in localized pathways. Identifying the location of these pathways is important for management and restoration. This task becomes more complicated in flat areas, such as alluvial terraces, where geomorphic features of concentrated overland flow (rills and ephemeral gullies) are scarce or inexistent. Field identification of sediment delivery pathways as well as depositional areas is also difficult and challenged. The concept of hydrological connectivity (HC) helps us to express the complexity of landscape non-linear responses to rainfall inputs. One of the unsolved issues in overland flow modelling studies is the choice of the right flow accumulation algorithm (FAA). There is an abundant literature on runoff generation under semi-arid conditions, and relating HC and land use management and changes. However, we found a scientific gap in the literature focussed on modelling of HC in flat areas under semi-arid conditions. This study aims to fill in this gap by modelling HC in alluvial terraces (28 ha) in NE Spain under semi-arid conditions (342 mm / year), mainly devoted to rain-fed cereal fields, by using eight FAA. For this purpose, we applied a modified version of the Borselli's index of runoff and sediment connectivity (IC). The study area includes seven fields on flat alluvial terraces, three fields on a gentle slope, small patches of scrubland, and twelve grass buffer strips that are located between each set of fields. Gentle and flat areas (S drone (model eBee by senseFly Ltd.). In order to minimize the effect of the vegetation on the photogrammetry restitution technique, pictures were taken in early spring, before the growth of the cereals. Then, several DEMs were generated independently. For this study, we chose the DEM at 0.5 x 0.5 m of spatial resolution. Before running the IC model, the continuity of the flow path lines throughout the landscape was ensured by removing

  5. Applying genetic algorithms to set the optimal combination of forest fire related variables and model forest fire susceptibility based on data mining models. The case of Dayu County, China.

    Science.gov (United States)

    Hong, Haoyuan; Tsangaratos, Paraskevas; Ilia, Ioanna; Liu, Junzhi; Zhu, A-Xing; Xu, Chong

    2018-07-15

    The main objective of the present study was to utilize Genetic Algorithms (GA) in order to obtain the optimal combination of forest fire related variables and apply data mining methods for constructing a forest fire susceptibility map. In the proposed approach, a Random Forest (RF) and a Support Vector Machine (SVM) was used to produce a forest fire susceptibility map for the Dayu County which is located in southwest of Jiangxi Province, China. For this purpose, historic forest fires and thirteen forest fire related variables were analyzed, namely: elevation, slope angle, aspect, curvature, land use, soil cover, heat load index, normalized difference vegetation index, mean annual temperature, mean annual wind speed, mean annual rainfall, distance to river network and distance to road network. The Natural Break and the Certainty Factor method were used to classify and weight the thirteen variables, while a multicollinearity analysis was performed to determine the correlation among the variables and decide about their usability. The optimal set of variables, determined by the GA limited the number of variables into eight excluding from the analysis, aspect, land use, heat load index, distance to river network and mean annual rainfall. The performance of the forest fire models was evaluated by using the area under the Receiver Operating Characteristic curve (ROC-AUC) based on the validation dataset. Overall, the RF models gave higher AUC values. Also the results showed that the proposed optimized models outperform the original models. Specifically, the optimized RF model gave the best results (0.8495), followed by the original RF (0.8169), while the optimized SVM gave lower values (0.7456) than the RF, however higher than the original SVM (0.7148) model. The study highlights the significance of feature selection techniques in forest fire susceptibility, whereas data mining methods could be considered as a valid approach for forest fire susceptibility modeling

  6. Physics Mining of Multi-source Data Sets, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose to implement novel physics mining algorithms with analytical capabilities to derive diagnostic and prognostic numerical models from multi-source...

  7. Use of the theory of recognition of patterns in developing methane metering equipment for blow-out-dangerous mines. [Instrument recognizes rate of change of methane concentration and, if dangerous, shuts off electrical equipment

    Energy Technology Data Exchange (ETDEWEB)

    Medvedev, V N

    1978-01-01

    In the most general form, the existing methane-metering equipment which issues command signals when the maximum permissible value of methane concentration has been reached can be viewed as a recognition system. The algorithm for operation on the principle of evaluating the degree of blow-out danger of the ore atmosphere stipulates the recognition of two situations: 1) ''not dangerous ''(methane concentration below maximum permissible value); 2) ''dangerous'' (disorders in technological process; methane concentration above maximum permissible value). This approach for constructing means for gas protection is optimal only for mines working beds which are not dangerous for sudden blow-outs. However, if we ''train'' the apparatus to recognize what was the reason for increase in methane concentration, ways are afforded for solving the problem of creating an effective methane-metering equipment for mines with sudden blow-outs. Gas-dynamic processes with sudden blow-outs can be distinguished from standard technological, in particular, according to the rate in increase in methane concentration. On this basis, functional plan is proposed for constructing the automatic gas protection for explosiondangerous mines which includes a primary measurement of methane concentration, block of concentration control, block of process recognition, block of command signals, block of information delay, block of measuring the rate of methane concentration, threshold device for the rate of increase in concentration.

  8. Practical data mining and machine learning for optics applications: introduction to the feature issue.

    Science.gov (United States)

    Abdulla, Ghaleb; Awwal, Abdul; Borne, Kirk; Ho, Tin Kam; Vestrand, W Thomas

    2011-08-01

    Data mining algorithms utilize search techniques to explore hidden patterns and correlations in the data, which otherwise require a tremendous amount of human time to explore. This feature issue explores the use of such techniques to help understand the data, build better simulators, explain outlier behavior, and build better predictive models. We hope that this issue will spur discussions and expose a set of tools that can be useful to the optics community.

  9. Heuristics Miner for E-Commerce Visitor Access Pattern Representation

    Directory of Open Access Journals (Sweden)

    Kartina Diah Kesuma Wardhani

    2017-06-01

    Full Text Available E-commerce click stream data can form a certain pattern that describe visitor behavior while surfing the e-commerce website. This pattern can be used to initiate a design to determine alternative access sequence on the website. This research use heuristic miner algorithm to determine the pattern. σ-Algorithm and Genetic Mining are methods used for pattern recognition with frequent sequence item set approach. Heuristic Miner is an evolved form of those methods. σ-Algorithm assume that an activity in a website, that has been recorded in the data log, is a complete sequence from start to finish, without any tolerance to incomplete data or data with noise. On the other hand, Genetic Mining is a method that tolerate incomplete data or data with noise, so it can generate a more detailed e-commerce visitor access pattern. In this study, the same sequence of events obtained from six-generated patterns. The resulting pattern of visitor access is that visitors are often access the home page and then the product category page or the home page and then the full text search page.

  10. Quantification of Operational Risk Using A Data Mining

    Science.gov (United States)

    Perera, J. Sebastian

    1999-01-01

    What is Data Mining? - Data Mining is the process of finding actionable information hidden in raw data. - Data Mining helps find hidden patterns, trends, and important relationships often buried in a sea of data - Typically, automated software tools based on advanced statistical analysis and data modeling technology can be utilized to automate the data mining process

  11. Use of Pattern Classification Algorithms to Interpret Passive and Active Data Streams from a Walking-Speed Robotic Sensor Platform

    Science.gov (United States)

    Dieckman, Eric Allen

    In order to perform useful tasks for us, robots must have the ability to notice, recognize, and respond to objects and events in their environment. This requires the acquisition and synthesis of information from a variety of sensors. Here we investigate the performance of a number of sensor modalities in an unstructured outdoor environment, including the Microsoft Kinect, thermal infrared camera, and coffee can radar. Special attention is given to acoustic echolocation measurements of approaching vehicles, where an acoustic parametric array propagates an audible signal to the oncoming target and the Kinect microphone array records the reflected backscattered signal. Although useful information about the target is hidden inside the noisy time domain measurements, the Dynamic Wavelet Fingerprint process (DWFP) is used to create a time-frequency representation of the data. A small-dimensional feature vector is created for each measurement using an intelligent feature selection process for use in statistical pattern classification routines. Using our experimentally measured data from real vehicles at 50 m, this process is able to correctly classify vehicles into one of five classes with 94% accuracy. Fully three-dimensional simulations allow us to study the nonlinear beam propagation and interaction with real-world targets to improve classification results.

  12. Development of an optimized algorithm for the characterization of microflow using speckle patterns present in optical coherence tomography signal

    International Nuclear Information System (INIS)

    Pretto, Lucas Ramos de

    2015-01-01

    This work discusses the Optical Coherence Tomography system (OCT) and its application to the microfluidics area. To this end, physical characterization of microfluidic circuits were performed using 3D (three-dimensional) models constructed from OCT images of such circuits. The technique was thus evaluated as a potential tool to aid in the inspection of microchannels. Going further, this work paper studies and develops analytical techniques for microfluidic flow, in particular techniques based on speckle pattern. In the first instance, existing methods were studied and improved, such as Speckle Variance - OCT, where a gain of 31% was obtained in processing time. Other methods, such as LASCA (Laser Speckle Contrast Analysis), based on speckle autocorrelation, are adapted to OCT images. Derived from LASCA, the developed analysis technique based on intensity autocorrelation motivated the development of a custom OCT system as well as an optimized acquisition software, with a sampling rate of 8 kHz. The proposed method was, then, able to distinguish different flow rates, and limits of detection were tested, proving its feasibility for implementation on Brownian motion analysis and flow rates below 10 μl/min. (author)

  13. A novel algorithm to detect glaucoma risk using texton and local configuration pattern features extracted from fundus images.

    Science.gov (United States)

    Acharya, U Rajendra; Bhat, Shreya; Koh, Joel E W; Bhandary, Sulatha V; Adeli, Hojjat

    2017-09-01

    Glaucoma is an optic neuropathy defined by characteristic damage to the optic nerve and accompanying visual field deficits. Early diagnosis and treatment are critical to prevent irreversible vision loss and ultimate blindness. Current techniques for computer-aided analysis of the optic nerve and retinal nerve fiber layer (RNFL) are expensive and require keen interpretation by trained specialists. Hence, an automated system is highly desirable for a cost-effective and accurate screening for the diagnosis of glaucoma. This paper presents a new methodology and a computerized diagnostic system. Adaptive histogram equalization is used to convert color images to grayscale images followed by convolution of these images with Leung-Malik (LM), Schmid (S), and maximum response (MR4 and MR8) filter banks. The basic microstructures in typical images are called textons. The convolution process produces textons. Local configuration pattern (LCP) features are extracted from these textons. The significant features are selected using a sequential floating forward search (SFFS) method and ranked using the statistical t-test. Finally, various classifiers are used for classification of images into normal and glaucomatous classes. A high classification accuracy of 95.8% is achieved using six features obtained from the LM filter bank and the k-nearest neighbor (kNN) classifier. A glaucoma integrative index (GRI) is also formulated to obtain a reliable and effective system. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Prediction of Allogeneic Hematopoietic Stem-Cell Transplantation Mortality 100 Days After Transplantation Using a Machine Learning Algorithm: A European Group for Blood and Marrow Transplantation Acute Leukemia Working Party Retrospective Data Mining Study.

    Science.gov (United States)

    Shouval, Roni; Labopin, Myriam; Bondi, Ori; Mishan-Shamay, Hila; Shimoni, Avichai; Ciceri, Fabio; Esteve, Jordi; Giebel, Sebastian; Gorin, Norbert C; Schmid, Christoph; Polge, Emmanuelle; Aljurf, Mahmoud; Kroger, Nicolaus; Craddock, Charles; Bacigalupo, Andrea; Cornelissen, Jan J; Baron, Frederic; Unger, Ron; Nagler, Arnon; Mohty, Mohamad

    2015-10-01

    Allogeneic hematopoietic stem-cell transplantation (HSCT) is potentially curative for acute leukemia (AL), but carries considerable risk. Machine learning algorithms, which are part of the data mining (DM) approach, may serve for transplantation-related mortality risk prediction. This work is a retrospective DM study on a cohort of 28,236 adult HSCT recipients from the AL registry of the European Group for Blood and Marrow Transplantation. The primary objective was prediction of overall mortality (OM) at 100 days after HSCT. Secondary objectives were estimation of nonrelapse mortality, leukemia-free survival, and overall survival at 2 years. Donor, recipient, and procedural characteristics were analyzed. The alternating decision tree machine learning algorithm was applied for model development on 70% of the data set and validated on the remaining data. OM prevalence at day 100 was 13.9% (n=3,936). Of the 20 variables considered, 10 were selected by the model for OM prediction, and several interactions were discovered. By using a logistic transformation function, the crude score was transformed into individual probabilities for 100-day OM (range, 3% to 68%). The model's discrimination for the primary objective performed better than the European Group for Blood and Marrow Transplantation score (area under the receiver operating characteristics curve, 0.701 v 0.646; Prisk evaluation of patients with AL before HSCT, and is available online (http://bioinfo.lnx.biu.ac.il/∼bondi/web1.html). It is presented as a continuous probabilistic score for the prediction of day 100 OM, extending prediction to 2 years. The DM method has proved useful for clinical prediction in HSCT. © 2015 by American Society of Clinical Oncology.

  15. Implications of Emerging Data Mining

    Science.gov (United States)

    Kulathuramaiyer, Narayanan; Maurer, Hermann

    Data Mining describes a technology that discovers non-trivial hidden patterns in a large collection of data. Although this technology has a tremendous impact on our lives, the invaluable contributions of this invisible technology often go unnoticed. This paper discusses advances in data mining while focusing on the emerging data mining capability. Such data mining applications perform multidimensional mining on a wide variety of heterogeneous data sources, providing solutions to many unresolved problems. This paper also highlights the advantages and disadvantages arising from the ever-expanding scope of data mining. Data Mining augments human intelligence by equipping us with a wealth of knowledge and by empowering us to perform our daily tasks better. As the mining scope and capacity increases, users and organizations become more willing to compromise privacy. The huge data stores of the ‚master miners` allow them to gain deep insights into individual lifestyles and their social and behavioural patterns. Data integration and analysis capability of combining business and financial trends together with the ability to deterministically track market changes will drastically affect our lives.

  16. Mining a database of single amplified genomes from Red Sea brine pool extremophiles-improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA).

    KAUST Repository

    Grö tzinger, Stefan W.; Alam, Intikhab; Ba Alawi, Wail; Bajic, Vladimir B.; Stingl, Ulrich; Eppinger, Jö rg

    2014-01-01

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant

  17. Mining Product Data Models: A Case Study

    Directory of Open Access Journals (Sweden)

    Cristina-Claudia DOLEAN

    2014-01-01

    Full Text Available This paper presents two case studies used to prove the validity of some data-flow mining algorithms. We proposed the data-flow mining algorithms because most part of mining algorithms focuses on the control-flow perspective. First case study uses event logs generated by an ERP system (Navision after we set several trackers on the data elements needed in the process analyzed; while the second case study uses the event logs generated by YAWL system. We offered a general solution of data-flow model extraction from different data sources. In order to apply the data-flow mining algorithms the event logs must comply a certain format (using InputOutput extension. But to respect this format, a set of conversion tools is needed. We depicted the conversion tools used and how we got the data-flow models. Moreover, the data-flow model is compared to the control-flow model.

  18. A survey on Big Data Stream Mining

    African Journals Online (AJOL)

    pc

    2018-03-05

    Mar 5, 2018 ... huge amount of stream like telecommunication systems. So, there ... streams have many challenges for data mining algorithm design like using of ..... A. Bifet and R. Gavalda, "Learning from Time-Changing Data with. Adaptive ...

  19. Mine Water Treatment in Hongai Coal Mines

    OpenAIRE

    Dang Phuong Thao; Dang Vu Chi

    2018-01-01

    Acid mine drainage (AMD) is recognized as one of the most serious environmental problem associated with mining industry. Acid water, also known as acid mine drainage forms when iron sulfide minerals found in the rock of coal seams are exposed to oxidizing conditions in coal mining. Until 2009, mine drainage in Hongai coal mines was not treated, leading to harmful effects on humans, animals and aquatic ecosystem. This report has examined acid mine drainage problem and techniques for acid mine ...

  20. EMiT: a process mining tool

    NARCIS (Netherlands)

    Dongen, van B.F.; Aalst, van der W.M.P.; Cortadella, J.; Reisig, W.

    2004-01-01

    Process mining offers a way to distill process models from event logs originating from transactional systems in logistics, banking, e-business, health-care, etc. The algorithms used for process mining are complex and in practise large logs are needed to derive a high-quality process model. To

  1. Declarative Process Mining for DCR Graphs

    DEFF Research Database (Denmark)

    Debois, Søren; Hildebrandt, Thomas T.; Laursen, Paw Høvsgaard

    2017-01-01

    We investigate process mining for the declarative Dynamic Condition Response (DCR) graphs process modelling language. We contribute (a) a process mining algorithm for DCR graphs, (b) a proposal for a set of metrics quantifying output model quality, and (c) a preliminary example-based comparison...

  2. Extending mine life

    International Nuclear Information System (INIS)

    Anon.

    1984-01-01

    Mine layouts, new machines and techniques, research into problem areas of ground control and so on, are highlighted in this report on extending mine life. The main resources taken into account are coal mining, uranium mining, molybdenum and gold mining

  3. Uranium mining

    International Nuclear Information System (INIS)

    2008-01-01

    Full text: The economic and environmental sustainability of uranium mining has been analysed by Monash University researcher Dr Gavin Mudd in a paper that challenges the perception that uranium mining is an 'infinite quality source' that provides solutions to the world's demand for energy. Dr Mudd says information on the uranium industry touted by politicians and mining companies is not necessarily inaccurate, but it does not tell the whole story, being often just an average snapshot of the costs of uranium mining today without reflecting the escalating costs associated with the process in years to come. 'From a sustainability perspective, it is critical to evaluate accurately the true lifecycle costs of all forms of electricity production, especially with respect to greenhouse emissions, ' he says. 'For nuclear power, a significant proportion of greenhouse emissions are derived from the fuel supply, including uranium mining, milling, enrichment and fuel manufacture.' Dr Mudd found that financial and environmental costs escalate dramatically as the uranium ore is used. The deeper the mining process required to extract the ore, the higher the cost for mining companies, the greater the impact on the environment and the more resources needed to obtain the product. I t is clear that there is a strong sensitivity of energy and water consumption and greenhouse emissions to ore grade, and that ore grades are likely to continue to decline gradually in the medium to long term. These issues are critical to the current debate over nuclear power and greenhouse emissions, especially with respect to ascribing sustainability to such activities as uranium mining and milling. For example, mining at Roxby Downs is responsible for the emission of over one million tonnes of greenhouse gases per year and this could increase to four million tonnes if the mine is expanded.'

  4. Non-communicable disease risk factor patterns among mining industry workers in Papua, Indonesia: longitudinal findings from the Cardiovascular Outcomes in a Papuan Population and Estimation of Risk (COPPER) Study.

    Science.gov (United States)

    Rodriguez-Fernandez, Rodrigo; Rahajeng, Ekowati; Viliani, Francesca; Kushadiwijaya, Haripurnomo; Amiya, Rachel M; Bangs, Michael J

    2015-10-01

    Non-communicable diseases (NCDs) constitute an increasing slice of the global burden of disease, with the South-East Asia region projected to see the highest increase in NCD-related deaths over the next decade. Mining industry employees may be exposed to various factors potentially elevating their NCD risk. This study aimed to assess the distribution and 5-year longitudinal trends of key metabolic NCD risk factors in a cohort of copper-gold mining company workers in Papua, Indonesia. Metabolic indicators of NCD risk were assessed among employees (15 580 at baseline, 6496 prospectively) of a large copper-gold mining operation in Papua, Indonesia, using routinely collected 5-year medical surveillance data. The study cohort comprised individuals aged 18-68 years employed for ≥1 year during 2008-2013. Assessed risk factors were based on repeat measures of cholesterol, blood glucose, blood pressure and body weight, using WHO criteria. Metabolic risk indicator rates were markedly high and increased significantly from baseline through 5-year follow-up (pmining operations setting in Papua, Indonesia, may face elevated NCD risk through various routes. Workplace health promotion interventions and policies targeting modifiable lifestyle patterns and environmental exposures present an important opportunity to reduce such susceptibilities and mitigate associated health risks. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  5. FraudMiner: A Novel Credit Card Fraud Detection Model Based on Frequent Itemset Mining

    Directory of Open Access Journals (Sweden)

    K. R. Seeja

    2014-01-01

    Full Text Available This paper proposes an intelligent credit card fraud detection model for detecting fraud from highly imbalanced and anonymous credit card transaction datasets. The class imbalance problem is handled by finding legal as well as fraud transaction patterns for each customer by using frequent itemset mining. A matching algorithm is also proposed to find to which pattern (legal or fraud the incoming transaction of a particular customer is closer and a decision is made accordingly. In order to handle the anonymous nature of the data, no preference is given to any of the attributes and each attribute is considered equally for finding the patterns. The performance evaluation of the proposed model is done on UCSD Data Mining Contest 2009 Dataset (anonymous and imbalanced and it is found that the proposed model has very high fraud detection rate, balanced classification rate, Matthews correlation coefficient, and very less false alarm rate than other state-of-the-art classifiers.

  6. FraudMiner: a novel credit card fraud detection model based on frequent itemset mining.

    Science.gov (United States)

    Seeja, K R; Zareapoor, Masoumeh

    2014-01-01

    This paper proposes an intelligent credit card fraud detection model for detecting fraud from highly imbalanced and anonymous credit card transaction datasets. The class imbalance problem is handled by finding legal as well as fraud transaction patterns for each customer by using frequent itemset mining. A matching algorithm is also proposed to find to which pattern (legal or fraud) the incoming transaction of a particular customer is closer and a decision is made accordingly. In order to handle the anonymous nature of the data, no preference is given to any of the attributes and each attribute is considered equally for finding the patterns. The performance evaluation of the proposed model is done on UCSD Data Mining Contest 2009 Dataset (anonymous and imbalanced) and it is found that the proposed model has very high fraud detection rate, balanced classification rate, Matthews correlation coefficient, and very less false alarm rate than other state-of-the-art classifiers.

  7. Application of a common spatial pattern-based algorithm for an fNIRS-based motor imagery brain-computer interface.

    Science.gov (United States)

    Zhang, Shen; Zheng, Yanchun; Wang, Daifa; Wang, Ling; Ma, Jianai; Zhang, Jing; Xu, Weihao; Li, Deyu; Zhang, Dan

    2017-08-10

    Motor imagery is one of the most investigated paradigms in the field of brain-computer interfaces (BCIs). The present study explored the feasibility of applying a common spatial pattern (CSP)-based algorithm for a functional near-infrared spectroscopy (fNIRS)-based motor imagery BCI. Ten participants performed kinesthetic imagery of their left- and right-hand movements while 20-channel fNIRS signals were recorded over the motor cortex. The CSP method was implemented to obtain the spatial filters specific for both imagery tasks. The mean, slope, and variance of the CSP filtered signals were taken as features for BCI classification. Results showed that the CSP-based algorithm outperformed two representative channel-wise methods for classifying the two imagery statuses using either data from all channels or averaged data from imagery responsive channels only (oxygenated hemoglobin: CSP-based: 75.3±13.1%; all-channel: 52.3±5.3%; averaged: 64.8±13.2%; deoxygenated hemoglobin: CSP-based: 72.3±13.0%; all-channel: 48.8±8.2%; averaged: 63.3±13.3%). Furthermore, the effectiveness of the CSP method was also observed for the motor execution data to a lesser extent. A partial correlation analysis revealed significant independent contributions from all three types of features, including the often-ignored variance feature. To our knowledge, this is the first study demonstrating the effectiveness of the CSP method for fNIRS-based motor imagery BCIs. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Improvement to the pattern of control rods of the equilibrium cycle of 18 months for the CLV using bio-inspired algorithms

    International Nuclear Information System (INIS)

    Perusquia, R.; Ortiz, J.J.; Montes, J.L.

    2003-01-01

    Nowadays in the National Institute of Nuclear Research are carried out studies with some bio-inspired optimization techniques to improve the performance of the fuel cycles of the boiling water reactors of the Laguna Verde power plant (CLV). In the present work two bio-inspired techniques were applied with the purpose of improving the performance of a balance cycle of 18 months developed for the CLV: genetic algorithms (AG) and systems based on ants colonies (SCH). The design of the reference cycle it represents in several aspects an optimal cycle proposed starting from the experience of several operation decades with the boiling water reactors (BWR initials for Boiling Water Reactor) in the world. To try to improve their performance is beforehand a difficult challenge and it puts on test the feasibility of the optimization methods in the reloads design. The study of the bio-inspired techniques was centered exclusively on the obtaining of the control rod patterns (PBC) trying to overcome the capacity factor reached in the design of the reference cycle. It was fixed the cycle length such that the decrease of the coast down period would represent an increase of the capacity factor of the cycle; so that, it diminishes the annual cost associated with the capital cost of the plant. As consequence of the study, was found that the algorithm based on the ants colonies reaches to diminish the coast down period in five and half days respect to the original balance cycle, what represents an annual saving of $US 74,000. Since the original cycle was optimized, the above-mentioned, shows the ability of the SCH for the optimization of the cycle design. With the AG it was reach to approach to the original balance cycle with a coast down period greater in seven days estimating an annual penalization of $US 130,000. (Author)

  9. Web Mining and Social Networking

    DEFF Research Database (Denmark)

    Xu, Guandong; Zhang, Yanchun; Li, Lin

    This book examines the techniques and applications involved in the Web Mining, Web Personalization and Recommendation and Web Community Analysis domains, including a detailed presentation of the principles, developed algorithms, and systems of the research in these areas. The applications of web ...... sense of individuals or communities. The volume will benefit both academic and industry communities interested in the techniques and applications of web search, web data management, web mining and web knowledge discovery, as well as web community and social network analysis.......This book examines the techniques and applications involved in the Web Mining, Web Personalization and Recommendation and Web Community Analysis domains, including a detailed presentation of the principles, developed algorithms, and systems of the research in these areas. The applications of web...... mining, and the issue of how to incorporate web mining into web personalization and recommendation systems are also reviewed. Additionally, the volume explores web community mining and analysis to find the structural, organizational and temporal developments of web communities and reveal the societal...

  10. In situ solution mining technique

    International Nuclear Information System (INIS)

    Learmont, R.P.

    1978-01-01

    A method of in situ solution mining is disclosed in which a primary leaching process employing an array of 5-spot leaching patterns of production and injection wells is converted to a different pattern by converting to injection wells all the production wells in alternate rows

  11. MO-A-BRD-09: A Data-Mining Algorithm for Large Scale Analysis of Dose-Outcome Relationships in a Database of Irradiated Head-And-Neck (HN) Cancer Patients

    Energy Technology Data Exchange (ETDEWEB)

    Robertson, SP; Quon, H; Kiess, AP; Moore, JA; Yang, W; Cheng, Z; Sharabi, A; McNutt, TR [Johns Hopkins University, Baltimore, MD (United States)

    2014-06-15

    Purpose: To develop a framework for automatic extraction of clinically meaningful dosimetric-outcome relationships from an in-house, analytic oncology database. Methods: Dose-volume histograms (DVH) and clinical outcome-related structured data elements have been routinely stored to our database for 513 HN cancer patients treated from 2007 to 2014. SQL queries were developed to extract outcomes that had been assessed for at least 100 patients, as well as DVH curves for organs-at-risk (OAR) that were contoured for at least 100 patients. DVH curves for paired OAR (e.g., left and right parotids) were automatically combined and included as additional structures for analysis. For each OAR-outcome combination, DVH dose points, D(V{sub t}), at a series of normalized volume thresholds, V{sub t}=[0.01,0.99], were stratified into two groups based on outcomes after treatment completion. The probability, P[D(V{sub t})], of an outcome was modeled at each V{sub t} by logistic regression. Notable combinations, defined as having P[D(V{sub t})] increase by at least 5% per Gy (p<0.05), were further evaluated for clinical relevance using a custom graphical interface. Results: A total of 57 individual and combined structures and 115 outcomes were queried, resulting in over 6,500 combinations for analysis. Of these, 528 combinations met the 5%/Gy requirement, with further manual inspection revealing a number of reasonable models based on either reported literature or proximity between neighboring OAR. The data mining algorithm confirmed the following well-known toxicity/outcome relationships: dysphagia/larynx, voice changes/larynx, esophagitis/esophagus, xerostomia/combined parotids, and mucositis/oral mucosa. Other notable relationships included dysphagia/pharyngeal constrictors, nausea/brainstem, nausea/spinal cord, weight-loss/mandible, and weight-loss/combined parotids. Conclusion: Our database platform has enabled large-scale analysis of dose-outcome relationships. The current data-mining

  12. Setup of a testing environment for mission planning in mining

    NARCIS (Netherlands)

    Groenen, J.P.J.; Steinbuch, M.

    2013-01-01

    Mission planning algorithms for surface mining applications are difficult to test as a result of the large scale tasks. To validate these algorithms, a scaled setup is created where the mining excavator is mimicked by an industrial robot. This report discusses the development of a software

  13. A Bayesian Scoring Technique for Mining Predictive and Non-Spurious Rules.

    Science.gov (United States)

    Batal, Iyad; Cooper, Gregory; Hauskrecht, Milos

    Rule mining is an important class of data mining methods for discovering interesting patterns in data. The success of a rule mining method heavily depends on the evaluation function that is used to assess the quality of the rules. In this work, we propose a new rule evaluation score - the Predictive and Non-Spurious Rules (PNSR) score. This score relies on Bayesian inference to evaluate the quality of the rules and considers the structure of the rules to filter out spurious rules. We present an efficient algorithm for finding rules with high PNSR scores. The experiments demonstrate that our method is able to cover and explain the data with a much smaller rule set than existing methods.

  14. Data preprocessing in data mining

    CERN Document Server

    García, Salvador; Herrera, Francisco

    2015-01-01

    Data Preprocessing for Data Mining addresses one of the most important issues within the well-known Knowledge Discovery from Data process. Data directly taken from the source will likely have inconsistencies, errors or most importantly, it is not ready to be considered for a data mining process. Furthermore, the increasing amount of data in recent science, industry and business applications, calls to the requirement of more complex tools to analyze it. Thanks to data preprocessing, it is possible to convert the impossible into possible, adapting the data to fulfill the input demands of each data mining algorithm. Data preprocessing includes the data reduction techniques, which aim at reducing the complexity of the data, detecting or removing irrelevant and noisy elements from the data. This book is intended to review the tasks that fill the gap between the data acquisition from the source and the data mining process. A comprehensive look from a practical point of view, including basic concepts and surveying t...

  15. Mechanical Model of Geometric Cell and Topological Algorithm for Cell Dynamics from Single-Cell to Formation of Monolayered Tissues with Pattern

    KAUST Repository

    Kachalo, Sëma

    2015-05-14

    Geometric and mechanical properties of individual cells and interactions among neighboring cells are the basis of formation of tissue patterns. Understanding the complex interplay of cells is essential for gaining insight into embryogenesis, tissue development, and other emerging behavior. Here we describe a cell model and an efficient geometric algorithm for studying the dynamic process of tissue formation in 2D (e.g. epithelial tissues). Our approach improves upon previous methods by incorporating properties of individual cells as well as detailed description of the dynamic growth process, with all topological changes accounted for. Cell size, shape, and division plane orientation are modeled realistically. In addition, cell birth, cell growth, cell shrinkage, cell death, cell division, cell collision, and cell rearrangements are now fully accounted for. Different models of cell-cell interactions, such as lateral inhibition during the process of growth, can be studied in detail. Cellular pattern formation for monolayered tissues from arbitrary initial conditions, including that of a single cell, can also be studied in detail. Computational efficiency is achieved through the employment of a special data structure that ensures access to neighboring cells in constant time, without additional space requirement. We have successfully generated tissues consisting of more than 20,000 cells starting from 2 cells within 1 hour. We show that our model can be used to study embryogenesis, tissue fusion, and cell apoptosis. We give detailed study of the classical developmental process of bristle formation on the epidermis of D. melanogaster and the fundamental problem of homeostatic size control in epithelial tissues. Simulation results reveal significant roles of solubility of secreted factors in both the bristle formation and the homeostatic control of tissue size. Our method can be used to study broad problems in monolayered tissue formation. Our software is publicly

  16. Set-Oriented Mining for Association Rules in Relational Databases

    NARCIS (Netherlands)

    Houtsma, M.A.W.; Houtsma, M.A.W.; Swami, A.

    1995-01-01

    Describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss the optimization of these

  17. Social big data mining

    CERN Document Server

    Ishikawa, Hiroshi

    2015-01-01

    Social Media. Big Data and Social Data. Hypotheses in the Era of Big Data. Social Big Data Applications. Basic Concepts in Data Mining. Association Rule Mining. Clustering. Classification. Prediction. Web Structure Mining. Web Content Mining. Web Access Log Mining, Information Extraction and Deep Web Mining. Media Mining. Scalability and Outlier Detection.

  18. A systematic review of data mining and machine learning for air pollution epidemiology.

    Science.gov (United States)

    Bellinger, Colin; Mohomed Jabbar, Mohomed Shazan; Zaïane, Osmar; Osornio-Vargas, Alvaro

    2017-11-28

    Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology. We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed. Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology. We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air

  19. Mining High-Dimensional Data

    Science.gov (United States)

    Wang, Wei; Yang, Jiong

    With the rapid growth of computational biology and e-commerce applications, high-dimensional data becomes very common. Thus, mining high-dimensional data is an urgent problem of great practical importance. However, there are some unique challenges for mining data of high dimensions, including (1) the curse of dimensionality and more crucial (2) the meaningfulness of the similarity measure in the high dimension space. In this chapter, we present several state-of-art techniques for analyzing high-dimensional data, e.g., frequent pattern mining, clustering, and classification. We will discuss how these methods deal with the challenges of high dimensionality.

  20. Mining Web-based Educational Systems to Predict Student Learning Achievements

    Directory of Open Access Journals (Sweden)

    José del Campo-Ávila

    2015-03-01

    Full Text Available Educational Data Mining (EDM is getting great importance as a new interdisciplinary research field related to some other areas. It is directly connected with Web-based Educational Systems (WBES and Data Mining (DM, a fundamental part of Knowledge Discovery in Databases. The former defines the context: WBES store and manage huge amounts of data. Such data are increasingly growing and they contain hidden knowledge that could be very useful to the users (both teachers and students. It is desirable to identify such knowledge in the form of models, patterns or any other representation schema that allows a better exploitation of the system. The latter reveals itself as the tool to achieve such discovering. Data mining must afford very complex and different situations to reach quality solutions. Therefore, data mining is a research field where many advances are being done to accommodate and solve emerging problems. For this purpose, many techniques are usually considered. In this paper we study how data mining can be used to induce student models from the data acquired by a specific Web-based tool for adaptive testing, called SIETTE. Concretely we have used top down induction decision trees algorithms to extract the patterns because these models, decision trees, are easily understandable. In addition, the conducted validation processes have assured high quality models.