WorldWideScience

Sample records for pattern mining algorithm

  1. Frequent Pattern Mining Algorithms for Data Clustering

    DEFF Research Database (Denmark)

    Zimek, Arthur; Assent, Ira; Vreeken, Jilles

    2014-01-01

    that frequent pattern mining was at the cradle of subspace clustering—yet, it quickly developed into an independent research field. In this chapter, we discuss how frequent pattern mining algorithms have been extended and generalized towards the discovery of local clusters in high-dimensional data......Discovering clusters in subspaces, or subspace clustering and related clustering paradigms, is a research field where we find many frequent pattern mining related influences. In fact, as the first algorithms for subspace clustering were based on frequent pattern mining algorithms, it is fair to say....... In particular, we discuss several example algorithms for subspace clustering or projected clustering as well as point out recent research questions and open topics in this area relevant to researchers in either clustering or pattern mining...

  2. Research on parallel algorithm for sequential pattern mining

    Science.gov (United States)

    Zhou, Lijuan; Qin, Bai; Wang, Yu; Hao, Zhongxiao

    2008-03-01

    Sequential pattern mining is the mining of frequent sequences related to time or other orders from the sequence database. Its initial motivation is to discover the laws of customer purchasing in a time section by finding the frequent sequences. In recent years, sequential pattern mining has become an important direction of data mining, and its application field has not been confined to the business database and has extended to new data sources such as Web and advanced science fields such as DNA analysis. The data of sequential pattern mining has characteristics as follows: mass data amount and distributed storage. Most existing sequential pattern mining algorithms haven't considered the above-mentioned characteristics synthetically. According to the traits mentioned above and combining the parallel theory, this paper puts forward a new distributed parallel algorithm SPP(Sequential Pattern Parallel). The algorithm abides by the principal of pattern reduction and utilizes the divide-and-conquer strategy for parallelization. The first parallel task is to construct frequent item sets applying frequent concept and search space partition theory and the second task is to structure frequent sequences using the depth-first search method at each processor. The algorithm only needs to access the database twice and doesn't generate the candidated sequences, which abates the access time and improves the mining efficiency. Based on the random data generation procedure and different information structure designed, this paper simulated the SPP algorithm in a concrete parallel environment and implemented the AprioriAll algorithm. The experiments demonstrate that compared with AprioriAll, the SPP algorithm had excellent speedup factor and efficiency.

  3. A Comparative Study of Frequent and Maximal Periodic Pattern Mining Algorithms in Spatiotemporal Databases

    Science.gov (United States)

    Obulesu, O.; Rama Mohan Reddy, A., Dr; Mahendra, M.

    2017-08-01

    Detecting regular and efficient cyclic models is the demanding activity for data analysts due to unstructured, vigorous and enormous raw information produced from web. Many existing approaches generate large candidate patterns in the occurrence of huge and complex databases. In this work, two novel algorithms are proposed and a comparative examination is performed by considering scalability and performance parameters. The first algorithm is, EFPMA (Extended Regular Model Detection Algorithm) used to find frequent sequential patterns from the spatiotemporal dataset and the second one is, ETMA (Enhanced Tree-based Mining Algorithm) for detecting effective cyclic models with symbolic database representation. EFPMA is an algorithm grows models from both ends (prefixes and suffixes) of detected patterns, which results in faster pattern growth because of less levels of database projection compared to existing approaches such as Prefixspan and SPADE. ETMA uses distinct notions to store and manage transactions data horizontally such as segment, sequence and individual symbols. ETMA exploits a partition-and-conquer method to find maximal patterns by using symbolic notations. Using this algorithm, we can mine cyclic models in full-series sequential patterns including subsection series also. ETMA reduces the memory consumption and makes use of the efficient symbolic operation. Furthermore, ETMA only records time-series instances dynamically, in terms of character, series and section approaches respectively. The extent of the pattern and proving efficiency of the reducing and retrieval techniques from synthetic and actual datasets is a really open & challenging mining problem. These techniques are useful in data streams, traffic risk analysis, medical diagnosis, DNA sequence Mining, Earthquake prediction applications. Extensive investigational outcomes illustrates that the algorithms outperforms well towards efficiency and scalability than ECLAT, STNR and MAFIA approaches.

  4. Personal continuous route pattern mining

    Institute of Scientific and Technical Information of China (English)

    Qian YE; Ling CHEN; Gen-cai CHEN

    2009-01-01

    In the daily life, people often repeat regular routes in certain periods. In this paper, a mining system is developed to find the continuous route patterns of personal past trips. In order to count the diversity of personal moving status, the mining system employs the adaptive GPS data recording and five data filters to guarantee the clean trips data. The mining system uses a client/server architecture to protect personal privacy and to reduce the computational load. The server conducts the main mining procedure but with insufficient information to recover real personal routes. In order to improve the scalability of sequential pattern mining, a novel pattern mining algorithm, continuous route pattern mining (CRPM), is proposed. This algorithm can tolerate the different disturbances in real routes and extract the frequent patterns. Experimental results based on nine persons' trips show that CRPM can extract more than two times longer route patterns than the traditional route pattern mining algorithms.

  5. Pattern recognition algorithms for data mining scalability, knowledge discovery and soft granular computing

    CERN Document Server

    Pal, Sankar K

    2004-01-01

    Pattern Recognition Algorithms for Data Mining addresses different pattern recognition (PR) tasks in a unified framework with both theoretical and experimental results. Tasks covered include data condensation, feature selection, case generation, clustering/classification, and rule generation and evaluation. This volume presents various theories, methodologies, and algorithms, using both classical approaches and hybrid paradigms. The authors emphasize large datasets with overlapping, intractable, or nonlinear boundary classes, and datasets that demonstrate granular computing in soft frameworks.Organized into eight chapters, the book begins with an introduction to PR, data mining, and knowledge discovery concepts. The authors analyze the tasks of multi-scale data condensation and dimensionality reduction, then explore the problem of learning with support vector machine (SVM). They conclude by highlighting the significance of granular computing for different mining tasks in a soft paradigm.

  6. Efficient frequent pattern mining algorithm based on node sets in cloud computing environment

    Science.gov (United States)

    Billa, V. N. Vinay Kumar; Lakshmanna, K.; Rajesh, K.; Reddy, M. Praveen Kumar; Nagaraja, G.; Sudheer, K.

    2017-11-01

    The ultimate goal of Data Mining is to determine the hidden information which is useful in making decisions using the large databases collected by an organization. This Data Mining involves many tasks that are to be performed during the process. Mining frequent itemsets is the one of the most important tasks in case of transactional databases. These transactional databases contain the data in very large scale where the mining of these databases involves the consumption of physical memory and time in proportion to the size of the database. A frequent pattern mining algorithm is said to be efficient only if it consumes less memory and time to mine the frequent itemsets from the given large database. Having these points in mind in this thesis we proposed a system which mines frequent itemsets in an optimized way in terms of memory and time by using cloud computing as an important factor to make the process parallel and the application is provided as a service. A complete framework which uses a proven efficient algorithm called FIN algorithm. FIN algorithm works on Nodesets and POC (pre-order coding) tree. In order to evaluate the performance of the system we conduct the experiments to compare the efficiency of the same algorithm applied in a standalone manner and in cloud computing environment on a real time data set which is traffic accidents data set. The results show that the memory consumption and execution time taken for the process in the proposed system is much lesser than those of standalone system.

  7. Handling Dynamic Weights in Weighted Frequent Pattern Mining

    Science.gov (United States)

    Ahmed, Chowdhury Farhan; Tanbeer, Syed Khairuzzaman; Jeong, Byeong-Soo; Lee, Young-Koo

    Even though weighted frequent pattern (WFP) mining is more effective than traditional frequent pattern mining because it can consider different semantic significances (weights) of items, existing WFP algorithms assume that each item has a fixed weight. But in real world scenarios, the weight (price or significance) of an item can vary with time. Reflecting these changes in item weight is necessary in several mining applications, such as retail market data analysis and web click stream analysis. In this paper, we introduce the concept of a dynamic weight for each item, and propose an algorithm, DWFPM (dynamic weighted frequent pattern mining), that makes use of this concept. Our algorithm can address situations where the weight (price or significance) of an item varies dynamically. It exploits a pattern growth mining technique to avoid the level-wise candidate set generation-and-test methodology. Furthermore, it requires only one database scan, so it is eligible for use in stream data mining. An extensive performance analysis shows that our algorithm is efficient and scalable for WFP mining using dynamic weights.

  8. A node linkage approach for sequential pattern mining.

    Directory of Open Access Journals (Sweden)

    Osvaldo Navarro

    Full Text Available Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT, has better performance and scalability in comparison with state of the art algorithms.

  9. An Incremental High-Utility Mining Algorithm with Transaction Insertion

    Science.gov (United States)

    Gan, Wensheng; Zhang, Binbin

    2015-01-01

    Association-rule mining is commonly used to discover useful and meaningful patterns from a very large database. It only considers the occurrence frequencies of items to reveal the relationships among itemsets. Traditional association-rule mining is, however, not suitable in real-world applications since the purchased items from a customer may have various factors, such as profit or quantity. High-utility mining was designed to solve the limitations of association-rule mining by considering both the quantity and profit measures. Most algorithms of high-utility mining are designed to handle the static database. Fewer researches handle the dynamic high-utility mining with transaction insertion, thus requiring the computations of database rescan and combination explosion of pattern-growth mechanism. In this paper, an efficient incremental algorithm with transaction insertion is designed to reduce computations without candidate generation based on the utility-list structures. The enumeration tree and the relationships between 2-itemsets are also adopted in the proposed algorithm to speed up the computations. Several experiments are conducted to show the performance of the proposed algorithm in terms of runtime, memory consumption, and number of generated patterns. PMID:25811038

  10. An Incremental High-Utility Mining Algorithm with Transaction Insertion

    Directory of Open Access Journals (Sweden)

    Jerry Chun-Wei Lin

    2015-01-01

    Full Text Available Association-rule mining is commonly used to discover useful and meaningful patterns from a very large database. It only considers the occurrence frequencies of items to reveal the relationships among itemsets. Traditional association-rule mining is, however, not suitable in real-world applications since the purchased items from a customer may have various factors, such as profit or quantity. High-utility mining was designed to solve the limitations of association-rule mining by considering both the quantity and profit measures. Most algorithms of high-utility mining are designed to handle the static database. Fewer researches handle the dynamic high-utility mining with transaction insertion, thus requiring the computations of database rescan and combination explosion of pattern-growth mechanism. In this paper, an efficient incremental algorithm with transaction insertion is designed to reduce computations without candidate generation based on the utility-list structures. The enumeration tree and the relationships between 2-itemsets are also adopted in the proposed algorithm to speed up the computations. Several experiments are conducted to show the performance of the proposed algorithm in terms of runtime, memory consumption, and number of generated patterns.

  11. Mining of high utility-probability sequential patterns from uncertain databases.

    Directory of Open Access Journals (Sweden)

    Binbin Zhang

    Full Text Available High-utility sequential pattern mining (HUSPM has become an important issue in the field of data mining. Several HUSPM algorithms have been designed to mine high-utility sequential patterns (HUPSPs. They have been applied in several real-life situations such as for consumer behavior analysis and event detection in sensor networks. Nonetheless, most studies on HUSPM have focused on mining HUPSPs in precise data. But in real-life, uncertainty is an important factor as data is collected using various types of sensors that are more or less accurate. Hence, data collected in a real-life database can be annotated with existing probabilities. This paper presents a novel pattern mining framework called high utility-probability sequential pattern mining (HUPSPM for mining high utility-probability sequential patterns (HUPSPs in uncertain sequence databases. A baseline algorithm with three optional pruning strategies is presented to mine HUPSPs. Moroever, to speed up the mining process, a projection mechanism is designed to create a database projection for each processed sequence, which is smaller than the original database. Thus, the number of unpromising candidates can be greatly reduced, as well as the execution time for mining HUPSPs. Substantial experiments both on real-life and synthetic datasets show that the designed algorithm performs well in terms of runtime, number of candidates, memory usage, and scalability for different minimum utility and minimum probability thresholds.

  12. An novel frequent probability pattern mining algorithm based on circuit simulation method in uncertain biological networks

    Science.gov (United States)

    2014-01-01

    Background Motif mining has always been a hot research topic in bioinformatics. Most of current research on biological networks focuses on exact motif mining. However, due to the inevitable experimental error and noisy data, biological network data represented as the probability model could better reflect the authenticity and biological significance, therefore, it is more biological meaningful to discover probability motif in uncertain biological networks. One of the key steps in probability motif mining is frequent pattern discovery which is usually based on the possible world model having a relatively high computational complexity. Methods In this paper, we present a novel method for detecting frequent probability patterns based on circuit simulation in the uncertain biological networks. First, the partition based efficient search is applied to the non-tree like subgraph mining where the probability of occurrence in random networks is small. Then, an algorithm of probability isomorphic based on circuit simulation is proposed. The probability isomorphic combines the analysis of circuit topology structure with related physical properties of voltage in order to evaluate the probability isomorphism between probability subgraphs. The circuit simulation based probability isomorphic can avoid using traditional possible world model. Finally, based on the algorithm of probability subgraph isomorphism, two-step hierarchical clustering method is used to cluster subgraphs, and discover frequent probability patterns from the clusters. Results The experiment results on data sets of the Protein-Protein Interaction (PPI) networks and the transcriptional regulatory networks of E. coli and S. cerevisiae show that the proposed method can efficiently discover the frequent probability subgraphs. The discovered subgraphs in our study contain all probability motifs reported in the experiments published in other related papers. Conclusions The algorithm of probability graph isomorphism

  13. Algorithms for Regular Tree Grammar Network Search and Their Application to Mining Human-viral Infection Patterns.

    Science.gov (United States)

    Smoly, Ilan; Carmel, Amir; Shemer-Avni, Yonat; Yeger-Lotem, Esti; Ziv-Ukelson, Michal

    2016-03-01

    Network querying is a powerful approach to mine molecular interaction networks. Most state-of-the-art network querying tools either confine the search to a prespecified topology in the form of some template subnetwork, or do not specify any topological constraints at all. Another approach is grammar-based queries, which are more flexible and expressive as they allow for expressing the topology of the sought pattern according to some grammar-based logic. Previous grammar-based network querying tools were confined to the identification of paths. In this article, we extend the patterns identified by grammar-based query approaches from paths to trees. For this, we adopt a higher order query descriptor in the form of a regular tree grammar (RTG). We introduce a novel problem and propose an algorithm to search a given graph for the k highest scoring subgraphs matching a tree accepted by an RTG. Our algorithm is based on the combination of dynamic programming with color coding, and includes an extension of previous k-best parsing optimization approaches to avoid isomorphic trees in the output. We implement the new algorithm and exemplify its application to mining viral infection patterns within molecular interaction networks. Our code is available online.

  14. An Evolutionary Algorithm to Mine High-Utility Itemsets

    Directory of Open Access Journals (Sweden)

    Jerry Chun-Wei Lin

    2015-01-01

    Full Text Available High-utility itemset mining (HUIM is a critical issue in recent years since it can be used to reveal the profitable products by considering both the quantity and profit factors instead of frequent itemset mining (FIM of association rules (ARs. In this paper, an evolutionary algorithm is presented to efficiently mine high-utility itemsets (HUIs based on the binary particle swarm optimization. A maximal pattern (MP-tree strcutrue is further designed to solve the combinational problem in the evolution process. Substantial experiments on real-life datasets show that the proposed binary PSO-based algorithm has better results compared to the state-of-the-art GA-based algorithm.

  15. A Mining Algorithm for Extracting Decision Process Data Models

    Directory of Open Access Journals (Sweden)

    Cristina-Claudia DOLEAN

    2011-01-01

    Full Text Available The paper introduces an algorithm that mines logs of user interaction with simulation software. It outputs a model that explicitly shows the data perspective of the decision process, namely the Decision Data Model (DDM. In the first part of the paper we focus on how the DDM is extracted by our mining algorithm. We introduce it as pseudo-code and, then, provide explanations and examples of how it actually works. In the second part of the paper, we use a series of small case studies to prove the robustness of the mining algorithm and how it deals with the most common patterns we found in real logs.

  16. Mining Co-Location Patterns with Clustering Items from Spatial Data Sets

    Science.gov (United States)

    Zhou, G.; Li, Q.; Deng, G.; Yue, T.; Zhou, X.

    2018-05-01

    The explosive growth of spatial data and widespread use of spatial databases emphasize the need for the spatial data mining. Co-location patterns discovery is an important branch in spatial data mining. Spatial co-locations represent the subsets of features which are frequently located together in geographic space. However, the appearance of a spatial feature C is often not determined by a single spatial feature A or B but by the two spatial features A and B, that is to say where A and B appear together, C often appears. We note that this co-location pattern is different from the traditional co-location pattern. Thus, this paper presents a new concept called clustering terms, and this co-location pattern is called co-location patterns with clustering items. And the traditional algorithm cannot mine this co-location pattern, so we introduce the related concept in detail and propose a novel algorithm. This algorithm is extended by join-based approach proposed by Huang. Finally, we evaluate the performance of this algorithm.

  17. A direct mining approach to efficient constrained graph pattern discovery

    DEFF Research Database (Denmark)

    Zhu, Feida; Zhang, Zequn; Qu, Qiang

    2013-01-01

    Despite the wealth of research on frequent graph pattern mining, how to efficiently mine the complete set of those with constraints still poses a huge challenge to the existing algorithms mainly due to the inherent bottleneck in the mining paradigm. In essence, mining requests with explicitly-spe...

  18. An algorithm, implementation and execution ontology design pattern

    NARCIS (Netherlands)

    Lawrynowicz, A.; Esteves, D.; Panov, P.; Soru, T.; Dzeroski, S.; Vanschoren, J.

    2016-01-01

    This paper describes an ontology design pattern for modeling algorithms, their implementations and executions. This pattern is derived from the research results on data mining/machine learning ontologies, but is more generic. We argue that the proposed pattern will foster the development of

  19. Developing and Implementing the Data Mining Algorithms in RAVEN

    International Nuclear Information System (INIS)

    Sen, Ramazan Sonat; Maljovec, Daniel Patrick; Alfonsi, Andrea; Rabiti, Cristian

    2015-01-01

    The RAVEN code is becoming a comprehensive tool to perform probabilistic risk assessment, uncertainty quantification, and verification and validation. The RAVEN code is being developed to support many programs and to provide a set of methodologies and algorithms for advanced analysis. Scientific computer codes can generate enormous amounts of data. To post-process and analyze such data might, in some cases, take longer than the initial software runtime. Data mining algorithms/methods help in recognizing and understanding patterns in the data, and thus discover knowledge in databases. The methodologies used in the dynamic probabilistic risk assessment or in uncertainty and error quantification analysis couple system/physics codes with simulation controller codes, such as RAVEN. RAVEN introduces both deterministic and stochastic elements into the simulation while the system/physics code model the dynamics deterministically. A typical analysis is performed by sampling values of a set of parameter values. A major challenge in using dynamic probabilistic risk assessment or uncertainty and error quantification analysis for a complex system is to analyze the large number of scenarios generated. Data mining techniques are typically used to better organize and understand data, i.e. recognizing patterns in the data. This report focuses on development and implementation of Application Programming Interfaces (APIs) for different data mining algorithms, and the application of these algorithms to different databases.

  20. Developing and Implementing the Data Mining Algorithms in RAVEN

    Energy Technology Data Exchange (ETDEWEB)

    Sen, Ramazan Sonat [Idaho National Lab. (INL), Idaho Falls, ID (United States); Maljovec, Daniel Patrick [Idaho National Lab. (INL), Idaho Falls, ID (United States); Alfonsi, Andrea [Idaho National Lab. (INL), Idaho Falls, ID (United States); Rabiti, Cristian [Idaho National Lab. (INL), Idaho Falls, ID (United States)

    2015-09-01

    The RAVEN code is becoming a comprehensive tool to perform probabilistic risk assessment, uncertainty quantification, and verification and validation. The RAVEN code is being developed to support many programs and to provide a set of methodologies and algorithms for advanced analysis. Scientific computer codes can generate enormous amounts of data. To post-process and analyze such data might, in some cases, take longer than the initial software runtime. Data mining algorithms/methods help in recognizing and understanding patterns in the data, and thus discover knowledge in databases. The methodologies used in the dynamic probabilistic risk assessment or in uncertainty and error quantification analysis couple system/physics codes with simulation controller codes, such as RAVEN. RAVEN introduces both deterministic and stochastic elements into the simulation while the system/physics code model the dynamics deterministically. A typical analysis is performed by sampling values of a set of parameter values. A major challenge in using dynamic probabilistic risk assessment or uncertainty and error quantification analysis for a complex system is to analyze the large number of scenarios generated. Data mining techniques are typically used to better organize and understand data, i.e. recognizing patterns in the data. This report focuses on development and implementation of Application Programming Interfaces (APIs) for different data mining algorithms, and the application of these algorithms to different databases.

  1. URL Mining Using Agglomerative Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Chinmay R. Deshmukh

    2015-02-01

    Full Text Available Abstract The tremendous growth of the web world incorporates application of data mining techniques to the web logs. Data Mining and World Wide Web encompasses an important and active area of research. Web log mining is analysis of web log files with web pages sequences. Web mining is broadly classified as web content mining web usage mining and web structure mining. Web usage mining is a technique to discover usage patterns from Web data in order to understand and better serve the needs of Web-based applications. URL mining refers to a subclass of Web mining that helps us to investigate the details of a Uniform Resource Locator. URL mining can be advantageous in the fields of security and protection. The paper introduces a technique for mining a collection of user transactions with an Internet search engine to discover clusters of similar queries and similar URLs. The information we exploit is a clickthrough data each record consist of a users query to a search engine along with the URL which the user selected from among the candidates offered by search engine. By viewing this dataset as a bipartite graph with the vertices on one side corresponding to queries and on the other side to URLs one can apply an agglomerative clustering algorithm to the graphs vertices to identify related queries and URLs.

  2. A partition enhanced mining algorithm for distributed association rule mining systems

    Directory of Open Access Journals (Sweden)

    A.O. Ogunde

    2015-11-01

    Full Text Available The extraction of patterns and rules from large distributed databases through existing Distributed Association Rule Mining (DARM systems is still faced with enormous challenges such as high response times, high communication costs and inability to adapt to the constantly changing databases. In this work, a Partition Enhanced Mining Algorithm (PEMA is presented to address these problems. In PEMA, the Association Rule Mining Coordinating Agent receives a request and decides the appropriate data sites, partitioning strategy and mining agents to use. The mining process is divided into two stages. In the first stage, the data agents horizontally segment the databases with small average transaction length into relatively smaller partitions based on the number of available sites and the available memory. On the other hand, databases with relatively large average transaction length were vertically partitioned. After this, Mobile Agent-Based Association Rule Mining-Agents, which are the mining agents, carry out the discovery of the local frequent itemsets. At the second stage, the local frequent itemsets were incrementally integrated by the from one data site to another to get the global frequent itemsets. This reduced the response time and communication cost in the system. Results from experiments conducted on real datasets showed that the average response time of PEMA showed an improvement over existing algorithms. Similarly, PEMA incurred lower communication costs with average size of messages exchanged lower when compared with benchmark DARM systems. This result showed that PEMA could be efficiently deployed for efficient discovery of valuable knowledge in distributed databases.

  3. Efficient constraint-based Sequential Pattern Mining (SPM algorithm to understand customers’ buying behaviour from time stamp-based sequence dataset

    Directory of Open Access Journals (Sweden)

    Niti Ashish Kumar Desai

    2015-12-01

    Full Text Available Business Strategies are formulated based on an understanding of customer needs. This requires development of a strategy to understand customer behaviour and buying patterns, both current and future. This involves understanding, first how an organization currently understands customer needs and second predicting future trends to drive growth. This article focuses on purchase trend of customer, where timing of purchase is more important than association of item to be purchased, and which can be found out with Sequential Pattern Mining (SPM methods. Conventional SPM algorithms worked purely on frequency identifying patterns that were more frequent but suffering from challenges like generation of huge number of uninteresting patterns, lack of user’s interested patterns, rare item problem, etc. Article attempts a solution through development of a SPM algorithm based on various constraints like Gap, Compactness, Item, Recency, Profitability and Length along with Frequency constraint. Incorporation of six additional constraints is as well to ensure that all patterns are recently active (Recency, active for certain time span (Compactness, profitable and indicative of next timeline for purchase (Length―Item―Gap. The article also attempts to throw light on how proposed Constraint-based Prefix Span algorithm is helpful to understand buying behaviour of customer which is in formative stage.

  4. An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences.

    Science.gov (United States)

    Ye, Kai; Kosters, Walter A; Ijzerman, Adriaan P

    2007-03-15

    Pattern discovery in protein sequences is often based on multiple sequence alignments (MSA). The procedure can be computationally intensive and often requires manual adjustment, which may be particularly difficult for a set of deviating sequences. In contrast, two algorithms, PRATT2 (http//www.ebi.ac.uk/pratt/) and TEIRESIAS (http://cbcsrv.watson.ibm.com/) are used to directly identify frequent patterns from unaligned biological sequences without an attempt to align them. Here we propose a new algorithm with more efficiency and more functionality than both PRATT2 and TEIRESIAS, and discuss some of its applications to G protein-coupled receptors, a protein family of important drug targets. In this study, we designed and implemented six algorithms to mine three different pattern types from either one or two datasets using a pattern growth approach. We compared our approach to PRATT2 and TEIRESIAS in efficiency, completeness and the diversity of pattern types. Compared to PRATT2, our approach is faster, capable of processing large datasets and able to identify the so-called type III patterns. Our approach is comparable to TEIRESIAS in the discovery of the so-called type I patterns but has additional functionality such as mining the so-called type II and type III patterns and finding discriminating patterns between two datasets. The source code for pattern growth algorithms and their pseudo-code are available at http://www.liacs.nl/home/kosters/pg/.

  5. The Top Ten Algorithms in Data Mining

    CERN Document Server

    Wu, Xindong

    2009-01-01

    From classification and clustering to statistical learning, association analysis, and link mining, this book covers the most important topics in data mining research. It presents the ten most influential algorithms used in the data mining community today. Each chapter provides a detailed description of the algorithm, a discussion of available software implementation, advanced topics, and exercises. With a simple data set, examples illustrate how each algorithm works and highlight the overall performance of each algorithm in a real-world application. Featuring contributions from leading researc

  6. A New Fast Vertical Method for Mining Frequent Patterns

    Directory of Open Access Journals (Sweden)

    Zhihong Deng

    2010-12-01

    Full Text Available Vertical mining methods are very effective for mining frequent patterns and usually outperform horizontal mining methods. However, the vertical methods become ineffective since the intersection time starts to be costly when the cardinality of tidset (tid-list or diffset is very large or there are a very large number of transactions. In this paper, we propose a novel vertical algorithm called PPV for fast frequent pattern discovery. PPV works based on a data structure called Node-lists, which is obtained from a coding prefix-tree called PPC-tree. The efficiency of PPV is achieved with three techniques. First, the Node-list is much more compact compared with previous proposed vertical structure (such as tid-lists or diffsets since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of support is transformed into the intersection of Node-lists and the complexity of intersecting two Node-lists can be reduced to O(m+n by an efficient strategy, where m and n are the cardinalities of the two Node-lists respectively. Third, the ancestor-descendant relationship of two nodes, which is the basic step of intersecting Node-lists, can be very efficiently verified by Pre-Post codes of nodes. We experimentally compare our algorithm with FP-growth, and two prominent vertical algorithms (Eclat and dEclat on a number of databases. The experimental results show that PPV is an efficient algorithm that outperforms FP-growth, Eclat, and dEclat.

  7. Spatio-Temporal Pattern Mining on Trajectory Data Using Arm

    Science.gov (United States)

    Khoshahval, S.; Farnaghi, M.; Taleai, M.

    2017-09-01

    Preliminary mobile was considered to be a device to make human connections easier. But today the consumption of this device has been evolved to a platform for gaming, web surfing and GPS-enabled application capabilities. Embedding GPS in handheld devices, altered them to significant trajectory data gathering facilities. Raw GPS trajectory data is a series of points which contains hidden information. For revealing hidden information in traces, trajectory data analysis is needed. One of the most beneficial concealed information in trajectory data is user activity patterns. In each pattern, there are multiple stops and moves which identifies users visited places and tasks. This paper proposes an approach to discover user daily activity patterns from GPS trajectories using association rules. Finding user patterns needs extraction of user's visited places from stops and moves of GPS trajectories. In order to locate stops and moves, we have implemented a place recognition algorithm. After extraction of visited points an advanced association rule mining algorithm, called Apriori was used to extract user activity patterns. This study outlined that there are useful patterns in each trajectory that can be emerged from raw GPS data using association rule mining techniques in order to find out about multiple users' behaviour in a system and can be utilized in various location-based applications.

  8. The Smallest Valid Extension-Based Efficient, Rare Graph Pattern Mining, Considering Length-Decreasing Support Constraints and Symmetry Characteristics of Graphs

    Directory of Open Access Journals (Sweden)

    Unil Yun

    2016-05-01

    Full Text Available Frequent graph mining has been proposed to find interesting patterns (i.e., frequent sub-graphs from databases composed of graph transaction data, which can effectively express complex and large data in the real world. In addition, various applications for graph mining have been suggested. Traditional graph pattern mining methods use a single minimum support threshold factor in order to check whether or not mined patterns are interesting. However, it is not a sufficient factor that can consider valuable characteristics of graphs such as graph sizes and features of graph elements. That is, previous methods cannot consider such important characteristics in their mining operations since they only use a fixed minimum support threshold in the mining process. For this reason, in this paper, we propose a novel graph mining algorithm that can consider various multiple, minimum support constraints according to the types of graph elements and changeable minimum support conditions, depending on lengths of graph patterns. In addition, the proposed algorithm performs in mining operations more efficiently because it can minimize duplicated operations and computational overheads by considering symmetry features of graphs. Experimental results provided in this paper demonstrate that the proposed algorithm outperforms previous mining approaches in terms of pattern generation, runtime and memory usage.

  9. Rare itemsets mining algorithm based on RP-Tree and spark framework

    Science.gov (United States)

    Liu, Sainan; Pan, Haoan

    2018-05-01

    For the issues of the rare itemsets mining in big data, this paper proposed a rare itemsets mining algorithm based on RP-Tree and Spark framework. Firstly, it arranged the data vertically according to the transaction identifier, in order to solve the defects of scan the entire data set, the vertical datasets are divided into frequent vertical datasets and rare vertical datasets. Then, it adopted the RP-Tree algorithm to construct the frequent pattern tree that contains rare items and generate rare 1-itemsets. After that, it calculated the support of the itemsets by scanning the two vertical data sets, finally, it used the iterative process to generate rare itemsets. The experimental show that the algorithm can effectively excavate rare itemsets and have great superiority in execution time.

  10. SPATIO-TEMPORAL PATTERN MINING ON TRAJECTORY DATA USING ARM

    Directory of Open Access Journals (Sweden)

    S. Khoshahval

    2017-09-01

    Full Text Available Preliminary mobile was considered to be a device to make human connections easier. But today the consumption of this device has been evolved to a platform for gaming, web surfing and GPS-enabled application capabilities. Embedding GPS in handheld devices, altered them to significant trajectory data gathering facilities. Raw GPS trajectory data is a series of points which contains hidden information. For revealing hidden information in traces, trajectory data analysis is needed. One of the most beneficial concealed information in trajectory data is user activity patterns. In each pattern, there are multiple stops and moves which identifies users visited places and tasks. This paper proposes an approach to discover user daily activity patterns from GPS trajectories using association rules. Finding user patterns needs extraction of user’s visited places from stops and moves of GPS trajectories. In order to locate stops and moves, we have implemented a place recognition algorithm. After extraction of visited points an advanced association rule mining algorithm, called Apriori was used to extract user activity patterns. This study outlined that there are useful patterns in each trajectory that can be emerged from raw GPS data using association rule mining techniques in order to find out about multiple users’ behaviour in a system and can be utilized in various location-based applications.

  11. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

    Directory of Open Access Journals (Sweden)

    Md. Rezaul Karim

    2012-03-01

    Full Text Available Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.

  12. Large-Scale Constraint-Based Pattern Mining

    Science.gov (United States)

    Zhu, Feida

    2009-01-01

    We studied the problem of constraint-based pattern mining for three different data formats, item-set, sequence and graph, and focused on mining patterns of large sizes. Colossal patterns in each data formats are studied to discover pruning properties that are useful for direct mining of these patterns. For item-set data, we observed robustness of…

  13. Randomized algorithms in automatic control and data mining

    CERN Document Server

    Granichin, Oleg; Toledano-Kitai, Dvora

    2015-01-01

    In the fields of data mining and control, the huge amount of unstructured data and the presence of uncertainty in system descriptions have always been critical issues. The book Randomized Algorithms in Automatic Control and Data Mining introduces the readers to the fundamentals of randomized algorithm applications in data mining (especially clustering) and in automatic control synthesis. The methods proposed in this book guarantee that the computational complexity of classical algorithms and the conservativeness of standard robust control techniques will be reduced. It is shown that when a problem requires "brute force" in selecting among options, algorithms based on random selection of alternatives offer good results with certain probability for a restricted time and significantly reduce the volume of operations.

  14. Classification of Internet banking customers using data mining algorithms

    Directory of Open Access Journals (Sweden)

    Reza Radfar

    2014-03-01

    Full Text Available Classifying customers using data mining algorithms, enables banks to keep old customers loyality while attracting new ones. Using decision tree as a data mining technique, we can optimize customer classification provided that the appropriate decision tree is selected. In this article we have presented an appropriate model to classify customers who use internet banking service. The model is developed based on CRISP-DM standard and we have used real data of Sina bank’s Internet bank. In compare to other decision trees, ours is based on both optimization and accuracy factors that recognizes new potential internet banking customers using a three level classification, which is low/medium and high. This is a practical, documentary-based research. Mining customer rules enables managers to make policies based on found out patterns in order to have a better perception of what customers really desire.

  15. Mining algorithm for association rules in big data based on Hadoop

    Science.gov (United States)

    Fu, Chunhua; Wang, Xiaojing; Zhang, Lijun; Qiao, Liying

    2018-04-01

    In order to solve the problem that the traditional association rules mining algorithm has been unable to meet the mining needs of large amount of data in the aspect of efficiency and scalability, take FP-Growth as an example, the algorithm is realized in the parallelization based on Hadoop framework and Map Reduce model. On the basis, it is improved using the transaction reduce method for further enhancement of the algorithm's mining efficiency. The experiment, which consists of verification of parallel mining results, comparison on efficiency between serials and parallel, variable relationship between mining time and node number and between mining time and data amount, is carried out in the mining results and efficiency by Hadoop clustering. Experiments show that the paralleled FP-Growth algorithm implemented is able to accurately mine frequent item sets, with a better performance and scalability. It can be better to meet the requirements of big data mining and efficiently mine frequent item sets and association rules from large dataset.

  16. Quantum algorithm for association rules mining

    Science.gov (United States)

    Yu, Chao-Hua; Gao, Fei; Wang, Qing-Le; Wen, Qiao-Yan

    2016-10-01

    Association rules mining (ARM) is one of the most important problems in knowledge discovery and data mining. Given a transaction database that has a large number of transactions and items, the task of ARM is to acquire consumption habits of customers by discovering the relationships between itemsets (sets of items). In this paper, we address ARM in the quantum settings and propose a quantum algorithm for the key part of ARM, finding frequent itemsets from the candidate itemsets and acquiring their supports. Specifically, for the case in which there are Mf(k ) frequent k -itemsets in the Mc(k ) candidate k -itemsets (Mf(k )≤Mc(k ) ), our algorithm can efficiently mine these frequent k -itemsets and estimate their supports by using parallel amplitude estimation and amplitude amplification with complexity O (k/√{Mc(k )Mf(k ) } ɛ ) , where ɛ is the error for estimating the supports. Compared with the classical counterpart, i.e., the classical sampling-based algorithm, whose complexity is O (k/Mc(k ) ɛ2) , our quantum algorithm quadratically improves the dependence on both ɛ and Mc(k ) in the best case when Mf(k )≪Mc(k ) and on ɛ alone in the worst case when Mf(k )≈Mc(k ) .

  17. Contrast data mining concepts, algorithms, and applications

    CERN Document Server

    Dong, Guozhu

    2012-01-01

    A Fruitful Field for Researching Data Mining Methodology and for Solving Real-Life Problems Contrast Data Mining: Concepts, Algorithms, and Applications collects recent results from this specialized area of data mining that have previously been scattered in the literature, making them more accessible to researchers and developers in data mining and other fields. The book not only presents concepts and techniques for contrast data mining, but also explores the use of contrast mining to solve challenging problems in various scientific, medical, and business domains. Learn from Real Case Studies

  18. A Survey on Accessing Data over Cloud Environment using Data mining Algorithms

    OpenAIRE

    B.Prasanalakshmi; A.Selvaraj

    2015-01-01

    In today's world to access the large set of data is more complex, because the data may be structured and unstructured like in the form of text, images, videos, etc., it cannot be controlled from the internet users this is known as Big data. Useful data can be accessed through extracting from big data with the help of data mining algorithms. Data mining is a technique for determine the patterns; classify the data, clustering from the large set of data. In this paper we will discuss how large s...

  19. Making Pattern Mining Useful

    NARCIS (Netherlands)

    Vreeken, J.

    2009-01-01

    The discovery of patterns plays an important role in data mining. A pattern can be any type of regularity displayed in that data, such as, e.g. which items are typically sold together, which genes are mostly active for patients of a certain disease, etc, etc. Generally speaking, finding a pattern is

  20. Recommending Learning Activities in Social Network Using Data Mining Algorithms

    Science.gov (United States)

    Mahnane, Lamia

    2017-01-01

    In this paper, we show how data mining algorithms (e.g. Apriori Algorithm (AP) and Collaborative Filtering (CF)) is useful in New Social Network (NSN-AP-CF). "NSN-AP-CF" processes the clusters based on different learning styles. Next, it analyzes the habits and the interests of the users through mining the frequent episodes by the…

  1. A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Spatiotemporal Association Analysis of Mobile Trajectory Big Data

    Directory of Open Access Journals (Sweden)

    Dawen Xia

    2018-01-01

    Full Text Available Frequent pattern mining is an effective approach for spatiotemporal association analysis of mobile trajectory big data in data-driven intelligent transportation systems. While existing parallel algorithms have been successfully applied to frequent pattern mining of large-scale trajectory data, two major challenges are how to overcome the inherent defects of Hadoop to cope with taxi trajectory big data including massive small files and how to discover the implicitly spatiotemporal frequent patterns with MapReduce. To conquer these challenges, this paper presents a MapReduce-based Parallel Frequent Pattern growth (MR-PFP algorithm to analyze the spatiotemporal characteristics of taxi operating using large-scale taxi trajectories with massive small file processing strategies on a Hadoop platform. More specifically, we first implement three methods, that is, Hadoop Archives (HAR, CombineFileInputFormat (CFIF, and Sequence Files (SF, to overcome the existing defects of Hadoop and then propose two strategies based on their performance evaluations. Next, we incorporate SF into Frequent Pattern growth (FP-growth algorithm and then implement the optimized FP-growth algorithm on a MapReduce framework. Finally, we analyze the characteristics of taxi operating in both spatial and temporal dimensions by MR-PFP in parallel. The results demonstrate that MR-PFP is superior to existing Parallel FP-growth (PFP algorithm in efficiency and scalability.

  2. pubmed.mineR: An R package with text-mining algorithms to ...

    Indian Academy of Sciences (India)

    2015-09-29

    Sep 29, 2015 ... using text-mining algorithms for biomedical research pur- poses. ... studies are described to illustrate some potential uses of ... This is the most applied task. ... other alphabets (for example, Greek alphabets) and hyphens.

  3. pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.

    Science.gov (United States)

    Rani, Jyoti; Shah, A B Rauf; Ramachandran, Srinivasan

    2015-10-01

    The PubMed literature database is a valuable source of information for scientific research. It is rich in biomedical literature with more than 24 million citations. Data-mining of voluminous literature is a challenging task. Although several text-mining algorithms have been developed in recent years with focus on data visualization, they have limitations such as speed, are rigid and are not available in the open source. We have developed an R package, pubmed.mineR, wherein we have combined the advantages of existing algorithms, overcome their limitations, and offer user flexibility and link with other packages in Bioconductor and the Comprehensive R Network (CRAN) in order to expand the user capabilities for executing multifaceted approaches. Three case studies are presented, namely, 'Evolving role of diabetes educators', 'Cancer risk assessment' and 'Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus sizes and with compute intensive functions. The pubmed.mineR is available at http://cran.rproject. org/web/packages/pubmed.mineR.

  4. Using an improved association rules mining optimization algorithm in web-based mobile-learning system

    Science.gov (United States)

    Huang, Yin; Chen, Jianhua; Xiong, Shaojun

    2009-07-01

    Mobile-Learning (M-learning) makes many learners get the advantages of both traditional learning and E-learning. Currently, Web-based Mobile-Learning Systems have created many new ways and defined new relationships between educators and learners. Association rule mining is one of the most important fields in data mining and knowledge discovery in databases. Rules explosion is a serious problem which causes great concerns, as conventional mining algorithms often produce too many rules for decision makers to digest. Since Web-based Mobile-Learning System collects vast amounts of student profile data, data mining and knowledge discovery techniques can be applied to find interesting relationships between attributes of learners, assessments, the solution strategies adopted by learners and so on. Therefore ,this paper focus on a new data-mining algorithm, combined with the advantages of genetic algorithm and simulated annealing algorithm , called ARGSA(Association rules based on an improved Genetic Simulated Annealing Algorithm), to mine the association rules. This paper first takes advantage of the Parallel Genetic Algorithm and Simulated Algorithm designed specifically for discovering association rules. Moreover, the analysis and experiment are also made to show the proposed method is superior to the Apriori algorithm in this Mobile-Learning system.

  5. Towards an evaluation framework for process mining algorithms

    NARCIS (Netherlands)

    Rozinat, A.; Alves De Medeiros, A.K.; Günther, C.W.; Weijters, A.J.M.M.; Aalst, van der W.M.P.

    2007-01-01

    Although there has been a lot of progress in developing process mining algorithms in recent years, no effort has been put in developing a common means of assessing the quality of the models discovered by these algorithms. In this paper, we outline elements of an evaluation framework that is intended

  6. An algorithm of opinion leaders mining based on signed network

    Science.gov (United States)

    Cao, Linlin; Zheng, Mingchun; Zhang, Yuanyuan; Zhang, Fuming

    2018-04-01

    With the rapid development of mobile Internet, user gradually become the leader of social media, the abruptly rise of new media has changed the traditional information's dissemination pattern and regularity. There is new era significance of opinion leaders, gatekeepers in the classical theory of mass communication, and it has further expansion and extension to a certain extent. In the existing mining of opinion leaders, it is mainly from the research of network structure and user behavior without considering an important attribute: whether the user has a real impact. In this paper, we take the symbolic network as the research tool, by giving symbol which correspondingly represents support or oppose to the link about point of view relationship between users and combining traditional algorithms of mining with symbolism which can describe the change of view between users, we will get the opinion leader who has real impact on users, then the result is more accurate and effective.

  7. Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method

    Directory of Open Access Journals (Sweden)

    Knaus William A

    2006-03-01

    Full Text Available Abstract Background Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness, hence leaving a serious bottleneck. In this paper we propose a method for automatically acquiring knowledge to shorten the pattern list by locating the novel and interesting ones. Methods The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high "surprise score" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures the degree of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance. Results We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a large patient database and a biomedical literature citation knowledgebase. The system estimated association scores for 50,000 patterns, composed of disease entities and lab results, by querying the database and the knowledgebase. It then computed the surprise scores by comparing the pairs of association scores. Finally, the system estimated statistical significance of the scores. Conclusion The dual-mining method eliminates more than 90% of patterns with strong associations, thus identifying them as uninteresting. We found that the pruning of patterns using the surprise score matched the biomedical evidence in the 100 cases that were examined by hand. The method automates the acquisition of

  8. A Framework for Mining Actionable Navigation Patterns from In-Store RFID Datasets via Indoor Mapping

    Science.gov (United States)

    Shen, Bin; Zheng, Qiuhua; Li, Xingsen; Xu, Libo

    2015-01-01

    With the quick development of RFID technology and the decreasing prices of RFID devices, RFID is becoming widely used in various intelligent services. Especially in the retail application domain, RFID is increasingly adopted to capture the shopping tracks and behavior of in-store customers. To further enhance the potential of this promising application, in this paper, we propose a unified framework for RFID-based path analytics, which uses both in-store shopping paths and RFID-based purchasing data to mine actionable navigation patterns. Four modules of this framework are discussed, which are: (1) mapping from the physical space to the cyber space, (2) data preprocessing, (3) pattern mining and (4) knowledge understanding and utilization. In the data preprocessing module, the critical problem of how to capture the mainstream shopping path sequences while wiping out unnecessary redundant and repeated details is addressed in detail. To solve this problem, two types of redundant patterns, i.e., loop repeat pattern and palindrome-contained pattern are recognized and the corresponding processing algorithms are proposed. The experimental results show that the redundant pattern filtering functions are effective and scalable. Overall, this work builds a bridge between indoor positioning and advanced data mining technologies, and provides a feasible way to study customers’ shopping behaviors via multi-source RFID data. PMID:25751076

  9. Effective application of improved profit-mining algorithm for the interday trading model.

    Science.gov (United States)

    Hsieh, Yu-Lung; Yang, Don-Lin; Wu, Jungpin

    2014-01-01

    Many real world applications of association rule mining from large databases help users make better decisions. However, they do not work well in financial markets at this time. In addition to a high profit, an investor also looks for a low risk trading with a better rate of winning. The traditional approach of using minimum confidence and support thresholds needs to be changed. Based on an interday model of trading, we proposed effective profit-mining algorithms which provide investors with profit rules including information about profit, risk, and winning rate. Since profit-mining in the financial market is still in its infant stage, it is important to detail the inner working of mining algorithms and illustrate the best way to apply them. In this paper we go into details of our improved profit-mining algorithm and showcase effective applications with experiments using real world trading data. The results show that our approach is practical and effective with good performance for various datasets.

  10. Effective Application of Improved Profit-Mining Algorithm for the Interday Trading Model

    Directory of Open Access Journals (Sweden)

    Yu-Lung Hsieh

    2014-01-01

    Full Text Available Many real world applications of association rule mining from large databases help users make better decisions. However, they do not work well in financial markets at this time. In addition to a high profit, an investor also looks for a low risk trading with a better rate of winning. The traditional approach of using minimum confidence and support thresholds needs to be changed. Based on an interday model of trading, we proposed effective profit-mining algorithms which provide investors with profit rules including information about profit, risk, and winning rate. Since profit-mining in the financial market is still in its infant stage, it is important to detail the inner working of mining algorithms and illustrate the best way to apply them. In this paper we go into details of our improved profit-mining algorithm and showcase effective applications with experiments using real world trading data. The results show that our approach is practical and effective with good performance for various datasets.

  11. Research on Health State Perception Algorithm of Mining Equipment Based on Frequency Closeness

    Directory of Open Access Journals (Sweden)

    Gang Wang

    2014-06-01

    Full Text Available The health state perception of mining equipment is intended to have an online real- time knowledge and analysis of the running conditions of large mining equipments. Due to its unknown failure mode, a challenge was raised to the traditional fault diagnosis of mining equipments. A health state perception algorithm of mining equipment was introduced in this paper, and through continuous sampling of the machine vibration data, the time-series data set was set up; subsequently, the mode set based on the frequency closeness was constructed by the d neighborhood method combined with the TSDM algorithm, thus the forecast method on the basis of the dual mode set was eventually formed. In the calculation of the frequency closeness, the Goertzel algorithm was introduced to effectively decrease the computation amount. It was indicated through the simulation test on the vibration data of the drum shaft base that the health state of the device could be effectively distinguished. The algorithm has been successfully applied to equipment monitoring in the Huoer Xinhe Coal Mine of Shanxi Coal Imp&Exp. Group Co., Ltd.

  12. Study on the Method of Association Rules Mining Based on Genetic Algorithm and Application in Analysis of Seawater Samples

    Directory of Open Access Journals (Sweden)

    Qiuhong Sun

    2014-04-01

    Full Text Available Based on the data mining research, the data mining based on genetic algorithm method, the genetic algorithm is briefly introduced, while the genetic algorithm based on two important theories and theoretical templates principle implicit parallelism is also discussed. Focuses on the application of genetic algorithms for association rule mining method based on association rule mining, this paper proposes a genetic algorithm fitness function structure, data encoding, such as the title of the improvement program, in particular through the early issues study, proposed the improved adaptive Pc, Pm algorithm is applied to the genetic algorithm, thereby improving efficiency of the algorithm. Finally, a genetic algorithm based association rule mining algorithm, and be applied in sea water samples database in data mining and prove its effective.

  13. Large Scale Frequent Pattern Mining using MPI One-Sided Model

    Energy Technology Data Exchange (ETDEWEB)

    Vishnu, Abhinav; Agarwal, Khushbu

    2015-09-08

    In this paper, we propose a work-stealing runtime --- Library for Work Stealing LibWS --- using MPI one-sided model for designing scalable FP-Growth --- {\\em de facto} frequent pattern mining algorithm --- on large scale systems. LibWS provides locality efficient and highly scalable work-stealing techniques for load balancing on a variety of data distributions. We also propose a novel communication algorithm for FP-growth data exchange phase, which reduces the communication complexity from state-of-the-art O(p) to O(f + p/f) for p processes and f frequent attributed-ids. FP-Growth is implemented using LibWS and evaluated on several work distributions and support counts. An experimental evaluation of the FP-Growth on LibWS using 4096 processes on an InfiniBand Cluster demonstrates excellent efficiency for several work distributions (87\\% efficiency for Power-law and 91% for Poisson). The proposed distributed FP-Tree merging algorithm provides 38x communication speedup on 4096 cores.

  14. A Framework for Mining Actionable Navigation Patterns from In-Store RFID Datasets via Indoor Mapping

    Directory of Open Access Journals (Sweden)

    Bin Shen

    2015-03-01

    Full Text Available With the quick development of RFID technology and the decreasing prices of RFID devices, RFID is becoming widely used in various intelligent services. Especially in the retail application domain, RFID is increasingly adopted to capture the shopping tracks and behavior of in-store customers. To further enhance the potential of this promising application, in this paper, we propose a unified framework for RFID-based path analytics, which uses both in-store shopping paths and RFID-based purchasing data to mine actionable navigation patterns. Four modules of this framework are discussed, which are: (1 mapping from the physical space to the cyber space, (2 data preprocessing, (3 pattern mining and (4 knowledge understanding and utilization. In the data preprocessing module, the critical problem of how to capture the mainstream shopping path sequences while wiping out unnecessary redundant and repeated details is addressed in detail. To solve this problem, two types of redundant patterns, i.e., loop repeat pattern and palindrome-contained pattern are recognized and the corresponding processing algorithms are proposed. The experimental results show that the redundant pattern filtering functions are effective and scalable. Overall, this work builds a bridge between indoor positioning and advanced data mining technologies, and provides a feasible way to study customers’ shopping behaviors via multi-source RFID data.

  15. On the Suitability of Genetic-Based Algorithms for Data Mining

    NARCIS (Netherlands)

    Choenni, R.S.

    1998-01-01

    Data mining has as goal to extract knowledge from large databases. A database may be considered as a search space consisting of an enormous number of elements, and a mining algorithm as a search strategy. In general, an exhaustive search of the space is infeasible. Therefore, efficient search

  16. Mining association patterns of drug-interactions using post marketing FDA's spontaneous reporting data.

    Science.gov (United States)

    Ibrahim, Heba; Saad, Amr; Abdo, Amany; Sharaf Eldin, A

    2016-04-01

    Pharmacovigilance (PhV) is an important clinical activity with strong implications for population health and clinical research. The main goal of PhV is the timely detection of adverse drug events (ADEs) that are novel in their clinical nature, severity and/or frequency. Drug interactions (DI) pose an important problem in the development of new drugs and post marketing PhV that contribute to 6-30% of all unexpected ADEs. Therefore, the early detection of DI is vital. Spontaneous reporting systems (SRS) have served as the core data collection system for post marketing PhV since the 1960s. The main objective of our study was to particularly identify signals of DI from SRS. In addition, we are presenting an optimized tailored mining algorithm called "hybrid Apriori". The proposed algorithm is based on an optimized and modified association rule mining (ARM) approach. A hybrid Apriori algorithm has been applied to the SRS of the United States Food and Drug Administration's (U.S. FDA) adverse events reporting system (FAERS) in order to extract significant association patterns of drug interaction-adverse event (DIAE). We have assessed the resulting DIAEs qualitatively and quantitatively using two different triage features: a three-element taxonomy and three performance metrics. These features were applied on two random samples of 100 interacting and 100 non-interacting DIAE patterns. Additionally, we have employed logistic regression (LR) statistic method to quantify the magnitude and direction of interactions in order to test for confounding by co-medication in unknown interacting DIAE patterns. Hybrid Apriori extracted 2933 interacting DIAE patterns (including 1256 serious ones) and 530 non-interacting DIAE patterns. Referring to the current knowledge using four different reliable resources of DI, the results showed that the proposed method can extract signals of serious interacting DIAEs. Various association patterns could be identified based on the relationships among

  17. Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns.

    Directory of Open Access Journals (Sweden)

    Shaoming Pan

    Full Text Available Declustering techniques are widely used in distributed environments to reduce query response time through parallel I/O by splitting large files into several small blocks and then distributing those blocks among multiple storage nodes. Unfortunately, however, many small geospatial image data files cannot be further split for distributed storage. In this paper, we propose a complete theoretical system for the distributed storage of small geospatial image data files based on mining the access patterns of geospatial image data using their historical access log information. First, an algorithm is developed to construct an access correlation matrix based on the analysis of the log information, which reveals the patterns of access to the geospatial image data. Then, a practical heuristic algorithm is developed to determine a reasonable solution based on the access correlation matrix. Finally, a number of comparative experiments are presented, demonstrating that our algorithm displays a higher total parallel access probability than those of other algorithms by approximately 10-15% and that the performance can be further improved by more than 20% by simultaneously applying a copy storage strategy. These experiments show that the algorithm can be applied in distributed environments to help realize parallel I/O and thereby improve system performance.

  18. Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns.

    Science.gov (United States)

    Pan, Shaoming; Li, Yongkai; Xu, Zhengquan; Chong, Yanwen

    2015-01-01

    Declustering techniques are widely used in distributed environments to reduce query response time through parallel I/O by splitting large files into several small blocks and then distributing those blocks among multiple storage nodes. Unfortunately, however, many small geospatial image data files cannot be further split for distributed storage. In this paper, we propose a complete theoretical system for the distributed storage of small geospatial image data files based on mining the access patterns of geospatial image data using their historical access log information. First, an algorithm is developed to construct an access correlation matrix based on the analysis of the log information, which reveals the patterns of access to the geospatial image data. Then, a practical heuristic algorithm is developed to determine a reasonable solution based on the access correlation matrix. Finally, a number of comparative experiments are presented, demonstrating that our algorithm displays a higher total parallel access probability than those of other algorithms by approximately 10-15% and that the performance can be further improved by more than 20% by simultaneously applying a copy storage strategy. These experiments show that the algorithm can be applied in distributed environments to help realize parallel I/O and thereby improve system performance.

  19. Software tool for data mining and its applications

    Science.gov (United States)

    Yang, Jie; Ye, Chenzhou; Chen, Nianyi

    2002-03-01

    A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.

  20. Parallel Algorithms and Patterns

    Energy Technology Data Exchange (ETDEWEB)

    Robey, Robert W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-06-16

    This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.

  1. An Optimization Routing Algorithm for Green Communication in Underground Mines

    Directory of Open Access Journals (Sweden)

    Heng Xu

    2018-06-01

    Full Text Available With the long-term dependence of humans on ore-based energy, underground mines are utilized around the world, and underground mining is often dangerous. Therefore, many underground mines have established networks that manage and acquire information from sensor nodes deployed on miners and in other places. Since the power supplies of many mobile sensor nodes are batteries, green communication is an effective approach of reducing the energy consumption of a network and extending its longevity. To reduce the energy consumption of networks, all factors that negatively influence the lifetime should be considered. The degree constraint minimum spanning tree (DCMST is introduced in this study to consider all the heterogeneous factors and assign weights for the next step of the evaluation. Then, a genetic algorithm (GA is introduced to cluster sensor nodes in the network and balance energy consumption according to several heterogeneous factors and routing paths from DCMST. Based on a comparison of the simulation results, the optimization routing algorithm proposed in this study for use in green communication in underground mines can effectively reduce the network energy consumption and extend the lifetimes of networks.

  2. Web Usage Mining, Pattern Discovery dan Log File

    OpenAIRE

    Tri Suratno; Toni Prahasto; Adian Fatchur Rochim

    2014-01-01

    Analysis  of  data  to  access  the  server  can  provide  significant  and  useful  information  for  performance  improvement,  restructuring  andimproving the effectiveness of a web site. Data mining is one of the most effective way to detect a series of patterns of information from large amounts of data. Application of  data mining  on  Internet use  called web  mining  is a set of  data mining  techniques  are  used  for the web. Web mining technologies and data mining is a combination o...

  3. Practical mine ventilation optimization based on genetic algorithms for free splitting networks

    Energy Technology Data Exchange (ETDEWEB)

    Acuna, E.; Maynard, R.; Hall, S. [Laurentian Univ., Sudbury, ON (Canada). Mirarco Mining Innovation; Hardcastle, S.G.; Li, G. [Natural Resources Canada, Sudbury, ON (Canada). CANMET Mining and Mineral Sciences Laboratories; Lowndes, I.S. [Nottingham Univ., Nottingham (United Kingdom). Process and Environmental Research Division; Tonnos, A. [Bestech, Sudbury, ON (Canada)

    2010-07-01

    The method used to optimize the design and operation of mine ventilation has generally been based on case studies and expert knowledge. It has yet to benefit from optimization techniques used and proven in other fields of engineering. Currently, optimization of mine ventilation systems is a manual based decision process performed by an experienced mine ventilation specialist assisted by commercial ventilation distribution solvers. These analysis tools are widely used in the mining industry to evaluate the practical and economic viability of alternative ventilation system configurations. The scenario which is usually selected is the one that reports the lowest energy consumption while delivering the required airflow distribution. Since most commercial solvers do not have an integrated optimization algorithm network, the process of generating a series of potential ventilation solutions using the conventional iterative design strategy can be time consuming. For that reason, a genetic algorithm (GA) optimization routine was developed in combination with a ventilation solver to determine the potential optimal solutions of a primary mine ventilation system based on a free splitting network. The optimization method was used in a small size mine ventilation network. The technique was shown to have the capacity to generate good feasible solutions and improve upon the manual results obtained by mine ventilation specialists. 9 refs., 7 tabs., 3 figs.

  4. User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.

    Science.gov (United States)

    Bourobou, Serge Thomas Mickala; Yoo, Younghwan

    2015-05-21

    This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.

  5. User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm

    Directory of Open Access Journals (Sweden)

    Serge Thomas Mickala Bourobou

    2015-05-01

    Full Text Available This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen’s temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.

  6. Mining the National Career Assessment Examination Result Using Clustering Algorithm

    Science.gov (United States)

    Pagudpud, M. V.; Palaoag, T. T.; Padirayon, L. M.

    2018-03-01

    Education is an essential process today which elicits authorities to discover and establish innovative strategies for educational improvement. This study applied data mining using clustering technique for knowledge extraction from the National Career Assessment Examination (NCAE) result in the Division of Quirino. The NCAE is an examination given to all grade 9 students in the Philippines to assess their aptitudes in the different domains. Clustering the students is helpful in identifying students’ learning considerations. With the use of the RapidMiner tool, clustering algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), k-means, k-medoid, expectation maximization clustering, and support vector clustering algorithms were analyzed. The silhouette indexes of the said clustering algorithms were compared, and the result showed that the k-means algorithm with k = 3 and silhouette index equal to 0.196 is the most appropriate clustering algorithm to group the students. Three groups were formed having 477 students in the determined group (cluster 0), 310 proficient students (cluster 1) and 396 developing students (cluster 2). The data mining technique used in this study is essential in extracting useful information from the NCAE result to better understand the abilities of students which in turn is a good basis for adopting teaching strategies.

  7. Feature Reduction Based on Genetic Algorithm and Hybrid Model for Opinion Mining

    Directory of Open Access Journals (Sweden)

    P. Kalaivani

    2015-01-01

    Full Text Available With the rapid growth of websites and web form the number of product reviews is available on the sites. An opinion mining system is needed to help the people to evaluate emotions, opinions, attitude, and behavior of others, which is used to make decisions based on the user preference. In this paper, we proposed an optimized feature reduction that incorporates an ensemble method of machine learning approaches that uses information gain and genetic algorithm as feature reduction techniques. We conducted comparative study experiments on multidomain review dataset and movie review dataset in opinion mining. The effectiveness of single classifiers Naïve Bayes, logistic regression, support vector machine, and ensemble technique for opinion mining are compared on five datasets. The proposed hybrid method is evaluated and experimental results using information gain and genetic algorithm with ensemble technique perform better in terms of various measures for multidomain review and movie reviews. Classification algorithms are evaluated using McNemar’s test to compare the level of significance of the classifiers.

  8. Collaborative mining and transfer learning for relational data

    Science.gov (United States)

    Levchuk, Georgiy; Eslami, Mohammed

    2015-06-01

    Many of the real-world problems, - including human knowledge, communication, biological, and cyber network analysis, - deal with data entities for which the essential information is contained in the relations among those entities. Such data must be modeled and analyzed as graphs, with attributes on both objects and relations encode and differentiate their semantics. Traditional data mining algorithms were originally designed for analyzing discrete objects for which a set of features can be defined, and thus cannot be easily adapted to deal with graph data. This gave rise to the relational data mining field of research, of which graph pattern learning is a key sub-domain [11]. In this paper, we describe a model for learning graph patterns in collaborative distributed manner. Distributed pattern learning is challenging due to dependencies between the nodes and relations in the graph, and variability across graph instances. We present three algorithms that trade-off benefits of parallelization and data aggregation, compare their performance to centralized graph learning, and discuss individual benefits and weaknesses of each model. Presented algorithms are designed for linear speedup in distributed computing environments, and learn graph patterns that are both closer to ground truth and provide higher detection rates than centralized mining algorithm.

  9. A hybrid heuristic algorithm for the open-pit-mining operational planning problem.

    OpenAIRE

    Souza, Marcone Jamilson Freitas; Coelho, Igor Machado; Ribas, Sabir; Santos, Haroldo Gambini; Merschmann, Luiz Henrique de Campos

    2010-01-01

    This paper deals with the Open-Pit-Mining Operational Planning problem with dynamic truck allocation. The objective is to optimize mineral extraction in the mines by minimizing the number of mining trucks used to meet production goals and quality requirements. According to the literature, this problem is NPhard, so a heuristic strategy is justified. We present a hybrid algorithm that combines characteristics of two metaheuristics: Greedy Randomized Adaptive Search Procedures and General Varia...

  10. MINING ON CAR DATABASE EMPLOYING LEARNING AND CLUSTERING ALGORITHMS

    OpenAIRE

    Muhammad Rukunuddin Ghalib; Shivam Vohra; Sunish Vohra; Akash Juneja

    2013-01-01

    In data mining, classification is a form of data analysis that can be used to extract models describing important data classes. Two of the known learning algorithms used are Naïve Bayesian (NB) and SMO (Self-Minimal-Optimisation) .Thus the following two learning algorithms are used on a Car review database and thus a model is hence created which predicts the characteristic of a review comment after getting trained. It was found that model successfully predicted correctly about the review comm...

  11. HSM: Heterogeneous Subspace Mining in High Dimensional Data

    DEFF Research Database (Denmark)

    Müller, Emmanuel; Assent, Ira; Seidl, Thomas

    2009-01-01

    Heterogeneous data, i.e. data with both categorical and continuous values, is common in many databases. However, most data mining algorithms assume either continuous or categorical attributes, but not both. In high dimensional data, phenomena due to the "curse of dimensionality" pose additional...... challenges. Usually, due to locally varying relevance of attributes, patterns do not show across the full set of attributes. In this paper we propose HSM, which defines a new pattern model for heterogeneous high dimensional data. It allows data mining in arbitrary subsets of the attributes that are relevant...... for the respective patterns. Based on this model we propose an efficient algorithm, which is aware of the heterogeneity of the attributes. We extend an indexing structure for continuous attributes such that HSM indexing adapts to different attribute types. In our experiments we show that HSM efficiently mines...

  12. Mining Productive-Associated Periodic-Frequent Patterns in Body Sensor Data for Smart Home Care.

    Science.gov (United States)

    Ismail, Walaa N; Hassan, Mohammad Mehedi

    2017-04-26

    The understanding of various health-oriented vital sign data generated from body sensor networks (BSNs) and discovery of the associations between the generated parameters is an important task that may assist and promote important decision making in healthcare. For example, in a smart home scenario where occupants' health status is continuously monitored remotely, it is essential to provide the required assistance when an unusual or critical situation is detected in their vital sign data. In this paper, we present an efficient approach for mining the periodic patterns obtained from BSN data. In addition, we employ a correlation test on the generated patterns and introduce productive-associated periodic-frequent patterns as the set of correlated periodic-frequent items. The combination of these measures has the advantage of empowering healthcare providers and patients to raise the quality of diagnosis as well as improve treatment and smart care, especially for elderly people in smart homes. We develop an efficient algorithm named PPFP-growth (Productive Periodic-Frequent Pattern-growth) to discover all productive-associated periodic frequent patterns using these measures. PPFP-growth is efficient and the productiveness measure removes uncorrelated periodic items. An experimental evaluation on synthetic and real datasets shows the efficiency of the proposed PPFP-growth algorithm, which can filter a huge number of periodic patterns to reveal only the correlated ones.

  13. Mining Productive-Associated Periodic-Frequent Patterns in Body Sensor Data for Smart Home Care

    Directory of Open Access Journals (Sweden)

    Walaa N. Ismail

    2017-04-01

    Full Text Available The understanding of various health-oriented vital sign data generated from body sensor networks (BSNs and discovery of the associations between the generated parameters is an important task that may assist and promote important decision making in healthcare. For example, in a smart home scenario where occupants’ health status is continuously monitored remotely, it is essential to provide the required assistance when an unusual or critical situation is detected in their vital sign data. In this paper, we present an efficient approach for mining the periodic patterns obtained from BSN data. In addition, we employ a correlation test on the generated patterns and introduce productive-associated periodic-frequent patterns as the set of correlated periodic-frequent items. The combination of these measures has the advantage of empowering healthcare providers and patients to raise the quality of diagnosis as well as improve treatment and smart care, especially for elderly people in smart homes. We develop an efficient algorithm named PPFP-growth (Productive Periodic-Frequent Pattern-growth to discover all productive-associated periodic frequent patterns using these measures. PPFP-growth is efficient and the productiveness measure removes uncorrelated periodic items. An experimental evaluation on synthetic and real datasets shows the efficiency of the proposed PPFP-growth algorithm, which can filter a huge number of periodic patterns to reveal only the correlated ones.

  14. Useful Pattern Mining on Time Series

    DEFF Research Database (Denmark)

    Goumatianos, Nikitas; Christou, Ioannis T; Lindgren, Peter

    2013-01-01

    We present the architecture of a “useful pattern” mining system that is capable of detecting thousands of different candlestick sequence patterns at the tick or any higher granularity levels. The system architecture is highly distributed and performs most of its highly compute-intensive aggregation...... calculations as complex but efficient distributed SQL queries on the relational databases that store the time-series. We present initial results from mining all frequent candlestick sequences with the characteristic property that when they occur then, with an average at least 60% probability, they signal a 2...

  15. Genetic Algorithm Calibration of Probabilistic Cellular Automata for Modeling Mining Permit Activity

    Science.gov (United States)

    Louis, S.J.; Raines, G.L.

    2003-01-01

    We use a genetic algorithm to calibrate a spatially and temporally resolved cellular automata to model mining activity on public land in Idaho and western Montana. The genetic algorithm searches through a space of transition rule parameters of a two dimensional cellular automata model to find rule parameters that fit observed mining activity data. Previous work by one of the authors in calibrating the cellular automaton took weeks - the genetic algorithm takes a day and produces rules leading to about the same (or better) fit to observed data. These preliminary results indicate that genetic algorithms are a viable tool in calibrating cellular automata for this application. Experience gained during the calibration of this cellular automata suggests that mineral resource information is a critical factor in the quality of the results. With automated calibration, further refinements of how the mineral-resource information is provided to the cellular automaton will probably improve our model.

  16. Improving diagnostic accuracy using agent-based distributed data mining system.

    Science.gov (United States)

    Sridhar, S

    2013-09-01

    The use of data mining techniques to improve the diagnostic system accuracy is investigated in this paper. The data mining algorithms aim to discover patterns and extract useful knowledge from facts recorded in databases. Generally, the expert systems are constructed for automating diagnostic procedures. The learning component uses the data mining algorithms to extract the expert system rules from the database automatically. Learning algorithms can assist the clinicians in extracting knowledge automatically. As the number and variety of data sources is dramatically increasing, another way to acquire knowledge from databases is to apply various data mining algorithms that extract knowledge from data. As data sets are inherently distributed, the distributed system uses agents to transport the trained classifiers and uses meta learning to combine the knowledge. Commonsense reasoning is also used in association with distributed data mining to obtain better results. Combining human expert knowledge and data mining knowledge improves the performance of the diagnostic system. This work suggests a framework of combining the human knowledge and knowledge gained by better data mining algorithms on a renal and gallstone data set.

  17. Ecosystem Health Assessment of Mining Cities Based on Landscape Pattern

    Science.gov (United States)

    Yu, W.; Liu, Y.; Lin, M.; Fang, F.; Xiao, R.

    2017-09-01

    Ecosystem health assessment (EHA) is one of the most important aspects in ecosystem management. Nowadays, ecological environment of mining cities is facing various problems. In this study, through ecosystem health theory and remote sensing images in 2005, 2009 and 2013, landscape pattern analysis and Vigor-Organization-Resilience (VOR) model were applied to set up an evaluation index system of ecosystem health of mining city to assess the healthy level of ecosystem in Panji District Huainan city. Results showed a temporal stable but high spatial heterogeneity landscape pattern during 2005-2013. According to the regional ecosystem health index, it experienced a rapid decline after a slight increase, and finally it maintained at an ordinary level. Among these areas, a significant distinction was presented in different towns. It indicates that the ecosystem health of Tianjijiedao town, the regional administrative centre, descended rapidly during the study period, and turned into the worst level in the study area. While the Hetuan Town, located in the northwestern suburb area of Panji District, stayed on a relatively better level than other towns. The impacts of coal mining collapse area, land reclamation on the landscape pattern and ecosystem health status of mining cities were also discussed. As a result of underground coal mining, land subsidence has become an inevitable problem in the study area. In addition, the coal mining subsidence area has brought about the destruction of the farmland, construction land and water bodies, which causing the change of the regional landscape pattern and making the evaluation of ecosystem health in mining area more difficult. Therefore, this study provided an ecosystem health approach for relevant departments to make scientific decisions.

  18. An Efficient Association Rule Hiding Algorithm for Privacy Preserving Data Mining

    OpenAIRE

    Yogendra Kumar Jain,; Vinod Kumar Yadav,; Geetika S. Panday

    2011-01-01

    The security of the large database that contains certain crucial information, it will become a serious issue when sharing data to the network against unauthorized access. Privacy preserving data mining is a new research trend in privacy data for data mining and statistical database. Association analysis is a powerful toolfor discovering relationships which are hidden in large database. Association rules hiding algorithms get strong and efficient performance for protecting confidential and cru...

  19. An imperialist competitive algorithm for solving the production scheduling problem in open pit mine

    Directory of Open Access Journals (Sweden)

    Mojtaba Mokhtarian Asl

    2016-06-01

    Full Text Available Production scheduling (planning of an open-pit mine is the procedure during which the rock blocks are assigned to different production periods in a way that the highest net present value of the project achieved subject to operational constraints. The paper introduces a new and computationally less expensive meta-heuristic technique known as imperialist competitive algorithm (ICA for long-term production planning of open pit mines. The proposed algorithm modifies the original rules of the assimilation process. The ICA performance for different levels of the control factors has been studied and the results are presented. The result showed that ICA could be efficiently applied on mine production planning problem.

  20. An Efficient System Based On Closed Sequential Patterns for Web Recommendations

    OpenAIRE

    Utpala Niranjan; R.B.V. Subramanyam; V-Khana

    2010-01-01

    Sequential pattern mining, since its introduction has received considerable attention among the researchers with broad applications. The sequential pattern algorithms generally face problems when mining long sequential patterns or while using very low support threshold. One possible solution of such problems is by mining the closed sequential patterns, which is a condensed representation of sequential patterns. Recently, several researchers have utilized the sequential pattern discovery for d...

  1. An Application of Data Mining Algorithms for Shipbuilding Cost Estimation

    NARCIS (Netherlands)

    Kaluzny, B.L.; Barbici, S.; Berg, G.; Chiomento, R.; Derpanis,D.; Jonsson, U.; Shaw, R.H.A.D.; Smit, M.C.; Ramaroson, F.

    2011-01-01

    This article presents a novel application of known data mining algorithms to the problem of estimating the cost of ship development and construction. The work is a product of North Atlantic Treaty Organization Research and Technology Organization Systems Analysis and Studies 076 Task Group “NATO

  2. A Contextualized, Differential Sequence Mining Method to Derive Students' Learning Behavior Patterns

    Science.gov (United States)

    Kinnebrew, John S.; Loretz, Kirk M.; Biswas, Gautam

    2013-01-01

    Computer-based learning environments can produce a wealth of data on student learning interactions. This paper presents an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs a novel combination of sequence mining techniques to identify deferentially…

  3. Statistically significant relational data mining :

    Energy Technology Data Exchange (ETDEWEB)

    Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.

    2014-02-01

    This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.

  4. Improve Data Mining and Knowledge Discovery through the use of MatLab

    Science.gov (United States)

    Shaykahian, Gholan Ali; Martin, Dawn Elliott; Beil, Robert

    2011-01-01

    Data mining is widely used to mine business, engineering, and scientific data. Data mining uses pattern based queries, searches, or other analyses of one or more electronic databases/datasets in order to discover or locate a predictive pattern or anomaly indicative of system failure, criminal or terrorist activity, etc. There are various algorithms, techniques and methods used to mine data; including neural networks, genetic algorithms, decision trees, nearest neighbor method, rule induction association analysis, slice and dice, segmentation, and clustering. These algorithms, techniques and methods used to detect patterns in a dataset, have been used in the development of numerous open source and commercially available products and technology for data mining. Data mining is best realized when latent information in a large quantity of data stored is discovered. No one technique solves all data mining problems; challenges are to select algorithms or methods appropriate to strengthen data/text mining and trending within given datasets. In recent years, throughout industry, academia and government agencies, thousands of data systems have been designed and tailored to serve specific engineering and business needs. Many of these systems use databases with relational algebra and structured query language to categorize and retrieve data. In these systems, data analyses are limited and require prior explicit knowledge of metadata and database relations; lacking exploratory data mining and discoveries of latent information. This presentation introduces MatLab(TradeMark)(MATrix LABoratory), an engineering and scientific data analyses tool to perform data mining. MatLab was originally intended to perform purely numerical calculations (a glorified calculator). Now, in addition to having hundreds of mathematical functions, it is a programming language with hundreds built in standard functions and numerous available toolboxes. MatLab's ease of data processing, visualization and

  5. Exploiting Sequential Patterns Found in Users' Solutions and Virtual Tutor Behavior to Improve Assistance in ITS

    Science.gov (United States)

    Fournier-Viger, Philippe; Faghihi, Usef; Nkambou, Roger; Nguifo, Engelbert Mephu

    2010-01-01

    We propose to mine temporal patterns in Intelligent Tutoring Systems (ITSs) to uncover useful knowledge that can enhance their ability to provide assistance. To discover patterns, we suggest using a custom, sequential pattern-mining algorithm. Two ways of applying the algorithm to enhance an ITS's capabilities are addressed. The first is to…

  6. An Adaptive Sensor Mining Framework for Pervasive Computing Applications

    Science.gov (United States)

    Rashidi, Parisa; Cook, Diane J.

    Analyzing sensor data in pervasive computing applications brings unique challenges to the KDD community. The challenge is heightened when the underlying data source is dynamic and the patterns change. We introduce a new adaptive mining framework that detects patterns in sensor data, and more importantly, adapts to the changes in the underlying model. In our framework, the frequent and periodic patterns of data are first discovered by the Frequent and Periodic Pattern Miner (FPPM) algorithm; and then any changes in the discovered patterns over the lifetime of the system are discovered by the Pattern Adaptation Miner (PAM) algorithm, in order to adapt to the changing environment. This framework also captures vital context information present in pervasive computing applications, such as the startup triggers and temporal information. In this paper, we present a description of our mining framework and validate the approach using data collected in the CASAS smart home testbed.

  7. Star pattern recognition algorithm aided by inertial information

    Science.gov (United States)

    Liu, Bao; Wang, Ke-dong; Zhang, Chao

    2011-08-01

    Star pattern recognition is one of the key problems of the celestial navigation. The traditional star pattern recognition approaches, such as the triangle algorithm and the star angular distance algorithm, are a kind of all-sky matching method whose recognition speed is slow and recognition success rate is not high. Therefore, the real time and reliability of CNS (Celestial Navigation System) is reduced to some extent, especially for the maneuvering spacecraft. However, if the direction of the camera optical axis can be estimated by other navigation systems such as INS (Inertial Navigation System), the star pattern recognition can be fulfilled in the vicinity of the estimated direction of the optical axis. The benefits of the INS-aided star pattern recognition algorithm include at least the improved matching speed and the improved success rate. In this paper, the direction of the camera optical axis, the local matching sky, and the projection of stars on the image plane are estimated by the aiding of INS firstly. Then, the local star catalog for the star pattern recognition is established in real time dynamically. The star images extracted in the camera plane are matched in the local sky. Compared to the traditional all-sky star pattern recognition algorithms, the memory of storing the star catalog is reduced significantly. Finally, the INS-aided star pattern recognition algorithm is validated by simulations. The results of simulations show that the algorithm's computation time is reduced sharply and its matching success rate is improved greatly.

  8. WildSpan: mining structured motifs from protein sequences

    Directory of Open Access Journals (Sweden)

    Chen Chien-Yu

    2011-03-01

    Full Text Available Abstract Background Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions that incorporates several pruning strategies to largely reduce the mining cost. Results WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode

  9. Mining Social and Affective Data for Recommendation of Student Tutors

    Directory of Open Access Journals (Sweden)

    Elisa Boff

    2013-03-01

    Full Text Available This paper presents a learning environment where a mining algorithm is used to learn patterns of interaction with the user and to represent these patterns in a scheme called item descriptors. The learning environment keeps theoretical information about subjects, as well as tools and exercises where the student can put into practice the knowledge gained. One of the main purposes of the project is to stimulate collaborative learning through the interaction of students with different levels of knowledge. The students' actions, as well as their interactions, are monitored by the system and used to find patterns that can guide the search for students that may play the role of a tutor. Such patterns are found with a particular learning algorithm and represented in item descriptors. The paper presents the educational environment, the representation mechanism and learning algorithm used to mine social-affective data in order to create a recommendation model of tutors.

  10. Parallelizing Gene Expression Programming Algorithm in Enabling Large-Scale Classification

    Directory of Open Access Journals (Sweden)

    Lixiong Xu

    2017-01-01

    Full Text Available As one of the most effective function mining algorithms, Gene Expression Programming (GEP algorithm has been widely used in classification, pattern recognition, prediction, and other research fields. Based on the self-evolution, GEP is able to mine an optimal function for dealing with further complicated tasks. However, in big data researches, GEP encounters low efficiency issue due to its long time mining processes. To improve the efficiency of GEP in big data researches especially for processing large-scale classification tasks, this paper presents a parallelized GEP algorithm using MapReduce computing model. The experimental results show that the presented algorithm is scalable and efficient for processing large-scale classification tasks.

  11. Comparison analysis for classification algorithm in data mining and the study of model use

    Science.gov (United States)

    Chen, Junde; Zhang, Defu

    2018-04-01

    As a key technique in data mining, classification algorithm was received extensive attention. Through an experiment of classification algorithm in UCI data set, we gave a comparison analysis method for the different algorithms and the statistical test was used here. Than that, an adaptive diagnosis model for preventive electricity stealing and leakage was given as a specific case in the paper.

  12. Unsupervised learning algorithms

    CERN Document Server

    Aydin, Kemal

    2016-01-01

    This book summarizes the state-of-the-art in unsupervised learning. The contributors discuss how with the proliferation of massive amounts of unlabeled data, unsupervised learning algorithms, which can automatically discover interesting and useful patterns in such data, have gained popularity among researchers and practitioners. The authors outline how these algorithms have found numerous applications including pattern recognition, market basket analysis, web mining, social network analysis, information retrieval, recommender systems, market research, intrusion detection, and fraud detection. They present how the difficulty of developing theoretically sound approaches that are amenable to objective evaluation have resulted in the proposal of numerous unsupervised learning algorithms over the past half-century. The intended audience includes researchers and practitioners who are increasingly using unsupervised learning algorithms to analyze their data. Topics of interest include anomaly detection, clustering,...

  13. Data mining and Pattern Recognizing Models for Identifying Inherited Diseases: Challenges and Implications

    OpenAIRE

    Lahiru Iddamalgoda; Partha Sarathi Das; Partha Sarathi Das; Achala Aponso; Vijayaraghava Seshadri Sundararajan; Prashanth Suravajhala; Prashanth Suravajhala; Prashanth Suravajhala; Jayaraman K Valadi

    2016-01-01

    Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately determining the responsible genetic factors for prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern r...

  14. Data Mining and Pattern Recognition Models for Identifying Inherited Diseases: Challenges and Implications

    OpenAIRE

    Iddamalgoda, Lahiru; Das, Partha S.; Aponso, Achala; Sundararajan, Vijayaraghava S.; Suravajhala, Prashanth; Valadi, Jayaraman K.

    2016-01-01

    Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited ...

  15. Attribute Index and Uniform Design Based Multiobjective Association Rule Mining with Evolutionary Algorithm

    Directory of Open Access Journals (Sweden)

    Jie Zhang

    2013-01-01

    Full Text Available In association rule mining, evaluating an association rule needs to repeatedly scan database to compare the whole database with the antecedent, consequent of a rule and the whole rule. In order to decrease the number of comparisons and time consuming, we present an attribute index strategy. It only needs to scan database once to create the attribute index of each attribute. Then all metrics values to evaluate an association rule do not need to scan database any further, but acquire data only by means of the attribute indices. The paper visualizes association rule mining as a multiobjective problem rather than a single objective one. In order to make the acquired solutions scatter uniformly toward the Pareto frontier in the objective space, elitism policy and uniform design are introduced. The paper presents the algorithm of attribute index and uniform design based multiobjective association rule mining with evolutionary algorithm, abbreviated as IUARMMEA. It does not require the user-specified minimum support and minimum confidence anymore, but uses a simple attribute index. It uses a well-designed real encoding so as to extend its application scope. Experiments performed on several databases demonstrate that the proposed algorithm has excellent performance, and it can significantly reduce the number of comparisons and time consumption.

  16. Attribute index and uniform design based multiobjective association rule mining with evolutionary algorithm.

    Science.gov (United States)

    Zhang, Jie; Wang, Yuping; Feng, Junhong

    2013-01-01

    In association rule mining, evaluating an association rule needs to repeatedly scan database to compare the whole database with the antecedent, consequent of a rule and the whole rule. In order to decrease the number of comparisons and time consuming, we present an attribute index strategy. It only needs to scan database once to create the attribute index of each attribute. Then all metrics values to evaluate an association rule do not need to scan database any further, but acquire data only by means of the attribute indices. The paper visualizes association rule mining as a multiobjective problem rather than a single objective one. In order to make the acquired solutions scatter uniformly toward the Pareto frontier in the objective space, elitism policy and uniform design are introduced. The paper presents the algorithm of attribute index and uniform design based multiobjective association rule mining with evolutionary algorithm, abbreviated as IUARMMEA. It does not require the user-specified minimum support and minimum confidence anymore, but uses a simple attribute index. It uses a well-designed real encoding so as to extend its application scope. Experiments performed on several databases demonstrate that the proposed algorithm has excellent performance, and it can significantly reduce the number of comparisons and time consumption.

  17. High Performance Data mining by Genetic Neural Network

    Directory of Open Access Journals (Sweden)

    Dadmehr Rahbari

    2013-10-01

    Full Text Available Data mining in computer science is the process of discovering interesting and useful patterns and relationships in large volumes of data. Most methods for mining problems is based on artificial intelligence algorithms. Neural network optimization based on three basic parameters topology, weights and the learning rate is a powerful method. We introduce optimal method for solving this problem. In this paper genetic algorithm with mutation and crossover operators change the network structure and optimized that. Dataset used for our work is stroke disease with twenty features that optimized number of that achieved by new hybrid algorithm. Result of this work is very well incomparison with other similar method. Low present of error show that our method is our new approach to efficient, high-performance data mining problems is introduced.

  18. Pattern Discovery and Change Detection of Online Music Query Streams

    Science.gov (United States)

    Li, Hua-Fu

    In this paper, an efficient stream mining algorithm, called FTP-stream (Frequent Temporal Pattern mining of streams), is proposed to find the frequent temporal patterns over melody sequence streams. In the framework of our proposed algorithm, an effective bit-sequence representation is used to reduce the time and memory needed to slide the windows. The FTP-stream algorithm can calculate the support threshold in only a single pass based on the concept of bit-sequence representation. It takes the advantage of "left" and "and" operations of the representation. Experiments show that the proposed algorithm only scans the music query stream once, and runs significant faster and consumes less memory than existing algorithms, such as SWFI-stream and Moment.

  19. Efficient discovery of risk patterns in medical data.

    Science.gov (United States)

    Li, Jiuyong; Fu, Ada Wai-chee; Fahey, Paul

    2009-01-01

    This paper studies a problem of efficiently discovering risk patterns in medical data. Risk patterns are defined by a statistical metric, relative risk, which has been widely used in epidemiological research. To avoid fruitless search in the complete exploration of risk patterns, we define optimal risk pattern set to exclude superfluous patterns, i.e. complicated patterns with lower relative risk than their corresponding simpler form patterns. We prove that mining optimal risk pattern sets conforms an anti-monotone property that supports an efficient mining algorithm. We propose an efficient algorithm for mining optimal risk pattern sets based on this property. We also propose a hierarchical structure to present discovered patterns for the easy perusal by domain experts. The proposed approach is compared with two well-known rule discovery methods, decision tree and association rule mining approaches on benchmark data sets and applied to a real world application. The proposed method discovers more and better quality risk patterns than a decision tree approach. The decision tree method is not designed for such applications and is inadequate for pattern exploring. The proposed method does not discover a large number of uninteresting superfluous patterns as an association mining approach does. The proposed method is more efficient than an association rule mining method. A real world case study shows that the method reveals some interesting risk patterns to medical practitioners. The proposed method is an efficient approach to explore risk patterns. It quickly identifies cohorts of patients that are vulnerable to a risk outcome from a large data set. The proposed method is useful for exploratory study on large medical data to generate and refine hypotheses. The method is also useful for designing medical surveillance systems.

  20. Plant succession patterns on residual open-pit gravel mines deposits Bogota

    OpenAIRE

    Ricardo A. Mora Goyes

    1999-01-01

    Based on both: the study of composition and structure of plant communities and the analysis of the physico-chemical characteristics of mining wastes, the initial patterns of primary succession were determined. These patterns were present in three deposits of waste material abandoned during 18, 36 and 120 months respectively. Sue materials were originated in open-pit gravel mines located to the south of Bogota (Colombia). This study pretends to contribute to the knowledge of the meehanlsms of ...

  1. Gas Emission Prediction Model of Coal Mine Based on CSBP Algorithm

    Directory of Open Access Journals (Sweden)

    Xiong Yan

    2016-01-01

    Full Text Available In view of the nonlinear characteristics of gas emission in a coal working face, a prediction method is proposed based on cuckoo search algorithm optimized BP neural network (CSBP. In the CSBP algorithm, the cuckoo search is adopted to optimize weight and threshold parameters of BP network, and obtains the global optimal solutions. Furthermore, the twelve main affecting factors of the gas emission in the coal working face are taken as input vectors of CSBP algorithm, the gas emission is acted as output vector, and then the prediction model of BP neural network with optimal parameters is established. The results show that the CSBP algorithm has batter generalization ability and higher prediction accuracy, and can be utilized effectively in the prediction of coal mine gas emission.

  2. Algorithms for adaptive nonlinear pattern recognition

    Science.gov (United States)

    Schmalz, Mark S.; Ritter, Gerhard X.; Hayden, Eric; Key, Gary

    2011-09-01

    In Bayesian pattern recognition research, static classifiers have featured prominently in the literature. A static classifier is essentially based on a static model of input statistics, thereby assuming input ergodicity that is not realistic in practice. Classical Bayesian approaches attempt to circumvent the limitations of static classifiers, which can include brittleness and narrow coverage, by training extensively on a data set that is assumed to cover more than the subtense of expected input. Such assumptions are not realistic for more complex pattern classification tasks, for example, object detection using pattern classification applied to the output of computer vision filters. In contrast, we have developed a two step process, that can render the majority of static classifiers adaptive, such that the tracking of input nonergodicities is supported. Firstly, we developed operations that dynamically insert (or resp. delete) training patterns into (resp. from) the classifier's pattern database, without requiring that the classifier's internal representation of its training database be completely recomputed. Secondly, we developed and applied a pattern replacement algorithm that uses the aforementioned pattern insertion/deletion operations. This algorithm is designed to optimize the pattern database for a given set of performance measures, thereby supporting closed-loop, performance-directed optimization. This paper presents theory and algorithmic approaches for the efficient computation of adaptive linear and nonlinear pattern recognition operators that use our pattern insertion/deletion technology - in particular, tabular nearest-neighbor encoding (TNE) and lattice associative memories (LAMs). Of particular interest is the classification of nonergodic datastreams that have noise corruption with time-varying statistics. The TNE and LAM based classifiers discussed herein have been successfully applied to the computation of object classification in hyperspectral

  3. Heuristics Miner for E-Commerce Visitor Access Pattern Representation

    OpenAIRE

    Kartina Diah Kesuma Wardhani; Wawan Yunanto

    2017-01-01

    E-commerce click stream data can form a certain pattern that describe visitor behavior while surfing the e-commerce website. This pattern can be used to initiate a design to determine alternative access sequence on the website. This research use heuristic miner algorithm to determine the pattern. σ-Algorithm and Genetic Mining are methods used for pattern recognition with frequent sequence item set approach. Heuristic Miner is an evolved form of those methods. σ-Algorithm assume that an activ...

  4. A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering

    Science.gov (United States)

    Chahine, Firas Safwan

    2012-01-01

    Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…

  5. An Improved Algorithm Research on the PrefixSpan Based on the Server Session Constraint

    Directory of Open Access Journals (Sweden)

    Cai Hong-Guo

    2017-01-01

    Full Text Available When we mine long sequential pattern and discover knowledge by the PrefixSpan algorithm in Web Usage Mining (WUM.The elements and the suffix sequences are much more may cause the problem of the calculation, such as the space explosion. To further solve the problem a more effective way is that. Firstly, a server session-based server log file format is proposed. Then the improved algorithm on the PrefixSpan based on server session constraint is discussed for mining frequent Sequential patterns on the website. Finally, the validity and superiority of the method are presented by the experiment in the paper.

  6. Plant succession patterns on residual open-pit gravel mines deposits Bogota

    Directory of Open Access Journals (Sweden)

    Ricardo A. Mora Goyes

    1999-07-01

    Full Text Available Based on both: the study of composition and structure of plant communities and the analysis of the physico-chemical characteristics of mining wastes, the initial patterns of primary succession were determined. These patterns were present in three deposits of waste material abandoned during 18, 36 and 120 months respectively. Sue materials were originated in open-pit gravel mines located to the south of Bogota (Colombia. This study pretends to contribute to the knowledge of the meehanlsms of natural restauration of tropical ecosystems subjected to man-borne degradation.

  7. Supporting Solar Physics Research via Data Mining

    Science.gov (United States)

    Angryk, Rafal; Banda, J.; Schuh, M.; Ganesan Pillai, K.; Tosun, H.; Martens, P.

    2012-05-01

    In this talk we will briefly introduce three pillars of data mining (i.e. frequent patterns discovery, classification, and clustering), and discuss some possible applications of known data mining techniques which can directly benefit solar physics research. In particular, we plan to demonstrate applicability of frequent patterns discovery methods for the verification of hypotheses about co-occurrence (in space and time) of filaments and sigmoids. We will also show how classification/machine learning algorithms can be utilized to verify human-created software modules to discover individual types of solar phenomena. Finally, we will discuss applicability of clustering techniques to image data processing.

  8. Zips : mining compressing sequential patterns in streams

    NARCIS (Netherlands)

    Hoang, T.L.; Calders, T.G.K.; Yang, J.; Mörchen, F.; Fradkin, D.; Chau, D.H.; Vreeken, J.; Leeuwen, van M.; Faloutsos, C.

    2013-01-01

    We propose a streaming algorithm, based on the minimal description length (MDL) principle, for extracting non-redundant sequential patterns. For static databases, the MDL-based approach that selects patterns based on their capacity to compress data rather than their frequency, was shown to be

  9. Mining for Social Media: Usage Patterns of Small Businesses

    OpenAIRE

    Balan, Shilpa; Rege, Janhavi

    2017-01-01

    Background: Information can now be rapidly exchanged due to social media. Due to its openness, Twitter has generated massive amounts of data. In this paper, we apply data mining and analytics to extract the usage patterns of social media by small businesses. Objectives: The aim of this paper is to describe with an example how data mining can be applied to social media. This paper further examines the impact of social media on small businesses. The Twitter posts related to small businesses are...

  10. Formulations and algorithms for problems on rock mass and support deformation during mining

    Science.gov (United States)

    Seryakov, VM

    2018-03-01

    The analysis of problem formulations to calculate stress-strain state of mine support and surrounding rocks mass in rock mechanics shows that such formulations incompletely describe the mechanical features of joint deformation in the rock mass–support system. The present paper proposes an algorithm to take into account the actual conditions of rock mass and support interaction and the algorithm implementation method to ensure efficient calculation of stresses in rocks and support.

  11. Finding occupational accident patterns in the extractive industry using a systematic data mining approach

    International Nuclear Information System (INIS)

    Silva, Joaquim F.; Jacinto, Celeste

    2012-01-01

    This paper deals with occupational accident patterns of in the Portuguese Extractive Industry. It constitutes a significant advance with relation to a previous study made in 2008, both in terms of methodology and extended knowledge on the patterns’ details. This work uses more recent data (2005–2007) and this time the identification of the “typical accident” shifts from a bivariate, to a multivariate pattern, for characterising more accurately the accident mechanisms. Instead of crossing only two variables (Deviation x Contact), the new methodology developed here uses data mining techniques to associate nine variables, through their categories, and to quantify the statistical cohesion of each pattern. The results confirmed the “typical accident” of the 2008 study, but went much further: it reveals three statistically significant patterns (the top-3 categories in frequency); moreover, each pattern includes now more variables (4–5 categories) and indicates their statistical cohesion. This approach allowed a more accurate vision of the reality, which is fundamental for risk management. The methodology is best suited for large groups, such as national Authorities, Insurers or Corporate Groups, to assist them planning target-oriented safety strategies. Not least importantly, researchers can apply the same algorithm to other study areas, as it is not restricted to accidents, neither to safety.

  12. Interesting association rule mining with consistent and inconsistent rule detection from big sales data in distributed environment

    Directory of Open Access Journals (Sweden)

    Dinesh J. Prajapati

    2017-06-01

    Full Text Available Nowadays, there is an increasing demand in mining interesting patterns from the big data. The process of analyzing such a huge amount of data is really computationally complex task when using traditional methods. The overall purpose of this paper is in twofold. First, this paper presents a novel approach to identify consistent and inconsistent association rules from sales data located in distributed environment. Secondly, the paper also overcomes the main memory bottleneck and computing time overhead of single computing system by applying computations to multi node cluster. The proposed method initially extracts frequent itemsets for each zone using existing distributed frequent pattern mining algorithms. The paper also compares the time efficiency of Mapreduce based frequent pattern mining algorithm with Count Distribution Algorithm (CDA and Fast Distributed Mining (FDM algorithms. The association generated from frequent itemsets are too large that it becomes complex to analyze it. Thus, Mapreduce based consistent and inconsistent rule detection (MR-CIRD algorithm is proposed to detect the consistent and inconsistent rules from big data and provide useful and actionable knowledge to the domain experts. These pruned interesting rules also give useful knowledge for better marketing strategy as well. The extracted consistent and inconsistent rules are evaluated and compared based on different interestingness measures presented together with experimental results that lead to the final conclusions.

  13. Loading pattern optimization using ant colony algorithm

    International Nuclear Information System (INIS)

    Hoareau, Fabrice

    2008-01-01

    Electricite de France (EDF) operates 58 nuclear power plants (NPP), of the Pressurized Water Reactor type. The loading pattern optimization of these NPP is currently done by EDF expert engineers. Within this framework, EDF R and D has developed automatic optimization tools that assist the experts. LOOP is an industrial tool, developed by EDF R and D and based on a simulated annealing algorithm. In order to improve the results of such automatic tools, new optimization methods have to be tested. Ant Colony Optimization (ACO) algorithms are recent methods that have given very good results on combinatorial optimization problems. In order to evaluate the performance of such methods on loading pattern optimization, direct comparisons between LOOP and a mock-up based on the Max-Min Ant System algorithm (a particular variant of ACO algorithms) were made on realistic test-cases. It is shown that the results obtained by the ACO mock-up are very similar to those of LOOP. Future research will consist in improving these encouraging results by using parallelization and by hybridizing the ACO algorithm with local search procedures. (author)

  14. Comparison of predictive performance of data mining algorithms in predicting body weight in Mengali rams of Pakistan

    Directory of Open Access Journals (Sweden)

    Senol Celik

    Full Text Available ABSTRACT The present study aimed at comparing predictive performance of some data mining algorithms (CART, CHAID, Exhaustive CHAID, MARS, MLP, and RBF in biometrical data of Mengali rams. To compare the predictive capability of the algorithms, the biometrical data regarding body (body length, withers height, and heart girth and testicular (testicular length, scrotal length, and scrotal circumference measurements of Mengali rams in predicting live body weight were evaluated by most goodness of fit criteria. In addition, age was considered as a continuous independent variable. In this context, MARS data mining algorithm was used for the first time to predict body weight in two forms, without (MARS_1 and with interaction (MARS_2 terms. The superiority order in the predictive accuracy of the algorithms was found as CART > CHAID ≈ Exhaustive CHAID > MARS_2 > MARS_1 > RBF > MLP. Moreover, all tested algorithms provided a strong predictive accuracy for estimating body weight. However, MARS is the only algorithm that generated a prediction equation for body weight. Therefore, it is hoped that the available results might present a valuable contribution in terms of predicting body weight and describing the relationship between the body weight and body and testicular measurements in revealing breed standards and the conservation of indigenous gene sources for Mengali sheep breeding. Therefore, it will be possible to perform more profitable and productive sheep production. Use of data mining algorithms is useful for revealing the relationship between body weight and testicular traits in describing breed standards of Mengali sheep.

  15. Data mining theories, algorithms, and examples

    CERN Document Server

    Ye, Nong

    2013-01-01

    AN OVERVIEW OF DATA MINING METHODOLOGIESIntroduction to data mining methodologiesMETHODOLOGIES FOR MINING CLASSIFICATION AND PREDICTION PATTERNSRegression modelsBayes classifiersDecision treesMulti-layer feedforward artificial neural networksSupport vector machinesSupervised clusteringMETHODOLOGIES FOR MINING CLUSTERING AND ASSOCIATION PATTERNSHierarchical clusteringPartitional clusteringSelf-organized mapProbability distribution estimationAssociation rulesBayesian networksMETHODOLOGIES FOR MINING DATA REDUCTION PATTERNSPrincipal components analysisMulti-dimensional scalingLatent variable anal

  16. Monitoring, analyzing and simulating of spatial-temporal changes of landscape pattern over mining area

    Science.gov (United States)

    Liu, Pei; Han, Ruimei; Wang, Shuangting

    2014-11-01

    According to the merits of remotely sensed data in depicting regional land cover and Land changes, multi- objective information processing is employed to remote sensing images to analyze and simulate land cover in mining areas. In this paper, multi-temporal remotely sensed data were selected to monitor the pattern, distri- bution and trend of LUCC and predict its impacts on ecological environment and human settlement in mining area. The monitor, analysis and simulation of LUCC in this coal mining areas are divided into five steps. The are information integration of optical and SAR data, LULC types extraction with SVM classifier, LULC trends simulation with CA Markov model, landscape temporal changes monitoring and analysis with confusion matrixes and landscape indices. The results demonstrate that the improved data fusion algorithm could make full use of information extracted from optical and SAR data; SVM classifier has an efficient and stable ability to obtain land cover maps, which could provide a good basis for both land cover change analysis and trend simulation; CA Markov model is able to predict LULC trends with good performance, and it is an effective way to integrate remotely sensed data with spatial-temporal model for analysis of land use / cover change and corresponding environmental impacts in mining area. Confusion matrixes are combined with landscape indices to evaluation and analysis show that, there was a sustained downward trend in agricultural land and bare land, but a continues growth trend tendency in water body, forest and other lands, and building area showing a wave like change, first increased and then decreased; mining landscape has undergone a from small to large and large to small process of fragmentation, agricultural land is the strongest influenced landscape type in this area, and human activities are the primary cause, so the problem should be pay more attentions by government and other organizations.

  17. Information mining in remote sensing imagery

    Science.gov (United States)

    Li, Jiang

    The volume of remotely sensed imagery continues to grow at an enormous rate due to the advances in sensor technology, and our capability for collecting and storing images has greatly outpaced our ability to analyze and retrieve information from the images. This motivates us to develop image information mining techniques, which is very much an interdisciplinary endeavor drawing upon expertise in image processing, databases, information retrieval, machine learning, and software design. This dissertation proposes and implements an extensive remote sensing image information mining (ReSIM) system prototype for mining useful information implicitly stored in remote sensing imagery. The system consists of three modules: image processing subsystem, database subsystem, and visualization and graphical user interface (GUI) subsystem. Land cover and land use (LCLU) information corresponding to spectral characteristics is identified by supervised classification based on support vector machines (SVM) with automatic model selection, while textural features that characterize spatial information are extracted using Gabor wavelet coefficients. Within LCLU categories, textural features are clustered using an optimized k-means clustering approach to acquire search efficient space. The clusters are stored in an object-oriented database (OODB) with associated images indexed in an image database (IDB). A k-nearest neighbor search is performed using a query-by-example (QBE) approach. Furthermore, an automatic parametric contour tracing algorithm and an O(n) time piecewise linear polygonal approximation (PLPA) algorithm are developed for shape information mining of interesting objects within the image. A fuzzy object-oriented database based on the fuzzy object-oriented data (FOOD) model is developed to handle the fuzziness and uncertainty. Three specific applications are presented: integrated land cover and texture pattern mining, shape information mining for change detection of lakes, and

  18. Data mining for the social sciences an introduction

    CERN Document Server

    Attewell, Paul

    2015-01-01

    We live in a world of big data: the amount of information collected on human behavior each day is staggering, and exponentially greater than at any time in the past. Additionally, powerful algorithms are capable of churning through seas of data to uncover patterns. Providing a simple and accessible introduction to data mining, Paul Attewell and David B. Monaghan discuss how data mining substantially differs from conventional statistical modeling familiar to most social scientists. The authors also empower social scientists to tap into these new resources and incorporate data mining

  19. Application of an empiric Bayesian data mining algorithm to reports of pancreatitis associated with atypical antipsychotics.

    Science.gov (United States)

    Hauben, Manfred

    2004-09-01

    To compare the results from one frequently cited data mining algorithm with those from a study, which was published in a peer-reviewed journal, that examined the association of pancreatitis with selected atypical antipsychotics observed by traditional rule-based methods of signal detection. Retrospective pharmacovigilance study. The widely studied data mining algorithm known as the Multi-item Gamma Poisson Shrinker (MGPS) was applied to adverse-event reports from the United States Food and Drug Administration's Adverse Event Reporting System database through the first quarter of 2003 for clozapine, olanzapine, and risperidone to determine if a significant signal of pancreatitis would have been generated by this method in advance of their review or the addition of these events to the respective product labels. Data mining was performed by using nine preferred terms relevant to drug-induced pancreatitis from the Medical Dictionary for Regulatory Activities (MedDRA). Results from a previous study on the antipsychotics were reviewed and analyzed. Physicians' Desk References (PDRs) starting from 1994 were manually reviewed to determine the first year that pancreatitis was listed as an adverse event in the product label for each antipsychotic. This information was used as a surrogate marker of the timing of initial signal detection by traditional criteria. Pancreatitis was listed as an adverse event in a PDR for all three atypical antipsychotics. Despite the presence of up to 88 reports/drug-event combination in the Food and Drug Administration's Adverse Event Reporting System database, the MGPS failed to generate a signal of disproportional reporting of pancreatitis associated with the three antipsychotics despite the signaling of these drug-event combinations by traditional rule-based methods, as reflected in product labeling and/or the literature. These discordant findings illustrate key principles in the application of data mining algorithms to drug safety

  20. Multilevel Association Rule Mining for Bridge Resource Management Based on Immune Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Yang Ou

    2014-01-01

    Full Text Available This paper is concerned with the problem of multilevel association rule mining for bridge resource management (BRM which is announced by IMO in 2010. The goal of this paper is to mine the association rules among the items of BRM and the vessel accidents. However, due to the indirect data that can be collected, which seems useless for the analysis of the relationship between items of BIM and the accidents, the cross level association rules need to be studied, which builds the relation between the indirect data and items of BRM. In this paper, firstly, a cross level coding scheme for mining the multilevel association rules is proposed. Secondly, we execute the immune genetic algorithm with the coding scheme for analyzing BRM. Thirdly, based on the basic maritime investigation reports, some important association rules of the items of BRM are mined and studied. Finally, according to the results of the analysis, we provide the suggestions for the work of seafarer training, assessment, and management.

  1. Quick fuzzy backpropagation algorithm.

    Science.gov (United States)

    Nikov, A; Stoeva, S

    2001-03-01

    A modification of the fuzzy backpropagation (FBP) algorithm called QuickFBP algorithm is proposed, where the computation of the net function is significantly quicker. It is proved that the FBP algorithm is of exponential time complexity, while the QuickFBP algorithm is of polynomial time complexity. Convergence conditions of the QuickFBP, resp. the FBP algorithm are defined and proved for: (1) single output neural networks in case of training patterns with different targets; and (2) multiple output neural networks in case of training patterns with equivalued target vector. They support the automation of the weights training process (quasi-unsupervised learning) establishing the target value(s) depending on the network's input values. In these cases the simulation results confirm the convergence of both algorithms. An example with a large-sized neural network illustrates the significantly greater training speed of the QuickFBP rather than the FBP algorithm. The adaptation of an interactive web system to users on the basis of the QuickFBP algorithm is presented. Since the QuickFBP algorithm ensures quasi-unsupervised learning, this implies its broad applicability in areas of adaptive and adaptable interactive systems, data mining, etc. applications.

  2. Educational Data Mining Application for Estimating Students Performance in Weka Environment

    Science.gov (United States)

    Gowri, G. Shiyamala; Thulasiram, Ramasamy; Amit Baburao, Mahindra

    2017-11-01

    Educational data mining (EDM) is a multi-disciplinary research area that examines artificial intelligence, statistical modeling and data mining with the data generated from an educational institution. EDM utilizes computational ways to deal with explicate educational information keeping in mind the end goal to examine educational inquiries. To make a country stand unique among the other nations of the world, the education system has to undergo a major transition by redesigning its framework. The concealed patterns and data from various information repositories can be extracted by adopting the techniques of data mining. In order to summarize the performance of students with their credentials, we scrutinize the exploitation of data mining in the field of academics. Apriori algorithmic procedure is extensively applied to the database of students for a wider classification based on various categorizes. K-means procedure is applied to the same set of databases in order to accumulate them into a specific category. Apriori algorithm deals with mining the rules in order to extract patterns that are similar along with their associations in relation to various set of records. The records can be extracted from academic information repositories. The parameters used in this study gives more importance to psychological traits than academic features. The undesirable student conduct can be clearly witnessed if we make use of information mining frameworks. Thus, the algorithms efficiently prove to profile the students in any educational environment. The ultimate objective of the study is to suspect if a student is prone to violence or not.

  3. Characteristic statistic algorithm (CSA) for in-core loading pattern optimization

    International Nuclear Information System (INIS)

    Liu Zhihong; Hu Yongming; Shi Gong

    2007-01-01

    To solve the problem of PWR in-core loading pattern optimization, a more suitable global optimization algorithm, i.e., Characteristic statistic algorithm (CSA), is used. The searching process of this algorithm and how to apply it to this problem are presented. Loading pattern optimization code SCYCLE is developed. Two different problems on real PWR models are calculated and the results are compared with other algorithms. It is shown that SCYCLE has high efficiency and good global performance on this problem. (authors)

  4. Toward a Progress Indicator for Machine Learning Model Building and Data Mining Algorithm Execution: A Position Paper

    Science.gov (United States)

    Luo, Gang

    2017-01-01

    For user-friendliness, many software systems offer progress indicators for long-duration tasks. A typical progress indicator continuously estimates the remaining task execution time as well as the portion of the task that has been finished. Building a machine learning model often takes a long time, but no existing machine learning software supplies a non-trivial progress indicator. Similarly, running a data mining algorithm often takes a long time, but no existing data mining software provides a nontrivial progress indicator. In this article, we consider the problem of offering progress indicators for machine learning model building and data mining algorithm execution. We discuss the goals and challenges intrinsic to this problem. Then we describe an initial framework for implementing such progress indicators and two advanced, potential uses of them, with the goal of inspiring future research on this topic. PMID:29177022

  5. Mining the multigroup-discrete ordinates algorithm for high quality solutions

    International Nuclear Information System (INIS)

    Ganapol, B.D.; Kornreich, D.E.

    2005-01-01

    A novel approach to the numerical solution of the neutron transport equation via the discrete ordinates (SN) method is presented. The new technique is referred to as 'mining' low order (SN) numerical solutions to obtain high order accuracy. The new numerical method, called the Multigroup Converged SN (MGCSN) algorithm, is a combination of several sequence accelerators: Romberg and Wynn-epsilon. The extreme accuracy obtained by the method is demonstrated through self consistency and comparison to the independent semi-analytical benchmark BLUE. (authors)

  6. Heuristics Miner for E-Commerce Visitor Access Pattern Representation

    Directory of Open Access Journals (Sweden)

    Kartina Diah Kesuma Wardhani

    2017-06-01

    Full Text Available E-commerce click stream data can form a certain pattern that describe visitor behavior while surfing the e-commerce website. This pattern can be used to initiate a design to determine alternative access sequence on the website. This research use heuristic miner algorithm to determine the pattern. σ-Algorithm and Genetic Mining are methods used for pattern recognition with frequent sequence item set approach. Heuristic Miner is an evolved form of those methods. σ-Algorithm assume that an activity in a website, that has been recorded in the data log, is a complete sequence from start to finish, without any tolerance to incomplete data or data with noise. On the other hand, Genetic Mining is a method that tolerate incomplete data or data with noise, so it can generate a more detailed e-commerce visitor access pattern. In this study, the same sequence of events obtained from six-generated patterns. The resulting pattern of visitor access is that visitors are often access the home page and then the product category page or the home page and then the full text search page.

  7. Genetic algorithms in loading pattern optimization

    International Nuclear Information System (INIS)

    Yilmazbayhan, A.; Tombakoglu, M.; Bekar, K. B.; Erdemli, A. Oe

    2001-01-01

    Genetic Algorithm (GA) based systems are used for the loading pattern optimization. The use of Genetic Algorithm operators such as regional crossover, crossover and mutation, and selection of initial population size for PWRs are discussed. Antithetic variates are used to generate the initial population. The performance of GA with antithetic variates is compared to traditional GA. The results of multi-cycle optimization are discussed for objective function taking into account cycle burn-up and discharge burn-up

  8. On Identifying Useful Patterns to Analyze Products in Retail Transaction Databases

    Science.gov (United States)

    Yun, Unil

    Mining correlated patterns in large transaction databases is one of the essential tasks in data mining since a huge number of patterns are usually mined, but it is hard to find patterns with the correlation. The needed data analysis should be made according to the requirements of the particular real application. In previous mining approaches, patterns with the weak affinity are found even with a high minimum support. In this paper, we suggest weighted support affinity pattern mining in which a new measure, weighted support confidence (ws-confidence) is developed to identify correlated patterns with the weighted support affinity. To efficiently prune the weak affinity patterns, we prove that the ws-confidence measure satisfies the anti-monotone and cross weighted support properties which can be applied to eliminate patterns with dissimilar weighted support levels. Based on the two properties, we develop a weighted support affinity pattern mining algorithm (WSP). The weighted support affinity patterns can be useful to answer the comparative analysis queries such as finding itemsets containing items which give similar total selling expense levels with an acceptable error range α% and detecting item lists with similar levels of total profits. In addition, our performance study shows that WSP is efficient and scalable for mining weighted support affinity patterns.

  9. Pattern Nulling of Linear Antenna Arrays Using Backtracking Search Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    Kerim Guney

    2015-01-01

    Full Text Available An evolutionary method based on backtracking search optimization algorithm (BSA is proposed for linear antenna array pattern synthesis with prescribed nulls at interference directions. Pattern nulling is obtained by controlling only the amplitude, position, and phase of the antenna array elements. BSA is an innovative metaheuristic technique based on an iterative process. Various numerical examples of linear array patterns with the prescribed single, multiple, and wide nulls are given to illustrate the performance and flexibility of BSA. The results obtained by BSA are compared with the results of the following seventeen algorithms: particle swarm optimization (PSO, genetic algorithm (GA, modified touring ant colony algorithm (MTACO, quadratic programming method (QPM, bacterial foraging algorithm (BFA, bees algorithm (BA, clonal selection algorithm (CLONALG, plant growth simulation algorithm (PGSA, tabu search algorithm (TSA, memetic algorithm (MA, nondominated sorting GA-2 (NSGA-2, multiobjective differential evolution (MODE, decomposition with differential evolution (MOEA/D-DE, comprehensive learning PSO (CLPSO, harmony search algorithm (HSA, seeker optimization algorithm (SOA, and mean variance mapping optimization (MVMO. The simulation results show that the linear antenna array synthesis using BSA provides low side-lobe levels and deep null levels.

  10. LPaMI: A Graph-Based Lifestyle Pattern Mining Application Using Personal Image Collections in Smartphones

    Directory of Open Access Journals (Sweden)

    Kifayat Ullah Khan

    2017-11-01

    Full Text Available Normally, individuals use smartphones for a variety of purposes like photography, schedule planning, playing games, and so on, apart from benefiting from the core tasks of call-making and short messaging. These services are sources of personal data generation. Therefore, any application that utilises personal data of a user from his/her smartphone is truly a great witness of his/her interests and this information can be used for various personalised services. In this paper, we present Lifestyle Pattern MIning (LPaMI, which is a personalised application for mining the lifestyle patterns of a smartphone user. LPaMI uses the personal photograph collections of a user, which reflect the day-to-day photos taken by a smartphone, to recognise scenes (called objects of interest in our work. These are then mined to discover lifestyle patterns. The uniqueness of LPaMI lies in our graph-based approach to mining the patterns of interest. Modelling of data in the form of graphs is effective in preserving the lifestyle behaviour maintained over the passage of time. Graph-modelled lifestyle data enables us to apply variety of graph mining techniques for pattern discovery. To demonstrate the effectiveness of our proposal, we have developed a prototype system for LPaMI to implement its end-to-end pipeline. We have also conducted an extensive evaluation for various phases of LPaMI using different real-world datasets. We understand that the output of LPaMI can be utilised for variety of pattern discovery application areas like trip and food recommendations, shopping, and so on.

  11. A genetic algorithm approach to recognition and data mining

    Energy Technology Data Exchange (ETDEWEB)

    Punch, W.F.; Goodman, E.D.; Min, Pei [Michigan State Univ., East Lansing, MI (United States)] [and others

    1996-12-31

    We review here our use of genetic algorithm (GA) and genetic programming (GP) techniques to perform {open_quotes}data mining,{close_quotes} the discovery of particular/important data within large datasets, by finding optimal data classifications using known examples. Our first experiments concentrated on the use of a K-nearest neighbor algorithm in combination with a GA. The GA selected weights for each feature so as to optimize knn classification based on a linear combination of features. This combined GA-knn approach was successfully applied to both generated and real-world data. We later extended this work by substituting a GP for the GA. The GP-knn could not only optimize data classification via linear combinations of features but also determine functional relationships among the features. This allowed for improved performance and new information on important relationships among features. We review the effectiveness of the overall approach on examples from biology and compare the effectiveness of the GA and GP.

  12. GRAMI: Frequent subgraph and pattern mining in a single large graph

    KAUST Repository

    Elseidy, M.

    2014-01-01

    Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or proteinprotein interactions in bioinformatics, are modeled as a single large graph. In this paper we present GRAMI, a novel framework for frequent subgraph mining in a single large graph. GRAMI undertakes a novel approach that only finds the minimal set of instances to satisfy the frequency threshold and avoids the costly enumeration of all instances required by previous approaches. We accompany our approach with a heuristic and optimizations that significantly improve performance. Additionally, we present an extension of GRAMI that mines frequent patterns. Compared to subgraphs, patterns offer a more powerful version of matching that captures transitive interactions between graph nodes (like friend of a friend) which are very common in modern applications. Finally, we present CGRAMI, a version supporting structural and semantic constraints, and AGRAMI, an approximate version producing results with no false positives. Our experiments on real data demonstrate that our framework is up to 2 orders of magnitude faster and discovers more interesting patterns than existing approaches. 2014 VLDB Endowment.

  13. Monitoring coal mine changes and their impact on landscape patterns in an alpine region: a case study of the Muli coal mine in the Qinghai-Tibet Plateau.

    Science.gov (United States)

    Qian, Dawen; Yan, Changzhen; Xing, Zanpin; Xiu, Lina

    2017-10-14

    The Muli coal mine is the largest open-cast coal mine in the Qinghai-Tibet Plateau, and it consists of two independent mining sites named Juhugeng and Jiangcang. It has received much attention due to the ecological problems caused by rapid expansion in recent years. The objective of this paper was to monitor the mining area and its surrounding land cover over the period 1976-2016 utilizing Landsat images, and the network structure of land cover changes was determined to visualize the relationships and pattern of the mining-induced land cover changes. In addition, the responses of the surrounding landscape pattern were analysed by constructing gradient transects. The results show that the mining area was increasing in size, especially after 2000 (increased by 71.68 km 2 ), and this caused shrinkage of the surrounding lands, including alpine meadow wetland (53.44 km 2 ), alpine meadow (6.28 km 2 ) and water (6.24 km 2 ). The network structure of the mining area revealed the changes in lands surrounding the mining area. The impact of mining development on landscape patterns was mainly distributed within a range of 1-6 km. Alpine meadow wetland was most affected in Juhugeng, while alpine meadow was most affected in Jiangcang. The results of this study provide a reference for the ecological assessment and restoration of the Muli coal mine land.

  14. Atlas Career Path Guidebook: Patterns and Common Practices in Systems Engineers’ Development

    Science.gov (United States)

    2018-01-16

    text mining principles to be used by systems...statistical and text mining principles facilitate the identification of patterns. Figure 1. Helix methodology for career path analysis In order to...illustrate how text mining algorithms might be used to identify similarities in position titles for systems engineers. In broad

  15. A new taxonomy of sublinear keyword pattern matching algorithms

    NARCIS (Netherlands)

    Cleophas, L.G.W.A.; Watson, B.W.; Zwaan, G.

    2004-01-01

    Abstract This paper presents a new taxonomy of sublinear (multiple) keyword pattern matching algorithms. Based on an earlier taxonomy by Watson and Zwaan [WZ96, WZ95], this new taxonomy includes not only suffix-based algorithms related to the Boyer-Moore, Commentz-Walter and Fan-Su algorithms, but

  16. Mining Temporal Patterns to Improve Agents Behavior: Two Case Studies

    Science.gov (United States)

    Fournier-Viger, Philippe; Nkambou, Roger; Faghihi, Usef; Nguifo, Engelbert Mephu

    We propose two mechanisms for agent learning based on the idea of mining temporal patterns from agent behavior. The first one consists of extracting temporal patterns from the perceived behavior of other agents accomplishing a task, to learn the task. The second learning mechanism consists in extracting temporal patterns from an agent's own behavior. In this case, the agent then reuses patterns that brought self-satisfaction. In both cases, no assumption is made on how the observed agents' behavior is internally generated. A case study with a real application is presented to illustrate each learning mechanism.

  17. Incremental temporal pattern mining using efficient batch-free stream clustering

    NARCIS (Netherlands)

    Lu, Y.; Hassani, M.; Seidl, T.

    2017-01-01

    This paper address the problem of temporal pattern mining from multiple data streams containing temporal events. Temporal events are considered as real world events aligned with comprehensive starting and ending timing information rather than simple integer timestamps. Predefined relations, such as

  18. Data Mining and Pattern Recognition Models for Identifying Inherited Diseases: Challenges and Implications.

    Science.gov (United States)

    Iddamalgoda, Lahiru; Das, Partha S; Aponso, Achala; Sundararajan, Vijayaraghava S; Suravajhala, Prashanth; Valadi, Jayaraman K

    2016-01-01

    Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification- and scoring-based prioritization methods in determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI) methods in conjunction with the K nearest neighbors' could be used in accurately categorizing the genetic factors in disease causation.

  19. Exploring the potential of data mining techniques for the analysis of accident patterns

    DEFF Research Database (Denmark)

    Prato, Carlo Giacomo; Bekhor, Shlomo; Galtzur, Ayelet

    2010-01-01

    Research in road safety faces major challenges: individuation of the most significant determinants of traffic accidents, recognition of the most recurrent accident patterns, and allocation of resources necessary to address the most relevant issues. This paper intends to comprehend which data mining...... and association rules) data mining techniques are implemented for the analysis of traffic accidents occurred in Israel between 2001 and 2004. Results show that descriptive techniques are useful to classify the large amount of analyzed accidents, even though introduce problems with respect to the clear...... importance of input and intermediate neurons, and the relative importance of hundreds of association rules. Further research should investigate whether limiting the analysis to fatal accidents would simplify the task of data mining techniques in recognizing accident patterns without the “noise” probably...

  20. Reduct Driven Pattern Extraction from Clusters

    Directory of Open Access Journals (Sweden)

    Shuchita Upadhyaya

    2009-03-01

    Full Text Available Clustering algorithms give general description of clusters, listing number of clusters and member entities in those clusters. However, these algorithms lack in generating cluster description in the form of pattern. From data mining perspective, pattern learning from clusters is as important as cluster finding. In the proposed approach, reduct derived from rough set theory is employed for pattern formulation. Further, reduct are the set of attributes which distinguishes the entities in a homogenous cluster, hence these can be clear cut removed from the same. Remaining attributes are then ranked for their contribution in the cluster. Pattern is formulated with the conjunction of most contributing attributes such that pattern distinctively describes the cluster with minimum error.

  1. Calibration of Mine Ventilation Network Models Using the Non-Linear Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    Guang Xu

    2017-12-01

    Full Text Available Effective ventilation planning is vital to underground mining. To ensure stable operation of the ventilation system and to avoid airflow disorder, mine ventilation network (MVN models have been widely used in simulating and optimizing the mine ventilation system. However, one of the challenges for MVN model simulation is that the simulated airflow distribution results do not match the measured data. To solve this problem, a simple and effective calibration method is proposed based on the non-linear optimization algorithm. The calibrated model not only makes simulated airflow distribution results in accordance with the on-site measured data, but also controls the errors of other parameters within a minimum range. The proposed method was then applied to calibrate an MVN model in a real case, which is built based on ventilation survey results and Ventsim software. Finally, airflow simulation experiments are carried out respectively using data before and after calibration, whose results were compared and analyzed. This showed that the simulated airflows in the calibrated model agreed much better to the ventilation survey data, which verifies the effectiveness of calibrating method.

  2. Walking pattern classification and walking distance estimation algorithms using gait phase information.

    Science.gov (United States)

    Wang, Jeen-Shing; Lin, Che-Wei; Yang, Ya-Ting C; Ho, Yu-Jen

    2012-10-01

    This paper presents a walking pattern classification and a walking distance estimation algorithm using gait phase information. A gait phase information retrieval algorithm was developed to analyze the duration of the phases in a gait cycle (i.e., stance, push-off, swing, and heel-strike phases). Based on the gait phase information, a decision tree based on the relations between gait phases was constructed for classifying three different walking patterns (level walking, walking upstairs, and walking downstairs). Gait phase information was also used for developing a walking distance estimation algorithm. The walking distance estimation algorithm consists of the processes of step count and step length estimation. The proposed walking pattern classification and walking distance estimation algorithm have been validated by a series of experiments. The accuracy of the proposed walking pattern classification was 98.87%, 95.45%, and 95.00% for level walking, walking upstairs, and walking downstairs, respectively. The accuracy of the proposed walking distance estimation algorithm was 96.42% over a walking distance.

  3. Urinary metabolic profiling of asymptomatic acute intermittent porphyria using a rule-mining-based algorithm.

    Science.gov (United States)

    Luck, Margaux; Schmitt, Caroline; Talbi, Neila; Gouya, Laurent; Caradeuc, Cédric; Puy, Hervé; Bertho, Gildas; Pallet, Nicolas

    2018-01-01

    Metabolomic profiling combines Nuclear Magnetic Resonance spectroscopy with supervised statistical analysis that might allow to better understanding the mechanisms of a disease. In this study, the urinary metabolic profiling of individuals with porphyrias was performed to predict different types of disease, and to propose new pathophysiological hypotheses. Urine 1 H-NMR spectra of 73 patients with asymptomatic acute intermittent porphyria (aAIP) and familial or sporadic porphyria cutanea tarda (f/sPCT) were compared using a supervised rule-mining algorithm. NMR spectrum buckets bins, corresponding to rules, were extracted and a logistic regression was trained. Our rule-mining algorithm generated results were consistent with those obtained using partial least square discriminant analysis (PLS-DA) and the predictive performance of the model was significant. Buckets that were identified by the algorithm corresponded to metabolites involved in glycolysis and energy-conversion pathways, notably acetate, citrate, and pyruvate, which were found in higher concentrations in the urines of aAIP compared with PCT patients. Metabolic profiling did not discriminate sPCT from fPCT patients. These results suggest that metabolic reprogramming occurs in aAIP individuals, even in the absence of overt symptoms, and supports the relationship that occur between heme synthesis and mitochondrial energetic metabolism.

  4. Data mining and Pattern Recognizing Models for Identifying Inherited Diseases: Challenges and Implications

    Directory of Open Access Journals (Sweden)

    Lahiru Iddamalgoda

    2016-08-01

    Full Text Available Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately determining the responsible genetic factors for prioritizing the single nucleotide polymorphisms (SNP associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification and scoring based prioritization methods for determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI methods in conjunction with the K nearest neighbors’ could be used in accurately categorizing the genetic factors in disease causation

  5. A study of the Bienstock-Zuckerberg algorithm, Applications in Mining and Resource Constrained Project Scheduling

    OpenAIRE

    Muñoz, Gonzalo; Espinoza, Daniel; Goycoolea, Marcos; Moreno, Eduardo; Queyranne, Maurice; Rivera, Orlando

    2016-01-01

    We study a Lagrangian decomposition algorithm recently proposed by Dan Bienstock and Mark Zuckerberg for solving the LP relaxation of a class of open pit mine project scheduling problems. In this study we show that the Bienstock-Zuckerberg (BZ) algorithm can be used to solve LP relaxations corresponding to a much broader class of scheduling problems, including the well-known Resource Constrained Project Scheduling Problem (RCPSP), and multi-modal variants of the RCPSP that consider batch proc...

  6. Application of Data Mining Algorithm to Recipient of Motorcycle Installment

    Directory of Open Access Journals (Sweden)

    Harry Dhika

    2015-12-01

    Full Text Available The study was conducted in the subsidiaries that provide services of finance related to the purchase of a motorcycle on credit. At the time of applying, consumers enter their personal data. Based on the personal data, it will be known whether the consumer credit data is approved or rejected. From 224 consumer data obtained, it is known that the number of consumers whose applications are approved is 87% or about 217 consumers and consumers whose application is rejected is 16% or as much as 6 consumers. Acceptance of motorcycle financing on credit by using the method of applying the algorithm through CRIS-P DM is the industry standard in the processing of data mining. The algorithm used in the decision making is the algorithm C4.5. The results obtained previously, the level of accuracy is measured with the Confusion Matrix and Receiver Operating characteristic (ROC. Evaluation of the Confusion Matrix is intended to seek the value of accuracy, precision value, and the value of recall data. While the Receiver Operating Characteristic (ROC is used to find data tables and comparison Area Under Curve (AUC.

  7. Data mining in soft computing framework: a survey.

    Science.gov (United States)

    Mitra, S; Pal, S K; Mitra, P

    2002-01-01

    The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included.

  8. Mining dynamic noteworthy functions in software execution sequences.

    Science.gov (United States)

    Zhang, Bing; Huang, Guoyan; Wang, Yuqian; He, Haitao; Ren, Jiadong

    2017-01-01

    As the quality of crucial entities can directly affect that of software, their identification and protection become an important premise for effective software development, management, maintenance and testing, which thus contribute to improving the software quality and its attack-defending ability. Most analysis and evaluation on important entities like codes-based static structure analysis are on the destruction of the actual software running. In this paper, from the perspective of software execution process, we proposed an approach to mine dynamic noteworthy functions (DNFM)in software execution sequences. First, according to software decompiling and tracking stack changes, the execution traces composed of a series of function addresses were acquired. Then these traces were modeled as execution sequences and then simplified so as to get simplified sequences (SFS), followed by the extraction of patterns through pattern extraction (PE) algorithm from SFS. After that, evaluating indicators inner-importance and inter-importance were designed to measure the noteworthiness of functions in DNFM algorithm. Finally, these functions were sorted by their noteworthiness. Comparison and contrast were conducted on the experiment results from two traditional complex network-based node mining methods, namely PageRank and DegreeRank. The results show that the DNFM method can mine noteworthy functions in software effectively and precisely.

  9. Application of the pessimistic pruning to increase the accuracy of C4.5 algorithm in diagnosing chronic kidney disease

    Science.gov (United States)

    Muslim, M. A.; Herowati, A. J.; Sugiharti, E.; Prasetiyo, B.

    2018-03-01

    A technique to dig valuable information buried or hidden in data collection which is so big to be found an interesting patterns that was previously unknown is called data mining. Data mining has been applied in the healthcare industry. One technique used data mining is classification. The decision tree included in the classification of data mining and algorithm developed by decision tree is C4.5 algorithm. A classifier is designed using applying pessimistic pruning in C4.5 algorithm in diagnosing chronic kidney disease. Pessimistic pruning use to identify and remove branches that are not needed, this is done to avoid overfitting the decision tree generated by the C4.5 algorithm. In this paper, the result obtained using these classifiers are presented and discussed. Using pessimistic pruning shows increase accuracy of C4.5 algorithm of 1.5% from 95% to 96.5% in diagnosing of chronic kidney disease.

  10. Data mining methods

    CERN Document Server

    Chattamvelli, Rajan

    2015-01-01

    DATA MINING METHODS, Second Edition discusses both theoretical foundation and practical applications of datamining in a web field including banking, e-commerce, medicine, engineering and management. This book starts byintroducing data and information, basic data type, data category and applications of data mining. The second chapterbriefly reviews data visualization technology and importance in data mining. Fundamentals of probability and statisticsare discussed in chapter 3, and novel algorithm for sample covariants are derived. The next two chapters give an indepthand useful discussion of data warehousing and OLAP. Decision trees are clearly explained and a new tabularmethod for decision tree building is discussed. The chapter on association rules discusses popular algorithms andcompares various algorithms in summary table form. An interesting application of genetic algorithm is introduced inthe next chapter. Foundations of neural networks are built from scratch and the back propagation algorithm is derived...

  11. Filter Pattern Search Algorithms for Mixed Variable Constrained Optimization Problems

    National Research Council Canada - National Science Library

    Abramson, Mark A; Audet, Charles; Dennis, Jr, J. E

    2004-01-01

    .... This class combines and extends the Audet-Dennis Generalized Pattern Search (GPS) algorithms for bound constrained mixed variable optimization, and their GPS-filter algorithms for general nonlinear constraints...

  12. Fast Adapting Ensemble: A New Algorithm for Mining Data Streams with Concept Drift

    Science.gov (United States)

    Ortíz Díaz, Agustín; Ramos-Jiménez, Gonzalo; Frías Blanco, Isvani; Caballero Mota, Yailé; Morales-Bueno, Rafael

    2015-01-01

    The treatment of large data streams in the presence of concept drifts is one of the main challenges in the field of data mining, particularly when the algorithms have to deal with concepts that disappear and then reappear. This paper presents a new algorithm, called Fast Adapting Ensemble (FAE), which adapts very quickly to both abrupt and gradual concept drifts, and has been specifically designed to deal with recurring concepts. FAE processes the learning examples in blocks of the same size, but it does not have to wait for the batch to be complete in order to adapt its base classification mechanism. FAE incorporates a drift detector to improve the handling of abrupt concept drifts and stores a set of inactive classifiers that represent old concepts, which are activated very quickly when these concepts reappear. We compare our new algorithm with various well-known learning algorithms, taking into account, common benchmark datasets. The experiments show promising results from the proposed algorithm (regarding accuracy and runtime), handling different types of concept drifts. PMID:25879051

  13. An overview of data mining algorithms in drug induced toxicity prediction.

    Science.gov (United States)

    Omer, Ankur; Singh, Poonam; Yadav, N K; Singh, R K

    2014-04-01

    The growth in chemical diversity has increased the need to adjudicate the toxicity of different chemical compounds raising the burden on the demand of animal testing. The toxicity evaluation requires time consuming and expensive undertaking, leading to the deprivation of the methods employed for screening chemicals pointing towards the need to develop more efficient toxicity assessment systems. Computational approaches have reduced the time as well as the cost for evaluating the toxicity and kinetic behavior of any chemical. The accessibility of a large amount of data and the intense need of turning this data into useful information have attracted the attention towards data mining. Machine Learning, one of the powerful data mining techniques has evolved as the most effective and potent tool for exploring new insights on combinatorial relationships among various experimental data generated. The article accounts on some sophisticated machine learning algorithms like Artificial Neural Networks (ANN), Support Vector Machine (SVM), k-mean clustering and Self Organizing Maps (SOM) with some of the available tools used for classification, sorting and toxicological evaluation of data, clarifying, how data mining and machine learning interact cooperatively to facilitate knowledge discovery. Addressing the association of some commonly used expert systems, we briefly outline some real world applications to consider the crucial role of data set partitioning.

  14. Assessment of the information content of patterns: an algorithm

    Science.gov (United States)

    Daemi, M. Farhang; Beurle, R. L.

    1991-12-01

    A preliminary investigation confirmed the possibility of assessing the translational and rotational information content of simple artificial images. The calculation is tedious, and for more realistic patterns it is essential to implement the method on a computer. This paper describes an algorithm developed for this purpose which confirms the results of the preliminary investigation. Use of the algorithm facilitates much more comprehensive analysis of the combined effect of continuous rotation and fine translation, and paves the way for analysis of more realistic patterns. Owing to the volume of calculation involved in these algorithms, extensive computing facilities were necessary. The major part of the work was carried out using an ICL 3900 series mainframe computer as well as other powerful workstations such as a RISC architecture MIPS machine.

  15. An Autonomous Star Identification Algorithm Based on One-Dimensional Vector Pattern for Star Sensors.

    Science.gov (United States)

    Luo, Liyan; Xu, Luping; Zhang, Hua

    2015-07-07

    In order to enhance the robustness and accelerate the recognition speed of star identification, an autonomous star identification algorithm for star sensors is proposed based on the one-dimensional vector pattern (one_DVP). In the proposed algorithm, the space geometry information of the observed stars is used to form the one-dimensional vector pattern of the observed star. The one-dimensional vector pattern of the same observed star remains unchanged when the stellar image rotates, so the problem of star identification is simplified as the comparison of the two feature vectors. The one-dimensional vector pattern is adopted to build the feature vector of the star pattern, which makes it possible to identify the observed stars robustly. The characteristics of the feature vector and the proposed search strategy for the matching pattern make it possible to achieve the recognition result as quickly as possible. The simulation results demonstrate that the proposed algorithm can effectively accelerate the star identification. Moreover, the recognition accuracy and robustness by the proposed algorithm are better than those by the pyramid algorithm, the modified grid algorithm, and the LPT algorithm. The theoretical analysis and experimental results show that the proposed algorithm outperforms the other three star identification algorithms.

  16. Mining for Social Media: Usage Patterns of Small Businesses

    Directory of Open Access Journals (Sweden)

    Balan Shilpa

    2017-03-01

    Full Text Available Background: Information can now be rapidly exchanged due to social media. Due to its openness, Twitter has generated massive amounts of data. In this paper, we apply data mining and analytics to extract the usage patterns of social media by small businesses. Objectives: The aim of this paper is to describe with an example how data mining can be applied to social media. This paper further examines the impact of social media on small businesses. The Twitter posts related to small businesses are analyzed in detail. Methods/Approach: The patterns of social media usage by small businesses are observed using IBM Watson Analytics. In this paper, we particularly analyze tweets on Twitter for the hashtag #smallbusiness. Results: It is found that the number of females posting topics related to small business on Twitter is greater than the number of males. It is also found that the number of negative posts in Twitter is relatively low. Conclusions: Small firms are beginning to understand the importance of social media to realize their business goals. For future research, further analysis can be performed on the date and time the tweets were posted.

  17. GraMi: Generalized Frequent Pattern Mining in a Single Large Graph

    KAUST Repository

    Saeedy, Mohammed El

    2011-11-01

    Mining frequent subgraphs is an important operation on graphs. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs or protein-protein interaction in bioinformatics, are modeled as a single large graph. Interesting interactions in such applications may be transitive (e.g., friend of a friend). Existing methods, however, search for frequent isomorphic (i.e., exact match) subgraphs and cannot discover many useful patterns. In this paper the authors propose GRAMI, a framework that generalizes frequent subgraph mining in a large single graph. GRAMI discovers frequent patterns. A pattern is a graph where edges are generalized to distance-constrained paths. Depending on the definition of the distance function, many instantiations of the framework are possible. Both directed and undirected graphs, as well as multiple labels per vertex, are supported. The authors developed an efficient implementation of the framework that models the frequency resolution phase as a constraint satisfaction problem, in order to avoid the costly enumeration of all instances of each pattern in the graph. The authors also implemented CGRAMI, a version that supports structural and semantic constraints; and AGRAMI, an approximate version that supports very large graphs. The experiments on real data demonstrate that the authors framework is up to 3 orders of magnitude faster and discovers more interesting patterns than existing approaches.

  18. A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences

    Directory of Open Access Journals (Sweden)

    Yun Xue

    2015-01-01

    Full Text Available Order-preserving submatrices (OPSMs have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method.

  19. A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences.

    Science.gov (United States)

    Xue, Yun; Liao, Zhengling; Li, Meihang; Luo, Jie; Kuang, Qiuhua; Hu, Xiaohui; Li, Tiechen

    2015-01-01

    Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS) between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method.

  20. Design Pattern Mining Using Distributed Learning Automata and DNA Sequence Alignment

    Science.gov (United States)

    Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina

    2014-01-01

    Context Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. Objective This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. Method The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. Results The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. Conclusion The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns. PMID:25243670

  1. Design pattern mining using distributed learning automata and DNA sequence alignment.

    Directory of Open Access Journals (Sweden)

    Mansour Esmaeilpour

    Full Text Available CONTEXT: Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. OBJECTIVE: This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA and deoxyribonucleic acid (DNA sequences alignment. METHOD: The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. RESULTS: The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. CONCLUSION: The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns.

  2. Design pattern mining using distributed learning automata and DNA sequence alignment.

    Science.gov (United States)

    Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina

    2014-01-01

    Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns.

  3. Fuzzy C-Means Clustering Model Data Mining For Recognizing Stock Data Sampling Pattern

    Directory of Open Access Journals (Sweden)

    Sylvia Jane Annatje Sumarauw

    2007-06-01

    Full Text Available Abstract Capital market has been beneficial to companies and investor. For investors, the capital market provides two economical advantages, namely deviden and capital gain, and a non-economical one that is a voting .} hare in Shareholders General Meeting. But, it can also penalize the share owners. In order to prevent them from the risk, the investors should predict the prospect of their companies. As a consequence of having an abstract commodity, the share quality will be determined by the validity of their company profile information. Any information of stock value fluctuation from Jakarta Stock Exchange can be a useful consideration and a good measurement for data analysis. In the context of preventing the shareholders from the risk, this research focuses on stock data sample category or stock data sample pattern by using Fuzzy c-Me, MS Clustering Model which providing any useful information jar the investors. lite research analyses stock data such as Individual Index, Volume and Amount on Property and Real Estate Emitter Group at Jakarta Stock Exchange from January 1 till December 31 of 204. 'he mining process follows Cross Industry Standard Process model for Data Mining (CRISP,. DM in the form of circle with these steps: Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation and Deployment. At this modelling process, the Fuzzy c-Means Clustering Model will be applied. Data Mining Fuzzy c-Means Clustering Model can analyze stock data in a big database with many complex variables especially for finding the data sample pattern, and then building Fuzzy Inference System for stimulating inputs to be outputs that based on Fuzzy Logic by recognising the pattern. Keywords: Data Mining, AUz..:y c-Means Clustering Model, Pattern Recognition

  4. SOMA: A Proposed Framework for Trend Mining in Large UK Diabetic Retinopathy Temporal Databases

    Science.gov (United States)

    Somaraki, Vassiliki; Harding, Simon; Broadbent, Deborah; Coenen, Frans

    In this paper, we present SOMA, a new trend mining framework; and Aretaeus, the associated trend mining algorithm. The proposed framework is able to detect different kinds of trends within longitudinal datasets. The prototype trends are defined mathematically so that they can be mapped onto the temporal patterns. Trends are defined and generated in terms of the frequency of occurrence of pattern changes over time. To evaluate the proposed framework the process was applied to a large collection of medical records, forming part of the diabetic retinopathy screening programme at the Royal Liverpool University Hospital.

  5. Application and Exploration of Big Data Mining in Clinical Medicine.

    Science.gov (United States)

    Zhang, Yue; Guo, Shu-Li; Han, Li-Na; Li, Tie-Ling

    2016-03-20

    To review theories and technologies of big data mining and their application in clinical medicine. Literatures published in English or Chinese regarding theories and technologies of big data mining and the concrete applications of data mining technology in clinical medicine were obtained from PubMed and Chinese Hospital Knowledge Database from 1975 to 2015. Original articles regarding big data mining theory/technology and big data mining's application in the medical field were selected. This review characterized the basic theories and technologies of big data mining including fuzzy theory, rough set theory, cloud theory, Dempster-Shafer theory, artificial neural network, genetic algorithm, inductive learning theory, Bayesian network, decision tree, pattern recognition, high-performance computing, and statistical analysis. The application of big data mining in clinical medicine was analyzed in the fields of disease risk assessment, clinical decision support, prediction of disease development, guidance of rational use of drugs, medical management, and evidence-based medicine. Big data mining has the potential to play an important role in clinical medicine.

  6. Step-by-Step Model for the Study of the Apriori Algorithm for Predictive Analysis

    Directory of Open Access Journals (Sweden)

    Daniel Grigore ROŞCA

    2015-06-01

    Full Text Available The goal of this paper was to develop an educational oriented application based on the Data Mining Apriori Algorithm which facilitates both the research and the study of data mining by graduate students. The application could be used to discover interesting patterns in the corpus of data and to measure the impact on the speed of execution as a function of problem constraints (value of support and confidence variables or size of the transactional data-base. The paper presents a brief overview of the Apriori Algorithm, aspects about the implementation of the algorithm using a step-by-step process, a discussion of the education-oriented user interface and the process of data mining of a test transactional data base. The impact of some constraints on the speed of the algorithm is also experimentally measured without a systematic review of different approaches to increase execution speed. Possible applications of the implementation, as well as its limits, are briefly reviewed.

  7. Algorithmic acquisition of diagnostic patterns in district heating billing system

    International Nuclear Information System (INIS)

    Kiluk, Sebastian

    2012-01-01

    An application of algorithmic exploration of billing data is examined for fault detection, diagnosis (FDD) based on evaluation of present state and detection of unexpected changes in energy efficiency of buildings. Large data sets from district heating (DH) billing systems are used for construction of feature space, diagnostic rules and classification of the buildings according to their energy efficiency properties. The algorithmic approach automates discovering knowledge about common, thus accepted changes in buildings’ properties, in equipment and in habitants’ behavior reflecting progress in technology and life style. In this article implementation of Data Mining and Knowledge Discovery (DMKD) method in supervision system with exemplary results based on real data is presented. Crucial steps of data processing influencing diagnostic results are described in details.

  8. Sequential Pattern Mining of Electronic Healthcare Reimbursement Claims: Experiences and Challenges in Uncovering How Patients are Treated by Physicians

    Energy Technology Data Exchange (ETDEWEB)

    Pullum, Laura L [ORNL; Ramanathan, Arvind [ORNL; Hobson, Tanner C [ORNL

    2015-01-01

    We examine the use of electronic healthcare reimbursement claims (EHRC) for analyzing healthcare delivery and practice patterns across the United States (US). We show that EHRCs are correlated with disease incidence estimates published by the Centers for Disease Control. Further, by analyzing over 1 billion EHRCs, we track patterns of clinical procedures administered to patients with autism spectrum disorder (ASD), heart disease (HD) and breast cancer (BC) using sequential pattern mining algorithms. Our analyses reveal that in contrast to treating HD and BC, clinical procedures for ASD diagnoses are highly varied leading up to and after the ASD diagnoses. The discovered clinical procedure sequences also reveal significant differences in the overall costs incurred across different parts of the US, indicating a lack of consensus amongst practitioners in treating ASD patients. We show that a data-driven approach to understand clinical trajectories using EHRC can provide quantitative insights into how to better manage and treat patients. Based on our experience, we also discuss emerging challenges in using EHRC datasets for gaining insights into the state of contemporary healthcare delivery and practice in the US.

  9. Data warehousing and data mining: A case study

    Directory of Open Access Journals (Sweden)

    Suknović Milija

    2005-01-01

    Full Text Available This paper shows design and implementation of data warehouse as well as the use of data mining algorithms for the purpose of knowledge discovery as the basic resource of adequate business decision making process. The project is realized for the needs of Student's Service Department of the Faculty of Organizational Sciences (FOS, University of Belgrade, Serbia and Montenegro. This system represents a good base for analysis and predictions in the following time period for the purpose of quality business decision-making by top management. Thus, the first part of the paper shows the steps in designing and development of data warehouse of the mentioned business system. The second part of the paper shows the implementation of data mining algorithms for the purpose of deducting rules, patterns and knowledge as a resource for support in the process of decision making.

  10. A Partial Join Approach for Mining Co-Location Patterns: A Summary of Results

    National Research Council Canada - National Science Library

    Yoo, Jin S; Shekhar, Shashi

    2005-01-01

    .... They propose a novel partial-join approach for mining co-location patterns efficiently. It transactionizes continuous spatial data while keeping track of the spatial information not modeled by transactions...

  11. Pattern recognition and data mining software based on artificial neural networks applied to proton transfer in aqueous environments

    International Nuclear Information System (INIS)

    Tahat Amani; Marti Jordi; Khwaldeh Ali; Tahat Kaher

    2014-01-01

    In computational physics proton transfer phenomena could be viewed as pattern classification problems based on a set of input features allowing classification of the proton motion into two categories: transfer ‘occurred’ and transfer ‘not occurred’. The goal of this paper is to evaluate the use of artificial neural networks in the classification of proton transfer events, based on the feed-forward back propagation neural network, used as a classifier to distinguish between the two transfer cases. In this paper, we use a new developed data mining and pattern recognition tool for automating, controlling, and drawing charts of the output data of an Empirical Valence Bond existing code. The study analyzes the need for pattern recognition in aqueous proton transfer processes and how the learning approach in error back propagation (multilayer perceptron algorithms) could be satisfactorily employed in the present case. We present a tool for pattern recognition and validate the code including a real physical case study. The results of applying the artificial neural networks methodology to crowd patterns based upon selected physical properties (e.g., temperature, density) show the abilities of the network to learn proton transfer patterns corresponding to properties of the aqueous environments, which is in turn proved to be fully compatible with previous proton transfer studies. (condensed matter: structural, mechanical, and thermal properties)

  12. Two related algorithms for root-to-frontier tree pattern matching

    NARCIS (Netherlands)

    Cleophas, L.G.W.A.; Hemerik, C.; Zwaan, G.

    2006-01-01

    Tree pattern matching (TPM) algorithms on ordered, ranked trees play an important role in applications such as compilers and term rewriting systems. Many TPM algorithms appearing in the literature are based on tree automata. For efficiency, these automata should be deterministic, yet deterministic

  13. Mining Branching Rules from Past Survey Data with an Illustration Using a Geriatric Assessment Survey for Older Adults with Cancer

    Directory of Open Access Journals (Sweden)

    Daniel R. Jeske

    2016-05-01

    Full Text Available We construct a fast data mining algorithm that can be used to identify high-frequency response patterns in historical surveys. Identification of these patterns leads to the derivation of question branching rules that shorten the time required to complete a survey. The data mining algorithm allows the user to control the error rate that is incurred through the use of implied answers that go along with each branching rule. The context considered is binary response questions, which can be obtained from multi-level response questions through dichotomization. The algorithm is illustrated by the analysis of four sections of a geriatric assessment survey used by oncologists. Reductions in the number of questions that need to be asked in these four sections range from 33% to 54%.

  14. Efficient Mining and Detection of Sequential Intrusion Patterns for Network Intrusion Detection Systems

    Science.gov (United States)

    Shyu, Mei-Ling; Huang, Zifang; Luo, Hongli

    In recent years, pervasive computing infrastructures have greatly improved the interaction between human and system. As we put more reliance on these computing infrastructures, we also face threats of network intrusion and/or any new forms of undesirable IT-based activities. Hence, network security has become an extremely important issue, which is closely connected with homeland security, business transactions, and people's daily life. Accurate and efficient intrusion detection technologies are required to safeguard the network systems and the critical information transmitted in the network systems. In this chapter, a novel network intrusion detection framework for mining and detecting sequential intrusion patterns is proposed. The proposed framework consists of a Collateral Representative Subspace Projection Modeling (C-RSPM) component for supervised classification, and an inter-transactional association rule mining method based on Layer Divided Modeling (LDM) for temporal pattern analysis. Experiments on the KDD99 data set and the traffic data set generated by a private LAN testbed show promising results with high detection rates, low processing time, and low false alarm rates in mining and detecting sequential intrusion detections.

  15. Mining Experiential Patterns from Game-Logs of Board Game

    Directory of Open Access Journals (Sweden)

    Liang Wang

    2015-01-01

    Full Text Available In board games, game-logs record past game processes, which can be regarded as an accumulation of experience. Similar to a real person, a computer player can gradually increase its skill by learning from game-logs. Therefore, the game becomes more interesting. This paper proposes an extensible approach to mine experiential patterns from increasing game-logs. The computer player improves its strategies by utilizing these growing patterns, just as it acquires experience. To evaluate the effect and performance of the approach, we designed a sample board game as a test platform and elaborated an experiment consisting of a series of tests. Experimental results show that our approach is effective and efficient.

  16. Synthesis of Steered Flat-top Beam Pattern Using Evolutionary Algorithm

    Directory of Open Access Journals (Sweden)

    D. Mandal

    2016-12-01

    Full Text Available In this paper a pattern synthesis method based on Evolutionary Algorithm is presented. A Flat-top beam pattern has been generated from a concentric ring array of isotropic elements by finding out the optimum set of elements amplitudes and phases using Differential Evolution algorithm. The said pattern is generated in three predefined azimuth planes instate of a single phi plane and also verified for a range of azimuth plane for the same optimum excitations. The main beam is steered to an elevation angle of 30 degree with lower peak SLL and ripple. Dynamic range ratio (DRR is also being improved by eliminating the weakly excited array elements, which simplify the design complexity of feed networks.

  17. Data mining scenarios for the discovery of subtypes and the comparison of algorithms

    NARCIS (Netherlands)

    Colas, Fabrice Pierre Robert

    2009-01-01

    A data mining scenario is a logical sequence of steps to infer patterns from data. In this thesis, we present two scenarios. Our first scenario aims to identify homogeneous subtypes in data. It was applied to clinical research on Osteoarthritis (OA) and Parkinson’s disease (PD) and in drug

  18. Output-Sensitive Pattern Extraction in Sequences

    DEFF Research Database (Denmark)

    Grossi, Roberto; Menconi, Giulia; Pisanti, Nadia

    2014-01-01

    Genomic Analysis, Plagiarism Detection, Data Mining, Intrusion Detection, Spam Fighting and Time Series Analysis are just some examples of applications where extraction of recurring patterns in sequences of objects is one of the main computational challenges. Several notions of patterns exist...... or extend them causes a loss of significant information (where the number of occurrences changes). Output-sensitive algorithms have been proposed to enumerate and list these patterns, taking polynomial time O(nc) per pattern for constant c > 1, which is impractical for massive sequences of very large length...

  19. Mining compressing sequential problems

    NARCIS (Netherlands)

    Hoang, T.L.; Mörchen, F.; Fradkin, D.; Calders, T.G.K.

    2012-01-01

    Compression based pattern mining has been successfully applied to many data mining tasks. We propose an approach based on the minimum description length principle to extract sequential patterns that compress a database of sequences well. We show that mining compressing patterns is NP-Hard and

  20. An AUTONOMOUS STAR IDENTIFICATION ALGORITHM BASED ON THE DIRECTED CIRCULARITY PATTERN

    Directory of Open Access Journals (Sweden)

    J. Xie

    2012-07-01

    Full Text Available The accuracy of the angular distance may decrease due to lots of factors, such as the parameters of the stellar camera aren't calibrated on-orbit, or the location accuracy of the star image points is low, and so on, which can cause the low success rates of star identification. A robust directed circularity pattern algorithm is proposed in this paper, which is developed on basis of the matching probability algorithm. The improved algorithm retains the matching probability strategy to identify master star, and constructs a directed circularity pattern with the adjacent stars for unitary matching. The candidate matching group which has the longest chain will be selected as the final result. Simulation experiments indicate that the improved algorithm has high successful identification and reliability etc, compared with the original algorithm. The experiments with real data are used to verify it.

  1. Length-Bounded Hybrid CPU/GPU Pattern Matching Algorithm for Deep Packet Inspection

    Directory of Open Access Journals (Sweden)

    Yi-Shan Lin

    2017-01-01

    Full Text Available Since frequent communication between applications takes place in high speed networks, deep packet inspection (DPI plays an important role in the network application awareness. The signature-based network intrusion detection system (NIDS contains a DPI technique that examines the incoming packet payloads by employing a pattern matching algorithm that dominates the overall inspection performance. Existing studies focused on implementing efficient pattern matching algorithms by parallel programming on software platforms because of the advantages of lower cost and higher scalability. Either the central processing unit (CPU or the graphic processing unit (GPU were involved. Our studies focused on designing a pattern matching algorithm based on the cooperation between both CPU and GPU. In this paper, we present an enhanced design for our previous work, a length-bounded hybrid CPU/GPU pattern matching algorithm (LHPMA. In the preliminary experiment, the performance and comparison with the previous work are displayed, and the experimental results show that the LHPMA can achieve not only effective CPU/GPU cooperation but also higher throughput than the previous method.

  2. Historical feature pattern extraction based network attack situation sensing algorithm.

    Science.gov (United States)

    Zeng, Yong; Liu, Dacheng; Lei, Zhou

    2014-01-01

    The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE). First, HFPE algorithm seeks similar indications from the history situation sequence recorded and weighs the link intensity between occurred indication and subsequent effect. Then it calculates the probability that a certain effect reappears according to the current indication and makes a prediction after weighting. Meanwhile, HFPE method gives an evolution algorithm to derive the prediction deviation from the views of pattern and accuracy. This algorithm can continuously promote the adaptability of HFPE through gradual fine-tuning. The method preserves the rules in sequence at its best, does not need data preprocessing, and can track and adapt to the variation of situation sequence continuously.

  3. Historical Feature Pattern Extraction Based Network Attack Situation Sensing Algorithm

    Directory of Open Access Journals (Sweden)

    Yong Zeng

    2014-01-01

    Full Text Available The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE. First, HFPE algorithm seeks similar indications from the history situation sequence recorded and weighs the link intensity between occurred indication and subsequent effect. Then it calculates the probability that a certain effect reappears according to the current indication and makes a prediction after weighting. Meanwhile, HFPE method gives an evolution algorithm to derive the prediction deviation from the views of pattern and accuracy. This algorithm can continuously promote the adaptability of HFPE through gradual fine-tuning. The method preserves the rules in sequence at its best, does not need data preprocessing, and can track and adapt to the variation of situation sequence continuously.

  4. Trace metal depositional patterns from an open pit mining activity as revealed by archived avian gizzard contents.

    Science.gov (United States)

    Bendell, L I

    2011-02-15

    Archived samples of blue grouse (Dendragapus obscurus) gizzard contents, inclusive of grit, collected yearly between 1959 and 1970 were analyzed for cadmium, lead, zinc, and copper content. Approximately halfway through the 12-year sampling period, an open-pit copper mine began activities, then ceased operations 2 years later. Thus the archived samples provided a unique opportunity to determine if avian gizzard contents, inclusive of grit, could reveal patterns in the anthropogenic deposition of trace metals associated with mining activities. Gizzard concentrations of cadmium and copper strongly coincided with the onset of opening and the closing of the pit mining activity. Gizzard zinc and lead demonstrated significant among year variation; however, maximum concentrations did not correlate to mining activity. The archived gizzard contents did provide a useful tool for documenting trends in metal depositional patterns related to an anthropogenic activity. Further, blue grouse ingesting grit particles during the time of active mining activity would have been exposed to toxicologically significant levels of cadmium. Gizzard lead concentrations were also of toxicological significance but not related to mining activity. This type of "pulse" toxic metal exposure as a consequence of open-pit mining activity would not necessarily have been revealed through a "snap-shot" of soil, plant or avian tissue trace metal analysis post-mining activity. Copyright © 2010 Elsevier B.V. All rights reserved.

  5. Application of a genetic algorithm to core reload pattern optimization

    International Nuclear Information System (INIS)

    Tanker, E.; Tanker, A.Z.

    1994-01-01

    A genetic algorithm is applied to reload pattern optimization of a PWR core. Evaluating all different distributions of a given batch load separately is found slow and ineffective. Allowing patterns from different distributions to combine reproduce, an optimized pattern better than that obtained from from linear programming is found, albeit in a longer time. (authors). 5 refs., 2 tabs

  6. Application and Exploration of Big Data Mining in Clinical Medicine

    Science.gov (United States)

    Zhang, Yue; Guo, Shu-Li; Han, Li-Na; Li, Tie-Ling

    2016-01-01

    Objective: To review theories and technologies of big data mining and their application in clinical medicine. Data Sources: Literatures published in English or Chinese regarding theories and technologies of big data mining and the concrete applications of data mining technology in clinical medicine were obtained from PubMed and Chinese Hospital Knowledge Database from 1975 to 2015. Study Selection: Original articles regarding big data mining theory/technology and big data mining's application in the medical field were selected. Results: This review characterized the basic theories and technologies of big data mining including fuzzy theory, rough set theory, cloud theory, Dempster–Shafer theory, artificial neural network, genetic algorithm, inductive learning theory, Bayesian network, decision tree, pattern recognition, high-performance computing, and statistical analysis. The application of big data mining in clinical medicine was analyzed in the fields of disease risk assessment, clinical decision support, prediction of disease development, guidance of rational use of drugs, medical management, and evidence-based medicine. Conclusion: Big data mining has the potential to play an important role in clinical medicine. PMID:26960378

  7. Pattern-set generation algorithm for the one-dimensional multiple stock sizes cutting stock problem

    Science.gov (United States)

    Cui, Yaodong; Cui, Yi-Ping; Zhao, Zhigang

    2015-09-01

    A pattern-set generation algorithm (PSG) for the one-dimensional multiple stock sizes cutting stock problem (1DMSSCSP) is presented. The solution process contains two stages. In the first stage, the PSG solves the residual problems repeatedly to generate the patterns in the pattern set, where each residual problem is solved by the column-generation approach, and each pattern is generated by solving a single large object placement problem. In the second stage, the integer linear programming model of the 1DMSSCSP is solved using a commercial solver, where only the patterns in the pattern set are considered. The computational results of benchmark instances indicate that the PSG outperforms existing heuristic algorithms and rivals the exact algorithm in solution quality.

  8. A Comparative Analysis of Classification Algorithms on Diverse Datasets

    Directory of Open Access Journals (Sweden)

    M. Alghobiri

    2018-04-01

    Full Text Available Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.

  9. A Pilot-Pattern Based Algorithm for MIMO-OFDM Channel Estimation

    Directory of Open Access Journals (Sweden)

    Guomin Li

    2016-12-01

    Full Text Available An improved pilot pattern algorithm for facilitating the channel estimation in multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM systems is proposed in this paper. The presented algorithm reconfigures the parameter in the least square (LS algorithm, which belongs to the space-time block-coded (STBC category for channel estimation in pilot-based MIMO-OFDM system. Simulation results show that the algorithm has better performance in contrast to the classical single symbol scheme. In contrast to the double symbols scheme, the proposed algorithm can achieve nearly the same performance with only half of the complexity of the double symbols scheme.

  10. Frequent pattern mining

    CERN Document Server

    Aggarwal, Charu C

    2014-01-01

    Proposes numerous methods to solve some of the most fundamental problems in data mining and machine learning Presents various simplified perspectives, providing a range of information to benefit both students and practitioners Includes surveys on key research content, case studies and future research directions

  11. Trace metal depositional patterns from an open pit mining activity as revealed by archived avian gizzard contents

    Energy Technology Data Exchange (ETDEWEB)

    Bendell, L.I., E-mail: bendell@sfu.ca

    2011-02-15

    Archived samples of blue grouse (Dendragapus obscurus) gizzard contents, inclusive of grit, collected yearly between 1959 and 1970 were analyzed for cadmium, lead, zinc, and copper content. Approximately halfway through the 12-year sampling period, an open-pit copper mine began activities, then ceased operations 2 years later. Thus the archived samples provided a unique opportunity to determine if avian gizzard contents, inclusive of grit, could reveal patterns in the anthropogenic deposition of trace metals associated with mining activities. Gizzard concentrations of cadmium and copper strongly coincided with the onset of opening and the closing of the pit mining activity. Gizzard zinc and lead demonstrated significant among year variation; however, maximum concentrations did not correlate to mining activity. The archived gizzard contents did provide a useful tool for documenting trends in metal depositional patterns related to an anthropogenic activity. Further, blue grouse ingesting grit particles during the time of active mining activity would have been exposed to toxicologically significant levels of cadmium. Gizzard lead concentrations were also of toxicological significance but not related to mining activity. This type of 'pulse' toxic metal exposure as a consequence of open-pit mining activity would not necessarily have been revealed through a 'snap-shot' of soil, plant or avian tissue trace metal analysis post-mining activity. - Research Highlights: {yields} Archived gizzard samples reveals mining history. {yields} Grit ingestion exposes grouse to cadmium and lead. {yields} Grit selection includes particles enriched in cadmium. {yields} Cadmium enriched particles are of toxicological significance.

  12. A hybrid GA-TS algorithm for open vehicle routing optimization of coal mines material

    Energy Technology Data Exchange (ETDEWEB)

    Yu, S.W.; Ding, C.; Zhu, K.J. [China University of Geoscience, Wuhan (China)

    2011-08-15

    In the open vehicle routing problem (OVRP), the objective is to minimize the number of vehicles and the total distance (or time) traveled. This study primarily focuses on solving an open vehicle routing problem (OVRP) by applying a novel hybrid genetic algorithm and the Tabu search (GA-TS), which combines the GA's parallel computing and global optimization with TS's Tabu search skill and fast local search. Firstly, the proposed algorithm uses natural number coding according to the customer demands and the captivity of the vehicle for globe optimization. Secondly, individuals of population do TS local search with a certain degree of probability, namely, do the local routing optimization of all customer sites belong to one vehicle. The mechanism not only improves the ability of global optimization, but also ensures the speed of operation. The algorithm was used in Zhengzhou Coal Mine and Power Supply Co., Ltd.'s transport vehicle routing optimization.

  13. A Bayesian Scoring Technique for Mining Predictive and Non-Spurious Rules.

    Science.gov (United States)

    Batal, Iyad; Cooper, Gregory; Hauskrecht, Milos

    Rule mining is an important class of data mining methods for discovering interesting patterns in data. The success of a rule mining method heavily depends on the evaluation function that is used to assess the quality of the rules. In this work, we propose a new rule evaluation score - the Predictive and Non-Spurious Rules (PNSR) score. This score relies on Bayesian inference to evaluate the quality of the rules and considers the structure of the rules to filter out spurious rules. We present an efficient algorithm for finding rules with high PNSR scores. The experiments demonstrate that our method is able to cover and explain the data with a much smaller rule set than existing methods.

  14. Extracting Patterns from Educational Traces via Clustering and Associated Quality Metrics

    NARCIS (Netherlands)

    Mihaescu, Marian; Tanasie, Alexandru; Dascalu, Mihai; Trausan-Matu, Stefan

    2016-01-01

    Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers,

  15. AN EFFICIENT DATA MINING METHOD TO FIND FREQUENT ITEM SETS IN LARGE DATABASE USING TR- FCTM

    Directory of Open Access Journals (Sweden)

    Saravanan Suba

    2016-01-01

    Full Text Available Mining association rules in large database is one of most popular data mining techniques for business decision makers. Discovering frequent item set is the core process in association rule mining. Numerous algorithms are available in the literature to find frequent patterns. Apriori and FP-tree are the most common methods for finding frequent items. Apriori finds significant frequent items using candidate generation with more number of data base scans. FP-tree uses two database scans to find significant frequent items without using candidate generation. This proposed TR-FCTM (Transaction Reduction- Frequency Count Table Method discovers significant frequent items by generating full candidates once to form frequency count table with one database scan. Experimental results of TR-FCTM shows that this algorithm outperforms than Apriori and FP-tree.

  16. The Plasmodium falciparum Sexual Development Transcriptome: A Microarray Analysis using Ontology-Based Pattern Identification

    National Research Council Canada - National Science Library

    Young, Jason A; Fivelman, Quinton L; Blair, Peter L; de la Vega, Patricia; Le Roch, Karine G; Zhou, Yingyao; Carucci, Daniel J; Baker, David A; Winzeler, Elizabeth A

    2005-01-01

    ... a full-genome high-density oligonucleotide microarray. The interpretation of this transcriptional data was aided by applying a novel knowledge-based data-mining algorithm termed ontology-based pattern identification (OPI...

  17. AC-600 reactor reloading pattern optimization by using genetic algorithms

    International Nuclear Information System (INIS)

    Wu Hongchun; Xie Zhongsheng; Yao Dong; Li Dongsheng; Zhang Zongyao

    2000-01-01

    The use of genetic algorithms to optimize reloading pattern of the nuclear power plant reactor is proposed. And a new encoding and translating method is given. Optimization results of minimizing core power peak and maximizing cycle length for both low-leakage and out-in loading pattern of AC-600 reactor are obtained

  18. Data mining in agriculture

    CERN Document Server

    Mucherino, Antonio; Pardalos, Panos M

    2009-01-01

    Data Mining in Agriculture represents a comprehensive effort to provide graduate students and researchers with an analytical text on data mining techniques applied to agriculture and environmental related fields. This book presents both theoretical and practical insights with a focus on presenting the context of each data mining technique rather intuitively with ample concrete examples represented graphically and with algorithms written in MATLAB®. Examples and exercises with solutions are provided at the end of each chapter to facilitate the comprehension of the material. For each data mining technique described in the book variants and improvements of the basic algorithm are also given. Also by P.J. Papajorgji and P.M. Pardalos: Advances in Modeling Agricultural Systems, 'Springer Optimization and its Applications' vol. 25, ©2009.

  19. Application of data mining techniques for nuclear data and instrumentation

    International Nuclear Information System (INIS)

    Toshniwal, Durga

    2013-01-01

    Data mining is defined as the discovery of previously unknown, valid, novel, potentially useful, and understandable patterns in large databases. It encompasses many different techniques and algorithms which differ in the kinds of data that can be analyzed and the form of knowledge representation used to convey the discovered knowledge. Patterns in the data can be represented in many different forms, including classification rules, association rules, clusters, etc. Data mining thus deals with the discovery of hidden trends and patterns from large quantities of data. The field of data mining is emerging as a new, fundamental research area with important applications to science, engineering, medicine, business, and education. It is an interdisciplinary research area and draws upon several roots, including database systems, machine learning, information systems, statistics and expert systems. Data mining, when performed on time series data, is known as time series data mining (TSDM). A time series is a sequence of real numbers, each number representing a value at a point of time. During the past few years, there has been an explosion of research in the area of time series data mining. This includes attempts to model time series data, to design languages to query such data, and to develop access structures to efficiently process queries on such data. Time series data arises naturally in many real-world applications. Efficient discovery of knowledge through time series data mining can be helpful in several domains such as: Stock market analysis, Weather forecasting etc. An important application area of data mining techniques is in nuclear power plant and related data. Nuclear power plant data can be represented in form of time sequences. Often it may be of prime importance to analyze such data to find trends and anomalies. The general goals of data mining include feature extraction, similarity search, clustering and classification, association rule mining and anomaly

  20. A genetic algorithm approach for open-pit mine production scheduling

    Directory of Open Access Journals (Sweden)

    Aref Alipour

    2017-06-01

    Full Text Available In an Open-Pit Production Scheduling (OPPS problem, the goal is to determine the mining sequence of an orebody as a block model. In this article, linear programing formulation is used to aim this goal. OPPS problem is known as an NP-hard problem, so an exact mathematical model cannot be applied to solve in the real state. Genetic Algorithm (GA is a well-known member of evolutionary algorithms that widely are utilized to solve NP-hard problems. Herein, GA is implemented in a hypothetical Two-Dimensional (2D copper orebody model. The orebody is featured as two-dimensional (2D array of blocks. Likewise, counterpart 2D GA array was used to represent the OPPS problem’s solution space. Thereupon, the fitness function is defined according to the OPPS problem’s objective function to assess the solution domain. Also, new normalization method was used for the handling of block sequencing constraint. A numerical study is performed to compare the solutions of the exact and GA-based methods. It is shown that the gap between GA and the optimal solution by the exact method is less than % 5; hereupon GA is found to be efficiently in solving OPPS problem.

  1. Data Mining and Machine Learning Methods for Dementia Research.

    Science.gov (United States)

    Li, Rui

    2018-01-01

    Patient data in clinical research often includes large amounts of structured information, such as neuroimaging data, neuropsychological test results, and demographic variables. Given the various sources of information, we can develop computerized methods that can be a great help to clinicians to discover hidden patterns in the data. The computerized methods often employ data mining and machine learning algorithms, lending themselves as the computer-aided diagnosis (CAD) tool that assists clinicians in making diagnostic decisions. In this chapter, we review state-of-the-art methods used in dementia research, and briefly introduce some recently proposed algorithms subsequently.

  2. Mining Spatiotemporal Patterns of the Elder's Daily Movement

    Science.gov (United States)

    Chen, C. R.; Chen, C. F.; Liu, M. E.; Tsai, S. J.; Son, N. T.; Kinh, L. V.

    2016-06-01

    With rapid developments in wearable device technology, a vast amount of spatiotemporal data, such as people's movement and physical activities, are generated. Information derived from the data reveals important knowledge that can contribute a long-term care and psychological assessment of the elders' living condition especially in long-term care institutions. This study aims to develop a method to investigate the spatial-temporal movement patterns of the elders with their outdoor trajectory information. To achieve the goal, GPS based location data of the elderly subjects from long-term care institutions are collected and analysed with geographic information system (GIS). A GIS statistical model is developed to mine the elderly subjects' spatiotemporal patterns with the location data and represent their daily movement pattern at particular time. The proposed method first finds the meaningful trajectory and extracts the frequent patterns from the time-stamp location data. Then, a density-based clustering method is used to identify the major moving range and the gather/stay hotspot in both spatial and temporal dimensions. The preliminary results indicate that the major moving area of the elderly people encompasses their dorm and has a short moving distance who often stay in the same site. Subjects' outdoor appearance are corresponded to their life routine. The results can be useful for understanding elders' social network construction, risky area identification and medical care monitoring.

  3. Comparing sets of patterns with the Jaccard index

    Directory of Open Access Journals (Sweden)

    Sam Fletcher

    2018-03-01

    Full Text Available The ability to extract knowledge from data has been the driving force of Data Mining since its inception, and of statistical modeling long before even that. Actionable knowledge often takes the form of patterns, where a set of antecedents can be used to infer a consequent. In this paper we offer a solution to the problem of comparing different sets of patterns. Our solution allows comparisons between sets of patterns that were derived from different techniques (such as different classification algorithms, or made from different samples of data (such as temporal data or data perturbed for privacy reasons. We propose using the Jaccard index to measure the similarity between sets of patterns by converting each pattern into a single element within the set. Our measure focuses on providing conceptual simplicity, computational simplicity, interpretability, and wide applicability. The results of this measure are compared to prediction accuracy in the context of a real-world data mining scenario.

  4. Quantifying Surface Coal-Mining Patterns to Promote Regional Sustainability in Ordos, Inner Mongolia

    Directory of Open Access Journals (Sweden)

    Xiaoji Zeng

    2018-04-01

    Full Text Available Ordos became the new “coal capital” of China within a few decades since the country’s economic reform in 1978, as large-scale surface coal mining dramatically propelled its per capita GDP from being one of the lowest to one of the highest in China, exceeding Hong Kong in 2009. Surface coal-mining areas (SCMAs have continued to expand in this region during recent decades, resulting in serious environmental and socioeconomic consequences. To understand these impacts and promote regional sustainability, quantifying the spatiotemporal patterns of SCMAs is urgently needed. Thus, the main objectives of this study were to quantify the spatiotemporal patterns of SCMAs in the Ordos region from 1990 to 2015, and to examine some of the major environmental and socioeconomic impacts in the study region. We extracted the SCMAs using remote-sensing data, and then quantified their spatiotemporal patterns using landscape metrics. The loss of natural habitat and several socioeconomic indicators were examined in relation to surface coal mining. Our results show that the area of SCMAs increased from 7.12 km2 to 355.95 km2, an increase of nearly 49 times from 1990 to 2015 in the Ordos region. The number of SCMAs in this region increased from 82 to 651, a nearly seven-fold increase. In particular, Zhungeer banner (an administrative division, Yijinhuoluo banner, Dongsheng District and Dalate banner in the north-eastern part of the Ordos region had higher growth rates of SCMAs. The income gap between urban and rural residents increased along with the growth in SCMAs, undermining social equity in the Ordos region. Moreover, the rapid increase in SCMAs resulted in natural habitat loss (including grasslands, forests, and deserts across this region. Thus, we suggest that regional sustainability in Ordos needs to emphasize effective measures to curb large-scale surface coal mining in order to reduce the urban–rural income gap, and to restore degraded natural

  5. AN EFFECTIVE RECOMMENDATIONS BY DIFFUSION ALGORITHM FOR WEB GRAPH MINING

    Directory of Open Access Journals (Sweden)

    S. Vasukipriya

    2013-04-01

    Full Text Available The information on the World Wide Web grows in an explosive rate. Societies are relying more on the Web for their miscellaneous needs of information. Recommendation systems are active information filtering systems that attempt to present the information items like movies, music, images, books recommendations, tags recommendations, query suggestions, etc., to the users. Various kinds of data bases are used for the recommendations; fundamentally these data bases can be molded in the form of many types of graphs. Aiming at provided that a general framework on effective DR (Recommendations by Diffusion algorithm for web graphs mining. First introduce a novel graph diffusion model based on heat diffusion. This method can be applied to both undirected graphs and directed graphs. Then it shows how to convert different Web data sources into correct graphs in our models.

  6. Prediction of customer behaviour analysis using classification algorithms

    Science.gov (United States)

    Raju, Siva Subramanian; Dhandayudam, Prabha

    2018-04-01

    Customer Relationship management plays a crucial role in analyzing of customer behavior patterns and their values with an enterprise. Analyzing of customer data can be efficient performed using various data mining techniques, with the goal of developing business strategies and to enhance the business. In this paper, three classification models (NB, J48, and MLPNN) are studied and evaluated for our experimental purpose. The performance measures of the three classifications are compared using three different parameters (accuracy, sensitivity, specificity) and experimental results expose J48 algorithm has better accuracy with compare to NB and MLPNN algorithm.

  7. Investigation on the improvement of genetic algorithm for PWR loading pattern search and its benchmark verification

    International Nuclear Information System (INIS)

    Li Qianqian; Jiang Xiaofeng; Zhang Shaohong

    2009-01-01

    In this study, the age technique, the concepts of relativeness degree and worth function are exploited to improve the performance of genetic algorithm (GA) for PWR loading pattern search. Among them, the age technique endows the algorithm be capable of learning from previous search 'experience' and guides it to do a better search in the vicinity ora local optimal; the introduction of the relativeness degree checks the relativeness of two loading patterns before performing crossover between them, which can significantly reduce the possibility of prematurity of the algorithm; while the application of the worth function makes the algorithm be capable of generating new loading patterns based on the statistics of common features of evaluated good loading patterns. Numerical verification against a loading pattern search benchmark problem ora two-loop reactor demonstrates that the adoption of these techniques is able to significantly enhance the efficiency of the genetic algorithm while improves the quality of the final solution as well. (authors)

  8. A multi-pattern hash-binary hybrid algorithm for URL matching in the HTTP protocol.

    Directory of Open Access Journals (Sweden)

    Ping Zeng

    Full Text Available In this paper, based on our previous multi-pattern uniform resource locator (URL binary-matching algorithm called HEM, we propose an improved multi-pattern matching algorithm called MH that is based on hash tables and binary tables. The MH algorithm can be applied to the fields of network security, data analysis, load balancing, cloud robotic communications, and so on-all of which require string matching from a fixed starting position. Our approach effectively solves the performance problems of the classical multi-pattern matching algorithms. This paper explores ways to improve string matching performance under the HTTP protocol by using a hash method combined with a binary method that transforms the symbol-space matching problem into a digital-space numerical-size comparison and hashing problem. The MH approach has a fast matching speed, requires little memory, performs better than both the classical algorithms and HEM for matching fields in an HTTP stream, and it has great promise for use in real-world applications.

  9. Brand Switching Pattern Discovery by Data Mining Techniques for the Telecommunication Industry in Australia

    Directory of Open Access Journals (Sweden)

    Md Zahidul Islam

    2016-11-01

    Full Text Available There is more than one mobile-phone subscription per member of the Australian population. The number of complaints against the mobile-phone-service providers is also high. Therefore, the mobile service providers are facing a huge challenge in retaining their customers. There are a number of existing models to analyse customer behaviour and switching patterns. A number of switching models may also exist within a large market. These models are often not useful due to the heterogeneous nature of the market. Therefore, in this study we use data mining techniques to let the data talk to help us discover switching patterns without requiring us to use any models and domain knowledge. We use a variety of decision tree and decision forest techniques on a real mobile-phone-usage dataset in order to demonstrate the effectiveness of data mining techniques in knowledge discovery. We report many interesting patterns, and discuss them from a brand-switching and marketing perspective, through which they are found to be very sensible and interesting.

  10. An Interval Bound Algorithm of optimizing reactor core loading pattern by using reactivity interval schema

    International Nuclear Information System (INIS)

    Gong Zhaohu; Wang Kan; Yao Dong

    2011-01-01

    Highlights: → We present a new Loading Pattern Optimization method - Interval Bound Algorithm (IBA). → IBA directly uses the reactivity of fuel assemblies and burnable poison. → IBA can optimize fuel assembly orientation in a coupled way. → Numerical experiment shows that IBA outperforms genetic algorithm and engineers. → We devise DDWF technique to deal with multiple objectives and constraints. - Abstract: In order to optimize the core loading pattern in Nuclear Power Plants, the paper presents a new optimization method - Interval Bound Algorithm (IBA). Similar to the typical population based algorithms, e.g. genetic algorithm, IBA maintains a population of solutions and evolves them during the optimization process. IBA acquires the solution by statistical learning and sampling the control variable intervals of the population in each iteration. The control variables are the transforms of the reactivity of fuel assemblies or the worth of burnable poisons, which are the crucial heuristic information for loading pattern optimization problems. IBA can deal with the relationship between the dependent variables by defining the control variables. Based on the IBA algorithm, a parallel Loading Pattern Optimization code, named IBALPO, has been developed. To deal with multiple objectives and constraints, the Dynamic Discontinuous Weight Factors (DDWF) for the fitness function have been used in IBALPO. Finally, the code system has been used to solve a realistic reloading problem and a better pattern has been obtained compared with the ones searched by engineers and genetic algorithm, thus the performance of the code is proved.

  11. Comparison of optimization of loading patterns on the basis of SA and PMA algorithms

    International Nuclear Information System (INIS)

    Beliczai, Botond

    2007-01-01

    Optimization of loading patterns is a very important task from economical point of view in a nuclear power plant. The optimization algorithms used for this purpose can be categorized basically into two categories: deterministic ones and stochastic ones. In the Paks nuclear power plant a deterministic optimization procedure is used to optimize the loading pattern at BOC, so that the core would have maximal reactivity reserve. To the group of stochastic optimization procedures belong mainly simulated annealing (SA) procedures and genetic algorithms (GA). There are new procedures as well, which try to combine the advantages of SAs and GAs. One of them is called population mutation annealing algorithm (PMA). In the Paks NPP we would like to introduce fuel assemblies including burnable poison (Gd) in the near future. In order to be able to find the optimal loading pattern (or near-optimal loading patterns) in that case, we have to optimize our core not only for objective functions defined at BOC, but at EOC as well. For this purpose I used stochastic algorithms (SA and PMA) to investigate loading pattern optimization results for different objective functions at BOC. (author)

  12. Historical Feature Pattern Extraction Based Network Attack Situation Sensing Algorithm

    OpenAIRE

    Zeng, Yong; Liu, Dacheng; Lei, Zhou

    2014-01-01

    The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE). First, HFPE algorithm seeks similar indications from the history si...

  13. Performance evaluation of Genetic Algorithms on loading pattern optimization of PWRs

    International Nuclear Information System (INIS)

    Tombakoglu, M.; Bekar, K.B.; Erdemli, A.O.

    2001-01-01

    Genetic Algorithm (GA) based systems are used for search and optimization problems. There are several applications of GAs in literature successfully applied for loading pattern optimization problems. In this study, we have selected loading pattern optimization problem of Pressurised Water Reactor (PWR). The main objective of this work is to evaluate the performance of Genetic Algorithm operators such as regional crossover, crossover and mutation, and selection and construction of initial population and its size for PWR loading pattern optimization problems. The performance of GA with antithetic variates is compared to traditional GA. Antithetic variates are used to generate the initial population and its use with GA operators are also discussed. Finally, the results of multi-cycle optimization problems are discussed for objective function taking into account cycle burn-up and discharge burn-up.(author)

  14. Mining Product Data Models: A Case Study

    Directory of Open Access Journals (Sweden)

    Cristina-Claudia DOLEAN

    2014-01-01

    Full Text Available This paper presents two case studies used to prove the validity of some data-flow mining algorithms. We proposed the data-flow mining algorithms because most part of mining algorithms focuses on the control-flow perspective. First case study uses event logs generated by an ERP system (Navision after we set several trackers on the data elements needed in the process analyzed; while the second case study uses the event logs generated by YAWL system. We offered a general solution of data-flow model extraction from different data sources. In order to apply the data-flow mining algorithms the event logs must comply a certain format (using InputOutput extension. But to respect this format, a set of conversion tools is needed. We depicted the conversion tools used and how we got the data-flow models. Moreover, the data-flow model is compared to the control-flow model.

  15. Mining Web-based Educational Systems to Predict Student Learning Achievements

    Directory of Open Access Journals (Sweden)

    José del Campo-Ávila

    2015-03-01

    Full Text Available Educational Data Mining (EDM is getting great importance as a new interdisciplinary research field related to some other areas. It is directly connected with Web-based Educational Systems (WBES and Data Mining (DM, a fundamental part of Knowledge Discovery in Databases. The former defines the context: WBES store and manage huge amounts of data. Such data are increasingly growing and they contain hidden knowledge that could be very useful to the users (both teachers and students. It is desirable to identify such knowledge in the form of models, patterns or any other representation schema that allows a better exploitation of the system. The latter reveals itself as the tool to achieve such discovering. Data mining must afford very complex and different situations to reach quality solutions. Therefore, data mining is a research field where many advances are being done to accommodate and solve emerging problems. For this purpose, many techniques are usually considered. In this paper we study how data mining can be used to induce student models from the data acquired by a specific Web-based tool for adaptive testing, called SIETTE. Concretely we have used top down induction decision trees algorithms to extract the patterns because these models, decision trees, are easily understandable. In addition, the conducted validation processes have assured high quality models.

  16. Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm.

    Science.gov (United States)

    Tchagang, Alain B; Phan, Sieu; Famili, Fazel; Shearer, Heather; Fobert, Pierre; Huang, Yi; Zou, Jitao; Huang, Daiqing; Cutler, Adrian; Liu, Ziying; Pan, Youlian

    2012-04-04

    Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space. We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples. Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.

  17. Comparison of phase unwrapping algorithms for topography reconstruction based on digital speckle pattern interferometry

    Science.gov (United States)

    Li, Yuanbo; Cui, Xiaoqian; Wang, Hongbei; Zhao, Mengge; Ding, Hongbin

    2017-10-01

    Digital speckle pattern interferometry (DSPI) can diagnose the topography evolution in real-time, continuous and non-destructive, and has been considered as a most promising technique for Plasma-Facing Components (PFCs) topography diagnostic under the complicated environment of tokamak. It is important for the study of digital speckle pattern interferometry to enhance speckle patterns and obtain the real topography of the ablated crater. In this paper, two kinds of numerical model based on flood-fill algorithm has been developed to obtain the real profile by unwrapping from the wrapped phase in speckle interference pattern, which can be calculated through four intensity images by means of 4-step phase-shifting technique. During the process of phase unwrapping by means of flood-fill algorithm, since the existence of noise pollution, and other inevitable factors will lead to poor quality of the reconstruction results, this will have an impact on the authenticity of the restored topography. The calculation of the quality parameters was introduced to obtain the quality-map from the wrapped phase map, this work presents two different methods to calculate the quality parameters. Then quality parameters are used to guide the path of flood-fill algorithm, and the pixels with good quality parameters are given priority calculation, so that the quality of speckle interference pattern reconstruction results are improved. According to the comparison between the flood-fill algorithm which is suitable for speckle pattern interferometry and the quality-guided flood-fill algorithm (with two different calculation approaches), the errors which caused by noise pollution and the discontinuous of the strips were successfully reduced.

  18. Evolving temporal association rules with genetic algorithms

    OpenAIRE

    Matthews, Stephen G.; Gongora, Mario A.; Hopgood, Adrian A.

    2010-01-01

    A novel framework for mining temporal association rules by discovering itemsets with a genetic algorithm is introduced. Metaheuristics have been applied to association rule mining, we show the efficacy of extending this to another variant - temporal association rule mining. Our framework is an enhancement to existing temporal association rule mining methods as it employs a genetic algorithm to simultaneously search the rule space and temporal space. A methodology for validating the ability of...

  19. A systematic review of data mining and machine learning for air pollution epidemiology.

    Science.gov (United States)

    Bellinger, Colin; Mohomed Jabbar, Mohomed Shazan; Zaïane, Osmar; Osornio-Vargas, Alvaro

    2017-11-28

    Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology. We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed. Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology. We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air

  20. Data clustering algorithms and applications

    CERN Document Server

    Aggarwal, Charu C

    2013-01-01

    Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains.The book focuses on three primary aspects of data clustering: Methods, describing key techniques commonly used for clustering, such as fea

  1. A Neural-Network Clustering-Based Algorithm for Privacy Preserving Data Mining

    Science.gov (United States)

    Tsiafoulis, S.; Zorkadis, V. C.; Karras, D. A.

    The increasing use of fast and efficient data mining algorithms in huge collections of personal data, facilitated through the exponential growth of technology, in particular in the field of electronic data storage media and processing power, has raised serious ethical, philosophical and legal issues related to privacy protection. To cope with these concerns, several privacy preserving methodologies have been proposed, classified in two categories, methodologies that aim at protecting the sensitive data and those that aim at protecting the mining results. In our work, we focus on sensitive data protection and compare existing techniques according to their anonymity degree achieved, the information loss suffered and their performance characteristics. The ℓ-diversity principle is combined with k-anonymity concepts, so that background information can not be exploited to successfully attack the privacy of data subjects data refer to. Based on Kohonen Self Organizing Feature Maps (SOMs), we firstly organize data sets in subspaces according to their information theoretical distance to each other, then create the most relevant classes paying special attention to rare sensitive attribute values, and finally generalize attribute values to the minimum extend required so that both the data disclosure probability and the information loss are possibly kept negligible. Furthermore, we propose information theoretical measures for assessing the anonymity degree achieved and empirical tests to demonstrate it.

  2. Mining Emerging Patterns for Recognizing Activities of Multiple Users in Pervasive Computing

    DEFF Research Database (Denmark)

    Gu, Tao; Wu, Zhanqing; Wang, Liang

    2009-01-01

    Understanding and recognizing human activities from sensor readings is an important task in pervasive computing. Existing work on activity recognition mainly focuses on recognizing activities for a single user in a smart home environment. However, in real life, there are often multiple inhabitants...... activity models, and propose an Emerging Pattern based Multi-user Activity Recognizer (epMAR) to recognize both single-user and multiuser activities. We conduct our empirical studies by collecting real-world activity traces done by two volunteers over a period of two weeks in a smart home environment...... sensor readings in a home environment, and propose a novel pattern mining approach to recognize both single-user and multi-user activities in a unified solution. We exploit Emerging Pattern – a type of knowledge pattern that describes significant changes between classes of data – for constructing our...

  3. Pattern Recognition of Signals for the Fault-Slip Type of Rock Burst in Coal Mines

    Directory of Open Access Journals (Sweden)

    X. S. Liu

    2015-01-01

    Full Text Available The fault-slip type of rock burst is a major threat to the safety of coal mining, and effectively recognizing its signals patterns is the foundation for the early warning and prevention. At first, a mechanical model of the fault-slip was established and the mechanism of the rock burst induced by the fault-slip was revealed. Then, the patterns of the electromagnetic radiation, acoustic emission (AE, and microseismic signals in the fault-slip type of rock burst were proposed, in that before the rock burst occurs, the electromagnetic radiation intensity near the sliding surface increases rapidly, the AE energy rises exponentially, and the energy released by microseismic events experiences at least one peak and is close to the next peak. At last, in situ investigations were performed at number 1412 coal face in the Huafeng Mine, China. Results showed that the signals patterns proposed are in good agreement with the process of the fault-slip type of rock burst. The pattern recognition can provide a basis for the early warning and the implementation of relief measures of the fault-slip type of rock burst.

  4. Mining

    Directory of Open Access Journals (Sweden)

    Khairullah Khan

    2014-09-01

    Full Text Available Opinion mining is an interesting area of research because of its applications in various fields. Collecting opinions of people about products and about social and political events and problems through the Web is becoming increasingly popular every day. The opinions of users are helpful for the public and for stakeholders when making certain decisions. Opinion mining is a way to retrieve information through search engines, Web blogs and social networks. Because of the huge number of reviews in the form of unstructured text, it is impossible to summarize the information manually. Accordingly, efficient computational methods are needed for mining and summarizing the reviews from corpuses and Web documents. This study presents a systematic literature survey regarding the computational techniques, models and algorithms for mining opinion components from unstructured reviews.

  5. The Determination of Children's Knowledge of Global Lunar Patterns from Online Essays Using Text Mining Analysis

    Science.gov (United States)

    Cheon, Jongpil; Lee, Sangno; Smith, Walter; Song, Jaeki; Kim, Yongjin

    2013-01-01

    The purpose of this study was to use text mining analysis of early adolescents' online essays to determine their knowledge of global lunar patterns. Australian and American students in grades five to seven wrote about global lunar patterns they had discovered by sharing observations with each other via the Internet. These essays were analyzed for…

  6. Data Mining and Machine Learning in Astronomy

    Science.gov (United States)

    Ball, Nicholas M.; Brunner, Robert J.

    We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those in which data mining techniques directly contributed to improving science, and important current and future directions, including probability density functions, parallel algorithms, Peta-Scale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.

  7. Zoning method for environmental engineering geological patterns in underground coal mining areas.

    Science.gov (United States)

    Liu, Shiliang; Li, Wenping; Wang, Qiqing

    2018-09-01

    Environmental engineering geological patterns (EEGPs) are used to express the trend and intensity of eco-geological environment caused by mining in underground coal mining areas, a complex process controlled by multiple factors. A new zoning method for EEGPs was developed based on the variable-weight theory (VWT), where the weights of factors vary with their value. The method was applied to the Yushenfu mining area, Shaanxi, China. First, the mechanism of the EEGPs caused by mining was elucidated, and four types of EEGPs were proposed. Subsequently, 13 key control factors were selected from mining conditions, lithosphere, hydrosphere, ecosphere, and climatic conditions; their thematic maps were constructed using ArcGIS software and remote-sensing technologies. Then, a stimulation-punishment variable-weight model derived from the partition of basic evaluation unit of study area, construction of partition state-variable-weight vector, and determination of variable-weight interval was built to calculate the variable weights of each factor. On this basis, a zoning mathematical model of EEGPs was established, and the zoning results were analyzed. For comparison, the traditional constant-weight theory (CWT) was also applied to divide the EEGPs. Finally, the zoning results obtained using VWT and CWT were compared. The verification of field investigation indicates that VWT is more accurate and reliable than CWT. The zoning results are consistent with the actual situations and the key of planning design for the rational development of coal resources and protection of eco-geological environment. Copyright © 2018 Elsevier B.V. All rights reserved.

  8. Analysis of gas migration patterns in fractured coal rocks under actual mining conditions

    Directory of Open Access Journals (Sweden)

    Gao Mingzhong

    2017-01-01

    Full Text Available Fracture fields in coal rocks are the main channels for gas seepage, migration, and extraction. The development, evolution, and spatial distribution of fractures in coal rocks directly affect the permeability of the coal rock as well as gas migration and flow. In this work, the Ji-15-14120 mining face at the No. 8 Coal Mine of Pingdingshan Tian’an Coal Mining Co. Ltd., Pingdingshan, China, was selected as the test site to develop a full-parameter fracture observation instrument and a dynamic fracture observation technique. The acquired video information of fractures in the walls of the boreholes was vectorized and converted to planarly expanded images on a computer-aided design platform. Based on the relative spatial distances between the openings of the boreholes, simultaneous planar images of isolated fractures in the walls of the boreholes along the mining direction were obtained from the boreholes located at various distances from the mining face. Using this information, a 3-D fracture network under mining conditions was established. The gas migration pattern was calculated using a COMSOL computation platform. The results showed that between 10 hours and 1 day the fracture network controlled the gas-flow, rather than the coal seam itself. After one day, the migration of gas was completely controlled by the fractures. The presence of fractures in the overlying rock enables the gas in coal seam to migrate more easily to the surrounding rocks or extraction tunnels situated relatively far away from the coal rock. These conclusions provide an important theoretical basis for gas extraction.

  9. Hospitalization patterns associated with Appalachian coal mining.

    Science.gov (United States)

    Hendryx, Michael; Ahern, Melissa M; Nurkiewicz, Timothy R

    2007-12-01

    The goal of this study was to test whether the volume of coal mining was related to population hospitalization risk for diseases postulated to be sensitive or insensitive to coal mining by-products. The study was a retrospective analysis of 2001 adult hospitalization data (n = 93,952) for West Virginia, Kentucky, and Pennsylvania, merged with county-level coal production figures. Hospitalization data were obtained from the Health Care Utilization Project National Inpatient Sample. Diagnoses postulated to be sensitive to coal mining by-product exposure were contrasted with diagnoses postulated to be insensitive to exposure. Data were analyzed using hierarchical nonlinear models, controlling for patient age, gender, insurance, comorbidities, hospital teaching status, county poverty, and county social capital. Controlling for covariates, the volume of coal mining was significantly related to hospitalization risk for two conditions postulated to be sensitive to exposure: hypertension and chronic obstructive pulmonary disease (COPD). The odds for a COPD hospitalization increased 1% for each 1462 tons of coal, and the odds for a hypertension hospitalization increased 1% for each 1873 tons of coal. Other conditions were not related to mining volume. Exposure to particulates or other pollutants generated by coal mining activities may be linked to increased risk of COPD and hypertension hospitalizations. Limitations in the data likely result in an underestimate of associations.

  10. Gesture Recognition from Data Streams of Human Motion Sensor Using Accelerated PSO Swarm Search Feature Selection Algorithm

    Directory of Open Access Journals (Sweden)

    Simon Fong

    2015-01-01

    Full Text Available Human motion sensing technology gains tremendous popularity nowadays with practical applications such as video surveillance for security, hand signing, and smart-home and gaming. These applications capture human motions in real-time from video sensors, the data patterns are nonstationary and ever changing. While the hardware technology of such motion sensing devices as well as their data collection process become relatively mature, the computational challenge lies in the real-time analysis of these live feeds. In this paper we argue that traditional data mining methods run short of accurately analyzing the human activity patterns from the sensor data stream. The shortcoming is due to the algorithmic design which is not adaptive to the dynamic changes in the dynamic gesture motions. The successor of these algorithms which is known as data stream mining is evaluated versus traditional data mining, through a case of gesture recognition over motion data by using Microsoft Kinect sensors. Three different subjects were asked to read three comic strips and to tell the stories in front of the sensor. The data stream contains coordinates of articulation points and various positions of the parts of the human body corresponding to the actions that the user performs. In particular, a novel technique of feature selection using swarm search and accelerated PSO is proposed for enabling fast preprocessing for inducing an improved classification model in real-time. Superior result is shown in the experiment that runs on this empirical data stream. The contribution of this paper is on a comparative study between using traditional and data stream mining algorithms and incorporation of the novel improved feature selection technique with a scenario where different gesture patterns are to be recognized from streaming sensor data.

  11. Research of the Space Clustering Method for the Airport Noise Data Minings

    Directory of Open Access Journals (Sweden)

    Jiwen Xie

    2014-03-01

    Full Text Available Mining the distribution pattern and evolution of the airport noise from the airport noise data and the geographic information of the monitoring points is of great significance for the scientific and rational governance of airport noise pollution problem. However, most of the traditional clustering methods are based on the closeness of space location or the similarity of non-spatial features, which split the duality of space elements, resulting in that the clustering result has difficult in satisfying both the closeness of space location and the similarity of non-spatial features. This paper, therefore, proposes a spatial clustering algorithm based on dual-distance. This algorithm uses a distance function as the similarity measure function in which spatial features and non-spatial features are combined. The experimental results show that the proposed algorithm can discover the noise distribution pattern around the airport effectively.

  12. Strong compound-risk factors: efficient discovery through emerging patterns and contrast sets.

    Science.gov (United States)

    Li, Jinyan; Yang, Qiang

    2007-09-01

    Odds ratio (OR), relative risk (RR) (risk ratio), and absolute risk reduction (ARR) (risk difference) are biostatistics measurements that are widely used for identifying significant risk factors in dichotomous groups of subjects. In the past, they have often been used to assess simple risk factors. In this paper, we introduce the concept of compound-risk factors to broaden the applicability of these statistical tests for assessing factor interplays. We observe that compound-risk factors with a high risk ratio or a big risk difference have an one-to-one correspondence to strong emerging patterns or strong contrast sets-two types of patterns that have been extensively studied in the data mining field. Such a relationship has been unknown to researchers in the past, and efficient algorithms for discovering strong compound-risk factors have been lacking. In this paper, we propose a theoretical framework and a new algorithm that unify the discovery of compound-risk factors that have a strong OR, risk ratio, or a risk difference. Our method guarantees that all patterns meeting a certain test threshold can be efficiently discovered. Our contribution thus represents the first of its kind in linking the risk ratios and ORs to pattern mining algorithms, making it possible to find compound-risk factors in large-scale data sets. In addition, we show that using compound-risk factors can improve classification accuracy in probabilistic learning algorithms on several disease data sets, because these compound-risk factors capture the interdependency between important data attributes.

  13. Mining Views : database views for data mining

    NARCIS (Netherlands)

    Blockeel, H.; Calders, T.; Fromont, É.; Goethals, B.; Prado, A.; Nijssen, S.; De Raedt, L.

    2007-01-01

    We propose a relational database model towards the integration of data mining into relational database systems, based on the so called virtual mining views. We show that several types of patterns and models over the data, such as itemsets, association rules, decision trees and clusterings, can be

  14. Web Mining and Social Networking

    DEFF Research Database (Denmark)

    Xu, Guandong; Zhang, Yanchun; Li, Lin

    This book examines the techniques and applications involved in the Web Mining, Web Personalization and Recommendation and Web Community Analysis domains, including a detailed presentation of the principles, developed algorithms, and systems of the research in these areas. The applications of web ...... sense of individuals or communities. The volume will benefit both academic and industry communities interested in the techniques and applications of web search, web data management, web mining and web knowledge discovery, as well as web community and social network analysis.......This book examines the techniques and applications involved in the Web Mining, Web Personalization and Recommendation and Web Community Analysis domains, including a detailed presentation of the principles, developed algorithms, and systems of the research in these areas. The applications of web...... mining, and the issue of how to incorporate web mining into web personalization and recommendation systems are also reviewed. Additionally, the volume explores web community mining and analysis to find the structural, organizational and temporal developments of web communities and reveal the societal...

  15. Research reactor loading pattern optimization using estimation of distribution algorithms

    International Nuclear Information System (INIS)

    Jiang, S.; Ziver, K.; Carter, J. N.; Pain, C. C.; Eaton, M. D.; Goddard, A. J. H.; Franklin, S. J.; Phillips, H. J.

    2006-01-01

    A new evolutionary search based approach for solving the nuclear reactor loading pattern optimization problems is presented based on the Estimation of Distribution Algorithms. The optimization technique developed is then applied to the maximization of the effective multiplication factor (K eff ) of the Imperial College CONSORT research reactor (the last remaining civilian research reactor in the United Kingdom). A new elitism-guided searching strategy has been developed and applied to improve the local convergence together with some problem-dependent information based on the 'stand-alone K eff with fuel coupling calculations. A comparison study between the EDAs and a Genetic Algorithm with Heuristic Tie Breaking Crossover operator has shown that the new algorithm is efficient and robust. (authors)

  16. Hypotension Risk Prediction via Sequential Contrast Patterns of ICU Blood Pressure.

    Science.gov (United States)

    Ghosh, Shameek; Feng, Mengling; Nguyen, Hung; Li, Jinyan

    2016-09-01

    Acute hypotension is a significant risk factor for in-hospital mortality at intensive care units. Prolonged hypotension can cause tissue hypoperfusion, leading to cellular dysfunction and severe injuries to multiple organs. Prompt medical interventions are thus extremely important for dealing with acute hypotensive episodes (AHE). Population level prognostic scoring systems for risk stratification of patients are suboptimal in such scenarios. However, the design of an efficient risk prediction system can significantly help in the identification of critical care patients, who are at risk of developing an AHE within a future time span. Toward this objective, a pattern mining algorithm is employed to extract informative sequential contrast patterns from hemodynamic data, for the prediction of hypotensive episodes. The hypotensive and normotensive patient groups are extracted from the MIMIC-II critical care research database, following an appropriate clinical inclusion criteria. The proposed method consists of a data preprocessing step to convert the blood pressure time series into symbolic sequences, using a symbolic aggregate approximation algorithm. Then, distinguishing subsequences are identified using the sequential contrast mining algorithm. These subsequences are used to predict the occurrence of an AHE in a future time window separated by a user-defined gap interval. Results indicate that the method performs well in terms of the prediction performance as well as in the generation of sequential patterns of clinical significance. Hence, the novelty of sequential patterns is in their usefulness as potential physiological biomarkers for building optimal patient risk stratification systems and for further clinical investigation of interesting patterns in critical care patients.

  17. Mining Views : database views for data mining

    NARCIS (Netherlands)

    Blockeel, H.; Calders, T.; Fromont, É.; Goethals, B.; Prado, A.

    2008-01-01

    We present a system towards the integration of data mining into relational databases. To this end, a relational database model is proposed, based on the so called virtual mining views. We show that several types of patterns and models over the data, such as itemsets, association rules and decision

  18. Distributed genetic process mining

    NARCIS (Netherlands)

    Bratosin, C.C.; Sidorova, N.; Aalst, van der W.M.P.

    2010-01-01

    Process mining aims at discovering process models from data logs in order to offer insight into the real use of information systems. Most of the existing process mining algorithms fail to discover complex constructs or have problems dealing with noise and infrequent behavior. The genetic process

  19. Differential harmony search algorithm to optimize PWRs loading pattern

    Energy Technology Data Exchange (ETDEWEB)

    Poursalehi, N., E-mail: npsalehi@yahoo.com [Engineering Department, Shahid Beheshti University, G.C, P.O.Box: 1983963113, Tehran (Iran, Islamic Republic of); Zolfaghari, A.; Minuchehr, A. [Engineering Department, Shahid Beheshti University, G.C, P.O.Box: 1983963113, Tehran (Iran, Islamic Republic of)

    2013-04-15

    Highlights: ► Exploit of DHS algorithm in LP optimization reveals its flexibility, robustness and reliability. ► Upshot of our experiments with DHS shows that the search approach to optimal LP is quickly. ► On the average, the final band width of DHS fitness values is narrow relative to HS and GHS. -- Abstract: The objective of this work is to develop a core loading optimization technique using differential harmony search algorithm in the context of obtaining an optimal configuration of fuel assemblies in pressurized water reactors. To implement and evaluate the proposed technique, differential harmony search nodal expansion package for 2-D geometry, DHSNEP-2D, is developed. The package includes two modules; in the first modules differential harmony search (DHS) is implemented and nodal expansion code which solves two dimensional-multi group neutron diffusion equations using fourth degree flux expansion with one node per a fuel assembly is in the second module. For evaluation of DHS algorithm, classical harmony search (HS) and global-best harmony search (GHS) algorithms are also included in DHSNEP-2D in order to compare the outcome of techniques together. For this purpose, two PWR test cases have been investigated to demonstrate the DHS algorithm capability in obtaining near optimal loading pattern. Results show that the convergence rate of DHS and execution times are quite promising and also is reliable for the fuel management operation. Moreover, numerical results show the good performance of DHS relative to other competitive algorithms such as genetic algorithm (GA), classical harmony search (HS) and global-best harmony search (GHS) algorithms.

  20. Differential harmony search algorithm to optimize PWRs loading pattern

    International Nuclear Information System (INIS)

    Poursalehi, N.; Zolfaghari, A.; Minuchehr, A.

    2013-01-01

    Highlights: ► Exploit of DHS algorithm in LP optimization reveals its flexibility, robustness and reliability. ► Upshot of our experiments with DHS shows that the search approach to optimal LP is quickly. ► On the average, the final band width of DHS fitness values is narrow relative to HS and GHS. -- Abstract: The objective of this work is to develop a core loading optimization technique using differential harmony search algorithm in the context of obtaining an optimal configuration of fuel assemblies in pressurized water reactors. To implement and evaluate the proposed technique, differential harmony search nodal expansion package for 2-D geometry, DHSNEP-2D, is developed. The package includes two modules; in the first modules differential harmony search (DHS) is implemented and nodal expansion code which solves two dimensional-multi group neutron diffusion equations using fourth degree flux expansion with one node per a fuel assembly is in the second module. For evaluation of DHS algorithm, classical harmony search (HS) and global-best harmony search (GHS) algorithms are also included in DHSNEP-2D in order to compare the outcome of techniques together. For this purpose, two PWR test cases have been investigated to demonstrate the DHS algorithm capability in obtaining near optimal loading pattern. Results show that the convergence rate of DHS and execution times are quite promising and also is reliable for the fuel management operation. Moreover, numerical results show the good performance of DHS relative to other competitive algorithms such as genetic algorithm (GA), classical harmony search (HS) and global-best harmony search (GHS) algorithms

  1. PWR loading pattern optimization using Harmony Search algorithm

    International Nuclear Information System (INIS)

    Poursalehi, N.; Zolfaghari, A.; Minuchehr, A.

    2013-01-01

    Highlights: ► Numerical results reveal that the HS method is reliable. ► The great advantage of HS is significant gain in computational cost. ► On the average, the final band width of search fitness values is narrow. ► Our experiments show that the search approaches the optimal value fast. - Abstract: In this paper a core reloading technique using Harmony Search, HS, is presented in the context of finding an optimal configuration of fuel assemblies, FA, in pressurized water reactors. To implement and evaluate the proposed technique a Harmony Search along Nodal Expansion Code for 2-D geometry, HSNEC2D, is developed to obtain nearly optimal arrangement of fuel assemblies in PWR cores. This code consists of two sections including Harmony Search algorithm and Nodal Expansion modules using fourth degree flux expansion which solves two dimensional-multi group diffusion equations with one node per fuel assembly. Two optimization test problems are investigated to demonstrate the HS algorithm capability in converging to near optimal loading pattern in the fuel management field and other subjects. Results, convergence rate and reliability of the method are quite promising and show the HS algorithm performs very well and is comparable to other competitive algorithms such as Genetic Algorithm and Particle Swarm Intelligence. Furthermore, implementation of nodal expansion technique along HS causes considerable reduction of computational time to process and analysis optimization in the core fuel management problems

  2. Optimal Refueling Pattern Search for a CANDU Reactor Using a Genetic Algorithm

    International Nuclear Information System (INIS)

    Quang Binh, DO; Gyuhong, ROH; Hangbok, CHOI

    2006-01-01

    This paper presents the results from the application of genetic algorithms to a refueling optimization of a Canada deuterium uranium (CANDU) reactor. This work aims at making a mathematical model of the refueling optimization problem including the objective function and constraints and developing a method based on genetic algorithms to solve the problem. The model of the optimization problem and the proposed method comply with the key features of the refueling strategy of the CANDU reactor which adopts an on-power refueling operation. In this study, a genetic algorithm combined with an elitism strategy was used to automatically search for the refueling patterns. The objective of the optimization was to maximize the discharge burn-up of the refueling bundles, minimize the maximum channel power, or minimize the maximum change in the zone controller unit (ZCU) water levels. A combination of these objectives was also investigated. The constraints include the discharge burn-up, maximum channel power, maximum bundle power, channel power peaking factor and the ZCU water level. A refueling pattern that represents the refueling rate and channels was coded by a one-dimensional binary chromosome, which is a string of binary numbers 0 and 1. A computer program was developed in FORTRAN 90 running on an HP 9000 workstation to conduct the search for the optimal refueling patterns for a CANDU reactor at the equilibrium state. The results showed that it was possible to apply genetic algorithms to automatically search for the refueling channels of the CANDU reactor. The optimal refueling patterns were compared with the solutions obtained from the AUTOREFUEL program and the results were consistent with each other. (authors)

  3. CUDA-accelerated genetic feedforward-ANN training for data mining

    International Nuclear Information System (INIS)

    Patulea, Catalin; Peace, Robert; Green, James

    2010-01-01

    We present an implementation of genetic algorithm (GA) training of feedforward artificial neural networks (ANNs) targeting commodity graphics cards (GPUs). By carefully mapping the problem onto the unique GPU architecture, we achieve order-of-magnitude speedup over a conventional CPU implementation. Furthermore, we show that the speedup is consistent across a wide range of data set sizes, making this implementation ideal for large data sets. This performance boost enables the genetic algorithm to search a larger subset of the solution space, which results in more accurate pattern classification. Finally, we demonstrate this method in the context of the 2009 UC San Diego Data Mining Contest, achieving a world-class lift on a data set of 94682 e-commerce transactions.

  4. CUDA-accelerated genetic feedforward-ANN training for data mining

    Energy Technology Data Exchange (ETDEWEB)

    Patulea, Catalin; Peace, Robert; Green, James, E-mail: cpatulea@sce.carleton.ca, E-mail: rpeace@sce.carleton.ca, E-mail: jrgreen@sce.carleton.ca [School of Systems and Computer Engineering, Carleton University, Ottawa, K1S 5B6 (Canada)

    2010-11-01

    We present an implementation of genetic algorithm (GA) training of feedforward artificial neural networks (ANNs) targeting commodity graphics cards (GPUs). By carefully mapping the problem onto the unique GPU architecture, we achieve order-of-magnitude speedup over a conventional CPU implementation. Furthermore, we show that the speedup is consistent across a wide range of data set sizes, making this implementation ideal for large data sets. This performance boost enables the genetic algorithm to search a larger subset of the solution space, which results in more accurate pattern classification. Finally, we demonstrate this method in the context of the 2009 UC San Diego Data Mining Contest, achieving a world-class lift on a data set of 94682 e-commerce transactions.

  5. Continuous firefly algorithm applied to PWR core pattern enhancement

    International Nuclear Information System (INIS)

    Poursalehi, N.; Zolfaghari, A.; Minuchehr, A.; Moghaddam, H.K.

    2013-01-01

    Highlights: ► Numerical results indicate the reliability of CFA for the nuclear reactor LPO. ► The major advantages of CFA are its light computational cost and fast convergence. ► Our experiments demonstrate the ability of CFA to obtain the near optimal loading pattern. -- Abstract: In this research, the new meta-heuristic optimization strategy, firefly algorithm, is developed for the nuclear reactor loading pattern optimization problem. Two main goals in reactor core fuel management optimization are maximizing the core multiplication factor (K eff ) in order to extract the maximum cycle energy and minimizing the power peaking factor due to safety constraints. In this work, we define a multi-objective fitness function according to above goals for the core fuel arrangement enhancement. In order to evaluate and demonstrate the ability of continuous firefly algorithm (CFA) to find the near optimal loading pattern, we developed CFA nodal expansion code (CFANEC) for the fuel management operation. This code consists of two main modules including CFA optimization program and a developed core analysis code implementing nodal expansion method to calculate with coarse meshes by dimensions of fuel assemblies. At first, CFA is applied for the Foxholes test case with continuous variables in order to validate CFA and then for KWU PWR using a decoding strategy for discrete variables. Results indicate the efficiency and relatively fast convergence of CFA in obtaining near optimal loading pattern with respect to considered fitness function. At last, our experience with the CFA confirms that the CFA is easy to implement and reliable

  6. Continuous firefly algorithm applied to PWR core pattern enhancement

    Energy Technology Data Exchange (ETDEWEB)

    Poursalehi, N., E-mail: npsalehi@yahoo.com [Engineering Department, Shahid Beheshti University, G.C., P.O. Box 1983963113, Tehran (Iran, Islamic Republic of); Zolfaghari, A.; Minuchehr, A.; Moghaddam, H.K. [Engineering Department, Shahid Beheshti University, G.C., P.O. Box 1983963113, Tehran (Iran, Islamic Republic of)

    2013-05-15

    Highlights: ► Numerical results indicate the reliability of CFA for the nuclear reactor LPO. ► The major advantages of CFA are its light computational cost and fast convergence. ► Our experiments demonstrate the ability of CFA to obtain the near optimal loading pattern. -- Abstract: In this research, the new meta-heuristic optimization strategy, firefly algorithm, is developed for the nuclear reactor loading pattern optimization problem. Two main goals in reactor core fuel management optimization are maximizing the core multiplication factor (K{sub eff}) in order to extract the maximum cycle energy and minimizing the power peaking factor due to safety constraints. In this work, we define a multi-objective fitness function according to above goals for the core fuel arrangement enhancement. In order to evaluate and demonstrate the ability of continuous firefly algorithm (CFA) to find the near optimal loading pattern, we developed CFA nodal expansion code (CFANEC) for the fuel management operation. This code consists of two main modules including CFA optimization program and a developed core analysis code implementing nodal expansion method to calculate with coarse meshes by dimensions of fuel assemblies. At first, CFA is applied for the Foxholes test case with continuous variables in order to validate CFA and then for KWU PWR using a decoding strategy for discrete variables. Results indicate the efficiency and relatively fast convergence of CFA in obtaining near optimal loading pattern with respect to considered fitness function. At last, our experience with the CFA confirms that the CFA is easy to implement and reliable.

  7. Instruction of pattern recognition by MATLAB practice 1

    International Nuclear Information System (INIS)

    1999-06-01

    This book describes the pattern recognition by MATLAB practice. It includes possibility and limit of AI, introduction of pattern recognition a vector and matrix, basic status and a probability theory, a random variable and probability distribution, statistical decision theory, data-mining, gaussian mixture model, a nerve cell modeling such as Hebb's learning rule, LMS learning rule, genetic algorithm, dynamic programming and DTW, HMN on Markov model and HMM's three problems and solution, introduction of SVM with KKT condition and margin optimum, kernel trick and MATLAB practice.

  8. Practical graph mining with R

    CERN Document Server

    Hendrix, William; Jenkins, John; Padmanabhan, Kanchana; Chakraborty, Arpan

    2014-01-01

    Practical Graph Mining with R presents a "do-it-yourself" approach to extracting interesting patterns from graph data. It covers many basic and advanced techniques for the identification of anomalous or frequently recurring patterns in a graph, the discovery of groups or clusters of nodes that share common patterns of attributes and relationships, the extraction of patterns that distinguish one category of graphs from another, and the use of those patterns to predict the category of new graphs. Hands-On Application of Graph Data Mining Each chapter in the book focuses on a graph mining task, such as link analysis, cluster analysis, and classification. Through applications using real data sets, the book demonstrates how computational techniques can help solve real-world problems. The applications covered include network intrusion detection, tumor cell diagnostics, face recognition, predictive toxicology, mining metabolic and protein-protein interaction networks, and community detection in social networks. De...

  9. In-depth motivic analysis based on multiparametric closed pattern and cyclic sequence mining

    DEFF Research Database (Denmark)

    Lartillot, Olivier

    2014-01-01

    presents a much simpler description and justification of this general strategy, as well as significant simplifications of the model, in particular concerning the management of pattern cyclicity. A new method for automated bundling of patterns belonging to same motivic or thematic classes is also presented....... The good performance of the method is shown through the analysis of a piece from the JKUPDD database. Ground-truth motives are detected, while additional relevant information completes the ground-truth musicological analysis. The system, implemented in Matlab, is made publicly available as part of Mining......Suite, a new open-source framework for audio and music analysis....

  10. A DATA-MINING BASED METHOD FOR THE GAIT PATTERN ANALYSIS

    Directory of Open Access Journals (Sweden)

    Marcelo Rudek

    2015-12-01

    Full Text Available The paper presents a method developed for the gait classification based on the analysis of the trajectory of the pressure centres (CoP extracted from the contact points of the feet with the ground during walking. The data acquirement is performed ba means of a walkway with embedded tactile sensors. The proposed method includes capturing procedures, standardization of data, creation of an organized repository (data warehouse, and development of a process mining. A graphical analysis is applied to looking at the footprint signature patterns. The aim is to obtain a visual interpretation of the grouping by situating it into the normal walking patterns or deviations associated with an individual way of walking. The method consists of data classification automation which divides them into healthy and non-healthy subjects in order to assist in rehabilitation treatments for the people with related mobility problems.

  11. Research reactor loading pattern optimization using estimation of distribution algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Jiang, S. [Dept. of Earth Science and Engineering, Applied Modeling and Computation Group AMCG, Imperial College, London, SW7 2AZ (United Kingdom); Ziver, K. [Dept. of Earth Science and Engineering, Applied Modeling and Computation Group AMCG, Imperial College, London, SW7 2AZ (United Kingdom); AMCG Group, RM Consultants, Abingdon (United Kingdom); Carter, J. N.; Pain, C. C.; Eaton, M. D.; Goddard, A. J. H. [Dept. of Earth Science and Engineering, Applied Modeling and Computation Group AMCG, Imperial College, London, SW7 2AZ (United Kingdom); Franklin, S. J.; Phillips, H. J. [Imperial College, Reactor Centre, Silwood Park, Buckhurst Road, Ascot, Berkshire, SL5 7TE (United Kingdom)

    2006-07-01

    A new evolutionary search based approach for solving the nuclear reactor loading pattern optimization problems is presented based on the Estimation of Distribution Algorithms. The optimization technique developed is then applied to the maximization of the effective multiplication factor (K{sub eff}) of the Imperial College CONSORT research reactor (the last remaining civilian research reactor in the United Kingdom). A new elitism-guided searching strategy has been developed and applied to improve the local convergence together with some problem-dependent information based on the 'stand-alone K{sub eff} with fuel coupling calculations. A comparison study between the EDAs and a Genetic Algorithm with Heuristic Tie Breaking Crossover operator has shown that the new algorithm is efficient and robust. (authors)

  12. An improved Pearson's correlation proximity-based hierarchical clustering for mining biological association between genes.

    Science.gov (United States)

    Booma, P M; Prabhakaran, S; Dhanalakshmi, R

    2014-01-01

    Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality.

  13. Application of data mining techniques to explore predictors of HCC in Egyptian patients with HCV-related chronic liver disease.

    Science.gov (United States)

    Omran, Dalia Abd El Hamid; Awad, AbuBakr Hussein; Mabrouk, Mahasen Abd El Rahman; Soliman, Ahmad Fouad; Aziz, Ashraf Omar Abdel

    2015-01-01

    Hepatocellular carcinoma (HCC) is the second most common malignancy in Egypt. Data mining is a method of predictive analysis which can explore tremendous volumes of information to discover hidden patterns and relationships. Our aim here was to develop a non-invasive algorithm for prediction of HCC. Such an algorithm should be economical, reliable, easy to apply and acceptable by domain experts. This cross-sectional study enrolled 315 patients with hepatitis C virus (HCV) related chronic liver disease (CLD); 135 HCC, 116 cirrhotic patients without HCC and 64 patients with chronic hepatitis C. Using data mining analysis, we constructed a decision tree learning algorithm to predict HCC. The decision tree algorithm was able to predict HCC with recall (sensitivity) of 83.5% and precession (specificity) of 83.3% using only routine data. The correctly classified instances were 259 (82.2%), and the incorrectly classified instances were 56 (17.8%). Out of 29 attributes, serum alpha fetoprotein (AFP), with an optimal cutoff value of ≥50.3 ng/ml was selected as the best predictor of HCC. To a lesser extent, male sex, presence of cirrhosis, AST>64U/L, and ascites were variables associated with HCC. Data mining analysis allows discovery of hidden patterns and enables the development of models to predict HCC, utilizing routine data as an alternative to CT and liver biopsy. This study has highlighted a new cutoff for AFP (≥50.3 ng/ml). Presence of a score of >2 risk variables (out of 5) can successfully predict HCC with a sensitivity of 96% and specificity of 82%.

  14. Using data mining and OLAP to discover patterns in a database of patients with Y-chromosome deletions.

    Science.gov (United States)

    Dzeroski, S; Hristovski, D; Peterlin, B

    2000-01-01

    The paper presents a database of published Y chromosome deletions and the results of analyzing the database with data mining techniques. The database describes 382 patients for which 177 different markers were tested: 364 of the 382 patients had deletions. Two data mining techniques, clustering and decision tree induction were used. Clustering was used to group patients according to the overall presence/absence of deletions at the tested markers. Decision trees and On-Line-Analytical-Processing (OLAP) were used to inspect the resulting clustering and look for correlations between deletion patterns, populations and the clinical picture of infertility. The results of the analysis indicate that there are correlations between deletion patterns and patient populations, as well as clinical phenotype severity.

  15. An optimization framework for process discovery algorithms

    NARCIS (Netherlands)

    Weijters, A.J.M.M.; Stahlbock, R.

    2011-01-01

    Today there are many process mining techniques that, based on an event log, allow for the automatic induction of a process model. The process mining algorithms that are able to deal with incomplete event logs, exceptions, and noise typically have many parameters to tune the algorithm. Therefore, the

  16. A Boyer-Moore (or Watson-Watson) type algorithm for regular tree pattern matching

    NARCIS (Netherlands)

    Watson, B.W.; Aarts, E.H.L.; Eikelder, ten H.M.M.; Hemerik, C.; Rem, M.

    1995-01-01

    In this chapter, I outline a new algorithm for regular tree pattern matching. The existence of this algorithm was first mentioned in the statements accompanying my dissertation, [2]. In order to avoid repeating the material in my dissertation, it is assumed that the reader is familiar with Chapters

  17. Graph Mining Meets the Semantic Web

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Sangkeun (Matt) [ORNL; Sukumar, Sreenivas R [ORNL; Lim, Seung-Hwan [ORNL

    2015-01-01

    The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today, data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. We address that need through implementation of three popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, and PageRank). We implement these algorithms as SPARQL queries, wrapped within Python scripts. We evaluate the performance of our implementation on 6 real world data sets and show graph mining algorithms (that have a linear-algebra formulation) can indeed be unleashed on data represented as RDF graphs using the SPARQL query interface.

  18. Reloading pattern optimization of VVER-1000 reactors in transient cycles using genetic algorithm

    International Nuclear Information System (INIS)

    Rahmani, Yashar

    2017-01-01

    Highlights: • The genetic algorithm (GA) and the innovative weighting factors method were used. • The coupling of WIMSD5-B and CITATION-LDI2 neutronic codes with the thermohydraulic WERL code was employed. • Optimization of reloading patterns was carried out in two states. • First an arrangement with satisfactory excess reactivity and the flattest power distribution was searched. • Second, it is tried to obtain an arrangement with satisfactory safety threshold and the maximum K_e_f_f. - Abstract: The present paper proposes application of the genetic algorithm (GA) and the innovative weighting factor method to optimize the reloading pattern of Bushehr VVER-1000 reactor in the second cycle. To estimate the composition of fuel assemblies remaining from the first cycle and precisely calculate the objective parameters of each reloading pattern in the second cycle, coupling of WIMSD5-B and CITATION-LDI2 codes in the neutronic section and the WERL code in the thermo-hydraulic section was employed. Optimization of the reloading patterns was carried out in two states. To meet the mentioned objective, with application of the weighting factor method in the first state, the type and quantity of the loadable fresh assemblies were determined to enable the reactor core to maintain the core criticality over the entire cycle length. Afterwards, the genetic algorithm was used to optimize the reloading pattern of the reactor to obtain an arrangement with flat radial power distribution. In the second state, the optimization algorithm was free to select the type and number of fresh fuel assemblies to be able to search for an arrangement with the maximum effective multiplication factor and the safe power peaking factor. In addition, in order to ensure the safety and desirability of the proposed patterns in both states, a time-dependent examination of the thermo-neutronic behavior of the reactor core was carried out during the second cycle. With consideration of the new

  19. Analysis of the electrical disturbances in CERN power distribution network with pattern mining methods

    CERN Document Server

    Abramenko, Oleksii

    2017-01-01

    The current research focuses on the perturbations within the electrical network of the LHC and its subsystems by analyzing measurements collected from oscilloscopes installed across different CERN sites, and alarms by electrical equipments. We analyze amplitude and duration of the glitches and, together with other relevant variables, correlate them with beam stopping events. The work also tries to identify assets affected by such perturbations using data mining and, in particular, frequent pattern mining methods. On the practical side we summarize results of our work by putting forward a prototype of a software tool enabling online monitoring of the alarms coming from the electrical network and facilitating glitch detection and analysis by a technical operator.

  20. Big data mining analysis method based on cloud computing

    Science.gov (United States)

    Cai, Qing Qiu; Cui, Hong Gang; Tang, Hao

    2017-08-01

    Information explosion era, large data super-large, discrete and non-(semi) structured features have gone far beyond the traditional data management can carry the scope of the way. With the arrival of the cloud computing era, cloud computing provides a new technical way to analyze the massive data mining, which can effectively solve the problem that the traditional data mining method cannot adapt to massive data mining. This paper introduces the meaning and characteristics of cloud computing, analyzes the advantages of using cloud computing technology to realize data mining, designs the mining algorithm of association rules based on MapReduce parallel processing architecture, and carries out the experimental verification. The algorithm of parallel association rule mining based on cloud computing platform can greatly improve the execution speed of data mining.

  1. Self-karaoke patterns: an interactive audio-visual system for handsfree live algorithm performance

    OpenAIRE

    Eldridge, Alice

    2014-01-01

    Self-karaoke Patterns, is an audiovisual study for improvised cello and live algorithms. The work is motivated in part by addressing the practical needs of the performer in ‘handsfree’ live algorithm contexts and in part an aesthetic concern with resolving the tension between conceptual dedication to autonomous algorithms and musical dedication to coherent performance. The elected approach is inspired by recent work investing the role of ‘shape’ in musical performance.

  2. Mining Frequent Item Sets in Asynchronous Transactional Data Streams over Time Sensitive Sliding Windows Model

    International Nuclear Information System (INIS)

    Javaid, Q.; Memon, F.; Talpur, S.; Arif, M.; Awan, M.D.

    2016-01-01

    EPs (Extracting Frequent Patterns) from the continuous transactional data streams is a challenging and critical task in some of the applications, such as web mining, data analysis and retail market, prediction and network monitoring, or analysis of stock market exchange data. Many algorithms have been developed previously for mining FPs (Frequent Patterns) from a data stream. Such algorithms are currently highly required to develop new solutions and approaches to the precise handling of data streams. New techniques, solutions, or approaches are developed to address unbounded, ordered, and continuous sequences of data and for the generation of data at a rapid speed from data streams. Hence, extracting FPs using fresh or recent data involves the high-level analysis of data streams. We have suggested an efficient technique for the window sliding model; this technique extracts new and fresh FPs from high-speed data streams. In this study, a CPILT (Compacted Tree Compact Pattern Tree) is developed to capture the latest contents in the stream and to efficiently remove outdated contents from the data stream. The main concept introduced in this work on CPILT is the dynamic restructuring of a tree, which is helpful in producing a compacted tree and the frequency descending structure of a tree on runtime. With the help of the mining technique of FP growth, a complete list of new and fresh FPs is obtained from a CPILT using an existing window. The memory usage and time complexity of the latest FPs in high-speed data streams can efficiently be determined through proper experimentation and analysis. (author)

  3. Pattern Discovery in Time-Ordered Data; TOPICAL

    International Nuclear Information System (INIS)

    CONRAD, GREGORY N.; BRITANIK, JOHN M.; DELAND, SHARON M.; JENKIN, CHRISTINA L.

    2002-01-01

    This report describes the results of a Laboratory-Directed Research and Development project on techniques for pattern discovery in discrete event time series data. In this project, we explored two different aspects of the pattern matching/discovery problem. The first aspect studied was the use of Dynamic Time Warping for pattern matching in continuous data. In essence, DTW is a technique for aligning time series along the time axis to optimize the similarity measure. The second aspect studied was techniques for discovering patterns in discrete event data. We developed a pattern discovery tool based on adaptations of the A-priori and GSP (Generalized Sequential Pattern mining) algorithms. We then used the tool on three different application areas-unattended monitoring system data from a storage magazine, computer network intrusion detection, and analysis of robot training data

  4. Advanced and flexible genetic algorithms for BWR fuel loading pattern optimization

    International Nuclear Information System (INIS)

    Martin-del-Campo, Cecilia; Palomera-Perez, Miguel-Angel; Francois, Juan-Luis

    2009-01-01

    This work proposes advances in the implementation of a flexible genetic algorithm (GA) for fuel loading pattern optimization for Boiling Water Reactors (BWRs). In order to avoid specific implementations of genetic operators and to obtain a more flexible treatment, a binary representation of the solution was implemented; this representation had to take into account that a little change in the genotype must correspond to a little change in the phenotype. An identifier number is assigned to each assembly by means of a Gray Code of 7 bits and the solution (the loading pattern) is represented by a binary chain of 777 bits of length. Another important contribution is the use of a Fitness Function which includes a Heuristic Function and an Objective Function. The Heuristic Function which is defined to give flexibility on the application of a set of positioning rules based on knowledge, and the Objective Function that contains all the parameters which qualify the neutronic and thermal hydraulic performances of each loading pattern. Experimental results illustrating the effectiveness and flexibility of this optimization algorithm are presented and discussed.

  5. An efficient communication strategy for mobile agent based distributed spatial data mining application

    Science.gov (United States)

    Han, Guodong; Wang, Jiazhen

    2005-11-01

    An efficient communication strategy is proposed in this paper, which aims to improve the response time and availability of mobile agent based distributed spatial data mining applications. When dealing with decomposed complex data mining tasks or On-Line Analytical Processing (OLAP), mobile agents authorized by the specified user need to coordinate and cooperate with each other by employing given communication method to fulfill the subtasks delegated to them. Agent interactive behavior, e.g. messages passing, intermediate results exchanging and final results merging, must happen after the specified path is determined by executing given routing selection algorithm. Most of algorithms exploited currently run in time that grows approximately quadratic with the size of the input nodes where mobile agents migrate between. In order to gain enhanced communication performance by reducing the execution time of the decision algorithm, we propose an approach to reduce the number of nodes involved in the computation. In practice, hosts in the system are reorganized into groups in terms of the bandwidth between adjacent nodes. Then, we find an optimal node for each group with high bandwidth and powerful computing resources, which is managed by an agent dispatched by agent home node. With that, the communication pattern can be implemented at a higher level of abstraction and contribute to improving the overall performance of mobile agent based distributed spatial data mining applications.

  6. REGULAR PATTERN MINING (WITH JITTER ON WEIGHTED-DIRECTED DYNAMIC GRAPHS

    Directory of Open Access Journals (Sweden)

    A. GUPTA

    2017-02-01

    Full Text Available Real world graphs are mostly dynamic in nature, exhibiting time-varying behaviour in structure of the graph, weight on the edges and direction of the edges. Mining regular patterns in the occurrence of edge parameters gives an insight into the consumer trends over time in ecommerce co-purchasing networks. But such patterns need not necessarily be precise as in the case when some product goes out of stock or a group of customers becomes unavailable for a short period of time. Ignoring them may lead to loss of useful information and thus taking jitter into account becomes vital. To the best of our knowledge, no work has been yet reported to extract regular patterns considering a jitter of length greater than unity. In this article, we propose a novel method to find quasi regular patterns on weight and direction sequences of such graphs. The method involves analysing the dynamic network considering the inconsistencies in the occurrence of edges. It utilizes the relation between the occurrence sequence and the corresponding weight and direction sequences to speed up this process. Further, these patterns are used to determine the most central nodes (such as the most profit yielding products. To accomplish this we introduce the concept of dynamic closeness centrality and dynamic betweenness centrality. Experiments on Enron e-mail dataset and a synthetic dynamic network show that the presented approach is efficient, so it can be used to find patterns in large scale networks consisting of many timestamps.

  7. FraudMiner: A Novel Credit Card Fraud Detection Model Based on Frequent Itemset Mining

    Directory of Open Access Journals (Sweden)

    K. R. Seeja

    2014-01-01

    Full Text Available This paper proposes an intelligent credit card fraud detection model for detecting fraud from highly imbalanced and anonymous credit card transaction datasets. The class imbalance problem is handled by finding legal as well as fraud transaction patterns for each customer by using frequent itemset mining. A matching algorithm is also proposed to find to which pattern (legal or fraud the incoming transaction of a particular customer is closer and a decision is made accordingly. In order to handle the anonymous nature of the data, no preference is given to any of the attributes and each attribute is considered equally for finding the patterns. The performance evaluation of the proposed model is done on UCSD Data Mining Contest 2009 Dataset (anonymous and imbalanced and it is found that the proposed model has very high fraud detection rate, balanced classification rate, Matthews correlation coefficient, and very less false alarm rate than other state-of-the-art classifiers.

  8. FraudMiner: a novel credit card fraud detection model based on frequent itemset mining.

    Science.gov (United States)

    Seeja, K R; Zareapoor, Masoumeh

    2014-01-01

    This paper proposes an intelligent credit card fraud detection model for detecting fraud from highly imbalanced and anonymous credit card transaction datasets. The class imbalance problem is handled by finding legal as well as fraud transaction patterns for each customer by using frequent itemset mining. A matching algorithm is also proposed to find to which pattern (legal or fraud) the incoming transaction of a particular customer is closer and a decision is made accordingly. In order to handle the anonymous nature of the data, no preference is given to any of the attributes and each attribute is considered equally for finding the patterns. The performance evaluation of the proposed model is done on UCSD Data Mining Contest 2009 Dataset (anonymous and imbalanced) and it is found that the proposed model has very high fraud detection rate, balanced classification rate, Matthews correlation coefficient, and very less false alarm rate than other state-of-the-art classifiers.

  9. Integrating Entropy and Closed Frequent Pattern Mining for Social Network Modelling and Analysis

    Science.gov (United States)

    Adnan, Muhaimenul; Alhajj, Reda; Rokne, Jon

    The recent increase in the explicitly available social networks has attracted the attention of the research community to investigate how it would be possible to benefit from such a powerful model in producing effective solutions for problems in other domains where the social network is implicit; we argue that social networks do exist around us but the key issue is how to realize and analyze them. This chapter presents a novel approach for constructing a social network model by an integrated framework that first preparing the data to be analyzed and then applies entropy and frequent closed patterns mining for network construction. For a given problem, we first prepare the data by identifying items and transactions, which arc the basic ingredients for frequent closed patterns mining. Items arc main objects in the problem and a transaction is a set of items that could exist together at one time (e.g., items purchased in one visit to the supermarket). Transactions could be analyzed to discover frequent closed patterns using any of the well-known techniques. Frequent closed patterns have the advantage that they successfully grab the inherent information content of the dataset and is applicable to a broader set of domains. Entropies of the frequent closed patterns arc used to keep the dimensionality of the feature vectors to a reasonable size; it is a kind of feature reduction process. Finally, we analyze the dynamic behavior of the constructed social network. Experiments were conducted on a synthetic dataset and on the Enron corpus email dataset. The results presented in the chapter show that social networks extracted from a feature set as frequent closed patterns successfully carry the community structure information. Moreover, for the Enron email dataset, we present an analysis to dynamically indicate the deviations from each user's individual and community profile. These indications of deviations can be very useful to identify unusual events.

  10. Mining a database of single amplified genomes from Red Sea brine pool extremophiles—improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA)

    Science.gov (United States)

    Grötzinger, Stefan W.; Alam, Intikhab; Ba Alawi, Wail; Bajic, Vladimir B.; Stingl, Ulrich; Eppinger, Jörg

    2014-01-01

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available

  11. Mining a database of single amplified genomes from Red Sea brine pool extremophiles-improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA).

    KAUST Repository

    Grötzinger, Stefan W.

    2014-04-07

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile\\'s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available

  12. Web Mining and Social Networking

    CERN Document Server

    Xu, Guandong; Li, Lin

    2011-01-01

    This book examines the techniques and applications involved in the Web Mining, Web Personalization and Recommendation and Web Community Analysis domains, including a detailed presentation of the principles, developed algorithms, and systems of the research in these areas. The applications of web mining, and the issue of how to incorporate web mining into web personalization and recommendation systems are also reviewed. Additionally, the volume explores web community mining and analysis to find the structural, organizational and temporal developments of web communities and reveal the societal s

  13. Genetic algorithm for the optimization of the loading pattern for reactor core fuel management

    International Nuclear Information System (INIS)

    Zhou Sheng; Hu Yongming; zheng Wenxiang

    2000-01-01

    The paper discusses the application of a genetic algorithm to the optimization of the loading pattern for in-core fuel management with the NP characteristics. The algorithm develops a matrix model for the fuel assembly loading pattern. The burnable poisons matrix was assigned randomly considering the distributed nature of the poisons. A method based on the traveling salesman problem was used to solve the problem. A integrated code for in-core fuel management was formed by combining this code with a reactor physics code

  14. Mining the Relationship between Spatial Mobility Patterns and POIs

    Directory of Open Access Journals (Sweden)

    Liping Huang

    2018-01-01

    Full Text Available Passengers move between urban places for diverse interests and drive the metropolitan regions as the aggregation of urban places to group into network communities. This paper aims to examine the relationship between the spatial patterns (represented by the network communities of mobility flows and places of interest (POIs. Furtherly, it intends to identify the categories of POIs that play the most significant role in shaping the spatial patterns of mobility flows. To achieve these purposes, we partition the study area into disjoint regions and construct the network with each partitioned region as a node and connection between them as links weighted by the mobility flows. The community detection algorithm is implemented on the network to discover spatial mobility patterns, and the multiclass classification based on the logistic regression method is adopted to classify spatial communities featured by POIs. Taking the taxi systems of Shanghai and Beijing as examples, we detect spatial communities based on the movement strengths among regions. Then we investigate their correlations with POIs. It finds that communities’ modularity correlates linearly with POIs; particularly governments, hotels, and the traffic facilities are of the most significance for generating the mobility patterns. This study can provide valuable insight into understanding the spatial mobility patterns from the perspective of POIs.

  15. Mining Building Metadata by Data Stream Comparison

    DEFF Research Database (Denmark)

    Holmegaard, Emil; Kjærgaard, Mikkel Baun

    2016-01-01

    to handle data streams with only slightly similar patterns. We have evaluated Metafier with points and data from one building located in Denmark. We have evaluated Metafier with 903 points, and the overall accuracy, with only 3 known examples, was 94.71%. Furthermore we found that using DTW for mining...... ways to annotate sensor and actuation points. This makes it difficult to create intuitive queries for retrieving data streams from points. Another problem is the amount of insufficient or missing metadata. We introduce Metafier, a tool for extracting metadata from comparing data streams. Metafier...... enables a semi-automatic labeling of metadata to building instrumentation. Metafier annotates points with metadata by comparing the data from a set of validated points with unvalidated points. Metafier has three different algorithms to compare points with based on their data. The three algorithms...

  16. High utility-itemset mining and privacy-preserving utility mining

    Directory of Open Access Journals (Sweden)

    Jerry Chun-Wei Lin

    2016-03-01

    Full Text Available In recent decades, high-utility itemset mining (HUIM has emerging a critical research topic since the quantity and profit factors are both concerned to mine the high-utility itemsets (HUIs. Generally, data mining is commonly used to discover interesting and useful knowledge from massive data. It may, however, lead to privacy threats if private or secure information (e.g., HUIs are published in the public place or misused. In this paper, we focus on the issues of HUIM and privacy-preserving utility mining (PPUM, and present two evolutionary algorithms to respectively mine HUIs and hide the sensitive high-utility itemsets in PPUM. Extensive experiments showed that the two proposed models for the applications of HUIM and PPUM can not only generate the high quality profitable itemsets according to the user-specified minimum utility threshold, but also enable the capability of privacy preserving for private or secure information (e.g., HUIs in real-word applications.

  17. Algorithms for Academic Search and Recommendation Systems

    DEFF Research Database (Denmark)

    Amolochitis, Emmanouil

    2014-01-01

    are part of a developed Movie Recommendation system, the first such system to be commercially deployed in Greece by a major Triple Play services provider. In the third part of the work we present the design of a quantitative association rule mining algorithm. The introduced mining algorithm processes......In this work we present novel algorithms for academic search, recommendation and association rules mining. In the first part of the work we introduce a novel hierarchical heuristic scheme for re-ranking academic publications. The scheme is based on the hierarchical combination of a custom...... implementation of the term frequency heuristic, a time-depreciated citation score and a graph-theoretic computed score that relates the paper’s index terms with each other. On the second part we describe the design of hybrid recommender ensemble (user, item and content based). The newly introduced algorithms...

  18. Kajian Algoritma Sequential Pattern Mining Dan Market Basket Analysis Dalam Pengenalan Pola Belanja Customer Untuk Layout Toko

    Directory of Open Access Journals (Sweden)

    Rusito Rusito

    2016-01-01

    Full Text Available Penelitian ini membahas tentang keterkaitan antar item yang dibeli oleh customer dalam toko ritel. Pengetahuan keterkaitan item yang dibeli dapat digunakan untuk  menentukan tata letak barang dagangan toko ritel. Hal ini penting agar konsumen dapat mudah mendapatkan barang yang dibutuhkan. Sehingga dapat meningkatkan omzet penjualan toko ritel sehingga akhirnya menambah keuntungan bagi pemilik toko ritel. Teknik yang digunakan untuk menyelesaikan penggalian data dan keterkaitan pembelian tersebut menggunakan pendekatan Association rule dan Market Basket Analysis. Sedangkan untuk mencari keterkaitan item tersebut digunakan algoritma Sequential Pattern Mining. Digunakan karena mampu menangani jumlah database yang besar dan sangat baik disisi kecepatan pemrosesan. Berbagai aplikasi telah diidentifikasi, termasuk misalnya, cross-selling, analisis situs Web, pendukung keputusan, evaluasi kredit, acara prediksi kriminal, analisis perilaku pelanggan  dan deteksi penipuan. Dari penelitian yang telah dilakukan diperoleh  pola-pola belanja customer untuk membentuk suatu layout display dalam toko ritel. Penelitian ini juga menyajikan suatu kerja algoritma yang lebih efektif dari algoritma asli karena terdapat pembatasan perulangan. Untuk kombinasi maksimal 5 item dengan waktu eksekusi 421.06 detik untuk 200 nota.   Kata kunci : Data Mining, Algoritma Sequential Pattern Mining, Market Basket Analysis, Apriori, Layout, Toko Ritel

  19. A novel procedure on next generation sequencing data analysis using text mining algorithm.

    Science.gov (United States)

    Zhao, Weizhong; Chen, James J; Perkins, Roger; Wang, Yuping; Liu, Zhichao; Hong, Huixiao; Tong, Weida; Zou, Wen

    2016-05-13

    Next-generation sequencing (NGS) technologies have provided researchers with vast possibilities in various biological and biomedical research areas. Efficient data mining strategies are in high demand for large scale comparative and evolutional studies to be performed on the large amounts of data derived from NGS projects. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. We report a novel procedure to analyse NGS data using topic modeling. It consists of four major procedures: NGS data retrieval, preprocessing, topic modeling, and data mining using Latent Dirichlet Allocation (LDA) topic outputs. The NGS data set of the Salmonella enterica strains were used as a case study to show the workflow of this procedure. The perplexity measurement of the topic numbers and the convergence efficiencies of Gibbs sampling were calculated and discussed for achieving the best result from the proposed procedure. The output topics by LDA algorithms could be treated as features of Salmonella strains to accurately describe the genetic diversity of fliC gene in various serotypes. The results of a two-way hierarchical clustering and data matrix analysis on LDA-derived matrices successfully classified Salmonella serotypes based on the NGS data. The implementation of topic modeling in NGS data analysis procedure provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data. The implementation of topic modeling in NGS data analysis provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data.

  20. Text Clustering Algorithm Based on Random Cluster Core

    Directory of Open Access Journals (Sweden)

    Huang Long-Jun

    2016-01-01

    Full Text Available Nowadays clustering has become a popular text mining algorithm, but the huge data can put forward higher requirements for the accuracy and performance of text mining. In view of the performance bottleneck of traditional text clustering algorithm, this paper proposes a text clustering algorithm with random features. This is a kind of clustering algorithm based on text density, at the same time using the neighboring heuristic rules, the concept of random cluster is introduced, which effectively reduces the complexity of the distance calculation.

  1. Multicycle Optimization of Advanced Gas-Cooled Reactor Loading Patterns Using Genetic Algorithms

    International Nuclear Information System (INIS)

    Ziver, A. Kemal; Carter, Jonathan N.; Pain, Christopher C.; Oliveira, Cassiano R.E. de; Goddard, Antony J. H.; Overton, Richard S.

    2003-01-01

    A genetic algorithm (GA)-based optimizer (GAOPT) has been developed for in-core fuel management of advanced gas-cooled reactors (AGRs) at HINKLEY B and HARTLEPOOL, which employ on-load and off-load refueling, respectively. The optimizer has been linked to the reactor analysis code PANTHER for the automated evaluation of loading patterns in a two-dimensional geometry, which is collapsed from the three-dimensional reactor model. GAOPT uses a directed stochastic (Monte Carlo) algorithm to generate initial population members, within predetermined constraints, for use in GAs, which apply the standard genetic operators: selection by tournament, crossover, and mutation. The GAOPT is able to generate and optimize loading patterns for successive reactor cycles (multicycle) within acceptable CPU times even on single-processor systems. The algorithm allows radial shuffling of fuel assemblies in a multicycle refueling optimization, which is constructed to aid long-term core management planning decisions. This paper presents the application of the GA-based optimization to two AGR stations, which apply different in-core management operational rules. Results obtained from the testing of GAOPT are discussed

  2. Setup of a testing environment for mission planning in mining

    NARCIS (Netherlands)

    Groenen, J.P.J.; Steinbuch, M.

    2013-01-01

    Mission planning algorithms for surface mining applications are difficult to test as a result of the large scale tasks. To validate these algorithms, a scaled setup is created where the mining excavator is mimicked by an industrial robot. This report discusses the development of a software

  3. World Wide Web Usage Mining Systems and Technologies

    Directory of Open Access Journals (Sweden)

    Wen-Chen Hu

    2003-08-01

    Full Text Available Web usage mining is used to discover interesting user navigation patterns and can be applied to many real-world problems, such as improving Web sites/pages, making additional topic or product recommendations, user/customer behavior studies, etc. This article provides a survey and analysis of current Web usage mining systems and technologies. A Web usage mining system performs five major tasks: i data gathering, ii data preparation, iii navigation pattern discovery, iv pattern analysis and visualization, and v pattern applications. Each task is explained in detail and its related technologies are introduced. A list of major research systems and projects concerning Web usage mining is also presented, and a summary of Web usage mining is given in the last section.

  4. Applied data mining

    CERN Document Server

    Xu, Guandong

    2013-01-01

    Data mining has witnessed substantial advances in recent decades. New research questions and practical challenges have arisen from emerging areas and applications within the various fields closely related to human daily life, e.g. social media and social networking. This book aims to bridge the gap between traditional data mining and the latest advances in newly emerging information services. It explores the extension of well-studied algorithms and approaches into these new research arenas.

  5. pubmed. mineR: An R package with text-mining algorithms to ...

    Indian Academy of Sciences (India)

    2016-08-26

    Aug 26, 2016 ... Three case studies are presented, namely, `Evolving role of diabetes educators', `Cancer risk assessment' and `Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus ...

  6. Online Nonparametric Bayesian Activity Mining and Analysis From Surveillance Video.

    Science.gov (United States)

    Bastani, Vahid; Marcenaro, Lucio; Regazzoni, Carlo S

    2016-05-01

    A method for online incremental mining of activity patterns from the surveillance video stream is presented in this paper. The framework consists of a learning block in which Dirichlet process mixture model is employed for the incremental clustering of trajectories. Stochastic trajectory pattern models are formed using the Gaussian process regression of the corresponding flow functions. Moreover, a sequential Monte Carlo method based on Rao-Blackwellized particle filter is proposed for tracking and online classification as well as the detection of abnormality during the observation of an object. Experimental results on real surveillance video data are provided to show the performance of the proposed algorithm in different tasks of trajectory clustering, classification, and abnormality detection.

  7. Physics Mining of Multi-Source Data Sets

    Science.gov (United States)

    Helly, John; Karimabadi, Homa; Sipes, Tamara

    2012-01-01

    Powerful new parallel data mining algorithms can produce diagnostic and prognostic numerical models and analyses from observational data. These techniques yield higher-resolution measures than ever before of environmental parameters by fusing synoptic imagery and time-series measurements. These techniques are general and relevant to observational data, including raster, vector, and scalar, and can be applied in all Earth- and environmental science domains. Because they can be highly automated and are parallel, they scale to large spatial domains and are well suited to change and gap detection. This makes it possible to analyze spatial and temporal gaps in information, and facilitates within-mission replanning to optimize the allocation of observational resources. The basis of the innovation is the extension of a recently developed set of algorithms packaged into MineTool to multi-variate time-series data. MineTool is unique in that it automates the various steps of the data mining process, thus making it amenable to autonomous analysis of large data sets. Unlike techniques such as Artificial Neural Nets, which yield a blackbox solution, MineTool's outcome is always an analytical model in parametric form that expresses the output in terms of the input variables. This has the advantage that the derived equation can then be used to gain insight into the physical relevance and relative importance of the parameters and coefficients in the model. This is referred to as physics-mining of data. The capabilities of MineTool are extended to include both supervised and unsupervised algorithms, handle multi-type data sets, and parallelize it.

  8. Declarative Process Mining for DCR Graphs

    DEFF Research Database (Denmark)

    Debois, Søren; Hildebrandt, Thomas T.; Laursen, Paw Høvsgaard

    2017-01-01

    We investigate process mining for the declarative Dynamic Condition Response (DCR) graphs process modelling language. We contribute (a) a process mining algorithm for DCR graphs, (b) a proposal for a set of metrics quantifying output model quality, and (c) a preliminary example-based comparison...

  9. Text Mining Applications and Theory

    CERN Document Server

    Berry, Michael W

    2010-01-01

    Text Mining: Applications and Theory presents the state-of-the-art algorithms for text mining from both the academic and industrial perspectives.  The contributors span several countries and scientific domains: universities, industrial corporations, and government laboratories, and demonstrate the use of techniques from machine learning, knowledge discovery, natural language processing and information retrieval to design computational models for automated text analysis and mining. This volume demonstrates how advancements in the fields of applied mathematics, computer science, machine learning

  10. Mining a database of single amplified genomes from Red Sea brine pool extremophiles – Improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA

    Directory of Open Access Journals (Sweden)

    Stefan Wolfgang Grötzinger

    2014-04-01

    Full Text Available Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs and poor homology of novel extremophile’s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the INDIGO data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes may translate into false positives when searching for specific functions. The Profile & Pattern Matching (PPM strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO-terms (which represent enzyme function profiles and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern. The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2,577 E.C. numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from 6 different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter and PROSITE IDs (pattern filter. Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website.

  11. Application of the distributed genetic algorithm for loading pattern optimization problems

    International Nuclear Information System (INIS)

    Hashimoto, Hiroshi; Yamamoto, Akio

    2000-01-01

    The distributed genetic algorithm (DGA) is applied for loading pattern optimization problems of the pressurized water reactors (PWR). Due to stiff nature of the loading pattern optimizations (e.g. multi-modality and non-linearity), stochastic methods like the simulated annealing or the genetic algorithm (GA) are widely applied for these problems. A basic concept of DGA is based on that of GA. However, DGA equally distributes candidates of solutions (i.e. loading patterns) to several independent 'islands' and evolves them in each island. Migrations of some candidates are performed among islands with a certain period. Since candidates of solutions independently evolve in each island with accepting different genes of migrants from other islands, premature convergence in the traditional GA can be prevented. Because many candidate loading patterns should be evaluated in one generation of GA or DGA, the parallelization in these calculations works efficiently. Parallel efficiency was measured using our optimization code and good load balance was attained even in a heterogeneous cluster environment due to dynamic distribution of the calculation load. The optimization code is based on the client/server architecture with the TCP/IP native socket and a client (optimization module) and calculation server modules communicate the objects of loading patterns each other. Throughout the sensitivity study on optimization parameters of DGA, a suitable set of the parameters for a test problem was identified. Finally, optimization capability of DGA and the traditional GA was compared in the test problem and DGA provided better optimization results than the traditional GA. (author)

  12. Implications of Emerging Data Mining

    Science.gov (United States)

    Kulathuramaiyer, Narayanan; Maurer, Hermann

    Data Mining describes a technology that discovers non-trivial hidden patterns in a large collection of data. Although this technology has a tremendous impact on our lives, the invaluable contributions of this invisible technology often go unnoticed. This paper discusses advances in data mining while focusing on the emerging data mining capability. Such data mining applications perform multidimensional mining on a wide variety of heterogeneous data sources, providing solutions to many unresolved problems. This paper also highlights the advantages and disadvantages arising from the ever-expanding scope of data mining. Data Mining augments human intelligence by equipping us with a wealth of knowledge and by empowering us to perform our daily tasks better. As the mining scope and capacity increases, users and organizations become more willing to compromise privacy. The huge data stores of the ‚master miners` allow them to gain deep insights into individual lifestyles and their social and behavioural patterns. Data integration and analysis capability of combining business and financial trends together with the ability to deterministically track market changes will drastically affect our lives.

  13. Multiwavelength Absolute Phase Retrieval from Noisy Diffractive Patterns: Wavelength Multiplexing Algorithm

    Directory of Open Access Journals (Sweden)

    Vladimir Katkovnik

    2018-05-01

    Full Text Available We study the problem of multiwavelength absolute phase retrieval from noisy diffraction patterns. The system is lensless with multiwavelength coherent input light beams and random phase masks applied for wavefront modulation. The light beams are formed by light sources radiating all wavelengths simultaneously. A sensor equipped by a Color Filter Array (CFA is used for spectral measurement registration. The developed algorithm targeted on optimal phase retrieval from noisy observations is based on maximum likelihood technique. The algorithm is specified for Poissonian and Gaussian noise distributions. One of the key elements of the algorithm is an original sparse modeling of the multiwavelength complex-valued wavefronts based on the complex-domain block-matching 3D filtering. Presented numerical experiments are restricted to noisy Poissonian observations. They demonstrate that the developed algorithm leads to effective solutions explicitly using the sparsity for noise suppression and enabling accurate reconstruction of absolute phase of high-dynamic range.

  14. Research of Improved Apriori Algorithm Based on Itemset Array

    Directory of Open Access Journals (Sweden)

    Naili Liu

    2013-06-01

    Full Text Available Mining frequent item sets is a major key process in data mining research. Apriori and many improved algorithms are lowly efficient because they need scan database many times and storage transaction ID in memory, so time and space overhead is very high. Especially, they are lower efficient when they process large scale database. The main task of the improved algorithm is to reduce time and space overhead for mining frequent item sets. Because, it scans database only once to generate binary item set array, it adopts binary instead of transaction ID when it storages transaction flag, it adopts logic AND operation to judge whether an item set is frequent item set. Moreover, the improved algorithm is more suitable for large scale database. Experimental results show that the improved algorithm has better efficiency than classic Apriori algorithm.

  15. Rating Algorithm for Pronunciation of English Based on Audio Feature Pattern Matching

    Directory of Open Access Journals (Sweden)

    Li Kun

    2015-01-01

    Full Text Available With the increasing internationalization of China, language communication has become an important channel for us to adapt to the political and economic environment. How to improve English learners’ language learning efficiency in limited conditions has turned into a problem demanding prompt solution at present. This paper applies two pronunciation patterns according to the actual needs of English pronunciation rating: to-be-evaluated pronunciation pattern and standard pronunciation pattern. It will translate the patterns into English pronunciation rating results through European distance. Besides, this paper will introduce the design philosophy of the whole algorithm in combination with CHMM matching pattern. Each link of the CHMM pattern will be given selective analysis while a contrast experiment between the CHMM matching pattern and the other two patterns will be conducted. From the experiment results, it can be concluded that CHMM pattern is the best option.

  16. EMiT: a process mining tool

    NARCIS (Netherlands)

    Dongen, van B.F.; Aalst, van der W.M.P.; Cortadella, J.; Reisig, W.

    2004-01-01

    Process mining offers a way to distill process models from event logs originating from transactional systems in logistics, banking, e-business, health-care, etc. The algorithms used for process mining are complex and in practise large logs are needed to derive a high-quality process model. To

  17. Use of a machine learning algorithm to classify expertise: analysis of hand motion patterns during a simulated surgical task.

    Science.gov (United States)

    Watson, Robert A

    2014-08-01

    To test the hypothesis that machine learning algorithms increase the predictive power to classify surgical expertise using surgeons' hand motion patterns. In 2012 at the University of North Carolina at Chapel Hill, 14 surgical attendings and 10 first- and second-year surgical residents each performed two bench model venous anastomoses. During the simulated tasks, the participants wore an inertial measurement unit on the dorsum of their dominant (right) hand to capture their hand motion patterns. The pattern from each bench model task performed was preprocessed into a symbolic time series and labeled as expert (attending) or novice (resident). The labeled hand motion patterns were processed and used to train a Support Vector Machine (SVM) classification algorithm. The trained algorithm was then tested for discriminative/predictive power against unlabeled (blinded) hand motion patterns from tasks not used in the training. The Lempel-Ziv (LZ) complexity metric was also measured from each hand motion pattern, with an optimal threshold calculated to separately classify the patterns. The LZ metric classified unlabeled (blinded) hand motion patterns into expert and novice groups with an accuracy of 70% (sensitivity 64%, specificity 80%). The SVM algorithm had an accuracy of 83% (sensitivity 86%, specificity 80%). The results confirmed the hypothesis. The SVM algorithm increased the predictive power to classify blinded surgical hand motion patterns into expert versus novice groups. With further development, the system used in this study could become a viable tool for low-cost, objective assessment of procedural proficiency in a competency-based curriculum.

  18. Grade Distribution Modeling within the Bauxite Seams of the Wachangping Mine, China, Using a Multi-Step Interpolation Algorithm

    Directory of Open Access Journals (Sweden)

    Shaofeng Wang

    2017-05-01

    Full Text Available Mineral reserve estimation and mining design depend on a precise modeling of the mineralized deposit. A multi-step interpolation algorithm, including 1D biharmonic spline estimator for interpolating floor altitudes, 2D nearest neighbor, linear, natural neighbor, cubic, biharmonic spline, inverse distance weighted, simple kriging, and ordinary kriging interpolations for grade distribution on the two vertical sections at roadways, and 3D linear interpolation for grade distribution between sections, was proposed to build a 3D grade distribution model of the mineralized seam in a longwall mining panel with a U-shaped layout having two roadways at both sides. Compared to field data from exploratory boreholes, this multi-step interpolation using a natural neighbor method shows an optimal stability and a minimal difference between interpolation and field data. Using this method, the 97,576 m3 of bauxite, in which the mass fraction of Al2O3 (Wa and the mass ratio of Al2O3 to SiO2 (Wa/s are 61.68% and 27.72, respectively, was delimited from the 189,260 m3 mineralized deposit in the 1102 longwall mining panel in the Wachangping mine, Southwest China. The mean absolute errors, the root mean squared errors and the relative standard deviations of errors between interpolated data and exploratory grade data at six boreholes are 2.544, 2.674, and 32.37% of Wa; and 1.761, 1.974, and 67.37% of Wa/s, respectively. The proposed method can be used for characterizing the grade distribution in a mineralized seam between two roadways at both sides of a longwall mining panel.

  19. Data Mining and Privacy of Social Network Sites' Users: Implications of the Data Mining Problem.

    Science.gov (United States)

    Al-Saggaf, Yeslam; Islam, Md Zahidul

    2015-08-01

    This paper explores the potential of data mining as a technique that could be used by malicious data miners to threaten the privacy of social network sites (SNS) users. It applies a data mining algorithm to a real dataset to provide empirically-based evidence of the ease with which characteristics about the SNS users can be discovered and used in a way that could invade their privacy. One major contribution of this article is the use of the decision forest data mining algorithm (SysFor) to the context of SNS, which does not only build a decision tree but rather a forest allowing the exploration of more logic rules from a dataset. One logic rule that SysFor built in this study, for example, revealed that anyone having a profile picture showing just the face or a picture showing a family is less likely to be lonely. Another contribution of this article is the discussion of the implications of the data mining problem for governments, businesses, developers and the SNS users themselves.

  20. Development of pattern recognition algorithms for the central drift chamber of the Belle II detector

    Energy Technology Data Exchange (ETDEWEB)

    Trusov, Viktor

    2016-11-04

    In this thesis, the development of one of the pattern recognition algorithms for the Belle II experiment based on conformal and Legendre transformations is presented. In order to optimize the performance of the algorithm (CPU time and efficiency) specialized processing steps have been introduced. To show achieved results, Monte-Carlo based efficiency measurements of the tracking algorithms in the Central Drift Chamber (CDC) has been done.

  1. A novel Neuro-fuzzy classification technique for data mining

    Directory of Open Access Journals (Sweden)

    Soumadip Ghosh

    2014-11-01

    Full Text Available In our study, we proposed a novel Neuro-fuzzy classification technique for data mining. The inputs to the Neuro-fuzzy classification system were fuzzified by applying generalized bell-shaped membership function. The proposed method utilized a fuzzification matrix in which the input patterns were associated with a degree of membership to different classes. Based on the value of degree of membership a pattern would be attributed to a specific category or class. We applied our method to ten benchmark data sets from the UCI machine learning repository for classification. Our objective was to analyze the proposed method and, therefore compare its performance with two powerful supervised classification algorithms Radial Basis Function Neural Network (RBFNN and Adaptive Neuro-fuzzy Inference System (ANFIS. We assessed the performance of these classification methods in terms of different performance measures such as accuracy, root-mean-square error, kappa statistic, true positive rate, false positive rate, precision, recall, and f-measure. In every aspect the proposed method proved to be superior to RBFNN and ANFIS algorithms.

  2. General asymmetric neutral networks and structure design by genetic algorithms: A learning rule for temporal patterns

    Energy Technology Data Exchange (ETDEWEB)

    Bornholdt, S. [Heidelberg Univ., (Germany). Inst., fuer Theoretische Physik; Graudenz, D. [Lawrence Berkeley Lab., CA (United States)

    1993-07-01

    A learning algorithm based on genetic algorithms for asymmetric neural networks with an arbitrary structure is presented. It is suited for the learning of temporal patterns and leads to stable neural networks with feedback.

  3. General asymmetric neutral networks and structure design by genetic algorithms: A learning rule for temporal patterns

    International Nuclear Information System (INIS)

    Bornholdt, S.

    1993-07-01

    A learning algorithm based on genetic algorithms for asymmetric neural networks with an arbitrary structure is presented. It is suited for the learning of temporal patterns and leads to stable neural networks with feedback

  4. Working with Data: Discovering Knowledge through Mining and Analysis; Systematic Knowledge Management and Knowledge Discovery; Text Mining; Methodological Approach in Discovering User Search Patterns through Web Log Analysis; Knowledge Discovery in Databases Using Formal Concept Analysis; Knowledge Discovery with a Little Perspective.

    Science.gov (United States)

    Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.

    2000-01-01

    These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)

  5. A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection.

    Directory of Open Access Journals (Sweden)

    Chun-Liang Lee

    Full Text Available The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms.

  6. Practical data mining and machine learning for optics applications: introduction to the feature issue.

    Science.gov (United States)

    Abdulla, Ghaleb; Awwal, Abdul; Borne, Kirk; Ho, Tin Kam; Vestrand, W Thomas

    2011-08-01

    Data mining algorithms utilize search techniques to explore hidden patterns and correlations in the data, which otherwise require a tremendous amount of human time to explore. This feature issue explores the use of such techniques to help understand the data, build better simulators, explain outlier behavior, and build better predictive models. We hope that this issue will spur discussions and expose a set of tools that can be useful to the optics community.

  7. Prediction of pork quality parameters by applying fractals and data mining on MRI

    DEFF Research Database (Denmark)

    Caballero, Daniel; Pérez-Palacios, Trinidad; Caro, Andrés

    2017-01-01

    This work firstly investigates the use of MRI, fractal algorithms and data mining techniques to determine pork quality parameters non-destructively. The main objective was to evaluate the capability of fractal algorithms (Classical Fractal algorithm, CFA; Fractal Texture Algorithm, FTA and One...... Point Fractal Texture Algorithm, OPFTA) to analyse MRI in order to predict quality parameters of loin. In addition, the effect of the sequence acquisition of MRI (Gradient echo, GE; Spin echo, SE and Turbo 3D, T3D) and the predictive technique of data mining (Isotonic regression, IR and Multiple linear...... regression, MLR) were analysed. Both fractal algorithm, FTA and OPFTA are appropriate to analyse MRI of loins. The sequence acquisition, the fractal algorithm and the data mining technique seems to influence on the prediction results. For most physico-chemical parameters, prediction equations with moderate...

  8. Computer-aided system for fire fighting in an underground mine

    Energy Technology Data Exchange (ETDEWEB)

    Rosiek, F; Sikora, M; Urbanski, J [Politechnika Wroclawska (Poland). Instytut Gornictwa

    1989-01-01

    Discusses structure of an algorithm for computer-aided planning of fire fighting and rescue in an underground coal mine. The algorithm developed by the Mining Institute of the Wroclaw Technical University consists of ten options: regulations on fire fighting, fire alarm for miners working underground (rescue ways, fire zones etc.), information system for mine management, movements of fire fighting teams, distribution of fire fighting equipment, assessment of explosion hazards of fire gases, fire gas temperature control of blower operation, detection of endogenous fires, ventilation control. 2 refs.

  9. Set-Oriented Mining for Association Rules in Relational Databases

    NARCIS (Netherlands)

    Houtsma, M.A.W.; Houtsma, M.A.W.; Swami, A.

    1995-01-01

    Describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss the optimization of these

  10. Biosorption of metal and salt tolerant microbial isolates from a former uranium mining area. Their impact on changes in rare earth element patterns in acid mine drainage.

    Science.gov (United States)

    Haferburg, Götz; Merten, Dirk; Büchel, Georg; Kothe, Erika

    2007-12-01

    The concentration of metals in microbial habitats influenced by mining operations can reach enormous values. Worldwide, much emphasis is placed on the research of resistance and biosorptive capacities of microorganisms suitable for bioremediation purposes. Using a collection of isolates from a former uranium mining area in Eastern Thuringia, Germany, this study presents three Gram-positive bacterial strains with distinct metal tolerances. These strains were identified as members of the genera Bacillus, Micrococcus and Streptomyces. Acid mine drainage (AMD) originating from the same mining area is characterized by high metal concentrations of a broad range of elements and a very low pH. AMD was analyzed and used as incubation solution. The sorption of rare earth elements (REE), aluminum, cobalt, copper, manganese, nickel, strontium, and uranium through selected strains was studied during a time course of four weeks. Biosorption was investigated after one hour, one week and four weeks by analyzing the concentrations of metals in supernatant and biomass. Additionally, dead biomass was investigated after four weeks of incubation. The maximum of metal removal was reached after one week. Up to 80% of both Al and Cu, and more than 60% of U was shown to be removed from the solution. High concentrations of metals could be bound to the biomass, as for example 2.2 mg/g U. The strains could survive four weeks of incubation. Distinct and different patterns of rare earth elements of the inoculated and non-inoculated AMD water were observed. Changes in REE patterns hint at different binding types of heavy metals regarding incubation time and metabolic activity of the cells. (c) 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Climatic zonation and land suitability determination for saffron in Khorasan-Razavi province using data mining algorithms

    Directory of Open Access Journals (Sweden)

    mehdi Bashiri

    2017-12-01

    Full Text Available Yield prediction for agricultural crops plays an important role in export-import planning, purchase guarantees, pricing, secure profits and increasing in agricultural productivity. Crop yield is affected by several parameters especially climate. In this study, the saffron yield in the Khorasan-Razavi province was evaluated by different classification algorithms including artificial neural networks, regression models, local linear trees, decision trees, discriminant analysis, random forest, support vector machine and nearest neighbor analysis. These algorithms analyzed data for 20 years (1989-2009 including 11 climatological parameters. The results showed that a few numbers of climatological parameters affect the saffron yield. The minimum, mean and maximum of temperature, had the highest positive correlations and the relative humidity of 6.5h, sunny hours, relative humidity of 18.5h, evaporation, relative humidity of 12.5h and absolute humidity had the highest negative correlations with saffron cultivation areas, respectively. In addition, in classification of saffron cultivation areas, the discriminant analysis and support vector machine had higher accuracies. The correlation between saffron cultivation area and saffron yield values was relatively high (r=0.38. The nearest neighbor analysis had the best prediction accuracy for classification of cultivation areas. For this algorithm the coefficients of determination were 1 and 0.944 for training and testing stages, respectively. However, the algorithms accuracy for prediction of crop yield from climatological parameters was low (the average coefficients of determination equal to 0.48 and 0.05 for training and testing stages. The best algorithm i.e. nearest neighbor analysis had coefficients of determination equal to 1 and 0.177 for saffron yield prediction. Results showed that, using climatological parameters and data mining algorithms can classify cultivation areas. By this way it is possible

  12. Data mining approach to web application intrusions detection

    Science.gov (United States)

    Kalicki, Arkadiusz

    2011-10-01

    Web applications became most popular medium in the Internet. Popularity, easiness of web application script languages and frameworks together with careless development results in high number of web application vulnerabilities and high number of attacks performed. There are several types of attacks possible because of improper input validation: SQL injection Cross-site scripting, Cross-Site Request Forgery (CSRF), web spam in blogs and others. In order to secure web applications intrusion detection (IDS) and intrusion prevention systems (IPS) are being used. Intrusion detection systems are divided in two groups: misuse detection (traditional IDS) and anomaly detection. This paper presents data mining based algorithm for anomaly detection. The principle of this method is the comparison of the incoming HTTP traffic with a previously built profile that contains a representation of the "normal" or expected web application usage sequence patterns. The frequent sequence patterns are found with GSP algorithm. Previously presented detection method was rewritten and improved. Some tests show that the software catches malicious requests, especially long attack sequences, results quite good with medium length sequences, for short length sequences must be complemented with other methods.

  13. Next Place Prediction Based on Spatiotemporal Pattern Mining of Mobile Device Logs.

    Science.gov (United States)

    Lee, Sungjun; Lim, Junseok; Park, Jonghun; Kim, Kwanho

    2016-01-23

    Due to the recent explosive growth of location-aware services based on mobile devices, predicting the next places of a user is of increasing importance to enable proactive information services. In this paper, we introduce a data-driven framework that aims to predict the user's next places using his/her past visiting patterns analyzed from mobile device logs. Specifically, the notion of the spatiotemporal-periodic (STP) pattern is proposed to capture the visits with spatiotemporal periodicity by focusing on a detail level of location for each individual. Subsequently, we present algorithms that extract the STP patterns from a user's past visiting behaviors and predict the next places based on the patterns. The experiment results obtained by using a real-world dataset show that the proposed methods are more effective in predicting the user's next places than the previous approaches considered in most cases.

  14. Discovering significant evolution patterns from satellite image time series.

    Science.gov (United States)

    Petitjean, François; Masseglia, Florent; Gançarski, Pierre; Forestier, Germain

    2011-12-01

    Satellite Image Time Series (SITS) provide us with precious information on land cover evolution. By studying these series of images we can both understand the changes of specific areas and discover global phenomena that spread over larger areas. Changes that can occur throughout the sensing time can spread over very long periods and may have different start time and end time depending on the location, which complicates the mining and the analysis of series of images. This work focuses on frequent sequential pattern mining (FSPM) methods, since this family of methods fits the above-mentioned issues. This family of methods consists of finding the most frequent evolution behaviors, and is actually able to extract long-term changes as well as short term ones, whenever the change may start and end. However, applying FSPM methods to SITS implies confronting two main challenges, related to the characteristics of SITS and the domain's constraints. First, satellite images associate multiple measures with a single pixel (the radiometric levels of different wavelengths corresponding to infra-red, red, etc.), which makes the search space multi-dimensional and thus requires specific mining algorithms. Furthermore, the non evolving regions, which are the vast majority and overwhelm the evolving ones, challenge the discovery of these patterns. We propose a SITS mining framework that enables discovery of these patterns despite these constraints and characteristics. Our proposal is inspired from FSPM and provides a relevant visualization principle. Experiments carried out on 35 images sensed over 20 years show the proposed approach makes it possible to extract relevant evolution behaviors.

  15. Mining Research on Vibration Signal Association Rules of Quayside Container Crane Hoisting Motor Based on Apriori Algorithm

    Science.gov (United States)

    Yang, Chencheng; Tang, Gang; Hu, Xiong

    2017-07-01

    Shore-hoisting motor in the daily work will produce a large number of vibration signal data,in order to analyze the correlation among the data and discover the fault and potential safety hazard of the motor, the data are discretized first, and then Apriori algorithm are used to mine the strong association rules among the data. The results show that the relationship between day 1 and day 16 is the most closely related, which can guide the staff to analyze the work of these two days of motor to find and solve the problem of fault and safety.

  16. A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series

    Directory of Open Access Journals (Sweden)

    Madeira Sara C

    2009-06-01

    Full Text Available Abstract Background The ability to monitor the change in expression patterns over time, and to observe the emergence of coherent temporal responses using gene expression time series, obtained from microarray experiments, is critical to advance our understanding of complex biological processes. In this context, biclustering algorithms have been recognized as an important tool for the discovery of local expression patterns, which are crucial to unravel potential regulatory mechanisms. Although most formulations of the biclustering problem are NP-hard, when working with time series expression data the interesting biclusters can be restricted to those with contiguous columns. This restriction leads to a tractable problem and enables the design of efficient biclustering algorithms able to identify all maximal contiguous column coherent biclusters. Methods In this work, we propose e-CCC-Biclustering, a biclustering algorithm that finds and reports all maximal contiguous column coherent biclusters with approximate expression patterns in time polynomial in the size of the time series gene expression matrix. This polynomial time complexity is achieved by manipulating a discretized version of the original matrix using efficient string processing techniques. We also propose extensions to deal with missing values, discover anticorrelated and scaled expression patterns, and different ways to compute the errors allowed in the expression patterns. We propose a scoring criterion combining the statistical significance of expression patterns with a similarity measure between overlapping biclusters. Results We present results in real data showing the effectiveness of e-CCC-Biclustering and its relevance in the discovery of regulatory modules describing the transcriptomic expression patterns occurring in Saccharomyces cerevisiae in response to heat stress. In particular, the results show the advantage of considering approximate patterns when compared to state of

  17. The Pandora multi-algorithm approach to automated pattern recognition of cosmic-ray muon and neutrino events in the MicroBooNE detector

    Energy Technology Data Exchange (ETDEWEB)

    Acciarri, R.; Bagby, L.; Baller, B.; Carls, B.; Castillo Fernandez, R.; Cavanna, F.; Greenlee, H.; James, C.; Jostlein, H.; Ketchum, W.; Kirby, M.; Kobilarcik, T.; Lockwitz, S.; Lundberg, B.; Marchionni, A.; Moore, C.D.; Palamara, O.; Pavlovic, Z.; Raaf, J.L.; Schukraft, A.; Snider, E.L.; Spentzouris, P.; Strauss, T.; Toups, M.; Wolbers, S.; Yang, T.; Zeller, G.P. [Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States); Adams, C. [Harvard University, Cambridge, MA (United States); Yale University, New Haven, CT (United States); An, R.; Littlejohn, B.R.; Martinez Caicedo, D.A. [Illinois Institute of Technology (IIT), Chicago, IL (United States); Anthony, J.; Escudero Sanchez, L.; De Vries, J.J.; Marshall, J.; Smith, A.; Thomson, M. [University of Cambridge, Cambridge (United Kingdom); Asaadi, J. [University of Texas, Arlington, TX (United States); Auger, M.; Ereditato, A.; Goeldi, D.; Kreslo, I.; Lorca, D.; Luethi, M.; Rudolf von Rohr, C.; Sinclair, J.; Weber, M. [Universitaet Bern, Bern (Switzerland); Balasubramanian, S.; Fleming, B.T.; Gramellini, E.; Hackenburg, A.; Luo, X.; Russell, B.; Tufanli, S. [Yale University, New Haven, CT (United States); Barnes, C.; Mousseau, J.; Spitz, J. [University of Michigan, Ann Arbor, MI (United States); Barr, G.; Bass, M.; Del Tutto, M.; Laube, A.; Soleti, S.R.; De Pontseele, W.V. [University of Oxford, Oxford (United Kingdom); Bay, F. [TUBITAK Space Technologies Research Institute, Ankara (Turkey); Bishai, M.; Chen, H.; Joshi, J.; Kirby, B.; Li, Y.; Mooney, M.; Qian, X.; Viren, B.; Zhang, C. [Brookhaven National Laboratory (BNL), Upton, NY (United States); Blake, A.; Devitt, D.; Lister, A.; Nowak, J. [Lancaster University, Lancaster (United Kingdom); Bolton, T.; Horton-Smith, G.; Meddage, V.; Rafique, A. [Kansas State University (KSU), Manhattan, KS (United States); Camilleri, L.; Caratelli, D.; Crespo-Anadon, J.I.; Fadeeva, A.A.; Genty, V.; Kaleko, D.; Seligman, W.; Shaevitz, M.H. [Columbia University, New York, NY (United States); Church, E. [Pacific Northwest National Laboratory (PNNL), Richland, WA (United States); Cianci, D.; Karagiorgi, G. [Columbia University, New York, NY (United States); The University of Manchester (United Kingdom); Cohen, E.; Piasetzky, E. [Tel Aviv University, Tel Aviv (Israel); Collin, G.H.; Conrad, J.M.; Hen, O.; Hourlier, A.; Moon, J.; Wongjirad, T.; Yates, L. [Massachusetts Institute of Technology (MIT), Cambridge, MA (United States); Convery, M.; Eberly, B.; Rochester, L.; Tsai, Y.T.; Usher, T. [SLAC National Accelerator Laboratory, Menlo Park, CA (United States); Dytman, S.; Graf, N.; Jiang, L.; Naples, D.; Paolone, V.; Wickremasinghe, D.A. [University of Pittsburgh, Pittsburgh, PA (United States); Esquivel, J.; Hamilton, P.; Pulliam, G.; Soderberg, M. [Syracuse University, Syracuse, NY (United States); Foreman, W.; Ho, J.; Schmitz, D.W.; Zennamo, J. [University of Chicago, IL (United States); Furmanski, A.P.; Garcia-Gamez, D.; Hewes, J.; Hill, C.; Murrells, R.; Porzio, D.; Soeldner-Rembold, S.; Szelc, A.M. [The University of Manchester (United Kingdom); Garvey, G.T.; Huang, E.C.; Louis, W.C.; Mills, G.B.; De Water, R.G.V. [Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); Gollapinni, S. [Kansas State University (KSU), Manhattan, KS (United States); University of Tennessee, Knoxville, TN (United States); and others

    2018-01-15

    The development and operation of liquid-argon time-projection chambers for neutrino physics has created a need for new approaches to pattern recognition in order to fully exploit the imaging capabilities offered by this technology. Whereas the human brain can excel at identifying features in the recorded events, it is a significant challenge to develop an automated, algorithmic solution. The Pandora Software Development Kit provides functionality to aid the design and implementation of pattern-recognition algorithms. It promotes the use of a multi-algorithm approach to pattern recognition, in which individual algorithms each address a specific task in a particular topology. Many tens of algorithms then carefully build up a picture of the event and, together, provide a robust automated pattern-recognition solution. This paper describes details of the chain of over one hundred Pandora algorithms and tools used to reconstruct cosmic-ray muon and neutrino events in the MicroBooNE detector. Metrics that assess the current pattern-recognition performance are presented for simulated MicroBooNE events, using a selection of final-state event topologies. (orig.)

  18. The Pandora multi-algorithm approach to automated pattern recognition of cosmic-ray muon and neutrino events in the MicroBooNE detector

    Science.gov (United States)

    Acciarri, R.; Adams, C.; An, R.; Anthony, J.; Asaadi, J.; Auger, M.; Bagby, L.; Balasubramanian, S.; Baller, B.; Barnes, C.; Barr, G.; Bass, M.; Bay, F.; Bishai, M.; Blake, A.; Bolton, T.; Camilleri, L.; Caratelli, D.; Carls, B.; Castillo Fernandez, R.; Cavanna, F.; Chen, H.; Church, E.; Cianci, D.; Cohen, E.; Collin, G. H.; Conrad, J. M.; Convery, M.; Crespo-Anadón, J. I.; Del Tutto, M.; Devitt, D.; Dytman, S.; Eberly, B.; Ereditato, A.; Escudero Sanchez, L.; Esquivel, J.; Fadeeva, A. A.; Fleming, B. T.; Foreman, W.; Furmanski, A. P.; Garcia-Gamez, D.; Garvey, G. T.; Genty, V.; Goeldi, D.; Gollapinni, S.; Graf, N.; Gramellini, E.; Greenlee, H.; Grosso, R.; Guenette, R.; Hackenburg, A.; Hamilton, P.; Hen, O.; Hewes, J.; Hill, C.; Ho, J.; Horton-Smith, G.; Hourlier, A.; Huang, E.-C.; James, C.; Jan de Vries, J.; Jen, C.-M.; Jiang, L.; Johnson, R. A.; Joshi, J.; Jostlein, H.; Kaleko, D.; Karagiorgi, G.; Ketchum, W.; Kirby, B.; Kirby, M.; Kobilarcik, T.; Kreslo, I.; Laube, A.; Li, Y.; Lister, A.; Littlejohn, B. R.; Lockwitz, S.; Lorca, D.; Louis, W. C.; Luethi, M.; Lundberg, B.; Luo, X.; Marchionni, A.; Mariani, C.; Marshall, J.; Martinez Caicedo, D. A.; Meddage, V.; Miceli, T.; Mills, G. B.; Moon, J.; Mooney, M.; Moore, C. D.; Mousseau, J.; Murrells, R.; Naples, D.; Nienaber, P.; Nowak, J.; Palamara, O.; Paolone, V.; Papavassiliou, V.; Pate, S. F.; Pavlovic, Z.; Piasetzky, E.; Porzio, D.; Pulliam, G.; Qian, X.; Raaf, J. L.; Rafique, A.; Rochester, L.; Rudolf von Rohr, C.; Russell, B.; Schmitz, D. W.; Schukraft, A.; Seligman, W.; Shaevitz, M. H.; Sinclair, J.; Smith, A.; Snider, E. L.; Soderberg, M.; Söldner-Rembold, S.; Soleti, S. R.; Spentzouris, P.; Spitz, J.; St. John, J.; Strauss, T.; Szelc, A. M.; Tagg, N.; Terao, K.; Thomson, M.; Toups, M.; Tsai, Y.-T.; Tufanli, S.; Usher, T.; Van De Pontseele, W.; Van de Water, R. G.; Viren, B.; Weber, M.; Wickremasinghe, D. A.; Wolbers, S.; Wongjirad, T.; Woodruff, K.; Yang, T.; Yates, L.; Zeller, G. P.; Zennamo, J.; Zhang, C.

    2018-01-01

    The development and operation of liquid-argon time-projection chambers for neutrino physics has created a need for new approaches to pattern recognition in order to fully exploit the imaging capabilities offered by this technology. Whereas the human brain can excel at identifying features in the recorded events, it is a significant challenge to develop an automated, algorithmic solution. The Pandora Software Development Kit provides functionality to aid the design and implementation of pattern-recognition algorithms. It promotes the use of a multi-algorithm approach to pattern recognition, in which individual algorithms each address a specific task in a particular topology. Many tens of algorithms then carefully build up a picture of the event and, together, provide a robust automated pattern-recognition solution. This paper describes details of the chain of over one hundred Pandora algorithms and tools used to reconstruct cosmic-ray muon and neutrino events in the MicroBooNE detector. Metrics that assess the current pattern-recognition performance are presented for simulated MicroBooNE events, using a selection of final-state event topologies.

  19. Data mining, mining data : energy consumption modelling

    Energy Technology Data Exchange (ETDEWEB)

    Dessureault, S. [Arizona Univ., Tucson, AZ (United States)

    2007-09-15

    Most modern mining operations are accumulating large amounts of data on production and business processes. Data, however, provides value only if it can be translated into information that appropriate users can utilize. This paper emphasized that a new technological focus should emerge, notably how to concentrate data into information; analyze information sufficiently to become knowledge; and, act on that knowledge. Researchers at the Mining Information Systems and Operations Management (MISOM) laboratory at the University of Arizona have created a method to transform data into action. The data-to-action approach was exercised in the development of an energy consumption model (ECM), in partnership with a major US-based copper mining company, 2 software companies, and the MISOM laboratory. The approach begins by integrating several key data sources using data warehousing techniques, and increasing the existing level of integration and data cleaning. An online analytical processing (OLAP) cube was also created to investigate the data and identify a subset of several million records. Data mining algorithms were applied using the information that was isolated by the OLAP cube. The data mining results showed that traditional cost drivers of energy consumption are poor predictors. A comparison was made between traditional methods of predicting energy consumption and the prediction formed using data mining. Traditionally, in the mines for which data were available, monthly averages of tons and distance are used to predict diesel fuel consumption. However, this article showed that new information technology can be used to incorporate many more variables into the budgeting process, resulting in more accurate predictions. The ECM helped mine planners improve the prediction of energy use through more data integration, measure development, and workflow analysis. 5 refs., 11 figs.

  20. Data Mining Web Services for Science Data Repositories

    Science.gov (United States)

    Graves, S.; Ramachandran, R.; Keiser, K.; Maskey, M.; Lynnes, C.; Pham, L.

    2006-12-01

    The maturation of web services standards and technologies sets the stage for a distributed "Service-Oriented Architecture" (SOA) for NASA's next generation science data processing. This architecture will allow members of the scientific community to create and combine persistent distributed data processing services and make them available to other users over the Internet. NASA has initiated a project to create a suite of specialized data mining web services designed specifically for science data. The project leverages the Algorithm Development and Mining (ADaM) toolkit as its basis. The ADaM toolkit is a robust, mature and freely available science data mining toolkit that is being used by several research organizations and educational institutions worldwide. These mining services will give the scientific community a powerful and versatile data mining capability that can be used to create higher order products such as thematic maps from current and future NASA satellite data records with methods that are not currently available. The package of mining and related services are being developed using Web Services standards so that community-based measurement processing systems can access and interoperate with them. These standards-based services allow users different options for utilizing them, from direct remote invocation by a client application to deployment of a Business Process Execution Language (BPEL) solutions package where a complex data mining workflow is exposed to others as a single service. The ability to deploy and operate these services at a data archive allows the data mining algorithms to be run where the data are stored, a more efficient scenario than moving large amounts of data over the network. This will be demonstrated in a scenario in which a user uses a remote Web-Service-enabled clustering algorithm to create cloud masks from satellite imagery at the Goddard Earth Sciences Data and Information Services Center (GES DISC).

  1. Support-Less Association Rule Mining Using Tuple Count Cube

    OpenAIRE

    Qin Ding; William Perrizo

    2007-01-01

    Association rule mining is one of the important tasks in data mining and knowledge discovery (KDD). The traditional task of association rule mining is to find all the rules with high support and high confidence. In some applications, we are interested in finding high confidence rules even though the support may be low. This type of problem differs from the traditional association rule mining problem; hence, it is called support-less association rule mining. Existing algorithms for association...

  2. Socioeconomic inequality of cancer mortality in the United States: a spatial data mining approach

    Directory of Open Access Journals (Sweden)

    Lam Nina SN

    2006-02-01

    Full Text Available Abstract Background The objective of this study was to demonstrate the use of an association rule mining approach to discover associations between selected socioeconomic variables and the four most leading causes of cancer mortality in the United States. An association rule mining algorithm was applied to extract associations between the 1988–1992 cancer mortality rates for colorectal, lung, breast, and prostate cancers defined at the Health Service Area level and selected socioeconomic variables from the 1990 United States census. Geographic information system technology was used to integrate these data which were defined at different spatial resolutions, and to visualize and analyze the results from the association rule mining process. Results Health Service Areas with high rates of low education, high unemployment, and low paying jobs were found to associate with higher rates of cancer mortality. Conclusion Association rule mining with geographic information technology helps reveal the spatial patterns of socioeconomic inequality in cancer mortality in the United States and identify regions that need further attention.

  3. An Unsupervised Opinion Mining Approach for Japanese Weblog Reputation Information Using an Improved SO-PMI Algorithm

    Science.gov (United States)

    Wang, Guangwei; Araki, Kenji

    In this paper, we propose an improved SO-PMI (Semantic Orientation Using Pointwise Mutual Information) algorithm, for use in Japanese Weblog Opinion Mining. SO-PMI is an unsupervised approach proposed by Turney that has been shown to work well for English. When this algorithm was translated into Japanese naively, most phrases, whether positive or negative in meaning, received a negative SO. For dealing with this slanting phenomenon, we propose three improvements: to expand the reference words to sets of words, to introduce a balancing factor and to detect neutral expressions. In our experiments, the proposed improvements obtained a well-balanced result: both positive and negative accuracy exceeded 62%, when evaluated on 1,200 opinion sentences sampled from three different domains (reviews of Electronic Products, Cars and Travels from Kakaku. com). In a comparative experiment on the same corpus, a supervised approach (SA-Demo) achieved a very similar accuracy to our method. This shows that our proposed approach effectively adapted SO-PMI for Japanese, and it also shows the generality of SO-PMI.

  4. Indexing amyloid peptide diffraction from serial femtosecond crystallography: new algorithms for sparse patterns

    Energy Technology Data Exchange (ETDEWEB)

    Brewster, Aaron S. [Lawrence Berkeley National Laboratory, Berkeley, CA 94720 (United States); Sawaya, Michael R. [University of California, Los Angeles, CA 90095-1570 (United States); University of California, Los Angeles, CA 90095-1570 (United States); University of California, Los Angeles, CA 90095-1570 (United States); Rodriguez, Jose [University of California, Los Angeles, CA 90095-1570 (United States); University of California, Los Angeles, CA 90095-1570 (United States); Hattne, Johan; Echols, Nathaniel [Lawrence Berkeley National Laboratory, Berkeley, CA 94720 (United States); McFarlane, Heather T. [University of California, Los Angeles, CA 90095-1570 (United States); University of California, Los Angeles, CA 90095-1570 (United States); Cascio, Duilio [University of California, Los Angeles, CA 90095-1570 (United States); University of California, Los Angeles, CA 90095-1570 (United States); University of California, Los Angeles, CA 90095-1570 (United States); Adams, Paul D. [Lawrence Berkeley National Laboratory, Berkeley, CA 94720 (United States); University of California, Berkeley, CA 94720 (United States); Eisenberg, David S. [University of California, Los Angeles, CA 90095-1570 (United States); University of California, Los Angeles, CA 90095-1570 (United States); University of California, Los Angeles, CA 90095-1570 (United States); Sauter, Nicholas K., E-mail: nksauter@lbl.gov [Lawrence Berkeley National Laboratory, Berkeley, CA 94720 (United States)

    2015-02-01

    Special methods are required to interpret sparse diffraction patterns collected from peptide crystals at X-ray free-electron lasers. Bragg spots can be indexed from composite-image powder rings, with crystal orientations then deduced from a very limited number of spot positions. Still diffraction patterns from peptide nanocrystals with small unit cells are challenging to index using conventional methods owing to the limited number of spots and the lack of crystal orientation information for individual images. New indexing algorithms have been developed as part of the Computational Crystallography Toolbox (cctbx) to overcome these challenges. Accurate unit-cell information derived from an aggregate data set from thousands of diffraction patterns can be used to determine a crystal orientation matrix for individual images with as few as five reflections. These algorithms are potentially applicable not only to amyloid peptides but also to any set of diffraction patterns with sparse properties, such as low-resolution virus structures or high-throughput screening of still images captured by raster-scanning at synchrotron sources. As a proof of concept for this technique, successful integration of X-ray free-electron laser (XFEL) data to 2.5 Å resolution for the amyloid segment GNNQQNY from the Sup35 yeast prion is presented.

  5. The Pandora multi-algorithm approach to automated pattern recognition of cosmic-ray muon and neutrino events in the MicroBooNE detector

    CERN Document Server

    Acciarri, R.; An, R.; Anthony, J.; Asaadi, J.; Auger, M.; Bagby, L.; Balasubramanian, S.; Baller, B.; Barnes, C.; Barr, G.; Bass, M.; Bay, F.; Bishai, M.; Blake, A.; Bolton, T.; Camilleri, L.; Caratelli, D.; Carls, B.; Castillo Fernandez, R.; Cavanna, F.; Chen, H.; Church, E.; Cianci, D.; Cohen, E.; Collin, G. H.; Conrad, J. M.; Convery, M.; Crespo-Anadón, J. I.; Del Tutto, M.; Devitt, D.; Dytman, S.; Eberly, B.; Ereditato, A.; Escudero Sanchez, L.; Esquivel, J.; Fadeeva, A. A.; Fleming, B. T.; Foreman, W.; Furmanski, A. P.; Garcia-Gamez, D.; Garvey, G. T.; Genty, V.; Goeldi, D.; Gollapinni, S.; Graf, N.; Gramellini, E.; Greenlee, H.; Grosso, R.; Guenette, R.; Hackenburg, A.; Hamilton, P.; Hen, O.; Hewes, J.; Hill, C.; Ho, J.; Horton-Smith, G.; Hourlier, A.; Huang, E.-C.; James, C.; Jan de Vries, J.; Jen, C.-M.; Jiang, L.; Johnson, R. A.; Joshi, J.; Jostlein, H.; Kaleko, D.; Karagiorgi, G.; Ketchum, W.; Kirby, B.; Kirby, M.; Kobilarcik, T.; Kreslo, I.; Laube, A.; Li, Y.; Lister, A.; Littlejohn, B. R.; Lockwitz, S.; Lorca, D.; Louis, W. C.; Luethi, M.; Lundberg, B.; Luo, X.; Marchionni, A.; Mariani, C.; Marshall, J.; Martinez Caicedo, D. A.; Meddage, V.; Miceli, T.; Mills, G. B.; Moon, J.; Mooney, M.; Moore, C. D.; Mousseau, J.; Murrells, R.; Naples, D.; Nienaber, P.; Nowak, J.; Palamara, O.; Paolone, V.; Papavassiliou, V.; Pate, S. F.; Pavlovic, Z.; Piasetzky, E.; Porzio, D.; Pulliam, G.; Qian, X.; Raaf, J. L.; Rafique, A.; Rochester, L.; Rudolf von Rohr, C.; Russell, B.; Schmitz, D. W.; Schukraft, A.; Seligman, W.; Shaevitz, M. H.; Sinclair, J.; Smith, A.; Snider, E. L.; Soderberg, M.; Söldner-Rembold, S.; Soleti, S. R.; Spentzouris, P.; Spitz, J.; St. John, J.; Strauss, T.; Szelc, A. M.; Tagg, N.; Terao, K.; Thomson, M.; Toups, M.; Tsai, Y.-T.; Tufanli, S.; Usher, T.; Van De Pontseele, W.; Van de Water, R. G.; Viren, B.; Weber, M.; Wickremasinghe, D. A.; Wolbers, S.; Wongjirad, T.; Woodruff, K.; Yang, T.; Yates, L.; Zeller, G. P.; Zennamo, J.; Zhang, C.

    2017-01-01

    The development and operation of Liquid-Argon Time-Projection Chambers for neutrino physics has created a need for new approaches to pattern recognition in order to fully exploit the imaging capabilities offered by this technology. Whereas the human brain can excel at identifying features in the recorded events, it is a significant challenge to develop an automated, algorithmic solution. The Pandora Software Development Kit provides functionality to aid the design and implementation of pattern-recognition algorithms. It promotes the use of a multi-algorithm approach to pattern recognition, in which individual algorithms each address a specific task in a particular topology. Many tens of algorithms then carefully build up a picture of the event and, together, provide a robust automated pattern-recognition solution. This paper describes details of the chain of over one hundred Pandora algorithms and tools used to reconstruct cosmic-ray muon and neutrino events in the MicroBooNE detector. Metrics that assess the...

  6. Next Place Prediction Based on Spatiotemporal Pattern Mining of Mobile Device Logs

    Directory of Open Access Journals (Sweden)

    Sungjun Lee

    2016-01-01

    Full Text Available Due to the recent explosive growth of location-aware services based on mobile devices, predicting the next places of a user is of increasing importance to enable proactive information services. In this paper, we introduce a data-driven framework that aims to predict the user’s next places using his/her past visiting patterns analyzed from mobile device logs. Specifically, the notion of the spatiotemporal-periodic (STP pattern is proposed to capture the visits with spatiotemporal periodicity by focusing on a detail level of location for each individual. Subsequently, we present algorithms that extract the STP patterns from a user’s past visiting behaviors and predict the next places based on the patterns. The experiment results obtained by using a real-world dataset show that the proposed methods are more effective in predicting the user’s next places than the previous approaches considered in most cases.

  7. A Comparison of the Effects of K-Anonymity on Machine Learning Algorithms

    OpenAIRE

    Hayden Wimmer; Loreen Powell

    2014-01-01

    While research has been conducted in machine learning algorithms and in privacy preserving in data mining (PPDM), a gap in the literature exists which combines the aforementioned areas to determine how PPDM affects common machine learning algorithms. The aim of this research is to narrow this literature gap by investigating how a common PPDM algorithm, K-Anonymity, affects common machine learning and data mining algorithms, namely neural networks, logistic regression, decision trees, and Baye...

  8. Multiagent data warehousing and multiagent data mining for cerebrum/cerebellum modeling

    Science.gov (United States)

    Zhang, Wen-Ran

    2002-03-01

    An algorithm named Neighbor-Miner is outlined for multiagent data warehousing and multiagent data mining. The algorithm is defined in an evolving dynamic environment with autonomous or semiautonomous agents. Instead of mining frequent itemsets from customer transactions, the new algorithm discovers new agents and mining agent associations in first-order logic from agent attributes and actions. While the Apriori algorithm uses frequency as a priory threshold, the new algorithm uses agent similarity as priory knowledge. The concept of agent similarity leads to the notions of agent cuboid, orthogonal multiagent data warehousing (MADWH), and multiagent data mining (MADM). Based on agent similarities and action similarities, Neighbor-Miner is proposed and illustrated in a MADWH/MADM approach to cerebrum/cerebellum modeling. It is shown that (1) semiautonomous neurofuzzy agents can be identified for uniped locomotion and gymnastic training based on attribute relevance analysis; (2) new agents can be discovered and agent cuboids can be dynamically constructed in an orthogonal MADWH, which resembles an evolving cerebrum/cerebellum system; and (3) dynamic motion laws can be discovered as association rules in first order logic. Although examples in legged robot gymnastics are used to illustrate the basic ideas, the new approach is generally suitable for a broad category of data mining tasks where knowledge can be discovered collectively by a set of agents from a geographically or geometrically distributed but relevant environment, especially in scientific and engineering data environments.

  9. A Framework To Support Management Of HIVAIDS Using K-Means And Random Forest Algorithm

    Directory of Open Access Journals (Sweden)

    Gladys Iseu

    2017-06-01

    Full Text Available Healthcare industry generates large amounts of complex data about patients hospital resources disease management electronic patient records and medical devices among others. The availability of these huge amounts of medical data creates a need for powerful mining tools to support health care professionals in diagnosis treatment and management of HIVAIDS. Several data mining techniques have been used in management of different data sets. Data mining techniques have been categorized into regression algorithms segmentation algorithms association algorithms sequence analysis algorithms and classification algorithms. In the medical field there has not been a specific study that has incorporated two or more data mining algorithms hence limiting decision making levels by medical practitioners. This study identified the extent to which K-means algorithm cluster patient characteristics it has also evaluated the extent to which random forest algorithm can classify the data for informed decision making as well as design a framework to support medical decision making in the treatment of HIVAIDS related diseases in Kenya. The paper further used random forest classification algorithm to compute proximities between pairs of cases that can be used in clustering locating outliers or by scaling to give interesting views of the data.

  10. Application of Three Existing Stope Boundary Optimisation Methods in an Operating Underground Mine

    Science.gov (United States)

    Erdogan, Gamze; Yavuz, Mahmut

    2017-12-01

    The underground mine planning and design optimisation process have received little attention because of complexity and variability of problems in underground mines. Although a number of optimisation studies and software tools are available and some of them, in special, have been implemented effectively to determine the ultimate-pit limits in an open pit mine, there is still a lack of studies for optimisation of ultimate stope boundaries in underground mines. The proposed approaches for this purpose aim at maximizing the economic profit by selecting the best possible layout under operational, technical and physical constraints. In this paper, the existing three heuristic techniques including Floating Stope Algorithm, Maximum Value Algorithm and Mineable Shape Optimiser (MSO) are examined for optimisation of stope layout in a case study. Each technique is assessed in terms of applicability, algorithm capabilities and limitations considering the underground mine planning challenges. Finally, the results are evaluated and compared.

  11. DTFP-Growth: Dynamic Threshold-Based FP-Growth Rule Mining Algorithm Through Integrating Gene Expression, Methylation, and Protein-Protein Interaction Profiles.

    Science.gov (United States)

    Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan; Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan; Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan

    2018-04-01

    Association rule mining is an important technique for identifying interesting relationships between gene pairs in a biological data set. Earlier methods basically work for a single biological data set, and, in maximum cases, a single minimum support cutoff can be applied globally, i.e., across all genesets/itemsets. To overcome this limitation, in this paper, we propose dynamic threshold-based FP-growth rule mining algorithm that integrates gene expression, methylation and protein-protein interaction profiles based on weighted shortest distance to find the novel associations among different pairs of genes in multi-view data sets. For this purpose, we introduce three new thresholds, namely, Distance-based Variable/Dynamic Supports (DVS), Distance-based Variable Confidences (DVC), and Distance-based Variable Lifts (DVL) for each rule by integrating co-expression, co-methylation, and protein-protein interactions existed in the multi-omics data set. We develop the proposed algorithm utilizing these three novel multiple threshold measures. In the proposed algorithm, the values of , , and are computed for each rule separately, and subsequently it is verified whether the support, confidence, and lift of each evolved rule are greater than or equal to the corresponding individual , , and values, respectively, or not. If all these three conditions for a rule are found to be true, the rule is treated as a resultant rule. One of the major advantages of the proposed method compared with other related state-of-the-art methods is that it considers both the quantitative and interactive significance among all pairwise genes belonging to each rule. Moreover, the proposed method generates fewer rules, takes less running time, and provides greater biological significance for the resultant top-ranking rules compared to previous methods.

  12. A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark

    Directory of Open Access Journals (Sweden)

    Fengcai Qiao

    2018-02-01

    Full Text Available Frequent subgraph mining (FSM plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSiGraM (Spark based Single Graph Mining, a Spark based parallel frequent subgraph mining algorithm in a single large graph. Aiming to approach the two computational challenges of FSM, we conduct the subgraph extension and support evaluation parallel across all the distributed cluster worker nodes. In addition, we also employ a heuristic search strategy and three novel optimizations: load balancing, pre-search pruning and top-down pruning in the support evaluation process, which significantly improve the performance. Extensive experiments with four different real-world datasets demonstrate that the proposed algorithm outperforms the existing GraMi (Graph Mining algorithm by an order of magnitude for all datasets and can work with a lower support threshold.

  13. Data Mining in Education : A Review on the Knowledge Discovery Perspective

    OpenAIRE

    Pratiyush Guleria; Manu Sood

    2014-01-01

    Knowledge Discovery in Databases is the process of finding knowledge in massive amount of data where data mining is the core of this process. Data minin g can be used to mine understandable meaningful patterns from large databases and these patterns ma y then be converted into knowledge.Data mining is t he process of extracting the information and patterns derived by the KDD process which helps in crucial decision-making.Data mining works with data warehou se and...

  14. Expressive power of an algebra for data mining

    NARCIS (Netherlands)

    Calders, T.; Lakshmanan, L.V.S.; Ng, R.T.; Paredaens, J.

    2006-01-01

    The relational data model has simple and clear foundations on which significant theoretical and systems research has flourished. By contrast, most research on data mining has focused on algorithmic issues. A major open question is: what's an appropriate foundation for data mining, which can

  15. [Vegetation spatial and temporal dynamic characteristics based on NDVI time series trajectories in grassland opencast coal mining].

    Science.gov (United States)

    Jia, Duo; Wang, Cang Jiao; Mu, Shou Guo; Zhao, Hua

    2017-06-18

    The spatiotemporal dynamic patterns of vegetation in mining area are still unclear. This study utilized time series trajectory segmentation algorithm to fit Landsat NDVI time series which generated from fusion images at the most prosperous period of growth based on ESTARFM algorithm. Combining with the shape features of the fitted trajectory, this paper extracted five vegetation dynamic patterns including pre-disturbance type, continuous disturbance type, stabilization after disturbance type, stabilization between disturbance and recovery type, and recovery after disturbance type. The result indicated that recovery after disturbance type was the dominant vegetation change pattern among the five types of vegetation dynamic pattern, which accounted for 55.2% of the total number of pixels. The follows were stabilization after disturbance type and continuous disturbance type, accounting for 25.6% and 11.0%, respectively. The pre-disturbance type and stabilization between disturbance and recovery type accounted for 3.5% and 4.7%, respectively. Vegetation disturbance mainly occurred from 2004 to 2009 in Shengli mining area. The onset time of stable state was 2008 and the spatial locations mainlydistributed in open-pit stope and waste dump. The reco-very state mainly started since the year of 2008 and 2010, while the areas were small and mainly distributed at the periphery of open-pit stope and waste dump. Duration of disturbance was mainly 1 year. The duration of stable period usually sustained 7 years. The duration of recovery state of the type of stabilization between disturbances continued 2 to 5 years, while the type of recovery after disturbance often sustained 8 years.

  16. Discriminative Chemical Patterns: Automatic and Interactive Design.

    Science.gov (United States)

    Bietz, Stefan; Schomburg, Karen T; Hilbig, Matthias; Rarey, Matthias

    2015-08-24

    The classification of molecules with respect to their inhibiting, activating, or toxicological potential constitutes a central aspect in the field of cheminformatics. Often, a discriminative feature is needed to distinguish two different molecule sets. Besides physicochemical properties, substructures and chemical patterns belong to the descriptors most frequently applied for this purpose. As a commonly used example of this descriptor class, SMARTS strings represent a powerful concept for the representation and processing of abstract chemical patterns. While their usage facilitates a convenient way to apply previously derived classification rules on new molecule sets, the manual generation of useful SMARTS patterns remains a complex and time-consuming process. Here, we introduce SMARTSminer, a new algorithm for the automatic derivation of discriminative SMARTS patterns from preclassified molecule sets. Based on a specially adapted subgraph mining algorithm, SMARTSminer identifies structural features that are frequent in only one of the given molecule classes. In comparison to elemental substructures, it also supports the consideration of general and specific SMARTS features. Furthermore, SMARTSminer is integrated into an interactive pattern editor named SMARTSeditor. This allows for an intuitive visualization on the basis of the SMARTSviewer concept as well as interactive adaption and further improvement of the generated patterns. Additionally, a new molecular matching feature provides an immediate feedback on a pattern's matching behavior across the molecule sets. We demonstrate the utility of the SMARTSminer functionality and its integration into the SMARTSeditor software in several different classification scenarios.

  17. Data Mining for Understanding and Impriving Decision-Making Affecting Ground Delay Programs

    Science.gov (United States)

    Kulkarni, Deepak; Wang, Yao Xun; Sridhar, Banavar

    2013-01-01

    The continuous growth in the demand for air transportation results in an imbalance between airspace capacity and traffic demand. The airspace capacity of a region depends on the ability of the system to maintain safe separation between aircraft in the region. In addition to growing demand, the airspace capacity is severely limited by convective weather. During such conditions, traffic managers at the FAA's Air Traffic Control System Command Center (ATCSCC) and dispatchers at various Airlines' Operations Center (AOC) collaborate to mitigate the demand-capacity imbalance caused by weather. The end result is the implementation of a set of Traffic Flow Management (TFM) initiatives such as ground delay programs, reroute advisories, flow metering, and ground stops. Data Mining is the automated process of analyzing large sets of data and then extracting patterns in the data. Data mining tools are capable of predicting behaviors and future trends, allowing an organization to benefit from past experience in making knowledge-driven decisions. The work reported in this paper is focused on ground delay programs. Data mining algorithms have the potential to develop associations between weather patterns and the corresponding ground delay program responses. If successful, they can be used to improve and standardize TFM decision resulting in better predictability of traffic flows on days with reliable weather forecasts. The approach here seeks to develop a set of data mining and machine learning models and apply them to historical archives of weather observations and forecasts and TFM initiatives to determine the extent to which the theory can predict and explain the observed traffic flow behaviors.

  18. A STUDY OF TEXT MINING METHODS, APPLICATIONS,AND TECHNIQUES

    OpenAIRE

    R. Rajamani*1 & S. Saranya2

    2017-01-01

    Data mining is used to extract useful information from the large amount of data. It is used to implement and solve different types of research problems. The research related areas in data mining are text mining, web mining, image mining, sequential pattern mining, spatial mining, medical mining, multimedia mining, structure mining and graph mining. Text mining also referred to text of data mining, it is also called knowledge discovery in text (KDT) or knowledge of intelligent text analysis. T...

  19. Advanced land mine detection using a synthesis of conventional technologies

    International Nuclear Information System (INIS)

    Rappaport, C.M.

    1998-01-01

    A team at Northeastern University develops and optimizes land mine detection based on ground-penetrating radar, infrared thermography, electromagnetic induction (EM), and high frequency acoustic sensors. It implements sophisticated, physics-based mathematical models to describe the interaction of EM or acoustic waves with mines buried in realistic (electromagnetically loose, inhomogeneous) soil and as a result develops signal processing algorithms to identify and classify mines. These mathematical models are derived from actual soil and land mine measurements, and include detection statistics of the sensors. The novel aspects of Northeastern University's approach are: (1) to combine multiple sensors synergistically, yielding more information than would be available to any single sencor technology operating alone, and (2) to use signal-processing algorithms derived from physics-based models which take into account the actual sensor parameters as well as material and electrical characteristics of the soil and land mines

  20. Cooperative organic mine avoidance path planning

    Science.gov (United States)

    McCubbin, Christopher B.; Piatko, Christine D.; Peterson, Adam V.; Donnald, Creighton R.; Cohen, David

    2005-06-01

    The JHU/APL Path Planning team has developed path planning techniques to look for paths that balance the utility and risk associated with different routes through a minefield. Extending on previous years' efforts, we investigated real-world Naval mine avoidance requirements and developed a tactical decision aid (TDA) that satisfies those requirements. APL has developed new mine path planning techniques using graph based and genetic algorithms which quickly produce near-minimum risk paths for complicated fitness functions incorporating risk, path length, ship kinematics, and naval doctrine. The TDA user interface, a Java Swing application that obtains data via Corba interfaces to path planning databases, allows the operator to explore a fusion of historic and in situ mine field data, control the path planner, and display the planning results. To provide a context for the minefield data, the user interface also renders data from the Digital Nautical Chart database, a database created by the National Geospatial-Intelligence Agency containing charts of the world's ports and coastal regions. This TDA has been developed in conjunction with the COMID (Cooperative Organic Mine Defense) system. This paper presents a description of the algorithms, architecture, and application produced.

  1. Transparent data mining for big and small data

    CERN Document Server

    Quercia, Daniele; Pasquale, Frank

    2017-01-01

    This book focuses on new and emerging data mining solutions that offer a greater level of transparency than existing solutions. Transparent data mining solutions with desirable properties (e.g. effective, fully automatic, scalable) are covered in the book. Experimental findings of transparent solutions are tailored to different domain experts, and experimental metrics for evaluating algorithmic transparency are presented. The book also discusses societal effects of black box vs. transparent approaches to data mining, as well as real-world use cases for these approaches. As algorithms increasingly support different aspects of modern life, a greater level of transparency is sorely needed, not least because discrimination and biases have to be avoided. With contributions from domain experts, this book provides an overview of an emerging area of data mining that has profound societal consequences, and provides the technical background to for readers to contribute to the field or to put existing approaches to prac...

  2. WEKA-G: Parallel data mining on computational grids

    Directory of Open Access Journals (Sweden)

    PIMENTA, A.

    2009-12-01

    Full Text Available Data mining is a technology that can extract useful information from large amounts of data. However, mining a database often requires a high computational power. To resolve this problem, this paper presents a tool (Weka-G, which runs in parallel algorithms used in the mining process data. As the environment for doing so, we use a computational grid by adding several features within a WAN.

  3. An improved genetic algorithm for designing optimal temporal patterns of neural stimulation

    Science.gov (United States)

    Cassar, Isaac R.; Titus, Nathan D.; Grill, Warren M.

    2017-12-01

    Objective. Electrical neuromodulation therapies typically apply constant frequency stimulation, but non-regular temporal patterns of stimulation may be more effective and more efficient. However, the design space for temporal patterns is exceedingly large, and model-based optimization is required for pattern design. We designed and implemented a modified genetic algorithm (GA) intended for design optimal temporal patterns of electrical neuromodulation. Approach. We tested and modified standard GA methods for application to designing temporal patterns of neural stimulation. We evaluated each modification individually and all modifications collectively by comparing performance to the standard GA across three test functions and two biophysically-based models of neural stimulation. Main results. The proposed modifications of the GA significantly improved performance across the test functions and performed best when all were used collectively. The standard GA found patterns that outperformed fixed-frequency, clinically-standard patterns in biophysically-based models of neural stimulation, but the modified GA, in many fewer iterations, consistently converged to higher-scoring, non-regular patterns of stimulation. Significance. The proposed improvements to standard GA methodology reduced the number of iterations required for convergence and identified superior solutions.

  4. Uncertainty modeling for data mining a label semantics approach

    CERN Document Server

    Qin, Zengchang

    2014-01-01

    Outlining a new research direction in fuzzy set theory applied to data mining, this volume proposes a number of new data mining algorithms and includes dozens of figures and illustrations that help the reader grasp the complexities of the concepts.

  5. Algorithm for real-time detection of signal patterns using phase synchrony: an application to an electrode array

    Science.gov (United States)

    Sadeghi, Saman; MacKay, William A.; van Dam, R. Michael; Thompson, Michael

    2011-02-01

    Real-time analysis of multi-channel spatio-temporal sensor data presents a considerable technical challenge for a number of applications. For example, in brain-computer interfaces, signal patterns originating on a time-dependent basis from an array of electrodes on the scalp (i.e. electroencephalography) must be analyzed in real time to recognize mental states and translate these to commands which control operations in a machine. In this paper we describe a new technique for recognition of spatio-temporal patterns based on performing online discrimination of time-resolved events through the use of correlation of phase dynamics between various channels in a multi-channel system. The algorithm extracts unique sensor signature patterns associated with each event during a training period and ranks importance of sensor pairs in order to distinguish between time-resolved stimuli to which the system may be exposed during real-time operation. We apply the algorithm to electroencephalographic signals obtained from subjects tested in the neurophysiology laboratories at the University of Toronto. The extension of this algorithm for rapid detection of patterns in other sensing applications, including chemical identification via chemical or bio-chemical sensor arrays, is also discussed.

  6. Algorithmic Information Dynamics of Persistent Patterns and Colliding Particles in the Game of Life

    KAUST Repository

    Zenil, Hector

    2018-02-18

    We demonstrate the way to apply and exploit the concept of \\\\textit{algorithmic information dynamics} in the characterization and classification of dynamic and persistent patterns, motifs and colliding particles in, without loss of generalization, Conway\\'s Game of Life (GoL) cellular automaton as a case study. We analyze the distribution of prevailing motifs that occur in GoL from the perspective of algorithmic probability. We demonstrate how the tools introduced are an alternative to computable measures such as entropy and compression algorithms which are often nonsensitive to small changes and features of non-statistical nature in the study of evolving complex systems and their emergent structures.

  7. Vegetation pattern and heavy metal accumulation at a mine tailing at Gyöngyösoroszi, hungary.

    Science.gov (United States)

    Tamás, János; Kovács, Alza

    2005-01-01

    Vegetation at an abandoned heavy metal bearing mine tailing may have multifunctional roles such as modification of water balance, erosion control and landscape rehabilitation. Research on the vegetation of mine tailings can provide useful information on tolerance, accumulation and translocation properties of species potentially applicable at moderately contaminated sites. Analyses of the relationship between heavy metal content (Pb, Zn and Cu) and vegetation in a mine tailing were carried out. These analyses included: (1) spatial analysis of relationship among heavy metal distribution, pH and vegetation patterns, and (2) analysis of heavy metal accumulation and translocation in some plant species. Presence of vegetation was found to be significantly dependent on pH value, which confirms that phytotoxicity is a function of element concentration in solution, which is primarily controlled by pH value in mine tailings. Among the most abundant plant species, dewberry (Rubus caesius), vipersbugloss (Echium vulgare), scarlet pimpernel (Anagallis arvensis) and narrowleaf plantain (Plantago lanceolata) accumulate significant amounts of Pb, Cu and Zn, while in the case of annual bluegrass (Poa annua) only Pb can be measured in elevated contents. Considering the translocation features, scarlet pimpernel, narrowleaf plantain, and dewberry accumulate heavy metals primarily in their roots, while heavy metal concentration in vipersbugloss and annual bluegrass is higher in the shoots.

  8. Creating ensembles of oblique decision trees with evolutionary algorithms and sampling

    Science.gov (United States)

    Cantu-Paz, Erick [Oakland, CA; Kamath, Chandrika [Tracy, CA

    2006-06-13

    A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.

  9. Improving clinical decision support using data mining techniques

    Science.gov (United States)

    Burn-Thornton, Kath E.; Thorpe, Simon I.

    1999-02-01

    Physicians, in their ever-demanding jobs, are looking to decision support systems for aid in clinical diagnosis. However, clinical decision support systems need to be of sufficiently high accuracy that they help, rather than hinder, the physician in his/her diagnosis. Decision support systems with accuracies, of patient state determination, of greater than 80 percent, are generally perceived to be sufficiently accurate to fulfill the role of helping the physician. We have previously shown that data mining techniques have the potential to provide the underpinning technology for clinical decision support systems. In this paper, an extension of the work in reverence 2, we describe how changes in data mining methodologies, for the analysis of 12-lead ECG data, improve the accuracy by which data mining algorithms determine which patients are suffering from heart disease. We show that the accuracy of patient state prediction, for all the algorithms, which we investigated, can be increased by up to 6 percent, using the combination of appropriate test training ratios and 5-fold cross-validation. The use of cross-validation greater than 5-fold, appears to reduce the improvement in algorithm classification accuracy gained by the use of this validation method. The accuracy of 84 percent in patient state predictions, obtained using the algorithm OCI, suggests that this algorithm will be capable of providing the required accuracy for clinical decision support systems.

  10. Percolator: Scalable Pattern Discovery in Dynamic Graphs

    Energy Technology Data Exchange (ETDEWEB)

    Choudhury, Sutanay; Purohit, Sumit; Lin, Peng; Wu, Yinghui; Holder, Lawrence B.; Agarwal, Khushbu

    2018-02-06

    We demonstrate Percolator, a distributed system for graph pattern discovery in dynamic graphs. In contrast to conventional mining systems, Percolator advocates efficient pattern mining schemes that (1) support pattern detection with keywords; (2) integrate incremental and parallel pattern mining; and (3) support analytical queries such as trend analysis. The core idea of Percolator is to dynamically decide and verify a small fraction of patterns and their in- stances that must be inspected in response to buffered updates in dynamic graphs, with a total mining cost independent of graph size. We demonstrate a) the feasibility of incremental pattern mining by walking through each component of Percolator, b) the efficiency and scalability of Percolator over the sheer size of real-world dynamic graphs, and c) how the user-friendly GUI of Percolator inter- acts with users to support keyword-based queries that detect, browse and inspect trending patterns. We also demonstrate two user cases of Percolator, in social media trend analysis and academic collaboration analysis, respectively.

  11. Web multimedia information retrieval using improved Bayesian algorithm.

    Science.gov (United States)

    Yu, Yi-Jun; Chen, Chun; Yu, Yi-Min; Lin, Huai-Zhong

    2003-01-01

    The main thrust of this paper is application of a novel data mining approach on the log of user's feedback to improve web multimedia information retrieval performance. A user space model was constructed based on data mining, and then integrated into the original information space model to improve the accuracy of the new information space model. It can remove clutter and irrelevant text information and help to eliminate mismatch between the page author's expression and the user's understanding and expectation. User space model was also utilized to discover the relationship between high-level and low-level features for assigning weight. The authors proposed improved Bayesian algorithm for data mining. Experiment proved that the authors' proposed algorithm was efficient.

  12. Data Mining Practical Machine Learning Tools and Techniques

    CERN Document Server

    Witten, Ian H; Hall, Mark A

    2011-01-01

    Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place

  13. [Exploring the clinical characters of Shugan Jieyu capsule through text mining].

    Science.gov (United States)

    Pu, Zheng-Ping; Xia, Jiang-Ming; Xie, Wei; He, Jin-Cai

    2017-09-01

    The study was main to explore the clinical characters of Shugan Jieyu capsule through text mining. The data sets of Shugan Jieyu capsule were downloaded from CMCC database by the method of literature retrieved from May 2009 to Jan 2016. Rules of Chinese medical patterns, diseases, symptoms and combination treatment were mined out by data slicing algorithm, and they were demonstrated in frequency tables and two dimension based network. Then totally 190 literature were recruited. The outcomess suggested that SC was most frequently correlated with liver Qi stagnation. Primary depression, depression due to brain disease, concomitant depression followed by physical diseases, concomitant depression followed by schizophrenia and functional dyspepsia were main diseases treated by Shugan Jieyu capsule. Symptoms like low mood, psychic anxiety, somatic anxiety and dysfunction of automatic nerve were mainy relieved bv Shugan Jieyu capsule.For combination treatment. Shugan Jieyu capsule was most commonly used with paroxetine, sertraline and fluoxetine. The research suggested that syndrome types and mining results of Shugan Jieyu capsule were almost the same as its instructions. Syndrome of malnutrition of heart spirit was the potential Chinese medical pattern of Shugan Jieyu capsule. Primary comorbid anxiety and depression, concomitant comorbid anxiety and depression followed by physical diseases, and postpartum depression were potential diseases treated by Shugan Jieyu capsule.For combination treatment, Shugan Jieyu capsule was most commonly used with paroxetine, sertraline and fluoxetine. Copyright© by the Chinese Pharmaceutical Association.

  14. Data mining in pharma sector: benefits.

    Science.gov (United States)

    Ranjan, Jayanthi

    2009-01-01

    The amount of data getting generated in any sector at present is enormous. The information flow in the pharma industry is huge. Pharma firms are progressing into increased technology-enabled products and services. Data mining, which is knowledge discovery from large sets of data, helps pharma firms to discover patterns in improving the quality of drug discovery and delivery methods. The paper aims to present how data mining is useful in the pharma industry, how its techniques can yield good results in pharma sector, and to show how data mining can really enhance in making decisions using pharmaceutical data. This conceptual paper is written based on secondary study, research and observations from magazines, reports and notes. The author has listed the types of patterns that can be discovered using data mining in pharma data. The paper shows how data mining is useful in the pharma industry and how its techniques can yield good results in pharma sector. Although much work can be produced for discovering knowledge in pharma data using data mining, the paper is limited to conceptualizing the ideas and view points at this stage; future work may include applying data mining techniques to pharma data based on primary research using the available, famous significant data mining tools. Research papers and conceptual papers related to data mining in Pharma industry are rare; this is the motivation for the paper.

  15. Vlsi implementation of flexible architecture for decision tree classification in data mining

    Science.gov (United States)

    Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak

    2017-07-01

    The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.

  16. Process mining online assessment data

    NARCIS (Netherlands)

    Pechenizkiy, M.; Trcka, N.; Vasilyeva, E.; Aalst, van der W.M.P.; De Bra, P.M.E.; Barnes, T.; Desmarais, M.; Romero, C.; Ventura, S.

    2009-01-01

    Traditional data mining techniques have been extensively applied to find interesting patterns, build descriptive and predictive models from large volumes of data accumulated through the use of different information systems. The results of data mining can be used for getting a better understanding of

  17. Discovering More Accurate Frequent Web Usage Patterns

    OpenAIRE

    Bayir, Murat Ali; Toroslu, Ismail Hakki; Cosar, Ahmet; Fidan, Guven

    2008-01-01

    Web usage mining is a type of web mining, which exploits data mining techniques to discover valuable information from navigation behavior of World Wide Web users. As in classical data mining, data preparation and pattern discovery are the main issues in web usage mining. The first phase of web usage mining is the data processing phase, which includes the session reconstruction operation from server logs. Session reconstruction success directly affects the quality of the frequent patterns disc...

  18. Data Mining and Statistics for Decision Making

    CERN Document Server

    Tufféry, Stéphane

    2011-01-01

    Data mining is the process of automatically searching large volumes of data for models and patterns using computational techniques from statistics, machine learning and information theory; it is the ideal tool for such an extraction of knowledge. Data mining is usually associated with a business or an organization's need to identify trends and profiles, allowing, for example, retailers to discover patterns on which to base marketing objectives. This book looks at both classical and recent techniques of data mining, such as clustering, discriminant analysis, logistic regression, generalized lin

  19. Depth data research of GIS based on clustering analysis algorithm

    Science.gov (United States)

    Xiong, Yan; Xu, Wenli

    2018-03-01

    The data of GIS have spatial distribution. Geographic data has both spatial characteristics and attribute characteristics, and also changes with time. Therefore, the amount of data is very large. Nowadays, many industries and departments in the society are using GIS. However, without proper data analysis and mining scheme, GIS will not exert its maximum effectiveness and will waste a lot of data. In this paper, we use the geographic information demand of a national security department as the experimental object, combining the characteristics of GIS data, taking into account the characteristics of time, space, attributes and so on, and using cluster analysis algorithm. We further study the mining scheme for depth data, and get the algorithm model. This algorithm can automatically classify sample data, and then carry out exploratory analysis. The research shows that the algorithm model and the information mining scheme can quickly find hidden depth information from the surface data of GIS, thus improving the efficiency of the security department. This algorithm can also be extended to other fields.

  20. Modeling N Cycling during Succession after Forest Disturbance: an Analysis of N Mining and Retention Hypothesis

    Science.gov (United States)

    Zhou, Z.; Ollinger, S. V.; Ouimette, A.; Lovett, G. M.; Fuss, C. B.; Goodale, C. L.

    2017-12-01

    Dissolved inorganic nitrogen losses at the Hubbard Brook Experimental Forest (HBEF), New Hampshire, USA, have declined in recent decades, a pattern that counters expectations based on prevailing theory. An unbalanced ecosystem nitrogen (N) budget implies there is a missing component for N sink. Hypotheses to explain this discrepancy include increasing rates of denitrification and accumulation of N in mineral soil pools following N mining by plants. Here, we conducted a modeling analysis fused with field measurements of N cycling, specifically examining the hypothesis relevant to N mining and retention in mineral soils. We included simplified representations of both mechanisms, N mining and retention, in a revised ecosystem process model, PnET-SOM, to evaluate the dynamics of N cycling during succession after forest disturbance at the HBEF. The predicted N mining during the early succession was regulated by a metric representing a potential demand of extra soil N for large wood growth. The accumulation of nitrate in mineral soil pools was a function of the net aboveground biomass accumulation and soil N availability and parameterized based on field 15N tracer incubation data. The predicted patterns of forest N dynamics were consistent with observations. The addition of the new algorithms also improved the predicted DIN export in stream water with an R squared of 0.35 (Ppay back the mined N in mineral soils. Predicted ecosystem N balance showed that N gas loss could account for 14-46% of the total N deposition, the soil mining about 103% during the early succession, and soil retention about 35% at the current forest stage at the HBEF.

  1. Object-oriented spatial-temporal association rules mining on ocean remote sensing imagery

    International Nuclear Information System (INIS)

    Xue, C J; Dong, Q; Ma, W X

    2014-01-01

    Using the long term marine remote sensing imagery, we develop an object-oriented spatial-temporal association rules mining framework to explore the association rules mining among marine environmental elements. Within the framework, two key issues are addressed. They are how to effectively deal with the related lattices and how to reduce the related dimensions? To deal with the first key issues, this paper develops an object-oriented method for abstracting marine sensitive objects from raster pixels and for representing them with a quadruple. To deal with the second key issues, by embedding the mutual information theory, we construct the direct association pattern tree to reduce the related elements at the first step, and then the Apriori algorithm is used to discover the spatio-temporal associated rules. Finally, Pacific Ocean is taken as a research area and multi- marine remote sensing imagery in recent three decades is used as a case study. The results show that the object-oriented spatio-temporal association rules mining can acquire the associated relationships not only among marine environmental elements in same region, also among the different regions. In addition, the information from association rules mining is much more expressive and informative in space and time than traditional spatio-temporal analysis

  2. Extracting software static defect models using data mining

    Directory of Open Access Journals (Sweden)

    Ahmed H. Yousef

    2015-03-01

    Full Text Available Large software projects are subject to quality risks of having defective modules that will cause failures during the software execution. Several software repositories contain source code of large projects that are composed of many modules. These software repositories include data for the software metrics of these modules and the defective state of each module. In this paper, a data mining approach is used to show the attributes that predict the defective state of software modules. Software solution architecture is proposed to convert the extracted knowledge into data mining models that can be integrated with the current software project metrics and bugs data in order to enhance the prediction. The results show better prediction capabilities when all the algorithms are combined using weighted votes. When only one individual algorithm is used, Naïve Bayes algorithm has the best results, then the Neural Network and the Decision Trees algorithms.

  3. Data mining for dummies

    CERN Document Server

    Brown, Meta S

    2014-01-01

    Delve into your data for the key to success Data mining is quickly becoming integral to creating value and business momentum. The ability to detect unseen patterns hidden in the numbers exhaustively generated by day-to-day operations allows savvy decision-makers to exploit every tool at their disposal in the pursuit of better business. By creating models and testing whether patterns hold up, it is possible to discover new intelligence that could change your business''s entire paradigm for a more successful outcome. Data Mining for Dummies shows you why it doesn''t take a data scientist to gain

  4. Searching for full power control rod patterns in a boiling water reactor using genetic algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Montes, Jose Luis [Departamento Sistemas Nucleares, ININ, Carr. Mexico-Toluca Km. 36.5, Ocoyoacac, Edo. de Mexico (Mexico)]. E-mail: jlmt@nuclear.inin.mx; Ortiz, Juan Jose [Departamento Sistemas Nucleares, ININ, Carr. Mexico-Toluca Km. 36.5, Ocoyoacac, Edo. de Mexico (Mexico)]. E-mail: jjortiz@nuclear.inin.mx; Requena, Ignacio [Departamento Ciencias Computacion e I.A. ETSII, Informatica, Universidad de Granada, C. Daniel Saucedo Aranda s/n. 18071 Granada (Spain)]. E-mail: requena@decsai.ugr.es; Perusquia, Raul [Departamento Sistemas Nucleares, ININ, Carr. Mexico-Toluca Km. 36.5, Ocoyoacac, Edo. de Mexico (Mexico)]. E-mail: rpc@nuclear.inin.mx

    2004-11-01

    One of the most important questions related to both safety and economic aspects in a nuclear power reactor operation, is without any doubt its reactivity control. During normal operation of a boiling water reactor, the reactivity control of its core is strongly determined by control rods patterns efficiency. In this paper, GACRP system is proposed based on the concepts of genetic algorithms for full power control rod patterns search. This system was carried out using LVNPP transition cycle characteristics, being applied too to an equilibrium cycle. Several operation scenarios, including core water flow variation throughout the cycle and different target axial power distributions, are considered. Genetic algorithm fitness function includes reactor security parameters, such as MLHGR, MCPR, reactor k{sub eff} and axial power density.

  5. An AK-LDMeans algorithm based on image clustering

    Science.gov (United States)

    Chen, Huimin; Li, Xingwei; Zhang, Yongbin; Chen, Nan

    2018-03-01

    Clustering is an effective analytical technique for handling unmarked data for value mining. Its ultimate goal is to mark unclassified data quickly and correctly. We use the roadmap for the current image processing as the experimental background. In this paper, we propose an AK-LDMeans algorithm to automatically lock the K value by designing the Kcost fold line, and then use the long-distance high-density method to select the clustering centers to further replace the traditional initial clustering center selection method, which further improves the efficiency and accuracy of the traditional K-Means Algorithm. And the experimental results are compared with the current clustering algorithm and the results are obtained. The algorithm can provide effective reference value in the fields of image processing, machine vision and data mining.

  6. Data mining in radiology

    International Nuclear Information System (INIS)

    Kharat, Amit T; Singh, Amarjit; Kulkarni, Vilas M; Shah, Digish

    2014-01-01

    Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining

  7. The use of antenna radiation pattern in node localisation algorithms for wireless sensor networks

    CSIR Research Space (South Africa)

    Mwila, MK

    2014-08-01

    Full Text Available due to the limited accuracy inherent to the current ranging model. These models, however, make the assumption that the antenna radiation pattern is omnidirectional targeted to simplifying the complexity of the algorithms. An increasing number of sensor...

  8. hs-CRP is strongly associated with coronary heart disease (CHD): A data mining approach using decision tree algorithm.

    Science.gov (United States)

    Tayefi, Maryam; Tajfard, Mohammad; Saffar, Sara; Hanachi, Parichehr; Amirabadizadeh, Ali Reza; Esmaeily, Habibollah; Taghipour, Ali; Ferns, Gordon A; Moohebati, Mohsen; Ghayour-Mobarhan, Majid

    2017-04-01

    Coronary heart disease (CHD) is an important public health problem globally. Algorithms incorporating the assessment of clinical biomarkers together with several established traditional risk factors can help clinicians to predict CHD and support clinical decision making with respect to interventions. Decision tree (DT) is a data mining model for extracting hidden knowledge from large databases. We aimed to establish a predictive model for coronary heart disease using a decision tree algorithm. Here we used a dataset of 2346 individuals including 1159 healthy participants and 1187 participant who had undergone coronary angiography (405 participants with negative angiography and 782 participants with positive angiography). We entered 10 variables of a total 12 variables into the DT algorithm (including age, sex, FBG, TG, hs-CRP, TC, HDL, LDL, SBP and DBP). Our model could identify the associated risk factors of CHD with sensitivity, specificity, accuracy of 96%, 87%, 94% and respectively. Serum hs-CRP levels was at top of the tree in our model, following by FBG, gender and age. Our model appears to be an accurate, specific and sensitive model for identifying the presence of CHD, but will require validation in prospective studies. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Fringe pattern analysis for optical metrology theory, algorithms, and applications

    CERN Document Server

    Servin, Manuel; Padilla, Moises

    2014-01-01

    The main objective of this book is to present the basic theoretical principles and practical applications for the classical interferometric techniques and the most advanced methods in the field of modern fringe pattern analysis applied to optical metrology. A major novelty of this work is the presentation of a unified theoretical framework based on the Fourier description of phase shifting interferometry using the Frequency Transfer Function (FTF) along with the theory of Stochastic Process for the straightforward analysis and synthesis of phase shifting algorithms with desired properties such

  10. Performance Evaluation of Machine Learning Algorithms for Urban Pattern Recognition from Multi-spectral Satellite Images

    Directory of Open Access Journals (Sweden)

    Marc Wieland

    2014-03-01

    Full Text Available In this study, a classification and performance evaluation framework for the recognition of urban patterns in medium (Landsat ETM, TM and MSS and very high resolution (WorldView-2, Quickbird, Ikonos multi-spectral satellite images is presented. The study aims at exploring the potential of machine learning algorithms in the context of an object-based image analysis and to thoroughly test the algorithm’s performance under varying conditions to optimize their usage for urban pattern recognition tasks. Four classification algorithms, Normal Bayes, K Nearest Neighbors, Random Trees and Support Vector Machines, which represent different concepts in machine learning (probabilistic, nearest neighbor, tree-based, function-based, have been selected and implemented on a free and open-source basis. Particular focus is given to assess the generalization ability of machine learning algorithms and the transferability of trained learning machines between different image types and image scenes. Moreover, the influence of the number and choice of training data, the influence of the size and composition of the feature vector and the effect of image segmentation on the classification accuracy is evaluated.

  11. Process Mining Online Assessment Data

    Science.gov (United States)

    Pechenizkiy, Mykola; Trcka, Nikola; Vasilyeva, Ekaterina; van der Aalst, Wil; De Bra, Paul

    2009-01-01

    Traditional data mining techniques have been extensively applied to find interesting patterns, build descriptive and predictive models from large volumes of data accumulated through the use of different information systems. The results of data mining can be used for getting a better understanding of the underlying educational processes, for…

  12. PERANCANGAN SISTEM PREDIKSI CHURN PELANGGAN PT. TELEKOMUNIKASI SELULER DENGAN MEMANFAATKAN PROSES DATA MINING

    Directory of Open Access Journals (Sweden)

    Rajesri Govindaraju

    2008-01-01

    Full Text Available The purpose of this research is to design a customer churn prediction system using data mining approach. This system is able to perform data integration, data cleaning, data transformation, sampling and data splitting, prediction model building, predicting customer churn, and show the results in certain agreed forms. Churn prediction variables were identified based on earlier research reports that include customer information, payment method, call pattern, complaint data, telecommunication services usage and change of telecommunication services usage behavior data. The preferred mining technique used is the classification with decision tree algorithm. The decision tree can present visual model which represents customer churn and non churn pattern behavior. This system was tested using Kartu Halo customer data in Bandung area and testing result showed 70,94% accuracy of the prediction model. Abstract in Bahasa Indonesia : Penelitian ini bertujuan merancang sistem prediksi churn pelanggan yang memanfaatkan proses data mining. Sistem yang dihasilkan dapat melakukan integrasi data, pembersihan data, transformasi data, sampling dan pemisahan data, konstruksi model prediksi, memprediksi churn pelanggan dan menampilkan hasil prediksi dalam format laporan tertentu yang diperlukan. Identifikasi variabel-variabel prediksi churn dilakukan berdasarkan model prediksi churn yang telah dikembangkan pada penelitian terdahulu yang antara lain mencakup informasi mengenai pelanggan, metode pembayaran, data percakapan, data penggunaan jenis-jenis layanan telekomunikasi dan data yang menggambarkan perubahan perilaku penggunaan layanan telekomunikasi tersebut. Teknik mining yang dipilih adalah teknik klasifikasi dengan algoritma decision tree. Decision tree menghasilkan model visual yang merepresentasikan pola perilaku pelanggan yang churn dan tidak churn. Uji coba sistem yang dilakukan menggunakan data pelanggan Kartu Halo daerah Bandung menghasilkan tingkat akurasi

  13. Data Stream Mining

    Science.gov (United States)

    Gaber, Mohamed Medhat; Zaslavsky, Arkady; Krishnaswamy, Shonali

    Data mining is concerned with the process of computationally extracting hidden knowledge structures represented in models and patterns from large data repositories. It is an interdisciplinary field of study that has its roots in databases, statistics, machine learning, and data visualization. Data mining has emerged as a direct outcome of the data explosion that resulted from the success in database and data warehousing technologies over the past two decades (Fayyad, 1997,Fayyad, 1998,Kantardzic, 2003).

  14. Mining top-k frequent closed itemsets in data streams using sliding window

    International Nuclear Information System (INIS)

    Rehman, Z.; Shahbaz, M.

    2013-01-01

    Frequent itemset mining has become a popular research area in data mining community since the last few years. T here are two main technical hitches while finding frequent itemsets. First, to provide an appropriate minimum support value to start and user need to tune this minimum support value by running the algorithm again and again. Secondly, generated frequent itemsets are mostly numerous and as a result a number of association rules generated are also very large in numbers. Applications dealing with streaming environment need to process the data received at high rate, therefore, finding frequent itemsets in data streams becomes complex. In this paper, we present an algorithm to mine top-k frequent closed itemsets using sliding window approach from streaming data. We developed a single-pass algorithm to find frequent closed itemsets of length between user's defined minimum and maximum- length. To improve the performance of algorithm and to avoid rescanning of data, we have transformed data into bitmap based tree data structure. (author)

  15. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    Science.gov (United States)

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Quantification of Operational Risk Using A Data Mining

    Science.gov (United States)

    Perera, J. Sebastian

    1999-01-01

    What is Data Mining? - Data Mining is the process of finding actionable information hidden in raw data. - Data Mining helps find hidden patterns, trends, and important relationships often buried in a sea of data - Typically, automated software tools based on advanced statistical analysis and data modeling technology can be utilized to automate the data mining process

  17. Study of high speed complex number algorithms. [for determining antenna for field radiation patterns

    Science.gov (United States)

    Heisler, R.

    1981-01-01

    A method of evaluating the radiation integral on the curved surface of a reflecting antenna is presented. A three dimensional Fourier transform approach is used to generate a two dimensional radiation cross-section along a planer cut at any angle phi through the far field pattern. Salient to the method is an algorithm for evaluating a subset of the total three dimensional discrete Fourier transform results. The subset elements are selectively evaluated to yield data along a geometric plane of constant. The algorithm is extremely efficient so that computation of the induced surface currents via the physical optics approximation dominates the computer time required to compute a radiation pattern. Application to paraboloid reflectors with off-focus feeds in presented, but the method is easily extended to offset antenna systems and reflectors of arbitrary shapes. Numerical results were computed for both gain and phase and are compared with other published work.

  18. Below a Historic Mercury Mine: Non-linear Patterns of Mercury Bioaccumulation in Aquatic Organisms

    Science.gov (United States)

    Haas, J.; Ichikawa, G.; Ode, P.; Salsbery, D.; Abel, J.

    2001-12-01

    Unlike most heavy metals, mercury is capable of bioaccumulating in aquatic food-chains, primarily because it is methylated by bacteria in sediment to the more toxic methylmercury form. Mercury concentrations in a number of riparian systems in California are highly elevated as a result of historic mining activities. These activities included both the mining of cinnabar in the coastal ranges to recover elemental mercury and the use of elemental mercury in the gold fields of the Sierra Nevada Mountains. The most productive mercury mining area was the New Almaden District, now a county park, located in the Guadalupe River drainage of Santa Clara County, where cinnabar was mined and retorted for over 100 years. As a consequence, riparian systems in several subwatersheds of the Guadalupe River drainage are contaminated with total mercury concentrations that exceed state hazardous waste criteria. Mercury concentrations in fish tissue frequently exceed human health guidelines. However, the potential ecological effects of these elevated mercury concentrations have not been thoroughly evaluated. One difficulty is in extrapolating sediment concentrations to fish tissue concentrations without accounting for physical and biological processes that determine bioaccumulation patterns. Many processes, such as methylation and demethylation of mercury by bacteria, assimilation efficiency in invertebrates, and metabolic rates in fish, are nonlinear, a factor that often confounds attempts to evaluate the effects of mercury contamination on aquatic food webs. Sediment, benthic macroinvertebrate, and fish tissue samples were collected in 1998 from the Guadalupe River drainage in Santa Clara County at 13 sites upstream and downstream from the historic mining district. Sediment and macroinvertebrate samples were analyzed for total mercury and methylmercury. Fish samples were analyzed for total mercury as whole bodies, composited by species and size. While linear correlations of sediment

  19. Advances in Machine Learning and Data Mining for Astronomy

    Science.gov (United States)

    Way, Michael J.; Scargle, Jeffrey D.; Ali, Kamal M.; Srivastava, Ashok N.

    2012-03-01

    Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.

  20. Mining-induced seismicity at the Lucky Friday Mine: Seismic events of magnitude >2.5, 1989--1994

    Energy Technology Data Exchange (ETDEWEB)

    Whyatt, J.K.; Williams, T.J. [USDOE, Spokane, WA (United States). Spokane Research Center; Blake, W. [Blake (W.), Hayden Lake, ID (United States); Sprenke, K. [Idaho Univ., Moscow, ID (United States); Wideman, C. [Montana Tech, Butte, MT (United States)

    1996-09-01

    An understanding of the types of seismic events that occur in a deep mine provides a foundation for assessing the seismic characteristics of these events and the degree to which initiation of these events can be anticipated or controlled. This study is a first step toward developing such an understanding of seismic events generated by mining in the Coeur d`Alene Mining District of northern Idaho. It is based on information developed in the course of a long-standing rock burst research effort undertaken by the U. S. Bureau of Mines in cooperation with Coeur d`Alene Mining District mines and regional universities. This information was collected for 39 seismic events with local magnitudes greater than 2.5 that occurred between 1989 and 1994. One of these events occurred, on average, every 8 weeks during the study period. Five major types of characteristic events were developed from the data; these five types describe all but two of the 39 events that were studied. The most common types of events occurred, on average, once every 30 weeks. The characteristic mechanisms, first-motion patterns, damage patterns, and relationships to mining and major geologic structures were defined for each type of event. These five types of events need to be studied further to assess their ability to camouflage clandestine nuclear tests as well as the degree to which they can be anticipated and controlled.

  1. Score Mining Rents in Terms of Investment Attractiveness of Peat Mining

    Science.gov (United States)

    Alexandrov, Gennady; Yablonev, Alexander

    2017-11-01

    In this article, as determinants in the system factors underlying the investment attractiveness of the peat industry is considered a rental factor, which predetermines the significant differences and peculiarities of the investment climate in the mining business and, in particular, in the sphere of peat mining. In contrast to modern studies treated the essence and role of rents in the economic mechanism, is proposed for a new approach to solving the problems of its formation. Our approach differs in that it, firstly, adequate rental relations, objectively in extractive industries, secondly, provides consensus in the interests of the owner of peat deposits and entrepreneurs, businesses in these deposits and, thus, thirdly, contributes to the creation of a favourable investment climate in the peat extraction industry. In practical terms, in accordance with the proposed approach, we have proposed specific allocation algorithm of mining rents from the profits of peat extraction enterprises.

  2. Analysis Of Data Mining For Car Sales Sparepart Using Apriori Algorithm (Case Study: PT. IDK 1 FIELD

    Directory of Open Access Journals (Sweden)

    Khairul Ummi

    2016-10-01

    Full Text Available PT. IDK 1 is one of the branch offices honda car dealership that sells various types of variants honda matic or manual car and motorcycle parts. Any sales or goods sold will be performed by inputting the database directly connected directly to the central office. But PT. IDK 1 do not know a couple items frequently purchased parts simultaneously. When the stock of spare parts which amount is low, the office is only asking them to send the stock of spare parts from the central office without knowing that the other parts if the parts were purchased then the other parts were also purchased. It was considered difficult when restocking of goods because of the many types of auto parts. Data mining techniques have been widely used to solve the existing problems with the implementation of the algorithm one A-Priori to obtain information about the association between the product of a database transaction. Sales transaction data honda car parts at PT. IDK 1 can be reprocessed using data mining applications resulting association rules is a strong link between itemset sales of spare parts so that it can provide recommendations and facilitate restocking items in the arrangement or placement of goods related to a strong interdependence.

  3. Design of Compressed Sensing Algorithm for Coal Mine IoT Moving Measurement Data Based on a Multi-Hop Network and Total Variation

    Directory of Open Access Journals (Sweden)

    Gang Wang

    2018-05-01

    Full Text Available As the application of a coal mine Internet of Things (IoT, mobile measurement devices, such as intelligent mine lamps, cause moving measurement data to be increased. How to transmit these large amounts of mobile measurement data effectively has become an urgent problem. This paper presents a compressed sensing algorithm for the large amount of coal mine IoT moving measurement data based on a multi-hop network and total variation. By taking gas data in mobile measurement data as an example, two network models for the transmission of gas data flow, namely single-hop and multi-hop transmission modes, are investigated in depth, and a gas data compressed sensing collection model is built based on a multi-hop network. To utilize the sparse characteristics of gas data, the concept of total variation is introduced and a high-efficiency gas data compression and reconstruction method based on Total Variation Sparsity based on Multi-Hop (TVS-MH is proposed. According to the simulation results, by using the proposed method, the moving measurement data flow from an underground distributed mobile network can be acquired and transmitted efficiently.

  4. Design of Compressed Sensing Algorithm for Coal Mine IoT Moving Measurement Data Based on a Multi-Hop Network and Total Variation.

    Science.gov (United States)

    Wang, Gang; Zhao, Zhikai; Ning, Yongjie

    2018-05-28

    As the application of a coal mine Internet of Things (IoT), mobile measurement devices, such as intelligent mine lamps, cause moving measurement data to be increased. How to transmit these large amounts of mobile measurement data effectively has become an urgent problem. This paper presents a compressed sensing algorithm for the large amount of coal mine IoT moving measurement data based on a multi-hop network and total variation. By taking gas data in mobile measurement data as an example, two network models for the transmission of gas data flow, namely single-hop and multi-hop transmission modes, are investigated in depth, and a gas data compressed sensing collection model is built based on a multi-hop network. To utilize the sparse characteristics of gas data, the concept of total variation is introduced and a high-efficiency gas data compression and reconstruction method based on Total Variation Sparsity based on Multi-Hop (TVS-MH) is proposed. According to the simulation results, by using the proposed method, the moving measurement data flow from an underground distributed mobile network can be acquired and transmitted efficiently.

  5. Spatiotemporal Data Mining: A Computational Perspective

    Directory of Open Access Journals (Sweden)

    Shashi Shekhar

    2015-10-01

    Full Text Available Explosive growth in geospatial and temporal data as well as the emergence of new technologies emphasize the need for automated discovery of spatiotemporal knowledge. Spatiotemporal data mining studies the process of discovering interesting and previously unknown, but potentially useful patterns from large spatiotemporal databases. It has broad application domains including ecology and environmental management, public safety, transportation, earth science, epidemiology, and climatology. The complexity of spatiotemporal data and intrinsic relationships limits the usefulness of conventional data science techniques for extracting spatiotemporal patterns. In this survey, we review recent computational techniques and tools in spatiotemporal data mining, focusing on several major pattern families: spatiotemporal outlier, spatiotemporal coupling and tele-coupling, spatiotemporal prediction, spatiotemporal partitioning and summarization, spatiotemporal hotspots, and change detection. Compared with other surveys in the literature, this paper emphasizes the statistical foundations of spatiotemporal data mining and provides comprehensive coverage of computational approaches for various pattern families. ISPRS Int. J. Geo-Inf. 2015, 4 2307 We also list popular software tools for spatiotemporal data analysis. The survey concludes with a look at future research needs.

  6. Algorithms bio-inspired for the pattern obtention of control bars in BWR reactors

    International Nuclear Information System (INIS)

    Ortiz, J.J.; Perusquia, R.; Montes, J.L.

    2003-01-01

    In this work methods based on Genetic Algorithms and Systems based on ant colonies for the obtention of the patterns of control bars of an equilibrium cycle of 18 months for the Laguna Verde nuclear power station are presented. A comparison of obtained results with the methods and with those of design of such equilibrium cycle is presented. As consequence of the study, it was found that the algorithm based on the ant colonies reached to diminish the coast down period (decrease of power at the end of the cycle) in five and half days with respect to the original design what represents an annual saving of $US 100,000. (Author)

  7. Fuzzy Clustering: An Approachfor Mining Usage Profilesfrom Web

    OpenAIRE

    Ms.Archana N. Boob; Prof. D. M. Dakhane

    2012-01-01

    Web usage mining is an application of data mining technology to mining the data of the web server log file. It can discover the browsing patterns of user and some kind of correlations between the web pages. Web usage mining provides the support for the web site design, providing personalization server and other business making decision, etc. Web mining applies the data mining, the artificial intelligence and the chart technology and so on to the web data and traces users' visiting characteris...

  8. Data Mining and Optimization Tools for Developing Engine Parameters Tools

    Science.gov (United States)

    Dhawan, Atam P.

    1998-01-01

    This project was awarded for understanding the problem and developing a plan for Data Mining tools for use in designing and implementing an Engine Condition Monitoring System. Tricia Erhardt and I studied the problem domain for developing an Engine Condition Monitoring system using the sparse and non-standardized datasets to be available through a consortium at NASA Lewis Research Center. We visited NASA three times to discuss additional issues related to dataset which was not made available to us. We discussed and developed a general framework of data mining and optimization tools to extract useful information from sparse and non-standard datasets. These discussions lead to the training of Tricia Erhardt to develop Genetic Algorithm based search programs which were written in C++ and used to demonstrate the capability of GA algorithm in searching an optimal solution in noisy, datasets. From the study and discussion with NASA LeRC personnel, we then prepared a proposal, which is being submitted to NASA for future work for the development of data mining algorithms for engine conditional monitoring. The proposed set of algorithm uses wavelet processing for creating multi-resolution pyramid of tile data for GA based multi-resolution optimal search.

  9. EVALUATION OF WEB SEARCHING METHOD USING A NOVEL WPRR ALGORITHM FOR TWO DIFFERENT CASE STUDIES

    Directory of Open Access Journals (Sweden)

    V. Lakshmi Praba

    2012-04-01

    Full Text Available The World-Wide Web provides every internet citizen with access to an abundance of information, but it becomes increasingly difficult to identify the relevant pieces of information. Research in web mining tries to address this problem by applying techniques from data mining and machine learning to web data and documents. Web content mining and web structure mining have important roles in identifying the relevant web page. Relevancy of web page denotes how well a retrieved web page or set of web pages meets the information need of the user. Page Rank, Weighted Page Rank and Hypertext Induced Topic Selection (HITS are existing algorithms which considers only web structure mining. Vector Space Model (VSM, Cover Density Ranking (CDR, Okapi similarity measurement (Okapi and Three-Level Scoring method (TLS are some of existing relevancy score methods which consider only web content mining. In this paper, we propose a new algorithm, Weighted Page with Relevant Rank (WPRR which is blend of both web content mining and web structure mining that demonstrates the relevancy of the page with respect to given query for two different case scenarios. It is shown that WPRR’s performance is better than the existing algorithms.

  10. Mining frequent binary expressions

    NARCIS (Netherlands)

    Calders, T.; Paredaens, J.; Kambayashi, Y.; Mohania, M.K.; Tjoa, A.M.

    2000-01-01

    In data mining, searching for frequent patterns is a common basic operation. It forms the basis of many interesting decision support processes. In this paper we present a new type of patterns, binary expressions. Based on the properties of a specified binary test, such as reflexivity, transitivity

  11. Protective and control relays as coal-mine power-supply ACS subsystem

    Science.gov (United States)

    Kostin, V. N.; Minakova, T. E.

    2017-10-01

    The paper presents instantaneous selective short-circuit protection for the cabling of the underground part of a coal mine and central control algorithms as a Coal-Mine Power-Supply ACS Subsystem. In order to improve the reliability of electricity supply and reduce the mining equipment down-time, a dual channel relay protection and central control system is proposed as a subsystem of the coal-mine power-supply automated control system (PS ACS).

  12. A framework for query optimization to support data mining

    NARCIS (Netherlands)

    S.R. Choenni (Sunil); A.P.J.M. Siebes (Arno)

    1996-01-01

    textabstractIn order to extract knowledge from databases, data mining algorithms heavily query the databases. Inefficient processing of these queries will inevitably have its impact on the performance of these algorithms, making them less valuable. In this paper, we describe an optimization

  13. A Study of Pattern Prediction in the Monitoring Data of Earthen Ruins with the Internet of Things.

    Science.gov (United States)

    Xiao, Yun; Wang, Xin; Eshragh, Faezeh; Wang, Xuanhong; Chen, Xiaojiang; Fang, Dingyi

    2017-05-11

    An understanding of the changes of the rammed earth temperature of earthen ruins is important for protection of such ruins. To predict the rammed earth temperature pattern using the air temperature pattern of the monitoring data of earthen ruins, a pattern prediction method based on interesting pattern mining and correlation, called PPER, is proposed in this paper. PPER first finds the interesting patterns in the air temperature sequence and the rammed earth temperature sequence. To reduce the processing time, two pruning rules and a new data structure based on an R-tree are also proposed. Correlation rules between the air temperature patterns and the rammed earth temperature patterns are then mined. The correlation rules are merged into predictive rules for the rammed earth temperature pattern. Experiments were conducted to show the accuracy of the presented method and the power of the pruning rules. Moreover, the Ming Dynasty Great Wall dataset was used to examine the algorithm, and six predictive rules from the air temperature to rammed earth temperature based on the interesting patterns were obtained, with the average hit rate reaching 89.8%. The PPER and predictive rules will be useful for rammed earth temperature prediction in protection of earthen ruins.

  14. A Stock Trading Recommender System Based on Temporal Association Rule Mining

    Directory of Open Access Journals (Sweden)

    Binoy B. Nair

    2015-04-01

    Full Text Available Recommender systems capable of discovering patterns in stock price movements and generating stock recommendations based on the patterns thus discovered can significantly supplement the decision-making process of a stock trader. Such recommender systems are of great significance to a layperson who wishes to profit by stock trading even while not possessing the skill or expertise of a seasoned trader. A genetic algorithm optimized Symbolic Aggregate approXimation (SAX–Apriori based stock trading recommender system, which can mine temporal association rules from the stock price data set to generate stock trading recommendations, is presented in this article. The proposed system is validated on 12 different data sets. The results indicate that the proposed system significantly outperforms the passive buy-and-hold strategy, offering scope for a layperson to successfully invest in capital markets.

  15. Anchor-Free Localization Method for Mobile Targets in Coal Mine Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Xiao Xu

    2009-04-01

    Full Text Available Severe natural conditions and complex terrain make it difficult to apply precise localization in underground mines. In this paper, an anchor-free localization method for mobile targets is proposed based on non-metric multi-dimensional scaling (Multi-dimensional Scaling: MDS and rank sequence. Firstly, a coal mine wireless sensor network is constructed in underground mines based on the ZigBee technology. Then a non-metric MDS algorithm is imported to estimate the reference nodes’ location. Finally, an improved sequence-based localization algorithm is presented to complete precise localization for mobile targets. The proposed method is tested through simulations with 100 nodes, outdoor experiments with 15 ZigBee physical nodes, and the experiments in the mine gas explosion laboratory with 12 ZigBee nodes. Experimental results show that our method has better localization accuracy and is more robust in underground mines.

  16. Anchor-free localization method for mobile targets in coal mine wireless sensor networks.

    Science.gov (United States)

    Pei, Zhongmin; Deng, Zhidong; Xu, Shuo; Xu, Xiao

    2009-01-01

    Severe natural conditions and complex terrain make it difficult to apply precise localization in underground mines. In this paper, an anchor-free localization method for mobile targets is proposed based on non-metric multi-dimensional scaling (Multi-dimensional Scaling: MDS) and rank sequence. Firstly, a coal mine wireless sensor network is constructed in underground mines based on the ZigBee technology. Then a non-metric MDS algorithm is imported to estimate the reference nodes' location. Finally, an improved sequence-based localization algorithm is presented to complete precise localization for mobile targets. The proposed method is tested through simulations with 100 nodes, outdoor experiments with 15 ZigBee physical nodes, and the experiments in the mine gas explosion laboratory with 12 ZigBee nodes. Experimental results show that our method has better localization accuracy and is more robust in underground mines.

  17. Genetic algorithms and artificial neural networks for loading pattern optimisation of advanced gas-cooled reactors

    Energy Technology Data Exchange (ETDEWEB)

    Ziver, A.K. E-mail: a.k.ziver@imperial.ac.uk; Pain, C.C; Carter, J.N.; Oliveira, C.R.E. de; Goddard, A.J.H.; Overton, R.S

    2004-03-01

    A non-generational genetic algorithm (GA) has been developed for fuel management optimisation of Advanced Gas-Cooled Reactors, which are operated by British Energy and produce around 20% of the UK's electricity requirements. An evolutionary search is coded using the genetic operators; namely selection by tournament, two-point crossover, mutation and random assessment of population for multi-cycle loading pattern (LP) optimisation. A detailed description of the chromosomes in the genetic algorithm coded is presented. Artificial Neural Networks (ANNs) have been constructed and trained to accelerate the GA-based search during the optimisation process. The whole package, called GAOPT, is linked to the reactor analysis code PANTHER, which performs fresh fuel loading, burn-up and power shaping calculations for each reactor cycle by imposing station-specific safety and operational constraints. GAOPT has been verified by performing a number of tests, which are applied to the Hinkley Point B and Hartlepool reactors. The test results giving loading pattern (LP) scenarios obtained from single and multi-cycle optimisation calculations applied to realistic reactor states of the Hartlepool and Hinkley Point B reactors are discussed. The results have shown that the GA/ANN algorithms developed can help the fuel engineer to optimise loading patterns in an efficient and more profitable way than currently available for multi-cycle refuelling of AGRs. Research leading to parallel GAs applied to LP optimisation are outlined, which can be adapted to present day LWR fuel management problems.

  18. Bit-Table Based Biclustering and Frequent Closed Itemset Mining in High-Dimensional Binary Data

    Directory of Open Access Journals (Sweden)

    András Király

    2014-01-01

    Full Text Available During the last decade various algorithms have been developed and proposed for discovering overlapping clusters in high-dimensional data. The two most prominent application fields in this research, proposed independently, are frequent itemset mining (developed for market basket data and biclustering (applied to gene expression data analysis. The common limitation of both methodologies is the limited applicability for very large binary data sets. In this paper we propose a novel and efficient method to find both frequent closed itemsets and biclusters in high-dimensional binary data. The method is based on simple but very powerful matrix and vector multiplication approaches that ensure that all patterns can be discovered in a fast manner. The proposed algorithm has been implemented in the commonly used MATLAB environment and freely available for researchers.

  19. Applying Supervised Opinion Mining Techniques on Online User Reviews

    Directory of Open Access Journals (Sweden)

    Ion SMEUREANU

    2012-01-01

    Full Text Available In recent years, the spectacular development of web technologies, lead to an enormous quantity of user generated information in online systems. This large amount of information on web platforms make them viable for use as data sources, in applications based on opinion mining and sentiment analysis. The paper proposes an algorithm for detecting sentiments on movie user reviews, based on naive Bayes classifier. We make an analysis of the opinion mining domain, techniques used in sentiment analysis and its applicability. We implemented the proposed algorithm and we tested its performance, and suggested directions of development.

  20. Data Reduction Algorithm Using Nonnegative Matrix Factorization with Nonlinear Constraints

    Science.gov (United States)

    Sembiring, Pasukat

    2017-12-01

    Processing ofdata with very large dimensions has been a hot topic in recent decades. Various techniques have been proposed in order to execute the desired information or structure. Non- Negative Matrix Factorization (NMF) based on non-negatives data has become one of the popular methods for shrinking dimensions. The main strength of this method is non-negative object, the object model by a combination of some basic non-negative parts, so as to provide a physical interpretation of the object construction. The NMF is a dimension reduction method thathasbeen used widely for numerous applications including computer vision,text mining, pattern recognitions,and bioinformatics. Mathematical formulation for NMF did not appear as a convex optimization problem and various types of algorithms have been proposed to solve the problem. The Framework of Alternative Nonnegative Least Square(ANLS) are the coordinates of the block formulation approaches that have been proven reliable theoretically and empirically efficient. This paper proposes a new algorithm to solve NMF problem based on the framework of ANLS.This algorithm inherits the convergenceproperty of the ANLS framework to nonlinear constraints NMF formulations.

  1. Analysing Customer Opinions with Text Mining Algorithms

    Science.gov (United States)

    Consoli, Domenico

    2009-08-01

    Knowing what the customer thinks of a particular product/service helps top management to introduce improvements in processes and products, thus differentiating the company from their competitors and gain competitive advantages. The customers, with their preferences, determine the success or failure of a company. In order to know opinions of the customers we can use technologies available from the web 2.0 (blog, wiki, forums, chat, social networking, social commerce). From these web sites, useful information must be extracted, for strategic purposes, using techniques of sentiment analysis or opinion mining.

  2. RANWAR: rank-based weighted association rule mining from gene expression and methylation data.

    Science.gov (United States)

    Mallik, Saurav; Mukhopadhyay, Anirban; Maulik, Ujjwal

    2015-01-01

    Ranking of association rules is currently an interesting topic in data mining and bioinformatics. The huge number of evolved rules of items (or, genes) by association rule mining (ARM) algorithms makes confusion to the decision maker. In this article, we propose a weighted rule-mining technique (say, RANWAR or rank-based weighted association rule-mining) to rank the rules using two novel rule-interestingness measures, viz., rank-based weighted condensed support (wcs) and weighted condensed confidence (wcc) measures to bypass the problem. These measures are basically depended on the rank of items (genes). Using the rank, we assign weight to each item. RANWAR generates much less number of frequent itemsets than the state-of-the-art association rule mining algorithms. Thus, it saves time of execution of the algorithm. We run RANWAR on gene expression and methylation datasets. The genes of the top rules are biologically validated by Gene Ontologies (GOs) and KEGG pathway analyses. Many top ranked rules extracted from RANWAR that hold poor ranks in traditional Apriori, are highly biologically significant to the related diseases. Finally, the top rules evolved from RANWAR, that are not in Apriori, are reported.

  3. Development of a BWR loading pattern design system based on modified genetic algorithms and knowledge

    International Nuclear Information System (INIS)

    Martin-del-Campo, Cecilia; Francois, Juan Luis; Avendano, Linda; Gonzalez, Mario

    2004-01-01

    An optimization system based on Genetic Algorithms (GAs), in combination with expert knowledge coded in heuristics rules, was developed for the design of optimized boiling water reactor (BWR) fuel loading patterns. The system was coded in a computer program named Loading Pattern Optimization System based on Genetic Algorithms, in which the optimization code uses GAs to select candidate solutions, and the core simulator code CM-PRESTO to evaluate them. A multi-objective function was built to maximize the cycle energy length while satisfying power and reactivity constraints used as BWR design parameters. Heuristic rules were applied to satisfy standard fuel management recommendations as the Control Cell Core and Low Leakage loading strategies, and octant symmetry. To test the system performance, an optimized cycle was designed and compared against an actual operating cycle of Laguna Verde Nuclear Power Plant, Unit I

  4. Collaborative mining and interpretation of large-scale data for biomedical research insights.

    Directory of Open Access Journals (Sweden)

    Georgia Tsiliki

    Full Text Available Biomedical research becomes increasingly interdisciplinary and collaborative in nature. Researchers need to efficiently and effectively collaborate and make decisions by meaningfully assembling, mining and analyzing available large-scale volumes of complex multi-faceted data residing in different sources. In line with related research directives revealing that, in spite of the recent advances in data mining and computational analysis, humans can easily detect patterns which computer algorithms may have difficulty in finding, this paper reports on the practical use of an innovative web-based collaboration support platform in a biomedical research context. Arguing that dealing with data-intensive and cognitively complex settings is not a technical problem alone, the proposed platform adopts a hybrid approach that builds on the synergy between machine and human intelligence to facilitate the underlying sense-making and decision making processes. User experience shows that the platform enables more informed and quicker decisions, by displaying the aggregated information according to their needs, while also exploiting the associated human intelligence.

  5. Mining High-Dimensional Data

    Science.gov (United States)

    Wang, Wei; Yang, Jiong

    With the rapid growth of computational biology and e-commerce applications, high-dimensional data becomes very common. Thus, mining high-dimensional data is an urgent problem of great practical importance. However, there are some unique challenges for mining data of high dimensions, including (1) the curse of dimensionality and more crucial (2) the meaningfulness of the similarity measure in the high dimension space. In this chapter, we present several state-of-art techniques for analyzing high-dimensional data, e.g., frequent pattern mining, clustering, and classification. We will discuss how these methods deal with the challenges of high dimensionality.

  6. Marine data users clustering using data mining technique

    Directory of Open Access Journals (Sweden)

    Farnaz Ghiasi

    2015-09-01

    Full Text Available The objective of this research is marine data users clustering using data mining technique. To achieve this objective, marine organizations will enable to know their data and users requirements. In this research, CRISP-DM standard model was used to implement the data mining technique. The required data was extracted from 500 marine data users profile database of Iranian National Institute for Oceanography and Atmospheric Sciences (INIOAS from 1386 to 1393. The TwoStep algorithm was used for clustering. In this research, patterns was discovered between marine data users such as student, organization and scientist and their data request (Data source, Data type, Data set, Parameter and Geographic area using clustering for the first time. The most important clusters are: Student with International data source, Chemistry data type, “World Ocean Database” dataset, Persian Gulf geographic area and Organization with Nitrate parameter. Senior managers of the marine organizations will enable to make correct decisions concerning their existing data. They will direct to planning for better data collection in the future. Also data users will guide with respect to their requests. Finally, the valuable suggestions were offered to improve the performance of marine organizations.

  7. Clustering and Candidate Motif Detection in Exosomal miRNAs by Application of Machine Learning Algorithms.

    Science.gov (United States)

    Gaur, Pallavi; Chaturvedi, Anoop

    2017-07-22

    The clustering pattern and motifs give immense information about any biological data. An application of machine learning algorithms for clustering and candidate motif detection in miRNAs derived from exosomes is depicted in this paper. Recent progress in the field of exosome research and more particularly regarding exosomal miRNAs has led much bioinformatic-based research to come into existence. The information on clustering pattern and candidate motifs in miRNAs of exosomal origin would help in analyzing existing, as well as newly discovered miRNAs within exosomes. Along with obtaining clustering pattern and candidate motifs in exosomal miRNAs, this work also elaborates the usefulness of the machine learning algorithms that can be efficiently used and executed on various programming languages/platforms. Data were clustered and sequence candidate motifs were detected successfully. The results were compared and validated with some available web tools such as 'BLASTN' and 'MEME suite'. The machine learning algorithms for aforementioned objectives were applied successfully. This work elaborated utility of machine learning algorithms and language platforms to achieve the tasks of clustering and candidate motif detection in exosomal miRNAs. With the information on mentioned objectives, deeper insight would be gained for analyses of newly discovered miRNAs in exosomes which are considered to be circulating biomarkers. In addition, the execution of machine learning algorithms on various language platforms gives more flexibility to users to try multiple iterations according to their requirements. This approach can be applied to other biological data-mining tasks as well.

  8. Improvements in seismic event locations in a deep western U.S. coal mine using tomographic velocity models and an evolutionary search algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Adam Lurka; Peter Swanson [Central Mining Institute, Katowice (Poland)

    2009-09-15

    Methods of improving seismic event locations were investigated as part of a research study aimed at reducing ground control safety hazards. Seismic event waveforms collected with a 23-station three-dimensional sensor array during longwall coal mining provide the data set used in the analyses. A spatially variable seismic velocity model is constructed using seismic event sources in a passive tomographic method. The resulting three-dimensional velocity model is used to relocate seismic event positions. An evolutionary optimization algorithm is implemented and used in both the velocity model development and in seeking improved event location solutions. Results obtained using the different velocity models are compared. The combination of the tomographic velocity model development and evolutionary search algorithm provides improvement to the event locations. 13 refs., 5 figs., 4 tabs.

  9. An unsupervised text mining method for relation extraction from biomedical literature.

    Directory of Open Access Journals (Sweden)

    Changqin Quan

    Full Text Available The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1 Protein-protein interactions extraction, and (2 Gene-suicide association extraction. The evaluation of task (1 on the benchmark dataset (AImed corpus showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene-suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method.

  10. A novel tree-based algorithm to discover seismic patterns in earthquake catalogs

    Science.gov (United States)

    Florido, E.; Asencio-Cortés, G.; Aznarte, J. L.; Rubio-Escudero, C.; Martínez-Álvarez, F.

    2018-06-01

    A novel methodology is introduced in this research study to detect seismic precursors. Based on an existing approach, the new methodology searches for patterns in the historical data. Such patterns may contain statistical or soil dynamics information. It improves the original version in several aspects. First, new seismicity indicators have been used to characterize earthquakes. Second, a machine learning clustering algorithm has been applied in a very flexible way, thus allowing the discovery of new data groupings. Third, a novel search strategy is proposed in order to obtain non-overlapped patterns. And, fourth, arbitrary lengths of patterns are searched for, thus discovering long and short-term behaviors that may influence in the occurrence of medium-large earthquakes. The methodology has been applied to seven different datasets, from three different regions, namely the Iberian Peninsula, Chile and Japan. Reported results show a remarkable improvement with respect to the former version, in terms of all evaluated quality measures. In particular, the number of false positives has decreased and the positive predictive values increased, both of them in a very remarkable manner.

  11. Biomedical text mining and its applications in cancer research.

    Science.gov (United States)

    Zhu, Fei; Patumcharoenpol, Preecha; Zhang, Cheng; Yang, Yang; Chan, Jonathan; Meechai, Asawin; Vongsangnak, Wanwipa; Shen, Bairong

    2013-04-01

    Cancer is a malignant disease that has caused millions of human deaths. Its study has a long history of well over 100years. There have been an enormous number of publications on cancer research. This integrated but unstructured biomedical text is of great value for cancer diagnostics, treatment, and prevention. The immense body and rapid growth of biomedical text on cancer has led to the appearance of a large number of text mining techniques aimed at extracting novel knowledge from scientific text. Biomedical text mining on cancer research is computationally automatic and high-throughput in nature. However, it is error-prone due to the complexity of natural language processing. In this review, we introduce the basic concepts underlying text mining and examine some frequently used algorithms, tools, and data sets, as well as assessing how much these algorithms have been utilized. We then discuss the current state-of-the-art text mining applications in cancer research and we also provide some resources for cancer text mining. With the development of systems biology, researchers tend to understand complex biomedical systems from a systems biology viewpoint. Thus, the full utilization of text mining to facilitate cancer systems biology research is fast becoming a major concern. To address this issue, we describe the general workflow of text mining in cancer systems biology and each phase of the workflow. We hope that this review can (i) provide a useful overview of the current work of this field; (ii) help researchers to choose text mining tools and datasets; and (iii) highlight how to apply text mining to assist cancer systems biology research. Copyright © 2012 Elsevier Inc. All rights reserved.

  12. Service mining framework and application

    CERN Document Server

    Chang, Wei-Lun

    2014-01-01

    The shifting focus of service from the 1980s to 2000s has proved that IT not only lowers the cost of service but creates avenues to enhance and increase revenue through service. The new type of service, e-service, is mobile, flexible, interactive, and interchangeable. While service science provides an avenue for future service researches, the specific research areas from the IT perspective still need to be elaborated. This book introduces a novel concept-service mining-to address several research areas from technology, model, management, and application perspectives. Service mining is defined as "a systematical process including service discovery, service experience, service recovery, and service retention to discover unique patterns and exceptional values within the existing services." The goal of service mining is similar to data mining, text mining, or web mining, and aims to "detect something new" from the service pool. The major difference is the feature of service is quite distinct from the mining targe...

  13. Mathematics of an automatic control system for ventilation of gassy coal mines

    Energy Technology Data Exchange (ETDEWEB)

    Puchkov, L.A.; Bakhvalov, L.A.; Kravchenko, A.G.

    1987-09-01

    Describes and presents a circuit diagram of an automatic control system introduced to control ventilation in the Kommunist mine belonging to the Oktyabr'ugol' coal mining association. The system comprises: sensors to register the parameters of the mine atmosphere (e.g. methane and air flow rate); communications channels and remote control devices to convert and transmit the data; a CM-4 computer with a high-speed processor, an 128-256 kByte operating memory, external memory devices, polydiaphragm air flow controllers, devices for controlling the electric drive of the main ventilation system, devices for collecting, processing and displaying the data. This system uses two groups of algorithms: algorithms for a data subsystem responsible for centralized control of the mine atmosphere parameters and a control subsystem which forms and implements the necessary control commands. The main software is the DISMAIN program. Introducing this system increased the productivity of the mine by 2%, reduced energy consumption by 5-7% and increased safety levels. 2 refs

  14. Landsat-Based Long-Term Monitoring of Total Suspended Matter Concentration Pattern Change in the Wet Season for Dongting Lake, China

    Directory of Open Access Journals (Sweden)

    Zhubin Zheng

    2015-10-01

    Full Text Available Assessing the impacts of environmental change and anthropogenic activities on the historical and current total suspended matter (TSM pattern in Dongting Lake, China, is a large challenge. We addressed this challenge by using more than three decades of Landsat data. Based on in situ measurements, we developed an algorithm based on the near-infrared (NIR band to estimate TSM in Dongting Lake. The algorithm was applied to Landsat images to derive TSM distribution maps from 1978 to 2013 in the wet season, revealing significant inter-annual and spatial variability. The relationship of TSM to water level, precipitation, and wind speed was analyzed, and we found that: (1 sand mining areas usually coincide with regions that have high TSM levels in Dongting Lake; (2 water level and seven-day precipitation were both important to TSM variation, but no significant relationship was found between TSM and wind speed or other meteorological data; (3 the increased level of sand mining in response to rapid economic growth has deeply influenced the TSM pattern since 2000 due to the resuspension of sediment; and (4 TSM variation might be associated with policy changes regarding the management of sand mining; it might also be affected by lower water levels caused by the impoundment of the Three Gorges Dam since 2000.

  15. KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

    Directory of Open Access Journals (Sweden)

    Schomburg Dietmar

    2010-07-01

    Full Text Available Abstract Background The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed. Description Here we present a text mining algorithm for the extraction of kinetic information such as KM, Ki, kcat etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (KM, Ki, kcat, kcat/KM, Vmax, IC50, S0.5, Kd, Ka, t1/2, pI, nH, specific activity, Vmax/KM from about 17 million PubMed abstracts and combine them with other data in the abstract. A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched. The results were stored in a database and are available as "KID the KInetic Database" via the internet. Conclusions The presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases. The database is available at http://kid.tu-bs.de. The source code of the algorithm is provided under the GNU General Public

  16. Data Mining at NASA: From Theory to Applications

    Science.gov (United States)

    Srivastava, Ashok N.

    2009-01-01

    This slide presentation demonstrates the data mining/machine learning capabilities of NASA Ames and Intelligent Data Understanding (IDU) group. This will encompass the work done recently in the group by various group members. The IDU group develops novel algorithms to detect, classify, and predict events in large data streams for scientific and engineering systems. This presentation for Knowledge Discovery and Data Mining 2009 is to demonstrate the data mining/machine learning capabilities of NASA Ames and IDU group. This will encompass the work done re cently in the group by various group members.

  17. Granular-relational data mining how to mine relational data in the paradigm of granular computing ?

    CERN Document Server

    Hońko, Piotr

    2017-01-01

    This book provides two general granular computing approaches to mining relational data, the first of which uses abstract descriptions of relational objects to build their granular representation, while the second extends existing granular data mining solutions to a relational case. Both approaches make it possible to perform and improve popular data mining tasks such as classification, clustering, and association discovery. How can different relational data mining tasks best be unified? How can the construction process of relational patterns be simplified? How can richer knowledge from relational data be discovered? All these questions can be answered in the same way: by mining relational data in the paradigm of granular computing! This book will allow readers with previous experience in the field of relational data mining to discover the many benefits of its granular perspective. In turn, those readers familiar with the paradigm of granular computing will find valuable insights on its application to mining r...

  18. Optimizing Fukushima Emissions Through Pattern Matching and Genetic Algorithms

    Science.gov (United States)

    Lucas, D. D.; Simpson, M. D.; Philip, C. S.; Baskett, R.

    2017-12-01

    Hazardous conditions during the Fukushima Daiichi nuclear power plant (NPP) accident hindered direct observations of the emissions of radioactive materials into the atmosphere. A wide range of emissions are estimated from bottom-up studies using reactor inventories and top-down approaches based on inverse modeling. We present a new inverse modeling estimate of cesium-137 emitted from the Fukushima NPP. Our estimate considers weather uncertainty through a large ensemble of Weather Research and Forecasting model simulations and uses the FLEXPART atmospheric dispersion model to transport and deposit cesium. The simulations are constrained by observations of the spatial distribution of cumulative cesium deposited on the surface of Japan through April 2, 2012. Multiple spatial metrics are used to quantify differences between observed and simulated deposition patterns. In order to match the observed pattern, we use a multi-objective genetic algorithm to optimize the time-varying emissions. We find that large differences with published bottom-up estimates are required to explain the observations. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  19. DATA MINING TECHNIQUES FOR EDUCATIONAL DATA: A REVIEW

    OpenAIRE

    Pragati Sharma; Dr. Sanjiv Sharma

    2018-01-01

    Recently, data mining is gaining more popularity among researcher. Data mining provides various techniques and methods for analysing data produced by various applications of different domain. Similarly, Educational mining is providing a way for analyzing educational data set. Educational mining concerns with developing methods for discovering knowledge from data that come from educational field and it helps to extract the hidden patterns and to discover new knowledge from large educational da...

  20. Prediction and Analysis of students Behavior using BARC Algorithm

    OpenAIRE

    M.Sindhuja; Dr.S.Rajalakshmi; S.M.Nandagopal

    2013-01-01

    Educational Data mining is a recent trends where data mining methods are experimented for the improvement of student performance in academics. The work describes the mining of higher education students’ related attributes such as behavior, attitude and relationship. The data were collected from a higher education institution in terms of the mentioned attributes. The proposed work explored Behavior Attitude Relationship Clustering (BARC) Algorithm, which showed the improvement in students’ per...

  1. A text-based data mining and toxicity prediction modeling system for a clinical decision support in radiation oncology: A preliminary study

    Science.gov (United States)

    Kim, Kwang Hyeon; Lee, Suk; Shim, Jang Bo; Chang, Kyung Hwan; Yang, Dae Sik; Yoon, Won Sup; Park, Young Je; Kim, Chul Yong; Cao, Yuan Jie

    2017-08-01

    The aim of this study is an integrated research for text-based data mining and toxicity prediction modeling system for clinical decision support system based on big data in radiation oncology as a preliminary research. The structured and unstructured data were prepared by treatment plans and the unstructured data were extracted by dose-volume data image pattern recognition of prostate cancer for research articles crawling through the internet. We modeled an artificial neural network to build a predictor model system for toxicity prediction of organs at risk. We used a text-based data mining approach to build the artificial neural network model for bladder and rectum complication predictions. The pattern recognition method was used to mine the unstructured toxicity data for dose-volume at the detection accuracy of 97.9%. The confusion matrix and training model of the neural network were achieved with 50 modeled plans (n = 50) for validation. The toxicity level was analyzed and the risk factors for 25% bladder, 50% bladder, 20% rectum, and 50% rectum were calculated by the artificial neural network algorithm. As a result, 32 plans could cause complication but 18 plans were designed as non-complication among 50 modeled plans. We integrated data mining and a toxicity modeling method for toxicity prediction using prostate cancer cases. It is shown that a preprocessing analysis using text-based data mining and prediction modeling can be expanded to personalized patient treatment decision support based on big data.

  2. Data preprocessing in data mining

    CERN Document Server

    García, Salvador; Herrera, Francisco

    2015-01-01

    Data Preprocessing for Data Mining addresses one of the most important issues within the well-known Knowledge Discovery from Data process. Data directly taken from the source will likely have inconsistencies, errors or most importantly, it is not ready to be considered for a data mining process. Furthermore, the increasing amount of data in recent science, industry and business applications, calls to the requirement of more complex tools to analyze it. Thanks to data preprocessing, it is possible to convert the impossible into possible, adapting the data to fulfill the input demands of each data mining algorithm. Data preprocessing includes the data reduction techniques, which aim at reducing the complexity of the data, detecting or removing irrelevant and noisy elements from the data. This book is intended to review the tasks that fill the gap between the data acquisition from the source and the data mining process. A comprehensive look from a practical point of view, including basic concepts and surveying t...

  3. A Tutorial on Nonlinear Time-Series Data Mining in Engineering Asset Health and Reliability Prediction: Concepts, Models, and Algorithms

    Directory of Open Access Journals (Sweden)

    Ming Dong

    2010-01-01

    Full Text Available The primary objective of engineering asset management is to optimize assets service delivery potential and to minimize the related risks and costs over their entire life through the development and application of asset health and usage management in which the health and reliability prediction plays an important role. In real-life situations where an engineering asset operates under dynamic operational and environmental conditions, the lifetime of an engineering asset is generally described as monitored nonlinear time-series data and subject to high levels of uncertainty and unpredictability. It has been proved that application of data mining techniques is very useful for extracting relevant features which can be used as parameters for assets diagnosis and prognosis. In this paper, a tutorial on nonlinear time-series data mining in engineering asset health and reliability prediction is given. Besides that an overview on health and reliability prediction techniques for engineering assets is covered, this tutorial will focus on concepts, models, algorithms, and applications of hidden Markov models (HMMs and hidden semi-Markov models (HSMMs in engineering asset health prognosis, which are representatives of recent engineering asset health prediction techniques.

  4. Mining Hesitation Information by Vague Association Rules

    Science.gov (United States)

    Lu, An; Ng, Wilfred

    In many online shopping applications, such as Amazon and eBay, traditional Association Rule (AR) mining has limitations as it only deals with the items that are sold but ignores the items that are almost sold (for example, those items that are put into the basket but not checked out). We say that those almost sold items carry hesitation information, since customers are hesitating to buy them. The hesitation information of items is valuable knowledge for the design of good selling strategies. However, there is no conceptual model that is able to capture different statuses of hesitation information. Herein, we apply and extend vague set theory in the context of AR mining. We define the concepts of attractiveness and hesitation of an item, which represent the overall information of a customer's intent on an item. Based on the two concepts, we propose the notion of Vague Association Rules (VARs). We devise an efficient algorithm to mine the VARs. Our experiments show that our algorithm is efficient and the VARs capture more specific and richer information than do the traditional ARs.

  5. Towards the generic framework for utility considerations in data mining research

    NARCIS (Netherlands)

    Puuronen, S.; Pechenizkiy, M.; Soares, C.; Ghani, R.

    2010-01-01

    Rigor data mining (DM) research has successfully developed advanced data mining techniques and algorithms, and many organizations have great expectations to take more benefit of their vast data warehouses in decision making. Even when there are some success stories the current status in practice is

  6. Mapping Changes in a Recovering Mine Site with Hyperspectral Airborne HyMap Imagery (Sotiel, SW Spain

    Directory of Open Access Journals (Sweden)

    Jorge Buzzi

    2014-04-01

    Full Text Available Hyperspectral high spatial resolution HyMap data are used to map mine waste from massive sulfide ore deposits, mostly abandoned, on the Iberian Pyrite Belt (southwest Spain. Mine dams, mill tailings and mine dumps in variable states of pyrite oxidation are recognizable. The interpretation of hyperspectral remote sensing requires specific algorithms able to manage high dimensional data compared to multispectral data. The routine of image processing methods used to extract information from hyperspectral data to map geological features is explained, as well as the sequence of algorithms used to produce maps of the mine sites. The mineralogical identification capability of algorithms to produce maps based on archive spectral libraries is discussed. Trends of mineral growth differ spectrally over time according to the geological setting and the recovery state of the mine site. Subtle mineralogical changes are enhanced using the spectral response as indicators of pyrite oxidation intensity of the mine waste piles and pyrite mud tailings. The changes in the surface of the mill tailings deserve a detailed description, as the surfaces are inaccessible to direct observation. Such mineralogical changes respond faithfully to industrial activities or the influence of climate when undisturbed by human influence.

  7. Classification algorithm of Web document in ionization radiation

    International Nuclear Information System (INIS)

    Geng Zengmin; Liu Wanchun

    2005-01-01

    Resources in the Internet is numerous. It is one of research directions of Web mining (WM) how to mine the resource of some calling or trade more efficiently. The paper studies the classification of Web document in ionization radiation (IR) based on the algorithm of Bayes, Rocchio, Widrow-Hoff, and analyses the result of trial effect. (authors)

  8. Directory of Open Access Journals (Sweden)

    Bhawna Mallick

    2013-04-01

    Full Text Available Sequential pattern mining is a vital data mining task to discover the frequently occurring patterns in sequence databases. As databases develop, the problem of maintaining sequential patterns over an extensively long period of time turn into essential, since a large number of new records may be added to a database. To reflect the current state of the database where previous sequential patterns would become irrelevant and new sequential patterns might appear, there is a need for efficient algorithms to update, maintain and manage the information discovered. Several efficient algorithms for maintaining sequential patterns have been developed. Here, we have presented an efficient algorithm to handle the maintenance problem of CFM-sequential patterns (Compact, Frequent, Monetary-constraints based sequential patterns. In order to efficiently capture the dynamic nature of data addition and deletion into the mining problem, initially, we construct the updated CFM-tree using the CFM patterns obtained from the static database. Then, the database gets updated from the distributed sources that have data which may be static, inserted, or deleted. Whenever the database is updated from the multiple sources, CFM tree is also updated by including the updated sequence. Then, the updated CFM-tree is used to mine the progressive CFM-patterns using the proposed tree pattern mining algorithm. Finally, the experimentation is carried out using the synthetic and real life distributed databases that are given to the progressive CFM-miner. The experimental results and analysis provides better results in terms of the generated number of sequential patterns, execution time and the memory usage over the existing IncSpan algorithm.

  9. Study of acid mine drainage management with evaluating climate and rainfall in East Pit 3 West Banko coal mine

    Science.gov (United States)

    Rochyani, Neny

    2017-11-01

    Acid mine drainage is a major problem for the mining environment. The main factor that formed acid mine drainage is the volume of rainfall. Therefore, it is important to know clearly the main climate pattern of rainfall and season on the management of acid mine drainage. This study focuses on the effects of rainfall on acid mine water management. Based on daily rainfall data, monthly and seasonal patterns by using Gumbel approach is known the amount of rainfall that occurred in East Pit 3 West Banko area. The data also obtained the highest maximum daily rainfall on 165 mm/day and the lowest at 76.4 mm/day, where it is known that the rainfall conditions during the period 2007 - 2016 is from November to April so the use of lime is also slightly, While the low rainfall is from May to October and the use of lime will be more and more. Based on calculation of lime requirement for each return period, it can be seen the total of lime and financial requirement for treatment of each return period.

  10. Application of affinity propagation algorithm based on manifold distance for transformer PD pattern recognition

    Science.gov (United States)

    Wei, B. G.; Huo, K. X.; Yao, Z. F.; Lou, J.; Li, X. Y.

    2018-03-01

    It is one of the difficult problems encountered in the research of condition maintenance technology of transformers to recognize partial discharge (PD) pattern. According to the main physical characteristics of PD, three models of oil-paper insulation defects were set up in laboratory to study the PD of transformers, and phase resolved partial discharge (PRPD) was constructed. By using least square method, the grey-scale images of PRPD were constructed and features of each grey-scale image were 28 box dimensions and 28 information dimensions. Affinity propagation algorithm based on manifold distance (AP-MD) for transformers PD pattern recognition was established, and the data of box dimension and information dimension were clustered based on AP-MD. Study shows that clustering result of AP-MD is better than the results of affinity propagation (AP), k-means and fuzzy c-means algorithm (FCM). By choosing different k values of k-nearest neighbor, we find clustering accuracy of AP-MD falls when k value is larger or smaller, and the optimal k value depends on sample size.

  11. Application-Specific Graph Sampling for Frequent Subgraph Mining and Community Detection

    Energy Technology Data Exchange (ETDEWEB)

    Purohit, Sumit; Choudhury, Sutanay; Holder, Lawrence B.

    2017-12-11

    Graph mining is an important data analysis methodology, but struggles as the input graph size increases. The scalability and usability challenges posed by such large graphs make it imperative to sample the input graph and reduce its size. The critical challenge in sampling is to identify the appropriate algorithm to insure the resulting analysis does not suffer heavily from the data reduction. Predicting the expected performance degradation for a given graph and sampling algorithm is also useful. In this paper, we present different sampling approaches for graph mining applications such as Frequent Subgrpah Mining (FSM), and Community Detection (CD). We explore graph metrics such as PageRank, Triangles, and Diversity to sample a graph and conclude that for heterogeneous graphs Triangles and Diversity perform better than degree based metrics. We also present two new sampling variations for targeted graph mining applications. We present empirical results to show that knowledge of the target application, along with input graph properties can be used to select the best sampling algorithm. We also conclude that performance degradation is an abrupt, rather than gradual phenomena, as the sample size decreases. We present the empirical results to show that the performance degradation follows a logistic function.

  12. In situ solution mining technique

    International Nuclear Information System (INIS)

    Learmont, R.P.

    1978-01-01

    A method of in situ solution mining is disclosed in which a primary leaching process employing an array of 5-spot leaching patterns of production and injection wells is converted to a different pattern by converting to injection wells all the production wells in alternate rows

  13. Community Clustering Algorithm in Complex Networks Based on Microcommunity Fusion

    Directory of Open Access Journals (Sweden)

    Jin Qi

    2015-01-01

    Full Text Available With the further research on physical meaning and digital features of the community structure in complex networks in recent years, the improvement of effectiveness and efficiency of the community mining algorithms in complex networks has become an important subject in this area. This paper puts forward a concept of the microcommunity and gets final mining results of communities through fusing different microcommunities. This paper starts with the basic definition of the network community and applies Expansion to the microcommunity clustering which provides prerequisites for the microcommunity fusion. The proposed algorithm is more efficient and has higher solution quality compared with other similar algorithms through the analysis of test results based on network data set.

  14. Optimization of Boiling Water Reactor Loading Pattern Using Two-Stage Genetic Algorithm

    International Nuclear Information System (INIS)

    Kobayashi, Yoko; Aiyoshi, Eitaro

    2002-01-01

    A new two-stage optimization method based on genetic algorithms (GAs) using an if-then heuristic rule was developed to generate optimized boiling water reactor (BWR) loading patterns (LPs). In the first stage, the LP is optimized using an improved GA operator. In the second stage, an exposure-dependent control rod pattern (CRP) is sought using GA with an if-then heuristic rule. The procedure of the improved GA is based on deterministic operators that consist of crossover, mutation, and selection. The handling of the encoding technique and constraint conditions by that GA reflects the peculiar characteristics of the BWR. In addition, strategies such as elitism and self-reproduction are effectively used in order to improve the search speed. The LP evaluations were performed with a three-dimensional diffusion code that coupled neutronic and thermal-hydraulic models. Strong axial heterogeneities and constraints dependent on three dimensions have always necessitated the use of three-dimensional core simulators for BWRs, so that optimization of computational efficiency is required. The proposed algorithm is demonstrated by successfully generating LPs for an actual BWR plant in two phases. One phase is only LP optimization applying the Haling technique. The other phase is an LP optimization that considers the CRP during reactor operation. In test calculations, candidates that shuffled fresh and burned fuel assemblies within a reasonable computation time were obtained

  15. Data mining algorithms for land cover change detection: a review

    Indian Academy of Sciences (India)

    Sangram Panigrahi

    2017-11-24

    Nov 24, 2017 ... values, poor quality measurement, high resolution and high dimensional data. The land cover .... These data sets also include quality assurance information, ...... 2012 A new data mining framework for forest fire mapping.

  16. Pattern recognition applied to uranium prospecting

    Energy Technology Data Exchange (ETDEWEB)

    Briggs, P L; Press, F [Massachusetts Inst. of Tech., Cambridge (USA). Dept. of Earth and Planetary Sciences

    1977-07-14

    It is stated that pattern recognition techniques provide one way of combining quantitative and descriptive geological data for mineral prospecting. A quantified decision process using computer-selected patterns of geological data has the potential for selecting areas with undiscovered deposits of uranium or other minerals. When a natural resource is mined more rapidly than it is discovered, its continued production becomes increasingly difficult, and it has been noted that, although a considerable uranium reserve may remain in the U.S.A., the discovery rate for uranium is decreasing exponentially with cumulative exploration footage drilled. Pattern recognition methods of organising geological information for prospecting may provide new predictive power, as well as insight into the occurrence of uranium ore deposits. Often the task of prospecting consists of three stages of information processing: (1) collection of data on known ore deposits; (2) noting any regularities common to the known examples of an ore; (3) selection of new exploration targets based on the results of the second stage. A logical pattern recognition algorithm is here described that implements this geological procedure to demonstrate the possibility of building a quantified uranium prospecting guide from diverse geologic data.

  17. Efficiently Hiding Sensitive Itemsets with Transaction Deletion Based on Genetic Algorithms

    Directory of Open Access Journals (Sweden)

    Chun-Wei Lin

    2014-01-01

    Full Text Available Data mining is used to mine meaningful and useful information or knowledge from a very large database. Some secure or private information can be discovered by data mining techniques, thus resulting in an inherent risk of threats to privacy. Privacy-preserving data mining (PPDM has thus arisen in recent years to sanitize the original database for hiding sensitive information, which can be concerned as an NP-hard problem in sanitization process. In this paper, a compact prelarge GA-based (cpGA2DT algorithm to delete transactions for hiding sensitive itemsets is thus proposed. It solves the limitations of the evolutionary process by adopting both the compact GA-based (cGA mechanism and the prelarge concept. A flexible fitness function with three adjustable weights is thus designed to find the appropriate transactions to be deleted in order to hide sensitive itemsets with minimal side effects of hiding failure, missing cost, and artificial cost. Experiments are conducted to show the performance of the proposed cpGA2DT algorithm compared to the simple GA-based (sGA2DT algorithm and the greedy approach in terms of execution time and three side effects.

  18. Data mining applications in the context of casemix.

    Science.gov (United States)

    Koh, H C; Leong, S K

    2001-07-01

    In October 1999, the Singapore Government introduced casemix-based funding to public hospitals. The casemix approach to health care funding is expected to yield significant benefits, including equity and rationality in financing health care, the use of comparative casemix data for quality improvement activities, and the provision of information that enables hospitals to understand their cost behaviour and reinforces the drive for more cost-efficient services. However, there is some concern about the "quicker and sicker" syndrome (that is, the rapid discharge of patients with little regard for the quality of outcome). As it is likely that consequences of premature discharges will be reflected in the readmission data, an analysis of possible systematic patterns in readmission data can provide useful insight into the "quicker and sicker" syndrome. This paper explores potential data mining applications in the context of casemix by using readmission data as an illustration. In particular, it illustrates how data mining can be used to better understand readmission data and to detect systematic patterns, if any. From a technical perspective, data mining (which is capable of analysing complex non-linear and interaction relationships) supplements and complements traditional statistical methods in data analysis. From an applications perspective, data mining provides the technology and methodology to analyse mass volume of data to detect hidden patterns in data. Using readmission data as an illustrative data mining application, this paper explores potential data mining applications in the general casemix context.

  19. Applying Fuzzy Logic and Data Mining Techniques in Wireless Sensor Network for Determination Residential Fire Confidence

    Directory of Open Access Journals (Sweden)

    Mirjana Maksimović

    2014-09-01

    Full Text Available The main goal of soft computing technologies (fuzzy logic, neural networks, fuzzy rule-based systems, data mining techniques… is to find and describe the structural patterns in the data in order to try to explain connections between data and on their basis create predictive or descriptive models. Integration of these technologies in sensor nodes seems to be a good idea because it can significantly lead to network performances improvements, above all to reduce the energy consumption and enhance the lifetime of the network. The purpose of this paper is to analyze different algorithms in the case of fire confidence determination in order to see which of the methods and parameter values work best for the given problem. Hence, an analysis between different classification algorithms in a case of nominal and numerical d

  20. Air Pollution Monitoring and Mining Based on Sensor Grid in London.

    Science.gov (United States)

    Ma, Yajie; Richards, Mark; Ghanem, Moustafa; Guo, Yike; Hassard, John

    2008-06-01

    In this paper, we present a distributed infrastructure based on wireless sensors network and Grid computing technology for air pollution monitoring and mining, which aims to develop low-cost and ubiquitous sensor networks to collect real-time, large scale and comprehensive environmental data from road traffic emissions for air pollution monitoring in urban environment. The main informatics challenges in respect to constructing the high-throughput sensor Grid are discussed in this paper. We present a twolayer network framework, a P2P e-Science Grid architecture, and the distributed data mining algorithm as the solutions to address the challenges. We simulated the system in TinyOS to examine the operation of each sensor as well as the networking performance. We also present the distributed data mining result to examine the effectiveness of the algorithm.

  1. Data analysis and pattern recognition in multiple databases

    CERN Document Server

    Adhikari, Animesh; Pedrycz, Witold

    2014-01-01

    Pattern recognition in data is a well known classical problem that falls under the ambit of data analysis. As we need to handle different data, the nature of patterns, their recognition and the types of data analyses are bound to change. Since the number of data collection channels increases in the recent time and becomes more diversified, many real-world data mining tasks can easily acquire multiple databases from various sources. In these cases, data mining becomes more challenging for several essential reasons. We may encounter sensitive data originating from different sources - those cannot be amalgamated. Even if we are allowed to place different data together, we are certainly not able to analyse them when local identities of patterns are required to be retained. Thus, pattern recognition in multiple databases gives rise to a suite of new, challenging problems different from those encountered before. Association rule mining, global pattern discovery, and mining patterns of select items provide different...

  2. Hemodynamic and oxygen transport patterns for outcome prediction, therapeutic goals, and clinical algorithms to improve outcome. Feasibility of artificial intelligence to customize algorithms.

    Science.gov (United States)

    Shoemaker, W C; Patil, R; Appel, P L; Kram, H B

    1992-11-01

    A generalized decision tree or clinical algorithm for treatment of high-risk elective surgical patients was developed from a physiologic model based on empirical data. First, a large data bank was used to do the following: (1) describe temporal hemodynamic and oxygen transport patterns that interrelate cardiac, pulmonary, and tissue perfusion functions in survivors and nonsurvivors; (2) define optimal therapeutic goals based on the supranormal oxygen transport values of high-risk postoperative survivors; (3) compare the relative effectiveness of alternative therapies in a wide variety of clinical and physiologic conditions; and (4) to develop criteria for titration of therapy to the endpoints of the supranormal optimal goals using cardiac index (CI), oxygen delivery (DO2), and oxygen consumption (VO2) as proxy outcome measures. Second, a general purpose algorithm was generated from these data and tested in preoperatively randomized clinical trials of high-risk surgical patients. Improved outcome was demonstrated with this generalized algorithm. The concept that the supranormal values represent compensations that have survival value has been corroborated by several other groups. We now propose a unique approach to refine the generalized algorithm to develop customized algorithms and individualized decision analysis for each patient's unique problems. The present article describes a preliminary evaluation of the feasibility of artificial intelligence techniques to accomplish individualized algorithms that may further improve patient care and outcome.

  3. Mining known attack patterns from security-related events

    Directory of Open Access Journals (Sweden)

    Nicandro Scarabeo

    2015-10-01

    Full Text Available Managed Security Services (MSS have become an essential asset for companies to have in order to protect their infrastructure from hacking attempts such as unauthorized behaviour, denial of service (DoS, malware propagation, and anomalies. A proliferation of attacks has determined the need for installing more network probes and collecting more security-related events in order to assure the best coverage, necessary for generating incident responses. The increase in volume of data to analyse has created a demand for specific tools that automatically correlate events and gather them in pre-defined scenarios of attacks. Motivated by Above Security, a specialized company in the sector, and by National Research Council Canada (NRC, we propose a new data mining system that employs text mining techniques to dynamically relate security-related events in order to reduce analysis time, increase the quality of the reports, and automatically build correlated scenarios.

  4. A survey on Big Data Stream Mining

    African Journals Online (AJOL)

    pc

    2018-03-05

    Mar 5, 2018 ... huge amount of stream like telecommunication systems. So, there ... streams have many challenges for data mining algorithm design like using of ..... A. Bifet and R. Gavalda, "Learning from Time-Changing Data with. Adaptive ...

  5. Measures to restore metallurgical mine wasteland using ecological restoration technologies: A case study at Longnan Rare Earth Mine

    Science.gov (United States)

    Rao, Yunzhang; Gu, Ruizhi; Guo, Ruikai; Zhang, Xueyan

    2017-01-01

    Whereas mining activities produce the raw materials that are crucial to economic growth, such activities leave extensive scarring on the land, contributing to the waste of valuable land resources and upsetting the ecological environment. The aim of this study is therefore to investigate various ecological technologies to restore metallurgical mine wastelands. These technologies include measures such as soil amelioration, vegetation restoration, different vegetation planting patterns, and engineering technologies. The Longnan Rare Earth Mine in the Jiangxi Province of China is used as the case study. The ecological restoration process provides a favourable reference for the restoration of a metallurgical mine wasteland.

  6. Simulation of small-angle scattering patterns using a CPU-efficient algorithm

    Science.gov (United States)

    Anitas, E. M.

    2017-12-01

    Small-angle scattering (of neutrons, x-ray or light; SAS) is a well-established experimental technique for structural analysis of disordered systems at nano and micro scales. For complex systems, such as super-molecular assemblies or protein molecules, analytic solutions of SAS intensity are generally not available. Thus, a frequent approach to simulate the corresponding patterns is to use a CPU-efficient version of the Debye formula. For this purpose, in this paper we implement the well-known DALAI algorithm in Mathematica software. We present calculations for a series of 2D Sierpinski gaskets and respectively of pentaflakes, obtained from chaos game representation.

  7. Machine Learning Algorithms for Statistical Patterns in Large Data Sets

    Science.gov (United States)

    2018-02-01

    SUBJECT TERMS Text Analysis, Text Exploitation, Situation Awareness of Text , Document Processing, Document Ingestion, Full Text Search, Information...Assortativity: Proclivity Index for Attributed Networks (PRONE).” Pacific-Asia Conference on Knowledge Discovery and Data Mining , 2017. pp. 225-237...international conference on Knowledge discovery and data mining , 2013. pp. 212-220. [18] Sutherland, D.J., Xiong, L., Póczos, B., and Schneider, J

  8. Research of the Occupational Psychological Impact Factors Based on the Frequent Item Mining of the Transactional Database

    Directory of Open Access Journals (Sweden)

    Cheng Dongmei

    2015-01-01

    Full Text Available Based on the massive reading of data mining and association rules mining documents, this paper will start from compressing transactional database and propose the frequent complementary item storage structure of the transactional database. According to the previous analysis, this paper will also study the association rules mining algorithm based on the frequent complementary item storage structure of the transactional database. At last, this paper will apply this mining algorithm in the test results analysis module of team psychological health assessment system, and will extract the relationship between each psychological impact factor, so as to provide certain guidance for psychologists in their mental illness treatment.

  9. Data-Throughput Enhancement Using Data Mining-Informed Cognitive Radio

    Directory of Open Access Journals (Sweden)

    Khashayar Kotobi

    2015-03-01

    Full Text Available We propose the data mining-informed cognitive radio, which uses non-traditional data sources and data-mining techniques for decision making and improving the performance of a wireless network. To date, the application of information other than wireless channel data in cognitive radios has not been significantly studied. We use a novel dataset (Twitter traffic as an indicator of network load in a wireless channel. Using this dataset, we present and test a series of predictive algorithms that show an improvement in wireless channel utilization over traditional collision-detection algorithms. Our results demonstrate the viability of using these novel datasets to inform and create more efficient cognitive radio networks.

  10. Development of pattern recognition algorithms for particles detection from atmospheric images

    International Nuclear Information System (INIS)

    Khatchadourian, S.

    2010-01-01

    The HESS experiment consists of a system of telescopes destined to observe cosmic rays. Since the project has achieved a high level of performances, a second phase of the project has been initiated. This implies the addition of a new telescope which is more sensitive than its predecessors and which is capable of collecting a huge amount of images. In this context, all data collected by the telescope can not be retained because of storage limitations. Therefore, a new real-time system trigger must be designed in order to select interesting events on the fly. The purpose of this thesis was to propose a trigger solution to efficiently discriminate events (images) which are captured by the telescope. The first part of this thesis was to develop pattern recognition algorithms to be implemented within the trigger. A processing chain based on neural networks and Zernike moments has been validated. The second part of the thesis has focused on the implementation of the proposed algorithms onto an FPGA target, taking into account the application constraints in terms of resources and execution time. (author)

  11. Granular neural networks, pattern recognition and bioinformatics

    CERN Document Server

    Pal, Sankar K; Ganivada, Avatharam

    2017-01-01

    This book provides a uniform framework describing how fuzzy rough granular neural network technologies can be formulated and used in building efficient pattern recognition and mining models. It also discusses the formation of granules in the notion of both fuzzy and rough sets. Judicious integration in forming fuzzy-rough information granules based on lower approximate regions enables the network to determine the exactness in class shape as well as to handle the uncertainties arising from overlapping regions, resulting in efficient and speedy learning with enhanced performance. Layered network and self-organizing analysis maps, which have a strong potential in big data, are considered as basic modules,. The book is structured according to the major phases of a pattern recognition system (e.g., classification, clustering, and feature selection) with a balanced mixture of theory, algorithm, and application. It covers the latest findings as well as directions for future research, particularly highlighting bioinf...

  12. Kernel Methods for Mining Instance Data in Ontologies

    Science.gov (United States)

    Bloehdorn, Stephan; Sure, York

    The amount of ontologies and meta data available on the Web is constantly growing. The successful application of machine learning techniques for learning of ontologies from textual data, i.e. mining for the Semantic Web, contributes to this trend. However, no principal approaches exist so far for mining from the Semantic Web. We investigate how machine learning algorithms can be made amenable for directly taking advantage of the rich knowledge expressed in ontologies and associated instance data. Kernel methods have been successfully employed in various learning tasks and provide a clean framework for interfacing between non-vectorial data and machine learning algorithms. In this spirit, we express the problem of mining instances in ontologies as the problem of defining valid corresponding kernels. We present a principled framework for designing such kernels by means of decomposing the kernel computation into specialized kernels for selected characteristics of an ontology which can be flexibly assembled and tuned. Initial experiments on real world Semantic Web data enjoy promising results and show the usefulness of our approach.

  13. Physics Mining of Multi-source Data Sets, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose to implement novel physics mining algorithms with analytical capabilities to derive diagnostic and prognostic numerical models from multi-source...

  14. EAGLE: 'EAGLE'Is an' Algorithmic Graph Library for Exploration

    Energy Technology Data Exchange (ETDEWEB)

    2015-01-16

    The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. Today there is no tools to conduct "graph mining" on RDF standard data sets. We address that need through implementation of popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, degree distribution, diversity degree, PageRank, etc.). We implement these algorithms as SPARQL queries, wrapped within Python scripts and call our software tool as EAGLE. In RDF style, EAGLE stands for "EAGLE 'Is an' algorithmic graph library for exploration. EAGLE is like 'MATLAB' for 'Linked Data.'

  15. A Recommendation Algorithm for Automating Corollary Order Generation

    Science.gov (United States)

    Klann, Jeffrey; Schadow, Gunther; McCoy, JM

    2009-01-01

    Manual development and maintenance of decision support content is time-consuming and expensive. We explore recommendation algorithms, e-commerce data-mining tools that use collective order history to suggest purchases, to assist with this. In particular, previous work shows corollary order suggestions are amenable to automated data-mining techniques. Here, an item-based collaborative filtering algorithm augmented with association rule interestingness measures mined suggestions from 866,445 orders made in an inpatient hospital in 2007, generating 584 potential corollary orders. Our expert physician panel evaluated the top 92 and agreed 75.3% were clinically meaningful. Also, at least one felt 47.9% would be directly relevant in guideline development. This automated generation of a rough-cut of corollary orders confirms prior indications about automated tools in building decision support content. It is an important step toward computerized augmentation to decision support development, which could increase development efficiency and content quality while automatically capturing local standards. PMID:20351875

  16. Predicting mining activity with parallel genetic algorithms

    Science.gov (United States)

    Talaie, S.; Leigh, R.; Louis, S.J.; Raines, G.L.; Beyer, H.G.; O'Reilly, U.M.; Banzhaf, Arnold D.; Blum, W.; Bonabeau, C.; Cantu-Paz, E.W.; ,; ,

    2005-01-01

    We explore several different techniques in our quest to improve the overall model performance of a genetic algorithm calibrated probabilistic cellular automata. We use the Kappa statistic to measure correlation between ground truth data and data predicted by the model. Within the genetic algorithm, we introduce a new evaluation function sensitive to spatial correctness and we explore the idea of evolving different rule parameters for different subregions of the land. We reduce the time required to run a simulation from 6 hours to 10 minutes by parallelizing the code and employing a 10-node cluster. Our empirical results suggest that using the spatially sensitive evaluation function does indeed improve the performance of the model and our preliminary results also show that evolving different rule parameters for different regions tends to improve overall model performance. Copyright 2005 ACM.

  17. A comparative analysis of biclustering algorithms for gene expression data

    Science.gov (United States)

    Eren, Kemal; Deveci, Mehmet; Küçüktunç, Onur; Çatalyürek, Ümit V.

    2013-01-01

    The need to analyze high-dimension biological data is driving the development of new data mining methods. Biclustering algorithms have been successfully applied to gene expression data to discover local patterns, in which a subset of genes exhibit similar expression levels over a subset of conditions. However, it is not clear which algorithms are best suited for this task. Many algorithms have been published in the past decade, most of which have been compared only to a small number of algorithms. Surveys and comparisons exist in the literature, but because of the large number and variety of biclustering algorithms, they are quickly outdated. In this article we partially address this problem of evaluating the strengths and weaknesses of existing biclustering methods. We used the BiBench package to compare 12 algorithms, many of which were recently published or have not been extensively studied. The algorithms were tested on a suite of synthetic data sets to measure their performance on data with varying conditions, such as different bicluster models, varying noise, varying numbers of biclusters and overlapping biclusters. The algorithms were also tested on eight large gene expression data sets obtained from the Gene Expression Omnibus. Gene Ontology enrichment analysis was performed on the resulting biclusters, and the best enrichment terms are reported. Our analyses show that the biclustering method and its parameters should be selected based on the desired model, whether that model allows overlapping biclusters, and its robustness to noise. In addition, we observe that the biclustering algorithms capable of finding more than one model are more successful at capturing biologically relevant clusters. PMID:22772837

  18. IMPROVING THE ORGANIZATION OF THE SHOVEL-TRUCK SYSTEMS IN OPEN-PIT COAL MINES

    Directory of Open Access Journals (Sweden)

    Mark KORYAGIN

    2017-06-01

    Full Text Available The aim of the study is to reduce idle times of mining trucks and shovels in an open-pit coal mine. A heuristic algorithm for making dispatching decisions in conditions of dynamic allocation of trucks is developed. Priority parameters for choosing the shovel after the end-of-truck unloading are introduced. Also, an algorithm for searching for the optimal priority parameters to satisfy the required efficiency criterion is developed. This algorithm is based on a simulation model of a shovel-truck system. The proposed approach is applicable in terms of the group of shovels with a common dump point in various open-pit coal mines. The importance of this work lies in the fact that the proposed model takes into account the random factors related with the duration of loading and dumping, truck movement, repair of shovels and haul trucks, as well as the duration of periods between repairs.

  19. Data Mining Learning Models and Algorithms on a Scada System Data Repository

    Directory of Open Access Journals (Sweden)

    Mircea Rîşteiu

    2010-06-01

    Full Text Available This paper presents three data mining techniques applied
    on a SCADA system data repository: Naijve Bayes, k-Nearest Neighbor and Decision Trees. A conclusion that k-Nearest Neighbor is a suitable method to classify the large amount of data considered is made finally according to the mining result and its reasonable explanation. The experiments are built on the training data set and evaluated using the new test set with machine learning tool WEKA.

  20. Standard values of quality and ore mining costs in management of multi-plant mining company

    Energy Technology Data Exchange (ETDEWEB)

    Kudelko, Jan [KGHM CUPRUM Research and Development Center, Wroclaw (Poland); Wirth, Herbert [KGHM Polska Miedz S.A., Lubin (Poland)

    2010-03-15

    Profitability of copper deposit mining depends on three basic variables, electrolytic copper price, manufacturing and selling costs of copper and company property involved in production process. If the company property is adjusted to its tasks then the mining profiability depends on costs of copper mining and selling, because the price is the external variable defined by the market. We can shape the costs in two (complementary) ways, traditionally, reducing the labor, material and power consumption, and by adjusting the quality of mined ore (copper content) to the level required by the current copper prices. Required quality of copper ore in the whole company we determine according to the accepted profitability criteria and then we determine quality standard for individual mines. Algorithms determining the ore quality standard resulting from current market price of copper are presented in the paper. Calculation models for the mined ore quality standards, unit mining costs per one ton of copper, electrolytic copper production and ore output are given. Standards were established for one variable assuming that the other variables are determined in this calculation. Innovative solution, presented in the paper, is the method of decomposition of the company controllable variables into the tasks for individual mines providing reaching the targets to the whole technological circuit. Using the models, having relatively few data, it will be possible to calculate quickly the values which are interesting for managers such as for example the prognosis of rate of return (economic or operational), required copper content in the mined ore for the whole company and individual mines at given rate of return or boundary level of copper content in comparison with cost and production level. Examples of calculation are provided. (orig.)

  1. Predicting Students’ Performance using Modified ID3 Algorithm

    OpenAIRE

    Ramanathan L; Saksham Dhanda; Suresh Kumar D

    2013-01-01

    The ability to predict performance of students is very crucial in our present education system. We can use data mining concepts for this purpose. ID3 algorithm is one of the famous algorithms present today to generate decision trees. But this algorithm has a shortcoming that it is inclined to attributes with many values. So , this research aims to overcome this shortcoming of the algorithm by using gain ratio(instead of information gain) as well as by giving weights to each attribute at every...

  2. Radioactivity measurements in Egyptian phosphate mines and their significance in the occupational exposure of mine workers

    International Nuclear Information System (INIS)

    Bigu, J.; Hussein, Mohamed I.; Hussein, A.Z.

    2000-01-01

    Radioactivity measurements have been conducted in nine underground phosphate mines in the Egyptian Eastern Desert in order to estimate the occupational radiation exposure of mine workers in those mining sites. Measurements were carried out of airborne radon ( 222 Rn) and its short-lived decay products (progeny) and thoron ( 220 Rn) progeny, as well as γ-radiation from mine walls, ceilings and floors. Comparison of experimental data and theoretical predictions showed partial agreement between these two sets of data. This result is partly attributed to the complex layout of these mines which causes undesirable ventilation conditions, such as recirculation airflow patterns which could not be adequately identified or quantified. The radiation data obtained were used to estimate the Maximum Annual Dose (MAD), and other important occupational radiation exposure variables. These calculations indicate that in eight out of the nine mines surveyed, the MAD exceeded (by a factor of up to 7) the maximum recommended level by ICRP-60. A number of suggestions are made in order to reduce the MAD in the affected mines

  3. Simple, Scalable, Script-based, Science Processor for Measurements - Data Mining Edition (S4PM-DME)

    Science.gov (United States)

    Pham, L. B.; Eng, E. K.; Lynnes, C. S.; Berrick, S. W.; Vollmer, B. E.

    2005-12-01

    The S4PM-DME is the Goddard Earth Sciences Distributed Active Archive Center's (GES DAAC) web-based data mining environment. The S4PM-DME replaces the Near-line Archive Data Mining (NADM) system with a better web environment and a richer set of production rules. S4PM-DME enables registered users to submit and execute custom data mining algorithms. The S4PM-DME system uses the GES DAAC developed Simple Scalable Script-based Science Processor for Measurements (S4PM) to automate tasks and perform the actual data processing. A web interface allows the user to access the S4PM-DME system. The user first develops personalized data mining algorithm on his/her home platform and then uploads them to the S4PM-DME system. Algorithms in C and FORTRAN languages are currently supported. The user developed algorithm is automatically audited for any potential security problems before it is installed within the S4PM-DME system and made available to the user. Once the algorithm has been installed the user can promote the algorithm to the "operational" environment. From here the user can search and order the data available in the GES DAAC archive for his/her science algorithm. The user can also set up a processing subscription. The subscription will automatically process new data as it becomes available in the GES DAAC archive. The generated mined data products are then made available for FTP pickup. The benefits of using S4PM-DME are 1) to decrease the downloading time it typically takes a user to transfer the GES DAAC data to his/her system thus off-load the heavy network traffic, 2) to free-up the load on their system, and last 3) to utilize the rich and abundance ocean, atmosphere data from the MODIS and AIRS instruments available from the GES DAAC.

  4. Optimal sampling strategy for data mining

    International Nuclear Information System (INIS)

    Ghaffar, A.; Shahbaz, M.; Mahmood, W.

    2013-01-01

    Latest technology like Internet, corporate intranets, data warehouses, ERP's, satellites, digital sensors, embedded systems, mobiles networks all are generating such a massive amount of data that it is getting very difficult to analyze and understand all these data, even using data mining tools. Huge datasets are becoming a difficult challenge for classification algorithms. With increasing amounts of data, data mining algorithms are getting slower and analysis is getting less interactive. Sampling can be a solution. Using a fraction of computing resources, Sampling can often provide same level of accuracy. The process of sampling requires much care because there are many factors involved in the determination of correct sample size. The approach proposed in this paper tries to find a solution to this problem. Based on a statistical formula, after setting some parameters, it returns a sample size called s ufficient sample size , which is then selected through probability sampling. Results indicate the usefulness of this technique in coping with the problem of huge datasets. (author)

  5. Air Pollution Monitoring and Mining Based on Sensor Grid in London

    Science.gov (United States)

    Ma, Yajie; Richards, Mark; Ghanem, Moustafa; Guo, Yike; Hassard, John

    2008-01-01

    In this paper, we present a distributed infrastructure based on wireless sensors network and Grid computing technology for air pollution monitoring and mining, which aims to develop low-cost and ubiquitous sensor networks to collect real-time, large scale and comprehensive environmental data from road traffic emissions for air pollution monitoring in urban environment. The main informatics challenges in respect to constructing the high-throughput sensor Grid are discussed in this paper. We present a two-layer network framework, a P2P e-Science Grid architecture, and the distributed data mining algorithm as the solutions to address the challenges. We simulated the system in TinyOS to examine the operation of each sensor as well as the networking performance. We also present the distributed data mining result to examine the effectiveness of the algorithm. PMID:27879895

  6. Air Pollution Monitoring and Mining Based on Sensor Grid in London

    Directory of Open Access Journals (Sweden)

    John Hassard

    2008-06-01

    Full Text Available In this paper, we present a distributed infrastructure based on wireless sensors network and Grid computing technology for air pollution monitoring and mining, which aims to develop low-cost and ubiquitous sensor networks to collect real-time, large scale and comprehensive environmental data from road traffic emissions for air pollution monitoring in urban environment. The main informatics challenges in respect to constructing the high-throughput sensor Grid are discussed in this paper. We present a twolayer network framework, a P2P e-Science Grid architecture, and the distributed data mining algorithm as the solutions to address the challenges. We simulated the system in TinyOS to examine the operation of each sensor as well as the networking performance. We also present the distributed data mining result to examine the effectiveness of the algorithm.

  7. Near-line Archive Data Mining at the Goddard Distributed Active Archive Center

    Science.gov (United States)

    Pham, L.; Mack, R.; Eng, E.; Lynnes, C.

    2002-12-01

    NASA's Earth Observing System (EOS) is generating immense volumes of data, in some cases too much to provide to users with data-intensive needs. As an alternative to moving the data to the user and his/her research algorithms, we are providing a means to move the algorithms to the data. The Near-line Archive Data Mining (NADM) system is the Goddard Earth Sciences Distributed Active Archive Center's (GES DAAC) web data mining portal to the EOS Data and Information System (EOSDIS) data pool, a 50-TB online disk cache. The NADM web portal enables registered users to submit and execute data mining algorithm codes on the data in the EOSDIS data pool. A web interface allows the user to access the NADM system. The users first develops personalized data mining code on their home platform and then uploads them to the NADM system. The C, FORTRAN and IDL languages are currently supported. The user developed code is automatically audited for any potential security problems before it is installed within the NADM system and made available to the user. Once the code has been installed the user is provided a test environment where he/she can test the execution of the software against data sets of the user's choosing. When the user is satisfied with the results, he/she can promote their code to the "operational" environment. From here the user can interactively run his/her code on the data available in the EOSDIS data pool. The user can also set up a processing subscription. The subscription will automatically process new data as it becomes available in the EOSDIS data pool. The generated mined data products are then made available for FTP pickup. The NADM system uses the GES DAAC-developed Simple Scalable Script-based Science Processor (S4P) to automate tasks and perform the actual data processing. Users will also have the option of selecting a DAAC-provided data mining algorithm and using it to process the data of their choice.

  8. Automated Assessment of Patients' Self-Narratives for Posttraumatic Stress Disorder Screening Using Natural Language Processing and Text Mining.

    Science.gov (United States)

    He, Qiwei; Veldkamp, Bernard P; Glas, Cees A W; de Vries, Theo

    2017-03-01

    Patients' narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four machine-learning algorithms-including decision tree, naive Bayes, support vector machine, and an alternative classification approach called the product score model-were used in combination with n-gram representation models to identify patterns between verbal features in self-narratives and psychiatric diagnoses. With our sample, the product score model with unigrams attained the highest prediction accuracy when compared with practitioners' diagnoses. The addition of multigrams contributed most to balancing the metrics of sensitivity and specificity. This article also demonstrates that text mining is a promising approach for analyzing patients' self-expression behavior, thus helping clinicians identify potential patients from an early stage.

  9. Use of data mining to predict significant factors and benefits of bilateral cochlear implantation.

    Science.gov (United States)

    Ramos-Miguel, Angel; Perez-Zaballos, Teresa; Perez, Daniel; Falconb, Juan Carlos; Ramosb, Angel

    2015-11-01

    Data mining (DM) is a technique used to discover pattern and knowledge from a big amount of data. It uses artificial intelligence, automatic learning, statistics, databases, etc. In this study, DM was successfully used as a predictive tool to assess disyllabic speech test performance in bilateral implanted patients with a success rate above 90%. 60 bilateral sequentially implanted adult patients were included in the study. The DM algorithms developed found correlations between unilateral medical records and Audiological test results and bilateral performance by establishing relevant variables based on two DM techniques: the classifier and the estimation. The nearest neighbor algorithm was implemented in the first case, and the linear regression in the second. The results showed that patients with unilateral disyllabic test results below 70% benefited the most from a bilateral implantation. Finally, it was observed that its benefits decrease as the inter-implant time increases.

  10. Reestablishment of woody plants on mine spoils and management of mine water impoundments: an overview of Forest Service research on the northern High Plain

    Energy Technology Data Exchange (ETDEWEB)

    Bjugstad, A J

    1977-01-01

    The function of the research unit at Rapid city, S. Dakota, is to provide guidelines for the reestablisment of shrubs and trees on land characteristic of the High Plains, and for the mitigation of possible detrimental effects of surface mining on ground water and surface water. One possible problem posed by surface mining concerns the formation of land drainage patterns that could result in post-mining formations of large salt playas. Surface mining could affect shallow ground water aquifers up to /sup 1///sub 4/ mile from the mine site. Research is being conducted on the reclamation of mine spoils and on the rehabilitation and management of impounded mine water.

  11. Landuse change detection in a surface coal mine area using multi-temporal high resolution satellite images

    Energy Technology Data Exchange (ETDEWEB)

    Demirel, N.; Duzgun, S.; Kemal Emil, M. [Middle East Technical Univ., Ankara (Turkey). Dept. of Mining Engineering

    2010-07-01

    Changes in the landcover and landuse of a mine area can be caused by surface mining activities, exploitation of ore and stripping and dumping overburden. In order to identify the long-term impacts of mining on the environment and land cover, these changes must be continuously monitored. A facility to regularly observe the progress of surface mining and reclamation is important for effective enforcement of mining and environmental regulations. Remote sensing provides a powerful tool to obtain rigorous data and reduce the need for time-consuming and expensive field measurements. The purpose of this study was to conduct post classification change detection for identifying, quantifying, and analyzing the spatial response of landscape due to surface lignite coal mining activities in Goynuk, Bolu, Turkey, from 2004 to 2008. The paper presented the research algorithm which involved acquiring multi temporal high resolution satellite data; preprocessing the data; performing image classification using maximum likelihood classification algorithm and performing accuracy assessment on the classification results; performing post classification change detection algorithm; and analyzing the results. Specifically, the paper discussed the study area, data and methodology, and image preprocessing using radiometric correction. Image classification and change detection were also discussed. It was concluded that the mine and dump area decreased by 192.5 ha from 2004 to 2008 and was caused by the diminishing reserves in the area and decline in the required production. 5 refs., 2 tabs., 4 figs.

  12. An improved Pattern Search based algorithm to solve the Dynamic Economic Dispatch problem with valve-point effect

    International Nuclear Information System (INIS)

    Alsumait, J.S.; Qasem, M.; Sykulski, J.K.; Al-Othman, A.K.

    2010-01-01

    In this paper, an improved algorithm based on Pattern Search method (PS) to solve the Dynamic Economic Dispatch is proposed. The algorithm maintains the essential unit ramp rate constraint, along with all other necessary constraints, not only for the time horizon of operation (24 h), but it preserves these constraints through the transaction period to the next time horizon (next day) in order to avoid the discontinuity of the power system operation. The Dynamic Economic and Emission Dispatch problem (DEED) is also considered. The load balance constraints, operating limits, valve-point loading and network losses are included in the models of both DED and DEED. The numerical results clarify the significance of the improved algorithm and verify its performance.

  13. Building a Classification Model for Enrollment In Higher Educational Courses using Data Mining Techniques

    OpenAIRE

    Saini, Priyanka

    2014-01-01

    Data Mining is the process of extracting useful patterns from the huge amount of database and many data mining techniques are used for mining these patterns. Recently, one of the remarkable facts in higher educational institute is the rapid growth data and this educational data is expanding quickly without any advantage to the educational management. The main aim of the management is to refine the education standard; therefore by applying the various data mining techniques on this data one ca...

  14. Ultrabroadband photonic Internet: data mining approach to security aspects

    Science.gov (United States)

    Kalicki, Arkadiusz

    2009-06-01

    Web applications became most popular medium in the Internet. Popularity, easiness of web application frameworks together with careless development results in high number of vulnerabilities and attacks. There are several types of attacks possible because of improper input validation. SQL injection is ability to execute arbitrary SQL queries in a database through an existing application. Cross-site scripting is the vulnerability which allows malicious web users to inject code into the web pages viewed by other users. Cross-Site Request Forgery (CSRF) is an attack that tricks the victim into loading a page that contains malicious request. Web spam in blogs. In order to secure web applications intrusion detection (IDS) and intrusion prevention systems (IPS) are being used. Intrusion detection systems are divided in two groups: misuse detection (traditional IDS) and anomaly detection. Misuse detection systems are signature based, have high accuracy in detecting many kinds of known attacks but cannot detect unknown and emerging attacks. This can be complemented with anomaly based intrusion detection and prevention systems. This paper presents anomaly driven proxy as an IPS and data mining based algorithm which was used to detecting anomalies. The principle of this method is the comparison of the incoming HTTP traffic with a previously built profile that contains a representation of the "normal" or expected web application usage sequence patterns. The frequent sequence patterns are found with GSP algorithm. Some basic tests show that the software catches malicious requests.

  15. Analisis Data Lulusan dengan Data Mining untuk Mendukung Strategi Promosi Universitas Lancang Kuning

    Directory of Open Access Journals (Sweden)

    Elvira Asril

    2015-11-01

    Full Text Available Setiap perusahaan maupun organisasi yang ingin tetap bertahan perlu untuk menentukan strategi promosi yang tepat. Penentuan strategi promosi yang tepat akan dapat mengurangi biaya promosi dan mencapai sasaran promosi yang tepat. Salah satu cara yang dapat dilakukan untuk penentuan strategi promosi adalah dengan menggunakan teknik data mining. Teknik data mining yang digunakan dalam hal ini adalah dengan menggunakan algoritma Clustering K-Means. Clustering merupakan pengelompokkan record, observasi, atau kasus ke dalam kelas-kelas objek yang mirip. K-Means adalah metode klaster data non-hirarkis yang mencoba untuk membagi data ke dalam satu atau lebih klaster. Penelitian dilakukan dengan mengamati beberapa variabel penelitian yang sering dipertimbangkan oleh perguruan tinggi dalam menentukan sasaran promosinya yaitu asal sekolah, daerah, dan jurusan. Hasil penelitian ini adalah berupa pola menarik hasil data mining yang merupakan informasi penting untuk mendukung strategi promosi yang tepat dalam mendapatkan calon mahasiswa baru.Kata kunci: Data Mining, Clustering, K-Means Each company or organization that wants to survive needs to determine appropriate promotional strategies. Determination of appropriate promotional strategies will be able to reduce costs and achieve the goals the promotion of proper promotion. One way that can be done to determine campaign strategy is to use data mining techniques. Data mining techniques used in this case is to use a K-Means clustering algorithm. Clustering is the grouping of records, observation, or in the case of the object classes that are similar. K-Means is a method of non-hierarchical clustering of data that is trying to divide the data into one or more clusters. The study was conducted by observing some of the variables that are often considered by the college in determining the target of promotion that the school of origin, region, and department. Results of this study are interesting pattern of

  16. Seasonal and spatial patterns of metals at a restored copper mine site. I. Stream copper and zinc

    International Nuclear Information System (INIS)

    Bambic, Dustin G.; Alpers, Charles N.; Green, Peter G.; Fanelli, Eileen; Silk, Wendy K.

    2006-01-01

    Seasonal and spatial variations in metal concentrations and pH were found in a stream at a restored copper mine site located near a massive sulfide deposit in the Foothill copper-zinc belt of the Sierra Nevada, California. At the mouth of the stream, copper concentrations increased and pH decreased with increased streamflow after the onset of winter rain and, unexpectedly, reached extreme values 1 or 2 months after peaks in the seasonal hydrographs. In contrast, aqueous zinc and sulfate concentrations were highest during low-flow periods. Spatial variation was assessed in 400 m of reach encompassing an acidic, metal-laden seep. At this seep, pH remained low (2-3) throughout the year, and copper concentrations were highest. In contrast, the zinc concentrations increased with downstream distance. These spatial patterns were caused by immobilization of copper by hydrous ferric oxides in benthic sediments, coupled with increasing downstream supply of zinc from groundwater seepage. - Seasonal hydrology and benthic sediments control copper and zinc concentrations in a stream through a restored mine site

  17. Quantification of differences between nailfold capillaroscopy images with a scleroderma pattern and normal pattern using measures of geometric and algorithmic complexity.

    Science.gov (United States)

    Urwin, Samuel George; Griffiths, Bridget; Allen, John

    2017-02-01

    This study aimed to quantify and investigate differences in the geometric and algorithmic complexity of the microvasculature in nailfold capillaroscopy (NFC) images displaying a scleroderma pattern and those displaying a 'normal' pattern. 11 NFC images were qualitatively classified by a capillary specialist as indicative of 'clear microangiopathy' (CM), i.e. a scleroderma pattern, and 11 as 'not clear microangiopathy' (NCM), i.e. a 'normal' pattern. Pre-processing was performed, and fractal dimension (FD) and Kolmogorov complexity (KC) were calculated following image binarisation. FD and KC were compared between groups, and a k-means cluster analysis (n  =  2) on all images was performed, without prior knowledge of the group assigned to them (i.e. CM or NCM), using FD and KC as inputs. CM images had significantly reduced FD and KC compared to NCM images, and the cluster analysis displayed promising results that the quantitative classification of images into CM and NCM groups is possible using the mathematical measures of FD and KC. The analysis techniques used show promise for quantitative microvascular investigation in patients with systemic sclerosis.

  18. Moment Tensor Inversion with 3D sensor configuration of Mining Induced Seismicity (Kiruna mine, Sweden)

    Science.gov (United States)

    Ma, Ju; Dineva, Savka; Cesca, Simone; Heimann, Sebastian

    2018-03-01

    Mining induced seismicity is an undesired consequence of mining operations, which poses significant hazard to miners and infrastructures and requires an accurate analysis of the rupture process. Seismic moment tensors of mining-induced events help to understand the nature of mining-induced seismicity by providing information about the relationship between the mining, stress redistribution and instabilities in the rock mass. In this work, we adapt and test a waveform-based inversion method on high frequency data recorded by a dense underground seismic system in one of the largest underground mines in the world (Kiruna mine, Sweden). Stable algorithm for moment tensor inversion for comparatively small mining induced earthquakes, resolving both the double couple and full moment tensor with high frequency data is very challenging. Moreover, the application to underground mining system requires accounting for the 3D geometry of the monitoring system. We construct a Green's function database using a homogeneous velocity model, but assuming a 3D distribution of potential sources and receivers. We first perform a set of moment tensor inversions using synthetic data to test the effects of different factors on moment tensor inversion stability and source parameters accuracy, including the network spatial coverage, the number of sensors and the signal-to-noise ratio. The influence of the accuracy of the input source parameters on the inversion results is also tested. Those tests show that an accurate selection of the inversion parameters allows resolving the moment tensor also in presence of realistic seismic noise conditions. Finally, the moment tensor inversion methodology is applied to 8 events chosen from mining block #33/34 at Kiruna mine. Source parameters including scalar moment, magnitude, double couple, compensated linear vector dipole and isotropic contributions as well as the strike, dip, rake configurations of the double couple term were obtained. The orientations

  19. Moment tensor inversion with three-dimensional sensor configuration of mining induced seismicity (Kiruna mine, Sweden)

    Science.gov (United States)

    Ma, Ju; Dineva, Savka; Cesca, Simone; Heimann, Sebastian

    2018-06-01

    Mining induced seismicity is an undesired consequence of mining operations, which poses significant hazard to miners and infrastructures and requires an accurate analysis of the rupture process. Seismic moment tensors of mining-induced events help to understand the nature of mining-induced seismicity by providing information about the relationship between the mining, stress redistribution and instabilities in the rock mass. In this work, we adapt and test a waveform-based inversion method on high frequency data recorded by a dense underground seismic system in one of the largest underground mines in the world (Kiruna mine, Sweden). A stable algorithm for moment tensor inversion for comparatively small mining induced earthquakes, resolving both the double-couple and full moment tensor with high frequency data, is very challenging. Moreover, the application to underground mining system requires accounting for the 3-D geometry of the monitoring system. We construct a Green's function database using a homogeneous velocity model, but assuming a 3-D distribution of potential sources and receivers. We first perform a set of moment tensor inversions using synthetic data to test the effects of different factors on moment tensor inversion stability and source parameters accuracy, including the network spatial coverage, the number of sensors and the signal-to-noise ratio. The influence of the accuracy of the input source parameters on the inversion results is also tested. Those tests show that an accurate selection of the inversion parameters allows resolving the moment tensor also in the presence of realistic seismic noise conditions. Finally, the moment tensor inversion methodology is applied to eight events chosen from mining block #33/34 at Kiruna mine. Source parameters including scalar moment, magnitude, double-couple, compensated linear vector dipole and isotropic contributions as well as the strike, dip and rake configurations of the double-couple term were obtained

  20. Frequent Symptom Sets Identification from Uncertain Medical Data in Differentially Private Way

    Directory of Open Access Journals (Sweden)

    Zhe Ding

    2017-01-01

    Full Text Available Data mining techniques are applied to identify hidden patterns in large amounts of patient data. These patterns can assist physicians in making more accurate diagnosis. For different physical conditions of patients, the same physiological index corresponds to a different symptom association probability for each patient. Data mining technologies based on certain data cannot be directly applied to these patients’ data. Patient data are sensitive data. An adversary with sufficient background information can make use of the patterns mined from uncertain medical data to obtain the sensitive information of patients. In this paper, a new algorithm is presented to determine the top K most frequent itemsets from uncertain medical data and to protect data privacy. Based on traditional algorithms for mining frequent itemsets from uncertain data, our algorithm applies sparse vector algorithm and the Laplace mechanism to ensure differential privacy for the top K most frequent itemsets for uncertain medical data and the expected supports of these frequent itemsets. We prove that our algorithm can guarantee differential privacy in theory. Moreover, we carry out experiments with four real-world scenario datasets and two synthetic datasets. The experimental results demonstrate the performance of our algorithm.

  1. Applying data mining techniques to improve diagnosis in neonatal jaundice

    Directory of Open Access Journals (Sweden)

    Ferreira Duarte

    2012-12-01

    Full Text Available Abstract Background Hyperbilirubinemia is emerging as an increasingly common problem in newborns due to a decreasing hospital length of stay after birth. Jaundice is the most common disease of the newborn and although being benign in most cases it can lead to severe neurological consequences if poorly evaluated. In different areas of medicine, data mining has contributed to improve the results obtained with other methodologies. Hence, the aim of this study was to improve the diagnosis of neonatal jaundice with the application of data mining techniques. Methods This study followed the different phases of the Cross Industry Standard Process for Data Mining model as its methodology. This observational study was performed at the Obstetrics Department of a central hospital (Centro Hospitalar Tâmega e Sousa – EPE, from February to March of 2011. A total of 227 healthy newborn infants with 35 or more weeks of gestation were enrolled in the study. Over 70 variables were collected and analyzed. Also, transcutaneous bilirubin levels were measured from birth to hospital discharge with maximum time intervals of 8 hours between measurements, using a noninvasive bilirubinometer. Different attribute subsets were used to train and test classification models using algorithms included in Weka data mining software, such as decision trees (J48 and neural networks (multilayer perceptron. The accuracy results were compared with the traditional methods for prediction of hyperbilirubinemia. Results The application of different classification algorithms to the collected data allowed predicting subsequent hyperbilirubinemia with high accuracy. In particular, at 24 hours of life of newborns, the accuracy for the prediction of hyperbilirubinemia was 89%. The best results were obtained using the following algorithms: naive Bayes, multilayer perceptron and simple logistic. Conclusions The findings of our study sustain that, new approaches, such as data mining, may support

  2. Imitating manual curation of text-mined facts in biomedicine.

    Directory of Open Access Journals (Sweden)

    Raul Rodriguez-Esteban

    2006-09-01

    Full Text Available Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted of individual facts--to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually produced evaluations (most facts were independently reviewed more than once, producing independent evaluations, we implemented and tested a collection of algorithms that mimic human evaluation of facts provided by an automated information-extraction system. The performance of our best automated classifiers closely approached that of our human evaluators (ROC score close to 0.95. Our hypothesis is that, were we to use a larger number of human experts to evaluate any given sentence, we could implement an artificial-intelligence curator that would perform the classification job at least as accurately as an average individual human evaluator. We illustrated our analysis by visualizing the predicted accuracy of the text-mined relations involving the term cocaine.

  3. Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values.

    Science.gov (United States)

    Bhattacharya, Anindya; De, Rajat K

    2010-08-01

    Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to

  4. Development of energy-saving technologies providing comfortable microclimate conditions for mining

    Directory of Open Access Journals (Sweden)

    Б. П. Казаков

    2017-03-01

    Full Text Available The paper contains analysis of natural and technogenic factors influencing properties of mine atmosphere, defining level of mining safety and probability of emergencies. Main trends in development of energy-saving technologies providing comfortable microclimate conditions are highlighted. A complex of methods and mathematical models has been developed to carry out aerologic and thermophysical calculations. Main ways of improvement for existing calculation methods of stationary and non-stationary air distribution have been defined: use of ejection draught sources to organize recirculation ventilation; accounting of depression losses at working intersections; inertance impact of  air streams and mined-out spaces for modeling transitory emergency scenarios. Based on the calculation algorithm of airflow rate distribution in the mine network, processing method has been developed for the results of air-depressive surveys under conditions of data shortage. Processes of dust transfer have been modeled in view of its coagulation and settlement, as well as interaction with water drops in case of wet dust prevention. A method to calculate intensity of water evaporation and condensation has been suggested, which allows to forecast time, duration and quantity of precipitation and its migration inside the mine during winter season. Solving the problem of heat exchange between mine airflow and timbering of the ventilation shaft in a conjugation formulation permits to estimate depression value of natural draught and conditions of convective balance between air streams. Normalization of microclimatic parameters for mine atmosphere is forecasted for the use of heat-exchange units either heating or cooling and dehumidifying ventilation air. Algorithms are presented that permit to minimize ventilation energy demands at the stages of mine design and exploitation.

  5. PROGRAMS WITH DATA MINING CAPABILITIES

    Directory of Open Access Journals (Sweden)

    Ciobanu Dumitru

    2012-03-01

    Full Text Available The fact that the Internet has become a commodity in the world has created a framework for anew economy. Traditional businesses migrate to this new environment that offers many features and options atrelatively low prices. However competitiveness is fierce and successful Internet business is tied to rigorous use of allavailable information. The information is often hidden in data and for their retrieval is necessary to use softwarecapable of applying data mining algorithms and techniques. In this paper we want to review some of the programswith data mining capabilities currently available in this area.We also propose some classifications of this softwareto assist those who wish to use such software.

  6. Data Mining Methods for Recommender Systems

    Science.gov (United States)

    Amatriain, Xavier; Jaimes*, Alejandro; Oliver, Nuria; Pujol, Josep M.

    In this chapter, we give an overview of the main Data Mining techniques used in the context of Recommender Systems. We first describe common preprocessing methods such as sampling or dimensionality reduction. Next, we review the most important classification techniques, including Bayesian Networks and Support Vector Machines. We describe the k-means clustering algorithm and discuss several alternatives. We also present association rules and related algorithms for an efficient training process. In addition to introducing these techniques, we survey their uses in Recommender Systems and present cases where they have been successfully applied.

  7. Text mining in cancer gene and pathway prioritization.

    Science.gov (United States)

    Luo, Yuan; Riedlinger, Gregory; Szolovits, Peter

    2014-01-01

    Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.

  8. Privacy Preservation in Distributed Subgradient Optimization Algorithms

    OpenAIRE

    Lou, Youcheng; Yu, Lean; Wang, Shouyang

    2015-01-01

    Privacy preservation is becoming an increasingly important issue in data mining and machine learning. In this paper, we consider the privacy preserving features of distributed subgradient optimization algorithms. We first show that a well-known distributed subgradient synchronous optimization algorithm, in which all agents make their optimization updates simultaneously at all times, is not privacy preserving in the sense that the malicious agent can learn other agents' subgradients asymptotic...

  9. Spectral signature verification using statistical analysis and text mining

    Science.gov (United States)

    DeCoster, Mallory E.; Firpi, Alexe H.; Jacobs, Samantha K.; Cone, Shelli R.; Tzeng, Nigel H.; Rodriguez, Benjamin M.

    2016-05-01

    In the spectral science community, numerous spectral signatures are stored in databases representative of many sample materials collected from a variety of spectrometers and spectroscopists. Due to the variety and variability of the spectra that comprise many spectral databases, it is necessary to establish a metric for validating the quality of spectral signatures. This has been an area of great discussion and debate in the spectral science community. This paper discusses a method that independently validates two different aspects of a spectral signature to arrive at a final qualitative assessment; the textual meta-data and numerical spectral data. Results associated with the spectral data stored in the Signature Database1 (SigDB) are proposed. The numerical data comprising a sample material's spectrum is validated based on statistical properties derived from an ideal population set. The quality of the test spectrum is ranked based on a spectral angle mapper (SAM) comparison to the mean spectrum derived from the population set. Additionally, the contextual data of a test spectrum is qualitatively analyzed using lexical analysis text mining. This technique analyzes to understand the syntax of the meta-data to provide local learning patterns and trends within the spectral data, indicative of the test spectrum's quality. Text mining applications have successfully been implemented for security2 (text encryption/decryption), biomedical3 , and marketing4 applications. The text mining lexical analysis algorithm is trained on the meta-data patterns of a subset of high and low quality spectra, in order to have a model to apply to the entire SigDB data set. The statistical and textual methods combine to assess the quality of a test spectrum existing in a database without the need of an expert user. This method has been compared to other validation methods accepted by the spectral science community, and has provided promising results when a baseline spectral signature is

  10. Trace element patterns in lichens following uranium mine closures

    International Nuclear Information System (INIS)

    Fahselt, D.; Wu, T.W.; Mott, B.

    1995-01-01

    Instrumental neutron activation analysis was used to determine trace elements in Cladina mitis (Sandst). Hale ampersand Culb. along transects extending from uranium mines at Elliot Lake and Agnew Lake in central Ontario, Canada. Levels of 11 elements were reported and the presence of uranium (U) was confirmed, although U concentrations were much less than in Cladina rangiferina 10 years earlier. Among the elements identified in lichen thalli was Th, which occurred in higher concentrations than U. All trace elements, including the two radionuclides, were found in deteriorating thallus parts as well as living podetia, and five of these seem to have originated as airborne particulates from minesites. In spite of mine closures, levels of Th and U remained higher near sources of ore dust and there was little relationship between radionuclide concentrations in thallus and substrate. 24 refs., 4 figs., 3 tabs

  11. Pattern Extraction Algorithm for NetFlow-Based Botnet Activities Detection

    Directory of Open Access Journals (Sweden)

    Rafał Kozik

    2017-01-01

    Full Text Available As computer and network technologies evolve, the complexity of cybersecurity has dramatically increased. Advanced cyber threats have led to current approaches to cyber-attack detection becoming ineffective. Many currently used computer systems and applications have never been deeply tested from a cybersecurity point of view and are an easy target for cyber criminals. The paradigm of security by design is still more of a wish than a reality, especially in the context of constantly evolving systems. On the other hand, protection technologies have also improved. Recently, Big Data technologies have given network administrators a wide spectrum of tools to combat cyber threats. In this paper, we present an innovative system for network traffic analysis and anomalies detection to utilise these tools. The systems architecture is based on a Big Data processing framework, data mining, and innovative machine learning techniques. So far, the proposed system implements pattern extraction strategies that leverage batch processing methods. As a use case we consider the problem of botnet detection by means of data in the form of NetFlows. Results are promising and show that the proposed system can be a useful tool to improve cybersecurity.

  12. Assessing semantic similarity of texts - Methods and algorithms

    Science.gov (United States)

    Rozeva, Anna; Zerkova, Silvia

    2017-12-01

    Assessing the semantic similarity of texts is an important part of different text-related applications like educational systems, information retrieval, text summarization, etc. This task is performed by sophisticated analysis, which implements text-mining techniques. Text mining involves several pre-processing steps, which provide for obtaining structured representative model of the documents in a corpus by means of extracting and selecting the features, characterizing their content. Generally the model is vector-based and enables further analysis with knowledge discovery approaches. Algorithms and measures are used for assessing texts at syntactical and semantic level. An important text-mining method and similarity measure is latent semantic analysis (LSA). It provides for reducing the dimensionality of the document vector space and better capturing the text semantics. The mathematical background of LSA for deriving the meaning of the words in a given text by exploring their co-occurrence is examined. The algorithm for obtaining the vector representation of words and their corresponding latent concepts in a reduced multidimensional space as well as similarity calculation are presented.

  13. Mining Long, Sharable Patterns in Trajectories of Moving Objects

    DEFF Research Database (Denmark)

    Gidofalvi, Gyozo; Pedersen, Torben Bach

    2009-01-01

    The efficient analysis of spatio-temporal data, generated by moving objects, is an essential requirement for intelligent location-based services. Spatio-temporal rules can be found by constructing spatio-temporal baskets, from which traditional association rule mining methods can discover spatio...

  14. Automatic boiling water reactor loading pattern design using ant colony optimization algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Wang, C.-D. [Department of Engineering and System Science, National Tsing Hua University, 101, Section 2 Kuang Fu Road, Hsinchu 30013, Taiwan (China); Nuclear Engineering Division, Institute of Nuclear Energy Research, No. 1000, Wenhua Rd., Jiaan Village, Longtan Township, Taoyuan County 32546, Taiwan (China)], E-mail: jdwang@iner.gov.tw; Lin Chaung [Department of Engineering and System Science, National Tsing Hua University, 101, Section 2 Kuang Fu Road, Hsinchu 30013, Taiwan (China)

    2009-08-15

    An automatic boiling water reactor (BWR) loading pattern (LP) design methodology was developed using the rank-based ant system (RAS), which is a variant of the ant colony optimization (ACO) algorithm. To reduce design complexity, only the fuel assemblies (FAs) of one eight-core positions were determined using the RAS algorithm, and then the corresponding FAs were loaded into the other parts of the core. Heuristic information was adopted to exclude the selection of the inappropriate FAs which will reduce search space, and thus, the computation time. When the LP was determined, Haling cycle length, beginning of cycle (BOC) shutdown margin (SDM), and Haling end of cycle (EOC) maximum fraction of limit for critical power ratio (MFLCPR) were calculated using SIMULATE-3 code, which were used to evaluate the LP for updating pheromone of RAS. The developed design methodology was demonstrated using FAs of a reference cycle of the BWR6 nuclear power plant. The results show that, the designed LP can be obtained within reasonable computation time, and has a longer cycle length than that of the original design.

  15. GPU-Accelerated Text Mining

    International Nuclear Information System (INIS)

    Cui, X.; Mueller, F.; Zhang, Y.; Potok, Thomas E.

    2009-01-01

    Accelerating hardware devices represent a novel promise for improving the performance for many problem domains but it is not clear for which domains what accelerators are suitable. While there is no room in general-purpose processor design to significantly increase the processor frequency, developers are instead resorting to multi-core chips duplicating conventional computing capabilities on a single die. Yet, accelerators offer more radical designs with a much higher level of parallelism and novel programming environments. This present work assesses the viability of text mining on CUDA. Text mining is one of the key concepts that has become prominent as an effective means to index the Internet, but its applications range beyond this scope and extend to providing document similarity metrics, the subject of this work. We have developed and optimized text search algorithms for GPUs to exploit their potential for massive data processing. We discuss the algorithmic challenges of parallelization for text search problems on GPUs and demonstrate the potential of these devices in experiments by reporting significant speedups. Our study may be one of the first to assess more complex text search problems for suitability for GPU devices, and it may also be one of the first to exploit and report on atomic instruction usage that have recently become available in NVIDIA devices

  16. Radon exposure in abandoned metalliferous mines of South America

    International Nuclear Information System (INIS)

    Silva, A.A.R. da; Umisedo, N.; Yoshimura, E.M.; Anjos, R.M.; Valladares, D.L.; Velasco, H.; Rizzotto, M.

    2011-01-01

    Since the days of the Spanish and Portuguese conquerors, South America has been closely associated with the metalliferous ore mining. Gold, silver, tin, lead, tungsten, nickel, copper, and palladium ores have been explored over the last centuries. In addition, there has also been the development and promotion of other economic activities related to mining, as the underground mine tourism. A few works have been published on radon levels in the South American mining. In this study, we investigated the radon transport process and its health hazard in two exhausted and abandoned mines in San Luis Province, Argentina. These mines were chosen because they have different physical configurations in their cavities, features which can affect the air flow patterns and radon concentrations. La Carolina gold mine (32 deg 48' 0'' S, 66 deg 60' 0'' W) is currently a blind end system, corresponding to a horizontal excavation into the side of a mountain, with only a main adit. Los Condores wolfram mine (32 deg 33' 25'' S, 65 deg 15' 20'' W) is also a horizontal excavation into the side of a mountain, but has a vertical output (a shaft) at the end of the main gallery. Three different experimental methodologies were used. Radon concentration measurements were performed by CR-39 nuclear track detectors. The distribution of natural radionuclide activities ( 40 K, 232 Th and 238 U) was determined from rock samples collected along their main adits, using in laboratory gamma-ray spectrometry. The external gamma dose rate was evaluated using thermoluminescent dosimeters and a portable survey meter. The values for the 222 Rn concentration ranged from 0.43 ± 0.04 to 1.48 ± 0.12 kBq/m 3 in the Los Condores wolfram mine and from 1.8 ± 0.1 to 6.0±0.5 kBq/m 3 in the La Carolina gold mine, indicating that, in this mine, the radon levels exceed up to four times the action level of 1.5 kBq/m 3 recommended by the ICRP. The patterns of the radon transport process revealed that the La Carolina

  17. Community Mining Method of Label Propagation Based on Dense Pairs

    Directory of Open Access Journals (Sweden)

    WENG Wei

    2014-03-01

    Full Text Available In recent years, with the popularity of handheld Internet equipments like mobile phones, increasing numbers of people are becoming involved in the virtual social network. Because of its large amount of data and complex structure, the network faces new challenges of community mining. A label propagation algorithm with low time complexity and without prior parameters deals easily with a large networks. This study explored a new method of community mining, based on label propagation with two stages. The first stage involved identifying closely linked nodes according to their local adjacency relations that gave rise to a micro-community. The second stage involved expanding and adjusting this community through a label propagation algorithm (LPA to finally obtain the community structure of the entire social network. This algorithm reduced the number of initial labels and avoided the merging of small communities in general LPAs. Thus, the quality of community discovery was improved, and the linear time complexity of the LPA was maintained.

  18. A Dynamic Fuzzy Cluster Algorithm for Time Series

    Directory of Open Access Journals (Sweden)

    Min Ji

    2013-01-01

    clustering time series by introducing the definition of key point and improving FCM algorithm. The proposed algorithm works by determining those time series whose class labels are vague and further partitions them into different clusters over time. The main advantage of this approach compared with other existing algorithms is that the property of some time series belonging to different clusters over time can be partially revealed. Results from simulation-based experiments on geographical data demonstrate the excellent performance and the desired results have been obtained. The proposed algorithm can be applied to solve other clustering problems in data mining.

  19. Deep image mining for diabetic retinopathy screening.

    Science.gov (United States)

    Quellec, Gwenolé; Charrière, Katia; Boudi, Yassine; Cochener, Béatrice; Lamard, Mathieu

    2017-07-01

    Deep learning is quickly becoming the leading methodology for medical image analysis. Given a large medical archive, where each image is associated with a diagnosis, efficient pathology detectors or classifiers can be trained with virtually no expert knowledge about the target pathologies. However, deep learning algorithms, including the popular ConvNets, are black boxes: little is known about the local patterns analyzed by ConvNets to make a decision at the image level. A solution is proposed in this paper to create heatmaps showing which pixels in images play a role in the image-level predictions. In other words, a ConvNet trained for image-level classification can be used to detect lesions as well. A generalization of the backpropagation method is proposed in order to train ConvNets that produce high-quality heatmaps. The proposed solution is applied to diabetic retinopathy (DR) screening in a dataset of almost 90,000 fundus photographs from the 2015 Kaggle Diabetic Retinopathy competition and a private dataset of almost 110,000 photographs (e-ophtha). For the task of detecting referable DR, very good detection performance was achieved: A z =0.954 in Kaggle's dataset and A z =0.949 in e-ophtha. Performance was also evaluated at the image level and at the lesion level in the DiaretDB1 dataset, where four types of lesions are manually segmented: microaneurysms, hemorrhages, exudates and cotton-wool spots. For the task of detecting images containing these four lesion types, the proposed detector, which was trained to detect referable DR, outperforms recent algorithms trained to detect those lesions specifically, with pixel-level supervision. At the lesion level, the proposed detector outperforms heatmap generation algorithms for ConvNets. This detector is part of the Messidor® system for mobile eye pathology screening. Because it does not rely on expert knowledge or manual segmentation for detecting relevant patterns, the proposed solution is a promising image

  20. On-Board Mining in the Sensor Web

    Science.gov (United States)

    Tanner, S.; Conover, H.; Graves, S.; Ramachandran, R.; Rushing, J.

    2004-12-01

    On-board data mining can contribute to many research and engineering applications, including natural hazard detection and prediction, intelligent sensor control, and the generation of customized data products for direct distribution to users. The ability to mine sensor data in real time can also be a critical component of autonomous operations, supporting deep space missions, unmanned aerial and ground-based vehicles (UAVs, UGVs), and a wide range of sensor meshes, webs and grids. On-board processing is expected to play a significant role in the next generation of NASA, Homeland Security, Department of Defense and civilian programs, providing for greater flexibility and versatility in measurements of physical systems. In addition, the use of UAV and UGV systems is increasing in military, emergency response and industrial applications. As research into the autonomy of these vehicles progresses, especially in fleet or web configurations, the applicability of on-board data mining is expected to increase significantly. Data mining in real time on board sensor platforms presents unique challenges. Most notably, the data to be mined is a continuous stream, rather than a fixed store such as a database. This means that the data mining algorithms must be modified to make only a single pass through the data. In addition, the on-board environment requires real time processing with limited computing resources, thus the algorithms must use fixed and relatively small amounts of processing time and memory. The University of Alabama in Huntsville is developing an innovative processing framework for the on-board data and information environment. The Environment for On-Board Processing (EVE) and the Adaptive On-board Data Processing (AODP) projects serve as proofs-of-concept of advanced information systems for remote sensing platforms. The EVE real-time processing infrastructure will upload, schedule and control the execution of processing plans on board remote sensors. These plans